-
Notifications
You must be signed in to change notification settings - Fork 157
Description
CIR-2018-296
Currently, in Cypher 9, the result of a MERGE statement depends on the evaluation order - i.e. the order of rows coming in to the MERGE statement (through the "driving table" of the MERGE statement). This is an undesirable situation, because it means that formally defining the semantics of MERGE becomes more involved. It it also undesirable because in order to know the outcome the driving table needs to be ordered, which makes queries more complex and more expensive.
It would be beneficial if we could improve the semantics of MERGE so that the behaviour does not depend on the evaluation order of the driving table.
Any changes to the semantics of MERGE need to take care not to lose the ability of solving the cases MERGE was designed to solve:
- Match or create a node such that there is only a single node in the graph that matches the given pattern.
- Match or create a relationship between two nodes, such that there is only a single matching relationship between the two nodes.
MERGE extends these functionalities needed to support these use cases to larger patterns than single entities, and it is often - but not only - there that the dependency on evaluation order becomes apparent.
Example 1:
MERGE (a)-[:KNOWS]-(b) with driving table:
| a | b |
|---|---|
| Node[1] | Node[2] |
| Node[2] | Node[1] |
The direction of the created relationship is currently defined to be "left to right" (from a to b in this case) for the first encountered row. While this will match the pattern regardless of which node is a and which is b, the result of the clause depends on the order of the driving table. With the order stated above, a relationship will be created from Node[1] to Node[2], but if the order had been the opposite a relationship from Node[2] to Node[1] would be created.
Example 2:
Given graph created by CREATE (:A {num:1})-[:R]->(:A {num:2})
Execute query MATCH (n:A) MERGE (n)-[:R]->()-[:R]->()
If n matches (:A{num:1}) first and (:A{num:2}) second, the result will be that MERGE will create the pattern for both input rows. If however n first matches (:A{num:2}) and (:A{num:1}) second, the creation of the pattern starting from (:A{num:2}) will create enough nodes and relationships for the pattern to match when n is (:A{num:1}).
Proposed solution
I first heard of the basics of this solution from the group at University of Edinburgh, working on defining the semantics of Cypher.
The basic idea is to not allow MERGE to read its own writes. So instead of MERGE directly creating entities, one can think of it as only making note of what to create (the rows for which there are no matches), and then once the entire driving table has been seen the creations are performed.
If that is all that is done, MERGE would not solve the initial use cases, since for example MERGE (:X{y:z}) would create more than one node given a driving table that contains z=3 more than once. The use case MERGE intends to solve is for such cases to only create a single node for all such rows.
In order to address this as well, I propose that the creations during the create-phase of MERGE would perform idempotent creations, such that creations with the same inputs would all be collapsed to the same entities.
If we consider Example 2 above under these semantics, we would see the pattern created for both nodes under any evaluation order.
In order to ensure only a single relationship being created in Example 1, and to ensure that the direction of that relationship is the same under any evaluation order, I further propose that the creation direction of an undirected pattern in MERGE be based on the order of the nodes (as in the order ORDER BY on a column containing the nodes bound to a and b would produce) instead of the lexical order of the pattern.
ON MATCH and ON CREATE
Still remaining as dependant on evaluation order is the ON MATCH and ON CREATE subclauses of MERGE. The redefinition of MERGE as acting in two phases: one matching or marking for creation, and one performing the creations with the creations based on equal inputs being merged to create the same entities, means that multiple rows now get evaluated as having created the entity, even if it is the same entity for those rows. The ON CREATE branch would thus be taken for multiple rows that create the same entity. This might seem confusing, but actually has some benefit since it means that the ON MATCH and ON CREATE are no longer as dependant on evaluation order.