Resolving MERGE being dependant on evaluation order

# CIR-2018-296

Currently, in _Cypher 9_, the result of a `MERGE` statement depends on the evaluation order - i.e. the order of rows coming in to the `MERGE` statement (through the _"driving table"_ of the `MERGE` statement). This is an undesirable situation, because it means that formally defining the semantics of `MERGE` becomes more involved. It it also undesirable because in order to know the outcome the driving table needs to be ordered, which makes queries more complex and more expensive.

It would be beneficial if we could improve the semantics of `MERGE` so that the behaviour does not depend on the evaluation order of the _driving table_.

Any changes to the semantics of `MERGE` need to take care not to lose the ability of solving the cases `MERGE` was designed to solve:

- Match or create a node such that there is only a single node in the graph that matches the given pattern.
- Match or create a relationship between two nodes, such that there is only a single matching relationship between the two nodes.

`MERGE` extends these functionalities needed to support these use cases to larger patterns than single entities, and it is often - but not only - there that the dependency on evaluation order becomes apparent.

### Example 1:

`MERGE (a)-[:KNOWS]-(b)` with driving table:

a | b
--|--
Node[1] | Node[2]
Node[2] | Node[1]

The direction of the created relationship is currently defined to be "left to right" (from `a` to `b` in this case) for the _first encountered_ row. While this will match the pattern regardless of which node is `a` and which is `b`, the result of the clause depends on the order of the driving table. With the order stated above, a relationship will be created from `Node[1]` to `Node[2]`, but if the order had been the opposite a relationship from `Node[2]` to `Node[1]` would be created.

### Example 2:

Given graph created by `CREATE (:A {num:1})-[:R]->(:A {num:2})`
Execute query `MATCH (n:A) MERGE (n)-[:R]->()-[:R]->()`

If `n` matches `(:A{num:1})` first and `(:A{num:2})` second, the result will be that `MERGE` will *create* the pattern for both input rows. If however `n` first matches `(:A{num:2})` and `(:A{num:1})` second, the creation of the pattern starting from `(:A{num:2})` will create enough nodes and relationships for the pattern to match when `n` is `(:A{num:1})`.

## Proposed solution

_I first heard of the basics of this solution from the group at University of Edinburgh, working on defining the semantics of Cypher._

The basic idea is to not allow `MERGE` to read its own writes. So instead of `MERGE` directly creating entities, one can think of it as only making note of what to create (the rows for which there are no matches), and then once the entire driving table has been seen the creations are performed.

If that is all that is done, `MERGE` would not solve the initial use cases, since for example `MERGE (:X{y:z})` would create more than one node given a driving table that contains `z=3` more than once. The use case `MERGE` intends to solve is for such cases to only create a single node for all such rows.

In order to address this as well, I propose that the creations during the create-phase of `MERGE` would perform _idempotent creations_, such that creations with the same inputs would all be collapsed to the same entities.

If we consider _Example 2_ above under these semantics, we would see the pattern created for both nodes under any evaluation order.

In order to ensure only a single relationship being created in _Example 1_, and to ensure that the direction of that relationship is the same under any evaluation order, I further propose that the creation direction of an undirected pattern in `MERGE` be based on the order of the nodes (as in the order `ORDER BY` on a column containing the nodes bound to `a` and `b` would produce) instead of the lexical order of the pattern.

## `ON MATCH` and `ON CREATE`

Still remaining as dependant on evaluation order is the `ON MATCH` and `ON CREATE` subclauses of `MERGE`. The redefinition of `MERGE` as acting in two phases: one matching or marking for creation, and one performing the creations with the creations based on equal inputs being merged to create the same entities, means that multiple rows now get evaluated as having created the entity, even if it is the same entity for those rows. The `ON CREATE` branch would thus be taken for multiple rows that create the same entity. This might seem confusing, but actually has some benefit since it means that the `ON MATCH` and `ON CREATE` are no longer as dependant on evaluation order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolving MERGE being dependant on evaluation order #296

CIR-2018-296

Example 1:

Example 2:

Proposed solution

`ON MATCH` and `ON CREATE`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Resolving MERGE being dependant on evaluation order #296

Description

CIR-2018-296

Example 1:

Example 2:

Proposed solution

ON MATCH and ON CREATE

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`ON MATCH` and `ON CREATE`