Replace YJS with custom server-authoritative document sync#655
Replace YJS with custom server-authoritative document sync#655softmarshmallow merged 10 commits intomainfrom
Conversation
… utilities - Add DocumentClock class for managing monotonic document clocks. - Introduce diff utilities for computing, applying, and composing document diffs. - Create protocol types for defining the wire protocol between client and server. - Implement transport layer with WebSocketTransport for communication. - Add presence management for tracking user presence in the document. - Include validation logic for incoming diffs to ensure structural integrity. - Set up TypeScript configuration for the project.
- Deleted y-document.ts and y-patches.ts files, removing Yjs dependencies. - Updated package.json to remove Yjs and related packages. - Introduced @grida/canvas-sync as a new dependency for document synchronization. - Refactored integration tests to utilize a new makeNode helper function for creating nodes. - Enhanced document worker to include CORS support and improved WebSocket message handling. - Implemented atomic write operations in SyncStorage to ensure data integrity during diff applications.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughReplaces the Yjs-based multiplayer sync with a new workspace package Changes
Sequence Diagram(s)sequenceDiagram
participant Editor as Editor
participant Adapter as DocumentSyncAdapter
participant Client as SyncClient
participant Transport as WebSocketTransport
participant Server as G1DO
Editor->>Adapter: document change (throttled)
Adapter->>Client: pushDiff(diff)
Client->>Transport: send(push message)
Transport->>Server: websocket push
Server->>Server: validate & applyDiff (tick clock)
Server->>Transport: push_ok / patch broadcast
Transport->>Client: deliver(push_ok / patch)
Client->>Adapter: emit stateChange
Adapter->>Editor: applyDocumentPatches
sequenceDiagram
participant ClientA as Client A
participant Server as G1DO
participant ClientB as Client B
ClientA->>Server: push(diff: add node X)
Server->>Server: validate & apply, broadcast patch
Server->>ClientA: push_ok(commit)
Server->>ClientB: patch(diff: node X added)
ClientB->>ClientB: apply patch, recompute state
Note over ClientA,ClientB: Both clients converge to same state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 618259e6c2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // If we have unsent changes from before reconnect, push them | ||
| if (!isDiffEmpty(this._unsent)) { | ||
| this._schedulePush(); | ||
| } |
There was a problem hiding this comment.
Requeue unacked speculative diffs after reconnect
When the socket drops after _flush() has moved a local change from _unsent into _speculative, reconnect only schedules a resend if _unsent is non-empty. That leaves the in-flight diff stranded forever, so the server never commits it. In that state, a later push_ok can acknowledge a newer push while _handlePushOk still applies _speculative[0], causing client/server divergence and lost edits after transient disconnects.
Useful? React with 👍 / 👎.
| // If the set is the same but order changed, emit a reorder | ||
| if (ops.length === 0) { | ||
| ops.push({ op: "reorder", ids: after }); |
There was a problem hiding this comment.
Preserve scene ordering when membership also changes
computeSceneDiff emits reorder only when there are no add/remove ops, so mixed changes lose ordering information. For example, before=[s1,s2,s3] and after=[s3,s2,s4] produces only remove s1 + add s4; applying that yields [s2,s3,s4], not [s3,s2,s4]. Any coalesced edit that both changes scene membership and reorders surviving scenes will sync to the wrong order.
Useful? React with 👍 / 👎.
| case "add": | ||
| if (!projectedNodes.has(sceneOp.id)) { | ||
| errors.push({ | ||
| target: sceneOp.id, | ||
| code: "SCENE_ADD_MISSING_NODE", |
There was a problem hiding this comment.
Reject non-scene nodes in scene-add validation
scene add currently validates only node existence, not node type, so a client can add a non-scene node ID into scenes. Downstream editor logic assumes scenes_ref entries are actual SceneNodes (e.g. getScene asserts type === "scene"), so accepting this payload can poison canonical state and trigger runtime failures when scene actions execute.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (7)
packages/grida-canvas-sync/tsconfig.json (1)
6-6: Move Vitest globals out of the src-only tsconfig.This config injects
vitest/globalsintosrc, but Lines 15-16 exclude__tests__. That makes test-only globals available in production code whiletsc --noEmitskips the files that actually use them. Please confirm there is another tsconfig covering the tests; otherwise split this into a library tsconfig plus a test tsconfig.Also applies to: 15-16
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/grida-canvas-sync/tsconfig.json` at line 6, The tsconfig currently injects "vitest/globals" into the src-only build (the "types": ["vitest/globals"] entry) while simultaneously excluding test folders ("__tests__"), which exposes test-only globals to production code; fix this by removing "vitest/globals" from this tsconfig and either create a separate tsconfig.test.json that extends the base and adds "types": ["vitest/globals"] for tests, or split into a library tsconfig (no vitest types) plus a test-specific tsconfig that includes the __tests__ patterns; ensure the test tsconfig is referenced by your test runner or npm scripts so tests still pick up vitest globals.editor/grida-canvas/plugins/sync/presence-sync.ts (1)
140-160: UnusedpeerIdvariable in presence iteration.The
peerIdfromObject.entries(peers)(line 140) is not used in the loop body. The cursor ID is taken frompresence.cursor.cursor_idinstead. This is fine if the cursor_id is the canonical identifier, but the unused variable could be removed for clarity.♻️ Optional: Use array iteration if peerId is unused
- for (const [peerId, presence] of Object.entries(peers)) { + for (const presence of Object.values(peers)) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@editor/grida-canvas/plugins/sync/presence-sync.ts` around lines 140 - 160, The loop is unpacking [peerId, presence] from Object.entries(peers) but never uses peerId; update the iteration to avoid the unused variable by iterating values only (e.g., use Object.values(peers) or for (const presence of Object.values(peers))) and keep the existing logic that reads presence.cursor.cursor_id into the cursors map; ensure symbols referenced remain presence, cursor, cursors and this._colorToPalette.editor/grida-canvas/plugins/sync/index.ts (2)
54-60: Track TODOs for schema version and clock persistence.The hardcoded schema version (line 55) and
lastClock: 0(line 58) are flagged with TODOs. These are important for:
- Schema version: Ensures clients and server agree on document format
- lastClock: Enables proper catch-up after page refresh without full state resync
Consider creating issues to track these before the sync feature is considered production-ready.
Do you want me to open issues to track these TODOs?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@editor/grida-canvas/plugins/sync/index.ts` around lines 54 - 60, The code creates a SyncClient with hardcoded schema and lastClock (see SyncClient instantiation) which risks version drift and lost progress; replace the literal schema string with a constant imported from the canonical schema package (e.g., import SCHEMA_VERSION from `@grida/schema` and use it in the SyncClient options) and load persisted clock state instead of 0 by reading the stored sync-state (e.g., from OPFS/localStorage) before constructing SyncClient and passing that value as lastClock; also add TODO-to-issue entries for "sync: derive schema version from `@grida/schema`" and "sync: persist/load lastClock from storage" so these tasks are tracked before production rollout.
40-40: Consider using structured logging or removing debug console.log.The
console.logstatements on lines 40 and 75 are useful for debugging but may clutter production logs. Consider using a configurable logger or removing them.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@editor/grida-canvas/plugins/sync/index.ts` at line 40, Replace the two console.log debug statements (the one logging "sync::constructor" for the constructor/initialization and the other at line ~75) with either removal or calls to the project's structured logger; locate the logs in the Sync plugin initializer/constructor in editor/grida-canvas/plugins/sync/index.ts (search for "sync::constructor") and either remove them or change them to use the shared logger instance (or a debug-level logger call) controlled by configuration so debug output can be enabled in dev and suppressed in production.packages/grida-canvas-sync/src/client.ts (2)
192-200: Presence updates are silently dropped when not connected.This is acceptable for ephemeral data like cursor position, but worth noting that presence updates during reconnection are lost. Consider documenting this behavior or returning a boolean to indicate success.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/grida-canvas-sync/src/client.ts` around lines 192 - 200, The setPresence method currently drops updates when this._status !== "ready" without feedback; update setPresence(presence: PresenceState) to return a boolean indicating success (true when this._transport.send was invoked and false when dropped) and ensure callers can act on the result, or alternatively add a clear comment/docs on the method explaining that presence updates are ephemeral and are not queued during reconnection; reference the setPresence method, the this._status check, and this._transport.send call when making the change.
308-311: Unnecessary scheduling when only speculative diffs remain.The condition
this._speculative.length > 0triggers scheduling even when those diffs are already in-flight. Since_flush()guards with_pushInFlight, this won't cause duplicate sends, but it's misleading and creates unnecessary timer churn.Only schedule when there are unsent changes
// If there's more to push, schedule it - if (!isDiffEmpty(this._unsent) || this._speculative.length > 0) { + if (!isDiffEmpty(this._unsent)) { this._schedulePush(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/grida-canvas-sync/src/client.ts` around lines 308 - 311, The scheduling condition is too broad: change the block that calls this._schedulePush() so it only schedules when there are actual unsent diffs (use !isDiffEmpty(this._unsent)) and remove the this._speculative.length > 0 check; _flush() already prevents duplicate sends via _pushInFlight, so only unsent changes should trigger _schedulePush() to avoid unnecessary timer churn while speculative diffs are in-flight.packages/grida-canvas-sync/__tests__/helpers.ts (1)
161-177: Minor: Redundant condition in connect handler.The condition
msg.lastClock === 0is redundant sincemsg.lastClock < this.clock.valuealready covers thelastClock === 0case (assuming server clock starts at 0 and only increases).Simplify condition
private _handleConnect(session: MockSession, msg: ConnectMessage): void { - if (msg.lastClock === 0 || msg.lastClock < this.clock.value) { + if (msg.lastClock < this.clock.value) { // Send full state session.transport.deliver({🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/grida-canvas-sync/__tests__/helpers.ts` around lines 161 - 177, The if condition in _handleConnect is redundant: remove the explicit msg.lastClock === 0 check and use a single comparison msg.lastClock < this.clock.value to decide to send the full state; update the branch in the private method _handleConnect (involving MockSession, ConnectMessage, and this.clock.value) so when msg.lastClock < this.clock.value it sends the full state (type "connect_ok" with clock, state, scenes) and otherwise sends the up-to-date response (type "connect_ok" with only clock).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@editor/grida-canvas/plugins/sync/document-sync.ts`:
- Around line 59-84: The throttled editor callback (created via editor.throttle
inside the subscribeWithSelector block) can delay local edits so they get
overwritten by incoming remote state; fix it by assigning the result of
editor.throttle(...) to a named throttled function (e.g., flushPendingEdits or
throttledPush) instead of inlining it, and then ensure any remote state-apply
path (the code handling incoming stateChange / document replacement — also
around the other subscribe block at lines 92-117) calls throttledPush.flush() or
invokes throttledPush synchronously before applying the remote document; keep
the same 30ms throttle and trailing behavior but guarantee pending local edits
are flushed via the named throttled function (and update usages of _mutex,
_lastTid, lastSyncedState, and _client.pushDiff accordingly).
In `@editor/grida-canvas/plugins/sync/serialize.ts`:
- Around line 49-60: Add server-side validation to block remote edits to the
pseudo-node "__doc_meta__": in validateNodeOp (and mirror checks in applyDiff if
it applies ops directly), detect operations whose target node id ===
"__doc_meta__" and reject patch/put/remove operations from remote peers by
returning/throwing a validation error (or marking op invalid) instead of
applying them; ensure the check covers op types that mutate (e.g., "patch",
"put", "remove") and leaves reads/no-ops unaffected so the server’s canonical
doc-level fields (links, images, bitmaps, properties, entry_scene_id) cannot be
mutated by incoming sync ops.
In `@packages/grida-canvas-sync/__tests__/helpers.ts`:
- Around line 323-336: The current comparison only iterates Object.keys(sNode)
so extra properties on clientNodes[id] are missed; update the check inside the
serverIds loop (the block using serverNodes, clientNodes, sNode, cNode) to also
iterate Object.keys(cNode) and assert each client key exists on sNode and has
the same JSON value (or perform a symmetric key-set comparison between sNode and
cNode) and throw a clear error mentioning the node id and offending key when a
client-only field or value mismatch is found.
In `@packages/grida-canvas-sync/src/client.ts`:
- Around line 255-278: On connect_ok in _handleConnectOk, clear the stale
speculative diff state (_speculative) before calling _recomputeLocalState so
previously in-flight speculative patches from the prior connection are not
applied to the new canonical state; locate _handleConnectOk and after updating
this._canonical and this._serverClock (and before this._recomputeLocalState())
set this._speculative to an empty/initial diff value (matching how _speculative
is initialized elsewhere) and then proceed to _recomputeLocalState and the
existing _schedulePush logic for any truly unsent changes in _unsent.
- Around line 392-408: The ref-check in _recomputeLocalState is false-positive
because applyDiff always returns new objects; update the function to skip
applying logically-empty diffs by checking each speculative diff with
isDiffEmpty(spec) before calling applyDiff on _speculative, and also use
isDiffEmpty to guard applying _unsent (or perform a deep/content equality check
between merged and _localState instead of a reference check) so stateChange only
emits when content actually differs; reference symbols: _recomputeLocalState,
_speculative, applyDiff, isDiffEmpty, _unsent, and _localState.
In `@packages/grida-canvas-sync/src/clock.ts`:
- Around line 16-18: DocumentClock must enforce monotonic, non-NaN, non-negative
values: update the constructor (DocumentClock.constructor) and the reset method
(DocumentClock.reset) to validate their incoming number argument: ensure it's a
finite number (not NaN/Infinity), >= 0, and does not roll back the clock (for
reset require newValue >= this._value or else throw or ignore). If invalid,
throw a clear error (or reject) rather than accepting the value; keep the
internal field (_value) unchanged on invalid input. This guarantees the
monotonic behavior used by clock comparisons.
In `@packages/grida-canvas-sync/src/diff.ts`:
- Around line 143-164: computeDiff currently emits separate remove/add when
membership changes but length stays the same (e.g. [a,b]→[c,b]) which loses
ordering; update the logic after building ops to detect the case where ops
include both "remove" and "add" and before.length === after.length and, in that
case, replace ops with a single { op: "reorder", ids: after } so
applyDiff(before, ops) reconstructs after; reference the local variables ops,
before, after and the "reorder" op when making the change.
In `@packages/grida-canvas-sync/src/transport.ts`:
- Around line 92-109: connect() must cancel any pending reconnect timers before
opening a new socket to avoid a stale timer creating a second WebSocket; modify
connect() to clear and null out _reconnectTimer (same logic as in disconnect())
before calling _openSocket(), and also apply the same guard where reconnection
is scheduled (the other _openSocket() invocations around the reconnect logic) so
_openSocket() cannot be invoked by an outstanding timer after a manual connect;
reference connect(), disconnect(), _reconnectTimer, _openSocket(), _ws and
_intentionalClose when making the change.
In `@packages/grida-canvas-sync/src/validate.ts`:
- Around line 32-33: ValidationErrorCode includes "SCENE_ADD_NOT_SCENE" but
there's no code that emits it; add a check in the scene validation block (the
code handling scene additions/removals—look for the logic that currently emits
SCENE_ADD and SCENE_REMOVE) to verify the referenced node's type === "scene"
and, if not, throw/return a ValidationError using
ValidationErrorCode.SCENE_ADD_NOT_SCENE (include the offending node id and
context). Alternatively, if that validation is not needed, remove the unused
enum value ValidationErrorCode.SCENE_ADD_NOT_SCENE to keep the API consistent.
In `@services/grida-canvas-document-worker-cf/src/room.ts`:
- Around line 270-273: The code advances the in-memory clock and canonical
before persisting, so if this.storage.applyDiff(msg.diff, newClock) throws the
DO will be left in a non-persistent state; instead, compute the next clock value
(e.g. call the tick/next function but do not assign it to this.clock), call
this.storage.applyDiff(msg.diff, nextClock) first, and only after that succeeds
assign this.clock = nextClock and set this.canonical = applyDiff(this.canonical,
msg.diff); keep references to this.clock.tick()/tick result, applyDiff,
this.canonical, and this.storage.applyDiff to locate the changes.
- Around line 137-145: The current oversized-message guard uses message.length
(UTF-16 code units) which undercounts wire bytes; change the check to compute
UTF-8 byte length (e.g., use Buffer.byteLength(message, 'utf8') or new
TextEncoder().encode(message).length) when comparing against MAX_MESSAGE_SIZE in
the same block, so the if(...) uses the UTF-8 byte count and the error path that
calls this._send(ws, {...}) remains unchanged.
---
Nitpick comments:
In `@editor/grida-canvas/plugins/sync/index.ts`:
- Around line 54-60: The code creates a SyncClient with hardcoded schema and
lastClock (see SyncClient instantiation) which risks version drift and lost
progress; replace the literal schema string with a constant imported from the
canonical schema package (e.g., import SCHEMA_VERSION from `@grida/schema` and use
it in the SyncClient options) and load persisted clock state instead of 0 by
reading the stored sync-state (e.g., from OPFS/localStorage) before constructing
SyncClient and passing that value as lastClock; also add TODO-to-issue entries
for "sync: derive schema version from `@grida/schema`" and "sync: persist/load
lastClock from storage" so these tasks are tracked before production rollout.
- Line 40: Replace the two console.log debug statements (the one logging
"sync::constructor" for the constructor/initialization and the other at line
~75) with either removal or calls to the project's structured logger; locate the
logs in the Sync plugin initializer/constructor in
editor/grida-canvas/plugins/sync/index.ts (search for "sync::constructor") and
either remove them or change them to use the shared logger instance (or a
debug-level logger call) controlled by configuration so debug output can be
enabled in dev and suppressed in production.
In `@editor/grida-canvas/plugins/sync/presence-sync.ts`:
- Around line 140-160: The loop is unpacking [peerId, presence] from
Object.entries(peers) but never uses peerId; update the iteration to avoid the
unused variable by iterating values only (e.g., use Object.values(peers) or for
(const presence of Object.values(peers))) and keep the existing logic that reads
presence.cursor.cursor_id into the cursors map; ensure symbols referenced remain
presence, cursor, cursors and this._colorToPalette.
In `@packages/grida-canvas-sync/__tests__/helpers.ts`:
- Around line 161-177: The if condition in _handleConnect is redundant: remove
the explicit msg.lastClock === 0 check and use a single comparison msg.lastClock
< this.clock.value to decide to send the full state; update the branch in the
private method _handleConnect (involving MockSession, ConnectMessage, and
this.clock.value) so when msg.lastClock < this.clock.value it sends the full
state (type "connect_ok" with clock, state, scenes) and otherwise sends the
up-to-date response (type "connect_ok" with only clock).
In `@packages/grida-canvas-sync/src/client.ts`:
- Around line 192-200: The setPresence method currently drops updates when
this._status !== "ready" without feedback; update setPresence(presence:
PresenceState) to return a boolean indicating success (true when
this._transport.send was invoked and false when dropped) and ensure callers can
act on the result, or alternatively add a clear comment/docs on the method
explaining that presence updates are ephemeral and are not queued during
reconnection; reference the setPresence method, the this._status check, and
this._transport.send call when making the change.
- Around line 308-311: The scheduling condition is too broad: change the block
that calls this._schedulePush() so it only schedules when there are actual
unsent diffs (use !isDiffEmpty(this._unsent)) and remove the
this._speculative.length > 0 check; _flush() already prevents duplicate sends
via _pushInFlight, so only unsent changes should trigger _schedulePush() to
avoid unnecessary timer churn while speculative diffs are in-flight.
In `@packages/grida-canvas-sync/tsconfig.json`:
- Line 6: The tsconfig currently injects "vitest/globals" into the src-only
build (the "types": ["vitest/globals"] entry) while simultaneously excluding
test folders ("__tests__"), which exposes test-only globals to production code;
fix this by removing "vitest/globals" from this tsconfig and either create a
separate tsconfig.test.json that extends the base and adds "types":
["vitest/globals"] for tests, or split into a library tsconfig (no vitest types)
plus a test-specific tsconfig that includes the __tests__ patterns; ensure the
test tsconfig is referenced by your test runner or npm scripts so tests still
pick up vitest globals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 56920275-a013-4c3f-b6bf-2a1d506138c9
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (36)
editor/grida-canvas-hosted/playground/playground.tsxeditor/grida-canvas/__tests__/headless/sync-integration.test.tseditor/grida-canvas/plugins/sync/document-sync.tseditor/grida-canvas/plugins/sync/index.tseditor/grida-canvas/plugins/sync/presence-sync.tseditor/grida-canvas/plugins/sync/serialize.tseditor/grida-canvas/plugins/yjs/__tests__/y-patches.test.tseditor/grida-canvas/plugins/yjs/index.tseditor/grida-canvas/plugins/yjs/y-awareness.tseditor/grida-canvas/plugins/yjs/y-document.tseditor/grida-canvas/plugins/yjs/y-patches.tseditor/package.jsonpackages/grida-canvas-sync/__tests__/client.test.tspackages/grida-canvas-sync/__tests__/diff.test.tspackages/grida-canvas-sync/__tests__/helpers.tspackages/grida-canvas-sync/__tests__/integration.test.tspackages/grida-canvas-sync/__tests__/validate.test.tspackages/grida-canvas-sync/package.jsonpackages/grida-canvas-sync/src/client.tspackages/grida-canvas-sync/src/clock.tspackages/grida-canvas-sync/src/diff.tspackages/grida-canvas-sync/src/index.tspackages/grida-canvas-sync/src/presence.tspackages/grida-canvas-sync/src/protocol.tspackages/grida-canvas-sync/src/transport.tspackages/grida-canvas-sync/src/validate.tspackages/grida-canvas-sync/tsconfig.jsonservices/grida-canvas-document-worker-cf/package.jsonservices/grida-canvas-document-worker-cf/src/do.tsservices/grida-canvas-document-worker-cf/src/index.tsservices/grida-canvas-document-worker-cf/src/lib/index.tsservices/grida-canvas-document-worker-cf/src/lib/storage.tsservices/grida-canvas-document-worker-cf/src/lib/websocket.tsservices/grida-canvas-document-worker-cf/src/room.tsservices/grida-canvas-document-worker-cf/src/storage.tsservices/grida-canvas-document-worker-cf/tsconfig.json
💤 Files with no reviewable changes (9)
- services/grida-canvas-document-worker-cf/src/lib/index.ts
- services/grida-canvas-document-worker-cf/src/lib/storage.ts
- editor/grida-canvas/plugins/yjs/tests/y-patches.test.ts
- editor/grida-canvas/plugins/yjs/y-awareness.ts
- editor/grida-canvas/plugins/yjs/y-patches.ts
- editor/grida-canvas/plugins/yjs/index.ts
- services/grida-canvas-document-worker-cf/src/lib/websocket.ts
- services/grida-canvas-document-worker-cf/src/do.ts
- editor/grida-canvas/plugins/yjs/y-document.ts
| this._unsubscribeEditor = this._editor.doc.subscribeWithSelector( | ||
| (state) => state.document, | ||
| editor.throttle( | ||
| ( | ||
| store: EditorDocumentStore, | ||
| _next: grida.program.document.Document, | ||
| _prev: grida.program.document.Document | ||
| ) => { | ||
| if (store.locked) return; | ||
| if (this._lastTid === store.tid) return; | ||
| this._lastTid = store.tid; | ||
|
|
||
| this._mutex(() => { | ||
| const currentState = documentToState( | ||
| this._editor.doc.state.document | ||
| ); | ||
| const diff = computeDiff(this.lastSyncedState, currentState); | ||
| if (diff) { | ||
| this.lastSyncedState = currentState; | ||
| this._client.pushDiff(diff); | ||
| } | ||
| }); | ||
| }, | ||
| 30, // 30ms throttle, same as old YJS plugin | ||
| { trailing: true } | ||
| ) |
There was a problem hiding this comment.
Do not let the editor outrun SyncClient during the throttle window.
Local edits only reach SyncClient when this throttled callback runs, but remote stateChange replaces the whole document immediately. That means an edit made inside the throttle window can be overwritten by an incoming patch before pushDiff() ever sees it, so the change is silently lost instead of being rebased. The manual flushEditorToServer() helper in the headless tests currently hides this race.
Also applies to: 92-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@editor/grida-canvas/plugins/sync/document-sync.ts` around lines 59 - 84, The
throttled editor callback (created via editor.throttle inside the
subscribeWithSelector block) can delay local edits so they get overwritten by
incoming remote state; fix it by assigning the result of editor.throttle(...) to
a named throttled function (e.g., flushPendingEdits or throttledPush) instead of
inlining it, and then ensure any remote state-apply path (the code handling
incoming stateChange / document replacement — also around the other subscribe
block at lines 92-117) calls throttledPush.flush() or invokes throttledPush
synchronously before applying the remote document; keep the same 30ms throttle
and trailing behavior but guarantee pending local edits are flushed via the
named throttled function (and update usages of _mutex, _lastTid,
lastSyncedState, and _client.pushDiff accordingly).
| // Store document-level data as a special meta record | ||
| const meta: DocMetaRecord = { | ||
| type: "__doc_meta__", | ||
| id: "__doc_meta__", | ||
| links: doc.links ?? {}, | ||
| images: doc.images ?? {}, | ||
| bitmaps: doc.bitmaps ?? {}, | ||
| properties: doc.properties ?? {}, | ||
| metadata: doc.metadata, | ||
| entry_scene_id: doc.entry_scene_id ?? null, | ||
| }; | ||
| nodes["__doc_meta__"] = meta as unknown as SerializedNode; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's any special handling for __doc_meta__ in the sync layer
echo "=== Checking for __doc_meta__ handling in diff/validate logic ==="
rg -n "__doc_meta__" --type tsRepository: gridaco/grida
Length of output: 1612
🏁 Script executed:
# Examine the test that checks __doc_meta__ exclusion
sed -n '300,320p' editor/grida-canvas/__tests__/headless/sync-integration.test.tsRepository: gridaco/grida
Length of output: 878
🏁 Script executed:
# Examine the server-side filtering mentioned at line 448-450
sed -n '445,455p' editor/grida-canvas/__tests__/headless/sync-integration.test.tsRepository: gridaco/grida
Length of output: 530
🏁 Script executed:
# Check if validateNodeOp has special handling for __doc_meta__
sed -n '106,167p' packages/grida-canvas-sync/src/validate.ts | head -80Repository: gridaco/grida
Length of output: 1654
🏁 Script executed:
# Check applyDiff for __doc_meta__ handling
sed -n '177,224p' packages/grida-canvas-sync/src/diff.tsRepository: gridaco/grida
Length of output: 1200
🏁 Script executed:
# Examine stateToDocument to see where __doc_meta__ is filtered
sed -n '73,105p' editor/grida-canvas/plugins/sync/serialize.tsRepository: gridaco/grida
Length of output: 1144
🏁 Script executed:
# Search for any other places that might validate/protect __doc_meta__
rg -n "validateNodeOp|applyDiff|__doc_meta__" editor/grida-canvas plugins/sync packages/grida-canvas-sync/src --type ts -B 2 -A 2 | grep -E "(validateNodeOp|applyDiff|__doc_meta__|server|Server)" | head -40Repository: gridaco/grida
Length of output: 3377
🏁 Script executed:
# Check if there's any server-side middleware or validation that blocks __doc_meta__
fd -e ts -path "*/server/*" -o -path "*/validate*" packages/grida-canvas-sync | head -10Repository: gridaco/grida
Length of output: 225
🏁 Script executed:
# Search for any validation that explicitly blocks __doc_meta__ operations
rg -n "__doc_meta__" packages/grida-canvas-sync/src --type ts -B 3 -A 3Repository: gridaco/grida
Length of output: 39
🏁 Script executed:
# Check if there are tests for __doc_meta__ patch operations or protection
rg -n "patch.*__doc_meta__|__doc_meta__.*patch" editor/grida-canvas/__tests__ --type ts -B 2 -A 5Repository: gridaco/grida
Length of output: 39
🏁 Script executed:
# Look for any comments about __doc_meta__ security or design decisions
rg -n "__doc_meta__" editor/grida-canvas/plugins/sync --type ts -B 5 -A 5 | head -60Repository: gridaco/grida
Length of output: 4918
Add server-side validation to prevent remote clients from patching __doc_meta__ metadata fields.
The __doc_meta__ pseudo-node stores critical document-level metadata (links, images, bitmaps, properties, entry_scene_id). While stateToDocument correctly filters it out before returning a Document to clients, the server's sync layer in applyDiff and validateNodeOp currently treats __doc_meta__ like any other node without special guards.
This allows a remote peer to send patch operations targeting __doc_meta__ that would modify the server's canonical document metadata. Add explicit validation in validateNodeOp to reject patch/put/remove operations on the __doc_meta__ node, or document this as an intentional design choice if server-side integrity is managed elsewhere.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@editor/grida-canvas/plugins/sync/serialize.ts` around lines 49 - 60, Add
server-side validation to block remote edits to the pseudo-node "__doc_meta__":
in validateNodeOp (and mirror checks in applyDiff if it applies ops directly),
detect operations whose target node id === "__doc_meta__" and reject
patch/put/remove operations from remote peers by returning/throwing a validation
error (or marking op invalid) instead of applying them; ensure the check covers
op types that mutate (e.g., "patch", "put", "remove") and leaves reads/no-ops
unaffected so the server’s canonical doc-level fields (links, images, bitmaps,
properties, entry_scene_id) cannot be mutated by incoming sync ops.
…se, persist-before-advance, and code quality - Fix critical scene diff ordering bug: computeSceneDiff now always emits a single reorder op, preventing [a,b]→[c,b] from producing [b,c] - Fix reconnect state corruption: clear stale speculative diffs and requeue them as unsent on connect_ok - Fix persist-before-advance: storage.applyDiff runs before in-memory clock/canonical advance, so a storage throw leaves state consistent - Fix spurious stateChange events: fast-path ref equality when no speculative/unsent changes exist, skip applyDiff for empty diffs - Fix double-socket race: cancel pending reconnect timer in connect() - Fix tombstone pruning gap: record tombstone_floor clock, getDelta returns null (forcing full state) for stale sinceClock values - Fix message size guard: use UTF-8 byte length instead of UTF-16 - Implement SCENE_ADD_NOT_SCENE validation (was declared but unused) - Add clock validation guards (non-negative safe integer) - Fix assertConvergence to check keys from both server and client - Remove all unused imports/variables flagged by code quality checks
There was a problem hiding this comment.
🧹 Nitpick comments (1)
editor/grida-canvas/plugins/sync/presence-sync.ts (1)
157-158: Consider tracking presence features for follow-up.The
marqueeandephemeral_chatfields are set tonullwith TODO comments. These features may be useful for collaborative editing (showing other users' selection boxes and chat bubbles).Would you like me to open an issue to track implementing marquee and cursor chat in the presence protocol?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@editor/grida-canvas/plugins/sync/presence-sync.ts` around lines 157 - 158, Add tracked follow-up work for the null presence fields by opening a repository issue to implement marquee and ephemeral_chat in the presence protocol, and update the TODO comments in presence-sync.ts (the marquee and ephemeral_chat entries) to reference that issue number/URL; ensure the issue describes expected behavior (selection marquee rendering, ephemeral cursor chat bubbles), data shape, and backwards-compatible handling so future work can implement PresenceSync methods using those fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@editor/grida-canvas/plugins/sync/presence-sync.ts`:
- Around line 157-158: Add tracked follow-up work for the null presence fields
by opening a repository issue to implement marquee and ephemeral_chat in the
presence protocol, and update the TODO comments in presence-sync.ts (the marquee
and ephemeral_chat entries) to reference that issue number/URL; ensure the issue
describes expected behavior (selection marquee rendering, ephemeral cursor chat
bubbles), data shape, and backwards-compatible handling so future work can
implement PresenceSync methods using those fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3c74d197-13de-4e2b-8c45-9e0bdec8279f
📒 Files selected for processing (13)
editor/grida-canvas/plugins/sync/presence-sync.tspackages/grida-canvas-sync/__tests__/client.test.tspackages/grida-canvas-sync/__tests__/diff.test.tspackages/grida-canvas-sync/__tests__/helpers.tspackages/grida-canvas-sync/__tests__/integration.test.tspackages/grida-canvas-sync/__tests__/validate.test.tspackages/grida-canvas-sync/src/client.tspackages/grida-canvas-sync/src/clock.tspackages/grida-canvas-sync/src/diff.tspackages/grida-canvas-sync/src/transport.tspackages/grida-canvas-sync/src/validate.tsservices/grida-canvas-document-worker-cf/src/room.tsservices/grida-canvas-document-worker-cf/src/storage.ts
✅ Files skipped from review due to trivial changes (2)
- packages/grida-canvas-sync/tests/integration.test.ts
- packages/grida-canvas-sync/tests/client.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/grida-canvas-sync/src/clock.ts
- services/grida-canvas-document-worker-cf/src/storage.ts
- packages/grida-canvas-sync/src/transport.ts
- packages/grida-canvas-sync/tests/helpers.ts
… SyncClient handling for empty states
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
editor/grida-canvas/plugins/sync/document-sync.ts (1)
66-89:⚠️ Potential issue | 🔴 CriticalPending local edits can still be lost when remote state arrives.
At Line 66 the editor→client path is trailing-throttled, while Line 98 applies remote
stateChangeimmediately. A local edit inside the 30ms window can be overwritten beforepushDiff()runs, so it never gets rebased/pushed.Suggested fix pattern
+ // 1) Extract the local diff push logic into a named immediate function. + // 2) Use a named throttled wrapper only for editor subscription events. + // 3) Before applying remote state, force a sync of pending local edits + // by calling the immediate push function (or flushing the named throttle) + // so local changes are not dropped. - editor.throttle((store, _next, _prev) => { ...pushDiff... }, 30, { trailing: true }) + const throttledPushLocal = editor.throttle(() => { + this._pushLocalDiffNow(); + }, 30, { trailing: true }); this._unsubscribeEditor = this._editor.doc.subscribeWithSelector( (state) => state.document, - /* inline throttled callback */ + throttledPushLocal ); this._unsubscribeClient = this._client.on("stateChange", (newState) => { + this._pushLocalDiffNow(); // flush pending local edits before remote apply this._mutex(() => { // apply remote state }); });Also applies to: 98-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@editor/grida-canvas/plugins/sync/document-sync.ts` around lines 66 - 89, The trailing-only throttle on the editor→client path (editor.throttle callback that checks this._lastTid and calls this._mutex -> this._client.pushDiff) can lose local edits when remote stateChange is applied immediately; fix by making the local-to-client flush serialize with remote application: either change the throttle options to run leading (e.g., { leading: true, trailing: true }) so local edits are pushed immediately, or ensure the remote stateChange handler acquires the same this._mutex (or awaits a flush of the throttled handler) before applying incoming changes so pushDiff and remote state application cannot interleave and overwrite pending local edits (reference editor.throttle, this._lastTid, this._mutex, this.lastSyncedState, this._client.pushDiff, and the remote stateChange handler).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/grida-canvas-sync/__tests__/client.test.ts`:
- Around line 23-25: MockTransport.send currently ignores connection state and
always pushes messages, violating the ISyncTransport contract that send() must
throw when not connected; update MockTransport.send to check the transport's
connection status (e.g., this.connected or an isConnected() flag) and throw an
error if not connected, otherwise push the message into this.sent—ensure the
method signature remains send(message: ClientMessage): void and that the thrown
error is descriptive so tests can detect connection-state bugs in SyncClient.
- Around line 452-471: The test must explicitly assert that listeners are not
invoked after teardown: after calling client.destroy(), clear the handler's
recorded calls (handler.mockClear() or equivalent) then call
transport.deliver(...) and assert the handler was not called
(expect(handler).not.toHaveBeenCalled()); keep the existing
expect(client.status).toBe("disconnected") as well. This uses the existing
handler mock, client.destroy(), transport.deliver(), and client.status to verify
listener teardown.
- Around line 160-165: The test calls makeNode with an invalid third argument
("scene") causing a TS type error; update the call in the test case ("preserves
local state when server responds with no state and no diff") to pass only the
valid parameters by removing the third argument so the makeNode invocation
becomes makeNode("scene", { name: "Main" }), leaving the other makeNode calls
unchanged.
---
Duplicate comments:
In `@editor/grida-canvas/plugins/sync/document-sync.ts`:
- Around line 66-89: The trailing-only throttle on the editor→client path
(editor.throttle callback that checks this._lastTid and calls this._mutex ->
this._client.pushDiff) can lose local edits when remote stateChange is applied
immediately; fix by making the local-to-client flush serialize with remote
application: either change the throttle options to run leading (e.g., { leading:
true, trailing: true }) so local edits are pushed immediately, or ensure the
remote stateChange handler acquires the same this._mutex (or awaits a flush of
the throttled handler) before applying incoming changes so pushDiff and remote
state application cannot interleave and overwrite pending local edits (reference
editor.throttle, this._lastTid, this._mutex, this.lastSyncedState,
this._client.pushDiff, and the remote stateChange handler).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d6d23f11-f015-4da1-9008-ea12f8aada45
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (5)
editor/grida-canvas/plugins/sync/document-sync.tspackages/grida-canvas-sync/__tests__/client.test.tspackages/grida-canvas-sync/src/client.tsservices/grida-canvas-document-worker-cf/package.jsonservices/grida-canvas-document-worker-cf/worker-configuration.d.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- services/grida-canvas-document-worker-cf/package.json
- packages/grida-canvas-sync/src/client.ts
| it("preserves local state when server responds with no state and no diff", () => { | ||
| const initialState: DocumentState = { | ||
| nodes: { | ||
| n1: makeNode("n1", { width: 100 }), | ||
| scene: makeNode("scene", { name: "Main" }, "scene"), | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify there are no 3-arg makeNode() calls in this test file.
python - <<'PY'
from pathlib import Path
import re
p = Path("packages/grida-canvas-sync/__tests__/client.test.ts")
src = p.read_text()
for m in re.finditer(r'\bmakeNode\s*\(([^)]*)\)', src):
args = [a.strip() for a in m.group(1).split(",")]
if len(args) > 2:
line = src.count("\n", 0, m.start()) + 1
print(f"Line {line}: {m.group(0)}")
PYRepository: gridaco/grida
Length of output: 291
🏁 Script executed:
cat -n "packages/grida-canvas-sync/__tests__/client.test.ts" | sed -n '160,170p'Repository: gridaco/grida
Length of output: 551
Remove the invalid third argument to makeNode at line 164.
The function makeNode accepts only an id (string) and optional props object, but the call passes three arguments: makeNode("scene", { name: "Main" }, "scene"). The third argument "scene" does not correspond to any parameter and causes a TypeScript type error.
Proposed fix
- scene: makeNode("scene", { name: "Main" }, "scene"),
+ scene: makeNode("scene", { name: "Main" }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("preserves local state when server responds with no state and no diff", () => { | |
| const initialState: DocumentState = { | |
| nodes: { | |
| n1: makeNode("n1", { width: 100 }), | |
| scene: makeNode("scene", { name: "Main" }, "scene"), | |
| }, | |
| it("preserves local state when server responds with no state and no diff", () => { | |
| const initialState: DocumentState = { | |
| nodes: { | |
| n1: makeNode("n1", { width: 100 }), | |
| scene: makeNode("scene", { name: "Main" }), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/grida-canvas-sync/__tests__/client.test.ts` around lines 160 - 165,
The test calls makeNode with an invalid third argument ("scene") causing a TS
type error; update the call in the test case ("preserves local state when server
responds with no state and no diff") to pass only the valid parameters by
removing the third argument so the makeNode invocation becomes makeNode("scene",
{ name: "Main" }), leaving the other makeNode calls unchanged.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/grida-canvas-sync/__tests__/client.test.ts (1)
139-150: Test name overstates what is asserted.The test title includes
connecting, but assertions only checksyncingandready. Consider renaming the test or explicitly assertingconnecting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/grida-canvas-sync/__tests__/client.test.ts` around lines 139 - 150, The test title claims "connecting → syncing → ready" but only asserts "syncing" and "ready"; update the spec to either rename the test to reflect only "syncing → ready" or add an explicit assertion for the "connecting" state: after calling transport.simulateConnected() assert that statuses contains "connecting" (in addition to or instead of the current "syncing" check), or adjust the test name string passed to it(...) to remove "connecting". Reference the existing test block using the it(...) description, the createClientAndTransport() helper, transport.simulateConnected(), client.on("statusChange", ...), transport.deliver({ type: "connect_ok", ... }) and client.status when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/wg/research/crdt/tldraw.md`:
- Around line 250-252: Update the conflicting phrasing around TLDraw's
conflict-resolution: replace the blanket “not last-write-wins” statement near
the line containing “TLDraw uses optimistic concurrency with
server-authoritative rebase” and adjust the later sentence that claims
same-field concurrent edits are effectively LWW via server order so both
describe the same behavior — e.g., say TLDraw is optimistic with
server-authoritative rebase which resolves concurrent same-field edits by
adopting server-ordered application (a deterministic resolution that can appear
LWW for that class of conflict) — and ensure both the earlier “not
last-write-wins” claim and the later “server order = LWW-like” phrasing are
aligned to this clarified wording.
- Around line 19-24: Several fenced code blocks in the tldraw CRDT doc are
missing language tags (MD040); update every unlabeled triple-backtick block
shown (the lists and ASCII diagrams such as the lines starting with
"@tldraw/store …", the WebSocket diagram, the "Client Server" diagrams, the
numbered steps, and the state transition "AwaitingConnectMessage → Connected …")
to include an explicit language like "text" (e.g., replace ``` with ```text) so
markdownlint no longer flags MD040 and the diagrams render as plain text.
- Around line 1-11: The markdown is missing required YAML frontmatter; add a
top-of-file frontmatter block containing title (use "TLDraw Sync: Real-Time
Collaboration Architecture" or similar), description (short SEO summary),
keywords (array of relevant terms), and format: md so the renderer picks up the
page; ensure the frontmatter is a valid YAML block (--- at top and bottom)
before the existing content/header so tooling and guidelines for
docs/{wg,reference} are satisfied.
---
Nitpick comments:
In `@packages/grida-canvas-sync/__tests__/client.test.ts`:
- Around line 139-150: The test title claims "connecting → syncing → ready" but
only asserts "syncing" and "ready"; update the spec to either rename the test to
reflect only "syncing → ready" or add an explicit assertion for the "connecting"
state: after calling transport.simulateConnected() assert that statuses contains
"connecting" (in addition to or instead of the current "syncing" check), or
adjust the test name string passed to it(...) to remove "connecting". Reference
the existing test block using the it(...) description, the
createClientAndTransport() helper, transport.simulateConnected(),
client.on("statusChange", ...), transport.deliver({ type: "connect_ok", ... })
and client.status when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b2fd69e7-9612-46ae-ab9b-91da7222d5c8
📒 Files selected for processing (2)
docs/wg/research/crdt/tldraw.mdpackages/grida-canvas-sync/__tests__/client.test.ts
| # TLDraw Sync: Real-Time Collaboration Architecture | ||
|
|
||
| > Research document covering tldraw's sync engine — architecture, data model, | ||
| > protocol, and conflict resolution strategy. | ||
| > | ||
| > **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026) | ||
| > **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync` | ||
|
|
||
| --- | ||
|
|
||
| ## 1. Architecture Overview |
There was a problem hiding this comment.
Add required docs frontmatter (title, description, keywords, format).
This new docs/wg/**.md page is missing required SEO frontmatter and format: md.
🔧 Proposed fix
+---
+title: TLDraw Sync: Real-Time Collaboration Architecture
+description: Research notes on TLDraw’s server-authoritative sync architecture, protocol, diff model, and optimistic rebase behavior.
+keywords:
+ - tldraw
+ - realtime collaboration
+ - sync protocol
+ - optimistic rebase
+ - conflict resolution
+format: md
+---
+
# TLDraw Sync: Real-Time Collaboration ArchitectureAs per coding guidelines, docs/{wg,reference}/**/*.md: “Include SEO frontmatter with title, description, and keywords … Add format: md frontmatter…”.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # TLDraw Sync: Real-Time Collaboration Architecture | |
| > Research document covering tldraw's sync engine — architecture, data model, | |
| > protocol, and conflict resolution strategy. | |
| > | |
| > **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026) | |
| > **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync` | |
| --- | |
| ## 1. Architecture Overview | |
| --- | |
| title: TLDraw Sync: Real-Time Collaboration Architecture | |
| description: Research notes on TLDraw’s server-authoritative sync architecture, protocol, diff model, and optimistic rebase behavior. | |
| keywords: | |
| - tldraw | |
| - realtime collaboration | |
| - sync protocol | |
| - optimistic rebase | |
| - conflict resolution | |
| format: md | |
| --- | |
| # TLDraw Sync: Real-Time Collaboration Architecture | |
| > Research document covering tldraw's sync engine — architecture, data model, | |
| > protocol, and conflict resolution strategy. | |
| > | |
| > **Source repo:** `tldraw/tldraw` (main branch, ~2024-2026) | |
| > **Key packages:** `@tldraw/store`, `@tldraw/sync-core`, `@tldraw/sync` | |
| --- | |
| ## 1. Architecture Overview |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/wg/research/crdt/tldraw.md` around lines 1 - 11, The markdown is missing
required YAML frontmatter; add a top-of-file frontmatter block containing title
(use "TLDraw Sync: Real-Time Collaboration Architecture" or similar),
description (short SEO summary), keywords (array of relevant terms), and format:
md so the renderer picks up the page; ensure the frontmatter is a valid YAML
block (--- at top and bottom) before the existing content/header so tooling and
guidelines for docs/{wg,reference} are satisfied.
| ``` | ||
| @tldraw/store — Generic record store with typed IDs, diffs, and history | ||
| @tldraw/sync-core — Protocol types, TLSyncRoom (server), TLSyncClient, storage interfaces | ||
| @tldraw/sync — React hook (useSync) that wires TLSyncClient to a TLStore | ||
| @tldraw/tlschema — Schema definitions, migrations, record types for tldraw shapes | ||
| ``` |
There was a problem hiding this comment.
Specify languages for unlabeled fenced code blocks (MD040).
markdownlint warnings are valid here; add explicit fence languages (e.g., text) to these blocks.
🔧 Proposed fix
-```
+```text
`@tldraw/store` — Generic record store with typed IDs, diffs, and history
...
-```
+```
-```
+```text
┌──────────┐ WebSocket ┌─────────────┐ WebSocket ┌──────────┐
...
-```
+```
-```
+```text
Client Server
...
-```
+```
-```
+```text
Client Server
...
-```
+```
-```
+```text
1. Flush store history
...
-```
+```
-```
+```text
AwaitingConnectMessage → Connected → AwaitingRemoval → (removed)
-```
+```Also applies to: 28-37, 173-195, 204-222, 263-274, 382-384
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 19-19: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/wg/research/crdt/tldraw.md` around lines 19 - 24, Several fenced code
blocks in the tldraw CRDT doc are missing language tags (MD040); update every
unlabeled triple-backtick block shown (the lists and ASCII diagrams such as the
lines starting with "@tldraw/store …", the WebSocket diagram, the "Client
Server" diagrams, the numbered steps, and the state transition
"AwaitingConnectMessage → Connected …") to include an explicit language like
"text" (e.g., replace ``` with ```text) so markdownlint no longer flags MD040
and the diagrams render as plain text.
| TLDraw uses **optimistic concurrency with server-authoritative rebase** — not | ||
| CRDTs, not OT, not last-write-wins. | ||
|
|
There was a problem hiding this comment.
Resolve contradiction in conflict-resolution wording.
Section 5 says “not last-write-wins,” but later the doc says same-field concurrent edits are effectively LWW via server order. Align these statements to avoid design ambiguity.
🔧 Proposed wording adjustment
-TLDraw uses **optimistic concurrency with server-authoritative rebase** — not
-CRDTs, not OT, not last-write-wins.
+TLDraw uses **optimistic concurrency with server-authoritative rebase** — not
+CRDTs and not OT. For same-field concurrent edits, final resolution is
+effectively server-order LWW after validation/rebase.Also applies to: 429-430
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/wg/research/crdt/tldraw.md` around lines 250 - 252, Update the
conflicting phrasing around TLDraw's conflict-resolution: replace the blanket
“not last-write-wins” statement near the line containing “TLDraw uses optimistic
concurrency with server-authoritative rebase” and adjust the later sentence that
claims same-field concurrent edits are effectively LWW via server order so both
describe the same behavior — e.g., say TLDraw is optimistic with
server-authoritative rebase which resolves concurrent same-field edits by
adopting server-ordered application (a deterministic resolution that can appear
LWW for that class of conflict) — and ensure both the earlier “not
last-write-wins” claim and the later “server order = LWW-like” phrasing are
aligned to this clarified wording.
Summary
Replaces the YJS-based PoC sync layer with a custom server-authoritative document sync system. The old system bridged YJS ↔ Immer ↔ Editor state, bypassing our FlatBuffers format entirely. The new system operates directly on typed property-level diffs aligned with the document model.
@grida/canvas-sync— standalone sync engine with protocol types, diff/patch/compose logic, server-side validation, andSyncClient(optimistic rebase model inspired by TLDraw's tlsync)grida-canvas-document-worker-cf—SyncRoomDurable Object with SQLite persistence, hibernatable WebSockets, atomic transactions, 1MB message guard, and presence relayplugins/sync/— replacesplugins/yjs/, bridgesSyncClient↔EditorDocumentStorevia typed serialization layeryjs,y-websocket,y-protocols,y-webrtcdependencies and the entireplugins/yjs/directoryTest coverage
@grida/canvas-sync— unit tests for diff/validate/client + 17 multi-client integration tests through aMockServerthat implementsSyncRoomsemantics (concurrent edits, LWW conflicts, reconnection, rapid bursts, 3-client scenarios)Editorinstances — two real editors wired throughMockServer, testing the full pipeline: editor dispatch → Zustand → serialize → diff → sync → broadcast → deserialize → applyDocumentPatches → editor stateKey design decisions
DocumentDiffwith typedNodeOp(put/patch/remove) andFieldOp(put/delete)ISyncTransportinterface —WebSocketTransportfor production,MockTransportfor testsArchitecture
See
crdt-impl.plan.mdat repo root for the full design document.Summary by CodeRabbit
New Features
Chores
Tests
Documentation