adding reconstruction codes for code review #150

saminur · 2025-12-19T16:43:06Z

No description provided.

bradley-erickson · 2026-01-04T17:04:35Z

@saminur much of this code looks okay as a proof of concept
More information about how the tabs work / what commands are associated with tabs should be documented further. The information about how they work is crucial for determining the best way these will operate within the system.
Any of the these new or modified commands should be added to the modules/writing_observer/writing_observer/reconstruct_doc.py under the dispatch variable where all other Google doc commands are handled.

bradley-erickson · 2026-01-15T14:25:50Z

extension/writing-process/src/background.js


 import { googledocs_id_from_url } from './writing_common';
-
+import { tab_id_from_url } from './writing_common';


this should be combined with the line above it.
import { googledocs_id_from_url, tab_id_from_url } from './writing_common';

Support doc+tab scopes for reconstruction in writing_observer Move googledoc reconstruction helpers into module and add tab scope Wire gdoc_tab_scope reconstruct reducers and update registrations

… reconstruct_doc.py pipelien with Tab specific recosntruction

bradley-erickson · 2026-01-23T14:40:09Z

modules/writing_observer/writing_observer/reconstruct_doc.py

+        for cmd in commands:
+            self._apply_cmd(cmd, default_tab, event_timestamp)
+
+    def _apply_cmd(self, cmd: dict, current_tab: str, event_timestamp: Optional[int] = None) -> None:


We have the dispatch variable so we can easily call commands. If commands are missing, we should add them to the dispatch variable and call them that way.

ty = cmd.get("ty") dispatch[ty](...)

I addressed this by centralizing all command handling through dispatch. Text‑mutating commands go through text_dispatch via _cmd_text, while tab‑specific commands (mkch, ucp, ac, nm, ae, te) now have dedicated handlers in dispatch. In DocState._dispatch_cmd, every command resolves to dispatch[ty] (if present), so the handling is centralized as requested.
Please take a look at teh following commit.
ca13755

bradley-erickson · 2026-01-23T14:43:19Z

modules/writing_observer/writing_observer/writing_analysis.py

    '''
-    # If it's not a relevant event, ignore it
-    if event['client']['event'] not in ["google_docs_save", "document_history"]:
+    def _extract_tab_id(client, root_event):


Shouldn't the tab already be available in the event based on the changes we made to the extension? I don't see why we need to have additional code to extract it.

I removed _extract_tab_id and now rely solely on client.tab_id / event.tab_id. If it’s missing, we fall back to "t.0" rather than parsing the URL. This aligns with the updated extension contract and avoids redundant extraction logic.

bradley-erickson · 2026-01-23T14:47:20Z

modules/writing_observer/writing_observer/writing_analysis.py

+        "position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0,
+        "edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []})
+            if isinstance(internal_state, dict) else {"cursor": [], "length": []},
+        "doc_state": doc_state.to_dict(),


This doc_state contains a lot of redundant information. Let's say a student has 30 pages of text on Tab 1 and 20 pages on Tab 2. The overall text key will have 50 pages of text while the doc_state also includes those 30 pages of text and the 20 pages of text.

I updated the reducer so doc_state is stored only in internal state and removed from the external output. This avoids duplicating full tab text alongside the overall text in the response.

The internal / external state is an older implementation. Initially having 2 state was the plan, but as the system has grown, we are converging on using a single state.
That being said, we ought to keep the states the same.
The redundant information I previously mentioned is still an issue even if we are only keeping it in a single state.

I removed the tab list and metadata from writing_analysis.py so reconstruction now only emits text, position, and edit_metadata, and I keep doc_state only in the internal state to preserve incremental reconstruction. This reduces redundancy and keeps external output lean. If you want, I can also align internal/external shapes further, but that would require deciding whether to drop doc_state entirely or accept it as internal-only.

check ff2c693

we are converging on using a single state
we ought to keep the states the same

I apologize if this was not clear, but the states ought to remain the same. The external state is not being used and we are only using the internal state.

These changes also do not fix the problem that the doc_state key includes a duplication of the text. What if a student has 50 pages of text? In your implementation, we are storing those 50 pages twice in the same object.

bradley-erickson · 2026-01-23T14:47:53Z

modules/writing_observer/writing_observer/writing_analysis.py

+        "text": gdoc_reconstruct_doc.render_full_text(doc_state),
+        "tabs": tabs,
+        "position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0,
+        "edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []})


I don't see this being updated with the information it previously had prior to your changes.
Both position and edit_metadata

I restored cursor and edit_metadata by sourcing them directly from the active tab’s google_text state (so they reflect the latest cursor and edit history). These fields are now written into both internal and external state on every update.

Check the commit ca13755

bradley-erickson · 2026-01-23T14:49:01Z

modules/writing_observer/writing_observer/writing_analysis.py

+        tabs.append({
+            "tab_id": tab_id,
+            "title": tab.name or tab_id,
+            "last_accessed": tab.last_timestamp or tab.first_timestamp,


This is already recorded in some other reducers. I'm not sure its needed here.

I removed last_accessed from the external tabs payload to avoid redundancy since it’s already tracked in other reducers. The tab output now only includes tab_id, title, and text.

Please take a look at 19a802f

I think we briefly discussed this, but it may make more sense to create a new reducer for tab information that is per student per document scoped.
We have a document_list reducer right now that tracks this metadata for a document. A similar reducer for tabs (on a per document basis) would be the correct way to store this information on the system.

This partially comes back to reviewing what was already there and making sure that information is still present - the reconstruction data only provided the text, the cursor, and some edit metadata. It did not include metadata about the document itself, that was the job of another reducer.

Acknowledged! I have removed the tab metadata updates (first_timestamp/last_timestamp, last_url, last_server_time) and the tabs payload from reconstruct so it only returns text + cursor + edit metadata. I agree that tab metadata should live in a separate per‑student/per‑doc reducer (similar to document_list), and I’ll handle that in a follow‑up change.

bradley-erickson · 2026-01-23T14:51:11Z

@saminur you need to review how the reconstruction code worked prior and make sure your new reconstruction code still adheres to it.
Additionally, I'm seeing repeated lines of text included in the text portion of your reconstruct meaning that it is not reconstructing the text properly.

bradley-erickson · 2026-01-26T14:57:35Z

modules/writing_observer/writing_observer/reconstruct_doc.py

+
+
+# Centralized dispatch for all command types.
+dispatch = {


Why not use the existing dispatch variable? There is not clear indication in the code why we need both.

The cleanest solution is adding the missing commands to our current dispatch function instead of creating a new one.

adding reconstruction codes for code review

b09e62f

saminur requested a review from bradley-erickson December 19, 2025 16:43

saminur self-assigned this Dec 19, 2025

saminur added the enhancement New feature or request label Dec 19, 2025

saminur added 2 commits January 11, 2026 12:28

working offline reducer for the google doc reconstruction

b3c784a

added the tab sepecific code in extension

6aa5329

bradley-erickson reviewed Jan 15, 2026

View reviewed changes

saminur added 3 commits January 16, 2026 09:39

addressed brad's comment on merging import

366ded8

Add tab-scoped reconstruct reducer and localize reconstruction helpers

ba58f15

Support doc+tab scopes for reconstruction in writing_observer Move googledoc reconstruction helpers into module and add tab scope Wire gdoc_tab_scope reconstruct reducers and update registrations

Removed unused file command_state.py and load_event.py and update old…

139bec7

… reconstruct_doc.py pipelien with Tab specific recosntruction

bradley-erickson reviewed Jan 23, 2026

View reviewed changes

saminur added 2 commits January 23, 2026 21:24

added centralized dispatch, cursor metadata and doc_state

ca13755

remove redundant codes from writing_analysis.py

19a802f

bradley-erickson reviewed Jan 26, 2026

View reviewed changes

saminur added 3 commits January 26, 2026 13:33

single dispatch system for doc and tab

c1f0ac3

tab list and metadata from reconstructio reducer

ff2c693

code for tab_list reducer

d374e37


		import { googledocs_id_from_url } from './writing_common';

		import { tab_id_from_url } from './writing_common';

adding reconstruction codes for code review #150

Are you sure you want to change the base?

adding reconstruction codes for code review #150

Uh oh!

Conversation

saminur commented Dec 19, 2025

Uh oh!

bradley-erickson commented Jan 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bradley-erickson Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saminur Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bradley-erickson commented Jan 23, 2026

Uh oh!

bradley-erickson Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bradley-erickson Jan 23, 2026 •

edited

Loading

saminur Jan 24, 2026 •

edited

Loading

bradley-erickson Jan 26, 2026 •

edited

Loading