Skip to content

Conversation

@saminur
Copy link

@saminur saminur commented Dec 19, 2025

No description provided.

@saminur saminur self-assigned this Dec 19, 2025
@saminur saminur added the enhancement New feature or request label Dec 19, 2025
@bradley-erickson
Copy link

@saminur much of this code looks okay as a proof of concept
More information about how the tabs work / what commands are associated with tabs should be documented further. The information about how they work is crucial for determining the best way these will operate within the system.
Any of the these new or modified commands should be added to the modules/writing_observer/writing_observer/reconstruct_doc.py under the dispatch variable where all other Google doc commands are handled.


import { googledocs_id_from_url } from './writing_common';

import { tab_id_from_url } from './writing_common';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be combined with the line above it.
import { googledocs_id_from_url, tab_id_from_url } from './writing_common';

Support doc+tab scopes for reconstruction in writing_observer
Move googledoc reconstruction helpers into module and add tab scope
Wire gdoc_tab_scope reconstruct reducers and update registrations
… reconstruct_doc.py pipelien with Tab specific recosntruction
for cmd in commands:
self._apply_cmd(cmd, default_tab, event_timestamp)

def _apply_cmd(self, cmd: dict, current_tab: str, event_timestamp: Optional[int] = None) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the dispatch variable so we can easily call commands. If commands are missing, we should add them to the dispatch variable and call them that way.

ty = cmd.get("ty")
dispatch[ty](...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this by centralizing all command handling through dispatch. Text‑mutating commands go through text_dispatch via _cmd_text, while tab‑specific commands (mkch, ucp, ac, nm, ae, te) now have dedicated handlers in dispatch. In DocState._dispatch_cmd, every command resolves to dispatch[ty] (if present), so the handling is centralized as requested.
Please take a look at teh following commit.
ca13755

'''
# If it's not a relevant event, ignore it
if event['client']['event'] not in ["google_docs_save", "document_history"]:
def _extract_tab_id(client, root_event):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the tab already be available in the event based on the changes we made to the extension? I don't see why we need to have additional code to extract it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed _extract_tab_id and now rely solely on client.tab_id / event.tab_id. If it’s missing, we fall back to "t.0" rather than parsing the URL. This aligns with the updated extension contract and avoids redundant extraction logic.

"position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0,
"edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []})
if isinstance(internal_state, dict) else {"cursor": [], "length": []},
"doc_state": doc_state.to_dict(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc_state contains a lot of redundant information. Let's say a student has 30 pages of text on Tab 1 and 20 pages on Tab 2. The overall text key will have 50 pages of text while the doc_state also includes those 30 pages of text and the 20 pages of text.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the reducer so doc_state is stored only in internal state and removed from the external output. This avoids duplicating full tab text alongside the overall text in the response.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal / external state is an older implementation. Initially having 2 state was the plan, but as the system has grown, we are converging on using a single state.
That being said, we ought to keep the states the same.
The redundant information I previously mentioned is still an issue even if we are only keeping it in a single state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the tab list and metadata from writing_analysis.py so reconstruction now only emits text, position, and edit_metadata, and I keep doc_state only in the internal state to preserve incremental reconstruction. This reduces redundancy and keeps external output lean. If you want, I can also align internal/external shapes further, but that would require deciding whether to drop doc_state entirely or accept it as internal-only.

check ff2c693

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are converging on using a single state
we ought to keep the states the same

I apologize if this was not clear, but the states ought to remain the same. The external state is not being used and we are only using the internal state.

These changes also do not fix the problem that the doc_state key includes a duplication of the text. What if a student has 50 pages of text? In your implementation, we are storing those 50 pages twice in the same object.

"text": gdoc_reconstruct_doc.render_full_text(doc_state),
"tabs": tabs,
"position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0,
"edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []})
Copy link

@bradley-erickson bradley-erickson Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being updated with the information it previously had prior to your changes.
Both position and edit_metadata

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored cursor and edit_metadata by sourcing them directly from the active tab’s google_text state (so they reflect the latest cursor and edit history). These fields are now written into both internal and external state on every update.

Check the commit ca13755

tabs.append({
"tab_id": tab_id,
"title": tab.name or tab_id,
"last_accessed": tab.last_timestamp or tab.first_timestamp,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already recorded in some other reducers. I'm not sure its needed here.

Copy link
Author

@saminur saminur Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed last_accessed from the external tabs payload to avoid redundancy since it’s already tracked in other reducers. The tab output now only includes tab_id, title, and text.

Please take a look at 19a802f

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we briefly discussed this, but it may make more sense to create a new reducer for tab information that is per student per document scoped.
We have a document_list reducer right now that tracks this metadata for a document. A similar reducer for tabs (on a per document basis) would be the correct way to store this information on the system.

This partially comes back to reviewing what was already there and making sure that information is still present - the reconstruction data only provided the text, the cursor, and some edit metadata. It did not include metadata about the document itself, that was the job of another reducer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged! I have removed the tab metadata updates (first_timestamp/last_timestamp, last_url, last_server_time) and the tabs payload from reconstruct so it only returns text + cursor + edit metadata. I agree that tab metadata should live in a separate per‑student/per‑doc reducer (similar to document_list), and I’ll handle that in a follow‑up change.

@bradley-erickson
Copy link

@saminur you need to review how the reconstruction code worked prior and make sure your new reconstruction code still adheres to it.
Additionally, I'm seeing repeated lines of text included in the text portion of your reconstruct meaning that it is not reconstructing the text properly.



# Centralized dispatch for all command types.
dispatch = {
Copy link

@bradley-erickson bradley-erickson Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the existing dispatch variable? There is not clear indication in the code why we need both.

The cleanest solution is adding the missing commands to our current dispatch function instead of creating a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants