-
Notifications
You must be signed in to change notification settings - Fork 2
adding reconstruction codes for code review #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@saminur much of this code looks okay as a proof of concept |
|
|
||
| import { googledocs_id_from_url } from './writing_common'; | ||
|
|
||
| import { tab_id_from_url } from './writing_common'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be combined with the line above it.
import { googledocs_id_from_url, tab_id_from_url } from './writing_common';
Support doc+tab scopes for reconstruction in writing_observer Move googledoc reconstruction helpers into module and add tab scope Wire gdoc_tab_scope reconstruct reducers and update registrations
… reconstruct_doc.py pipelien with Tab specific recosntruction
| for cmd in commands: | ||
| self._apply_cmd(cmd, default_tab, event_timestamp) | ||
|
|
||
| def _apply_cmd(self, cmd: dict, current_tab: str, event_timestamp: Optional[int] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the dispatch variable so we can easily call commands. If commands are missing, we should add them to the dispatch variable and call them that way.
ty = cmd.get("ty")
dispatch[ty](...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this by centralizing all command handling through dispatch. Text‑mutating commands go through text_dispatch via _cmd_text, while tab‑specific commands (mkch, ucp, ac, nm, ae, te) now have dedicated handlers in dispatch. In DocState._dispatch_cmd, every command resolves to dispatch[ty] (if present), so the handling is centralized as requested.
Please take a look at teh following commit.
ca13755
| ''' | ||
| # If it's not a relevant event, ignore it | ||
| if event['client']['event'] not in ["google_docs_save", "document_history"]: | ||
| def _extract_tab_id(client, root_event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the tab already be available in the event based on the changes we made to the extension? I don't see why we need to have additional code to extract it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed _extract_tab_id and now rely solely on client.tab_id / event.tab_id. If it’s missing, we fall back to "t.0" rather than parsing the URL. This aligns with the updated extension contract and avoids redundant extraction logic.
| "position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0, | ||
| "edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []}) | ||
| if isinstance(internal_state, dict) else {"cursor": [], "length": []}, | ||
| "doc_state": doc_state.to_dict(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc_state contains a lot of redundant information. Let's say a student has 30 pages of text on Tab 1 and 20 pages on Tab 2. The overall text key will have 50 pages of text while the doc_state also includes those 30 pages of text and the 20 pages of text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the reducer so doc_state is stored only in internal state and removed from the external output. This avoids duplicating full tab text alongside the overall text in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal / external state is an older implementation. Initially having 2 state was the plan, but as the system has grown, we are converging on using a single state.
That being said, we ought to keep the states the same.
The redundant information I previously mentioned is still an issue even if we are only keeping it in a single state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the tab list and metadata from writing_analysis.py so reconstruction now only emits text, position, and edit_metadata, and I keep doc_state only in the internal state to preserve incremental reconstruction. This reduces redundancy and keeps external output lean. If you want, I can also align internal/external shapes further, but that would require deciding whether to drop doc_state entirely or accept it as internal-only.
check ff2c693
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are converging on using a single state
we ought to keep the states the same
I apologize if this was not clear, but the states ought to remain the same. The external state is not being used and we are only using the internal state.
These changes also do not fix the problem that the doc_state key includes a duplication of the text. What if a student has 50 pages of text? In your implementation, we are storing those 50 pages twice in the same object.
| "text": gdoc_reconstruct_doc.render_full_text(doc_state), | ||
| "tabs": tabs, | ||
| "position": internal_state.get("position", 0) if isinstance(internal_state, dict) else 0, | ||
| "edit_metadata": internal_state.get("edit_metadata", {"cursor": [], "length": []}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this being updated with the information it previously had prior to your changes.
Both position and edit_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored cursor and edit_metadata by sourcing them directly from the active tab’s google_text state (so they reflect the latest cursor and edit history). These fields are now written into both internal and external state on every update.
Check the commit ca13755
| tabs.append({ | ||
| "tab_id": tab_id, | ||
| "title": tab.name or tab_id, | ||
| "last_accessed": tab.last_timestamp or tab.first_timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already recorded in some other reducers. I'm not sure its needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed last_accessed from the external tabs payload to avoid redundancy since it’s already tracked in other reducers. The tab output now only includes tab_id, title, and text.
Please take a look at 19a802f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we briefly discussed this, but it may make more sense to create a new reducer for tab information that is per student per document scoped.
We have a document_list reducer right now that tracks this metadata for a document. A similar reducer for tabs (on a per document basis) would be the correct way to store this information on the system.
This partially comes back to reviewing what was already there and making sure that information is still present - the reconstruction data only provided the text, the cursor, and some edit metadata. It did not include metadata about the document itself, that was the job of another reducer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged! I have removed the tab metadata updates (first_timestamp/last_timestamp, last_url, last_server_time) and the tabs payload from reconstruct so it only returns text + cursor + edit metadata. I agree that tab metadata should live in a separate per‑student/per‑doc reducer (similar to document_list), and I’ll handle that in a follow‑up change.
|
@saminur you need to review how the reconstruction code worked prior and make sure your new reconstruction code still adheres to it. |
|
|
||
|
|
||
| # Centralized dispatch for all command types. | ||
| dispatch = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the existing dispatch variable? There is not clear indication in the code why we need both.
The cleanest solution is adding the missing commands to our current dispatch function instead of creating a new one.
No description provided.