feat: complete blog post on nested tokenization by KarelZe · Pull Request #58 · KarelZe/blog

KarelZe · 2025-12-10T12:29:19Z

No description provided.

gemini-code-assist · 2025-12-10T12:29:33Z

Summary of Changes

Hello @KarelZe, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request completes and publishes two distinct blog posts by updating their metadata. The primary post, as indicated by the title, focuses on 'nested tokenization', while a second post discusses 'faithful synthetic data'. Both are transitioned from draft status to public, making them accessible to readers.

Highlights

Blog Post Publication: The pull request publishes two new blog posts by changing their 'draft' status from 'true' to 'false'.
Nested Tokenization Post: The blog post titled 'Nested Tokenization for Larger Context in Large Images' is now live.
Faithful Synthetic Data Post: The blog post titled 'How Faithful is Your Synthetic Data?' is also now live.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request marks two blog posts, 'How Faithful is Your Synthetic Data?' and 'Nested Tokenization for Larger Context in Large Images', as ready for publication by changing the draft status to false. However, a review of the content reveals that both posts are incomplete and contain numerous issues such as TODO comments, personal notes, typos, and unfinished sentences. Publishing them in their current state would be premature. I recommend reverting these changes and completing the posts before merging. Additionally, there is a potentially redundant index.md.draft file for the 'Nested Tokenization' post, which should be clarified.

gemini-code-assist · 2025-12-10T12:30:32Z

 description: My thoughts on the paper "How Faithful is Your Synthetic Data?".
 DisableComments: false
-draft: true
+draft: false


This post is being marked for publication, but it appears to be incomplete. It contains several TODO comments which indicate that work is still pending. Publishing the article in this state is likely not intended.

Here are the locations of the TODOs:

Line 129: TODO: add d3.js plot...

Line 185: TODO: Simplify into simple train-val loop...

Line 405: TODO: simplify.

Line 518: TODO: its unclear to me, what the k-nn is used for.

Line 534: TODO: translate + make a svg

Line 544: TODO: translate + make a svg

Please complete these sections and remove the TODO comments before setting draft: false.

Suggested change

draft: false

draft: true

gemini-code-assist · 2025-12-10T12:30:32Z

 description: My thoughts on the paper "Nested Tokenization for Larger Context in Large Images".
 disableComments: false
-draft: true
+draft: false


This post is being marked for publication, but it seems to be an unfinished draft. It contains several personal comments, typos, and an incomplete sentence that should be addressed before publishing.

Specifically:

Personal notes/comments that look like reminders for the author on lines 34, 35, 38, 39, and 47. These should be removed or integrated into the main text.

Typos:

Line 22: "Ressources" should be "Resources".

Line 36: "down-sights" should be "downsides".

Line 37: "Mare practically" should be "More practically".

Incomplete sentence on line 40, which ends with "while".

Please finish editing this post before marking it as ready for publication.

Suggested change

draft: false

draft: true

gemini-code-assist · 2025-12-10T12:30:32Z

 categories: [ai]
 disableComments: false
-draft: true
+draft: false


This file, which appears to be a draft version of nested-tokenization/index.md, is also being marked for publication. It contains the same issues as the main markdown file, such as personal comments and incomplete content.

Furthermore, having both index.md and index.md.draft for the same post is confusing. It's unclear if this file is intended for publication or if it's a backup that should be removed.

Please clarify the purpose of this file. If it's not meant to be published, it should be removed or kept as a draft. If it is, it needs to be completed first.

draft: true

…enization

…roblematic

feat: complete blog post on nested tokenization

3d3301d

KarelZe self-assigned this Dec 10, 2025

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

KarelZe added 4 commits December 10, 2025 19:40

feat: complete intro on nested tokenization

ff7f2dc

Merge remote-tracking branch 'origin/master' into complete-nested-tok…

2f9d100

…enization

feat: add section why cropping and down-sampling of vision input is p…

3cb0b45

…roblematic

feat: add sample implementation for dynamic scaling

f26470f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: complete blog post on nested tokenization#58

feat: complete blog post on nested tokenization#58
KarelZe wants to merge 5 commits into
masterfrom
complete-nested-tokenization

KarelZe commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KarelZe commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant