Skip to content

feat: complete blog post on nested tokenization#58

Draft
KarelZe wants to merge 5 commits into
masterfrom
complete-nested-tokenization
Draft

feat: complete blog post on nested tokenization#58
KarelZe wants to merge 5 commits into
masterfrom
complete-nested-tokenization

Conversation

@KarelZe
Copy link
Copy Markdown
Owner

@KarelZe KarelZe commented Dec 10, 2025

No description provided.

@KarelZe KarelZe self-assigned this Dec 10, 2025
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @KarelZe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request completes and publishes two distinct blog posts by updating their metadata. The primary post, as indicated by the title, focuses on 'nested tokenization', while a second post discusses 'faithful synthetic data'. Both are transitioned from draft status to public, making them accessible to readers.

Highlights

  • Blog Post Publication: The pull request publishes two new blog posts by changing their 'draft' status from 'true' to 'false'.
  • Nested Tokenization Post: The blog post titled 'Nested Tokenization for Larger Context in Large Images' is now live.
  • Faithful Synthetic Data Post: The blog post titled 'How Faithful is Your Synthetic Data?' is also now live.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request marks two blog posts, 'How Faithful is Your Synthetic Data?' and 'Nested Tokenization for Larger Context in Large Images', as ready for publication by changing the draft status to false. However, a review of the content reveals that both posts are incomplete and contain numerous issues such as TODO comments, personal notes, typos, and unfinished sentences. Publishing them in their current state would be premature. I recommend reverting these changes and completing the posts before merging. Additionally, there is a potentially redundant index.md.draft file for the 'Nested Tokenization' post, which should be clarified.

description: My thoughts on the paper "How Faithful is Your Synthetic Data?".
DisableComments: false
draft: true
draft: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This post is being marked for publication, but it appears to be incomplete. It contains several TODO comments which indicate that work is still pending. Publishing the article in this state is likely not intended.

Here are the locations of the TODOs:

  • Line 129: TODO: add d3.js plot...
  • Line 185: TODO: Simplify into simple train-val loop...
  • Line 405: TODO: simplify.
  • Line 518: TODO: its unclear to me, what the k-nn is used for.
  • Line 534: TODO: translate + make a svg
  • Line 544: TODO: translate + make a svg

Please complete these sections and remove the TODO comments before setting draft: false.

Suggested change
draft: false
draft: true

description: My thoughts on the paper "Nested Tokenization for Larger Context in Large Images".
disableComments: false
draft: true
draft: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This post is being marked for publication, but it seems to be an unfinished draft. It contains several personal comments, typos, and an incomplete sentence that should be addressed before publishing.

Specifically:

  • Personal notes/comments that look like reminders for the author on lines 34, 35, 38, 39, and 47. These should be removed or integrated into the main text.
  • Typos:
    • Line 22: "Ressources" should be "Resources".
    • Line 36: "down-sights" should be "downsides".
    • Line 37: "Mare practically" should be "More practically".
  • Incomplete sentence on line 40, which ends with "while".

Please finish editing this post before marking it as ready for publication.

Suggested change
draft: false
draft: true

categories: [ai]
disableComments: false
draft: true
draft: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file, which appears to be a draft version of nested-tokenization/index.md, is also being marked for publication. It contains the same issues as the main markdown file, such as personal comments and incomplete content.

Furthermore, having both index.md and index.md.draft for the same post is confusing. It's unclear if this file is intended for publication or if it's a backup that should be removed.

Please clarify the purpose of this file. If it's not meant to be published, it should be removed or kept as a draft. If it is, it needs to be completed first.

draft: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant