Skip to content

feat(slides): add slides.createFromJson — agent-friendly blueprint-to-slides tool#348

Open
n0012 wants to merge 22 commits intogemini-cli-extensions:mainfrom
n0012:feat/slides-create-from-json
Open

feat(slides): add slides.createFromJson — agent-friendly blueprint-to-slides tool#348
n0012 wants to merge 22 commits intogemini-cli-extensions:mainfrom
n0012:feat/slides-create-from-json

Conversation

@n0012
Copy link
Copy Markdown

@n0012 n0012 commented Apr 25, 2026

What this adds

slides.createFromJson — a high-level tool designed specifically for AI agents that need to build presentations without knowing the raw Slides API format.

Agents pass a JSON blueprint; the server translates it into a Slides API batchUpdate in one round trip.

Blueprint format

// Multiple slides
{ "slides": [{ "elements": [...] }, ...] }

// Single slide shorthand
{ "elements": [...] }

Element schema

Each element: type (text | shape | image), position ({x,y,w,h} in points on a 720×405 grid), optional content, shape_type, url, layer (z-index).

style object (all optional):

Property Description
size Font size (points)
bold, italic, underline, strikethrough Text decoration
font_family Font name — defaults to "Arial" for cross-platform compatibility
align START | CENTER | END
vertical_align TOP | MIDDLE | BOTTOM
color Text color {red,green,blue} (0-1)
bg_color Shape fill {red,green,blue} (0-1)
border_color / border_weight Shape outline
no_border Strip border entirely
bold_phrases Array of substrings to bold within a text box
bold_until Bold chars 0–N
links [{text, url}] inline hyperlinks by matching text

How layers work

Elements are sorted before rendering: shapes → images → text, then by layer value within each group. This ensures background shapes reliably render beneath text without requiring agents to manually sequence API calls.

Also included

  • slides.create: create a blank presentation, returns {presentationId, title, url}
  • slides.batchUpdate: raw Slides API request array passthrough — escape hatch for operations createFromJson doesn't cover

Relationship to #237

This PR complements #237. The slides.create and slides.batchUpdate implementations here are intentionally minimal; #237's versions (with seeding, rollback logic, and folder support) are more complete and should take precedence on merge. slides.createFromJson is not present in #237 and is the primary new contribution here.

Happy to rebase on top of #237 once it merges to reduce overlap.

Validation

  • TypeScript type-check passes (tsc --noEmit)
  • Manually validated end-to-end: created a 7-slide presentation with mixed shapes, text layers, colors, and inline links through the MCP in a real Gemini CLI / Claude Code session

…eFromJson

slides.createFromJson is a high-level agent-friendly tool that accepts a
JSON blueprint and translates it into Slides API batch requests in one call.
Avoids requiring agents to know the raw Slides API request format.

Blueprint format:
  { "slides": [{ "elements": [...] }, ...] }  — multiple slides
  { "elements": [...] }                         — single slide (shorthand)

Each element supports: type (text|shape|image), position (x,y,w,h in PT on
a 720x405 grid), content, shape_type, url, layer (z-index), and a style
object with: size, bold, italic, underline, strikethrough, font_family,
align, vertical_align, color, bg_color, border_color, border_weight,
no_border, bold_phrases, bold_until, links.

Hardcoded Google Sans replaced with configurable font_family (defaults to
Arial for cross-platform compatibility).

Layers are sorted before rendering: shapes → images → text, then by layer
value, so background shapes reliably render beneath text.

Also adds:
- slides.create: create a blank presentation (returns ID + URL)
- slides.batchUpdate: execute raw Slides API request arrays (escape hatch
  for operations not covered by createFromJson)

Complements gemini-cli-extensions#237 which adds further write tools (addSlide, insertText,
replaceText, deleteSlide). The batchUpdate and create implementations here
are intentionally minimal; gemini-cli-extensions#237's versions may supersede them on merge.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new Google Slides tools for creating presentations, performing batch updates, and generating slides from JSON blueprints. The review feedback highlights the need for consistency by using the registerTool wrapper to respect feature flags and suggests enhancing input schemas to support structured objects. Additionally, the feedback addresses a potential crash in the createFromJson service method and recommends adjusting the slide insertion logic to append new slides to the end of a presentation.

slidesService.getSlideThumbnail,
);

server.registerTool(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The slides.create tool is being registered using server.registerTool directly, which bypasses the registerTool wrapper defined on line 154. This wrapper is responsible for checking if the tool is enabled via feature flags (WORKSPACE_FEATURE_OVERRIDES). Using the wrapper ensures consistency and allows users to disable these tools if needed.

Suggested change
server.registerTool(
registerTool(

);

server.registerTool(
'slides.batchUpdate',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to slides.create, this tool should be registered using the registerTool wrapper to respect feature flags.

  registerTool(

});

server.registerTool(
'slides.createFromJson',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This tool should also use the registerTool wrapper to ensure it can be managed via feature flags.

  registerTool(

Comment on lines +482 to +486
requests: z
.string()
.describe(
'JSON string of an array of Slides API request objects (e.g., [{"createSlide":{}}, {"createShape":{...}}]). Will be parsed server-side.',
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requests field is restricted to a string, but the SlidesService.batchUpdate implementation (line 329) and most MCP clients support passing structured arrays directly. Allowing both a JSON string and an array of objects provides a better experience for AI agents.

Suggested change
requests: z
.string()
.describe(
'JSON string of an array of Slides API request objects (e.g., [{"createSlide":{}}, {"createShape":{...}}]). Will be parsed server-side.',
),
requests: z
.union([z.string(), z.array(z.any())])
.describe(
'An array of Slides API request objects or a JSON string of that array (e.g., [{"createSlide":{}}, {"createShape":{...}}]).',
),

Comment on lines +602 to +606
slideJson: z
.string()
.describe(
'JSON string of the slide blueprint. Use {"slides":[{"elements":[...]},...]} for multiple slides or {"elements":[...]} for one slide. Will be parsed server-side.',
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The slideElementSchema defined on lines 493-591 is currently unused. It should be applied to the slideJson input schema to provide the AI agent with structured validation and clear documentation of the expected blueprint format. Additionally, allowing both objects and strings makes the tool more robust.

        slideJson: z
          .union([
            z.object({
              slides: z.array(z.object({ elements: z.array(slideElementSchema) })),
            }),
            z.object({
              elements: z.array(slideElementSchema),
            }),
            z.string(),
          ])
          .describe(
            'The slide blueprint. Use {"slides":[{"elements":[...]}]} for multiple slides or {"elements":[...]} for one slide. Can be a JSON string or object.',
          ),

Comment on lines +679 to +681
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides
: [{ elements: (slideJson as any).elements || [] }];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the slides format is used but an individual slide object is missing the elements property (e.g., { "slides": [{}] }), slideDefs[i].elements will be undefined. This will cause a crash in buildSlideRequests when it attempts to spread or iterate over elements (line 421).

Suggested change
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides
: [{ elements: (slideJson as any).elements || [] }];
const slideDefs = (slideJson as any).slides
? (slideJson as any).slides.map((s: any) => ({ ...s, elements: s.elements || [] }))
: [{ elements: (slideJson as any).elements || [] }];

requests.push({
createSlide: {
objectId: slideId,
insertionIndex: i + 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding insertionIndex: i + 1 causes new slides to always be inserted at the beginning of the presentation (after the first slide). For an 'add slides' tool, the expected behavior is usually to append slides to the end. Omitting insertionIndex entirely will cause the Slides API to append the new slides to the end of the presentation.

Suggested change
insertionIndex: i + 1,
slideLayoutReference: { predefinedLayout: 'BLANK' },

…late image URLs

Three gaps vs the Python reference implementation:

indent: expose indentStart in updateParagraphStyle so agents can
express bullet/sub-item indentation without raw batchUpdate calls.
Style property: indent (number, points).

vertical_align on shapes: contentAlignment was only applied to text
elements. Shapes (e.g. a labelled colored box) also accept
contentAlignment — added to the shape updateShapeProperties block
so vertical_align works on both element types.

Image URL sanitization: when an LLM outputs a template URL with
unresolved placeholders (e.g. https://.../{{color}}/{{icon}}.png),
the Slides API call fails. Detect '{' or '%7B' in the URL and fall
back to a neutral info icon rather than breaking the whole slide.
@n0012
Copy link
Copy Markdown
Author

n0012 commented Apr 25, 2026

test with gemini extensions install https://github.com/n0012/workspace

new tool that can batch create slides is slides.createFromJson

n0012 added 20 commits April 25, 2026 19:55
Adds an optional 'theme' parameter with built-in named palettes:
  - google  — Google Blue/Green + Google Sans
  - minimal — dark grey headers + Arial
  - dark    — dark background + blue accent + Arial

When a theme is active:
  - Color fields in style accept named aliases instead of RGB objects:
    primary, primary_text, secondary, secondary_text, surface,
    surface_alt, text, text_muted, background
  - font_family defaults to the theme font (pass 'theme' to opt in explicitly)
  - text color defaults to theme.text when not specified

Fully backward compatible — existing blueprints with explicit RGB values
and no theme parameter behave identically to before.
Theme changes:
- Add 'light' theme (default): near-black #202124 headers,
  Google Blue 600 #1A73E8 accent, Arial, white background.
  Safe cross-platform — no Google-specific fonts required.

- Fix 'google' theme to match official Google brand palette:
    primary:   #1A73E8 (Google Blue 600, was #1C549E — wrong)
    secondary: #34A853 (Google Green,    was #198A3D — wrong)
    surface:   #E8F0FE (Blue 50)
    surfaceAlt:#E6F4EA (Green 50)
    text:      #1F1F1F (gm3-sys-color-on-surface)
    textMuted: #444746 (gm3-sys-color-on-surface-variant)
    font:      Google Sans (unchanged)

- Fix 'dark' theme primary to #0B57D0 (Material 3 gm3-sys-color-primary)

- 'light' is now the default when no theme param is provided,
  giving new users sensible font and color defaults out of the box.
  Backward compat: existing blueprints with explicit RGB values
  get theme.text as text color default (was hardcoded black).
Adds concise layout guidance to the tool description so agents
understand the 720x405 grid, layer ordering, padding conventions,
font size hierarchy, and theme alias usage without needing a
separate skill loaded. Principles-based, not rigid recipes.
Replaces prescriptive layout recipe with design intent guidance:
- Lead with 'let content drive layout' not 'header bars go here'
- Deck consistency comes from theme/margin/type system, not templates
- Technical notes (layers, text clipping) kept as reference
- Opinionated layout patterns live in slides-designer skill instead
Retrieves and writes per-slide speaker notes.

getSpeakerNotes returns an array — one entry per slide — with
slideIndex, slideObjectId, speakerNotesObjectId, and notes text.
updateSpeakerNotes replaces notes on a single slide by objectId.

Approach adapted from gemini-cli-extensions#235
by @stefanoamorelli (MIT). Key difference: we target slides
by objectId rather than index, matching the pattern used by
getMetadata and createFromJson throughout this PR.
…cription

Two observed failure modes addressed:

1. Theme ignored: agent set theme:'google' but hardcoded RGB values,
   so theme had no visual effect. Description now leads with 'HOW
   THEMES WORK — IMPORTANT' and explicitly states that hardcoding RGB
   bypasses the theme entirely. Color aliases are the mechanism.

2. Speaker notes skipped: agent built the deck but never called
   updateSpeakerNotes because the prompt didn't mention notes.
   Description now instructs: for any professional presentation,
   write speaker notes for every slide after creation.
speaker_notes can now be included per-slide in the JSON blueprint:
  { slides: [{ speaker_notes: '...', elements: [...] }] }

The server writes notes automatically after creating slides — no
separate getSpeakerNotes/updateSpeakerNotes calls needed. This
removes the most common failure mode (agent creates slides but
never calls updateSpeakerNotes because it's a separate step).

Also updates tool description to reflect inline notes and
reinforces that theme aliases are the mechanism for brand colors.
Model was ignoring optional speaker_notes — field existed but was
never populated across three test runs. Changed schema description
to REQUIRED with 'Omitting speaker_notes produces an unprofessional
deck' framing. The server still handles missing notes gracefully
(no crash), but the model should now always include them.
The 'google' theme was producing blue-and-green-only decks that
didn't look authentically Google. Three changes:

1. Headers now use #202124 (near-black) instead of #1A73E8. Real
   Google decks use dark headers with brand colors as accents,
   not solid blue bars on every slide.

2. Added all four Google brand colors as accent aliases:
   'blue' #4285F4, 'red' #EA4335, 'yellow' #FBBC05, 'green' #34A853
   (also accessible as accent1-accent4).

3. Tool description now says 'use all four across a deck for an
   authentic Google look' to prevent agents defaulting to blue-only.

Theme interface extended with optional accent1-accent4 fields.
Backward compatible — existing blueprints using 'primary'/'secondary'
still resolve correctly.
Adds explicit instruction to include the signature Google four-color
bar (blue/red/yellow/green, 4×180pt rectangles at y=397) on every
slide in a Google-themed deck. Also adds semantic color aliases
'blue', 'red', 'yellow', 'green' that resolve to the four brand
colors so agents can reference them by name.
Dark theme had both primary and secondary as blue — every slide
was blue-on-dark with zero variety. Replaced with Material Design
300-weight colors that are light enough for dark backgrounds:
  Blue 300 #8AB4F8, Red 300 #F28B82, Yellow 300 #FDD663, Green 300 #81C995

Light theme: added accent1-4 as muted professional tones (Blue,
Red 600, Amber, Green 700) for agents that want variety without
switching to the Google brand palette.

All three themes now have four accent colors available via the
blue/red/yellow/green and accent1-4 aliases.
Adds 'LESS IS MORE' principle to tool description: color is for
emphasis not decoration, most slides should be mostly white with
dark text. Removes 'four-color bar on every slide' instruction
(was producing template-heavy output). Theme description simplified.
When createFromJson returns with no speaker_notes provided, the
response now includes:
  speakerNotesStatus: 'MISSING'
  action_required: 'Call slides.updateSpeakerNotes for each slideId...'

This nudges the agent to make a second pass for notes instead of
considering the job done. Addresses the failure mode where Gemini
drops speaker_notes from the JSON blueprint because it's focused
on layout complexity.

Tool description updated to describe both approaches: inline
(simpler) or two-pass (focus on layout first, notes second).
One-sentence notes don't make a usable talk track. Guided to write
4-6 sentences per slide: opening line, key points, transition.
Google Slides creates a blank first slide (objectId 'p') with every
new presentation. createFromJson now deletes it automatically after
building the real slides, so the deck starts clean. Fails silently
if the slide doesn't exist (not a new presentation).
Dark theme had no real usage and added complexity. Light theme
updated to use same Google brand colors (just Arial instead of
Google Sans). Two themes: light (default, Arial) and google
(Google Sans). Both share the same palette and layout philosophy.
One theme. Google brand palette is always active — no parameter
needed. Color aliases resolve automatically. Removes light/dark
themes and the theme enum from the schema. Users can still use
hardcoded RGB for one-off colors.
Server now automatically adds to every slide created by createFromJson:
- 'Google Cloud' text (10pt, text_muted) at bottom-left (x=36, y=375)
- 'Proprietary & Confidential' text (10pt, right-aligned) at bottom-right
- Four-color brand bar (red/yellow/blue/green, 4×180pt, h=8) at y=397

Agents no longer need to include these in the blueprint JSON. Ensures
consistent positioning across all slides — fixes the offset/missing
footer inconsistency from agent-generated footers.
Footer text ('Google Cloud', 'Proprietary & Confidential') and
four-color brand bar were hardcoded into createFromJson. This locked
every user into Google branding. Moved to the slides-designer skill
where brand decisions belong. The server stays generic — a different
skill can teach different branding.
Brand-neutral template skill for slides.createFromJson with
placeholder branding ('YOUR COMPANY', generic colors). Users
copy and customize for their organization. Includes all 5 layout
patterns, design philosophy, footer snippet, and tips.

The Google Cloud version lives in n0012/ai-skills/slides-designer
as a filled-in example of this template.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant