Skip to content

Enhance inference documentation with batch API support#583

Merged
NiveditJain merged 4 commits intomainfrom
inference-standards
Dec 8, 2025
Merged

Enhance inference documentation with batch API support#583
NiveditJain merged 4 commits intomainfrom
inference-standards

Conversation

@NiveditJain
Copy link
Member

Added details on the batch inference API format compatible with OpenAI and other providers. Included instructions for uploading JSONL files, making inference requests, and retrieving output files. Updated example requests and responses for clarity.

Added details on the batch inference API format compatible with OpenAI and other providers. Included instructions for uploading JSONL files, making inference requests, and retrieving output files. Updated example requests and responses for clarity.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guidance for batch inference using JSONL format, including file upload procedures, request/response examples, and batch output configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds documentation describing batch inference via JSONL: how to upload files with PUT /v0/files/, invoke /v0/infer/ using a file_id, specify Output-Format: jsonl, and retrieve completed JSONL outputs. Contains example requests, responses, and notes that existing inference endpoints/status logic remain unchanged.

Changes

Cohort / File(s) Summary
Batch inference docs
docs/docs/inference.md
Added documentation for batch inference using JSONL: file upload (PUT /v0/files/), invoking inference with file_id at /v0/infer/, use of Output-Format: jsonl, example request/response payloads, status tracking, and sample completed output with output_url.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Verify example requests/responses match actual API behavior and schemas
  • Confirm endpoint paths and HTTP methods are correct
  • Ensure JSONL examples are syntactically valid and representative

Poem

🐰 I nibbled text and found a clue,

Batch files line up, each one true,
PUT and infer, the steps align,
JSONL outputs, tidy, fine,
Hopping on docs — a helpful chew.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Enhance inference documentation with batch API support' directly and accurately summarizes the main change: adding documentation for batch API inference capabilities.
Description check ✅ Passed The description is directly related to the changeset, detailing the batch inference API documentation additions including JSONL file handling, requests, responses, and output file retrieval.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch inference-standards

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @NiveditJain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the inference documentation by introducing a new section dedicated to batch inference API support. It provides clear, step-by-step instructions and practical examples for users to leverage batch processing, including file uploads, request formatting, and output retrieval, aligning the API's capabilities with common industry standards.

Highlights

  • Batch Inference API Documentation: Added comprehensive documentation for the batch inference API, detailing its compatibility with formats used by OpenAI and other providers.
  • JSONL File Upload: Included instructions and examples for uploading JSONL files containing multiple inference requests using the PUT /v0/files/ endpoint.
  • Batch Inference Requests: Provided guidance on how to initiate batch inference using the /infer/ API by passing a file_id obtained from the file upload.
  • Output File Retrieval: Documented the process for requesting and retrieving inference outputs as JSONL files, including an example of downloading results from a provided output_url.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the inference documentation by adding details about the batch inference API. The changes are clear and provide good examples for the new workflow. I've made a few suggestions to correct minor grammatical errors and improve the overall readability of the documentation.

NiveditJain and others added 2 commits December 8, 2025 21:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf44b38 and bcf7924.

📒 Files selected for processing (1)
  • docs/docs/inference.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/docs/inference.md

128-128: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


130-130: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


134-134: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


136-136: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


146-146: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


152-152: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


165-165: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


176-176: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


178-178: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


188-188: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


191-191: Files should end with a single newline character

(MD047, single-trailing-newline)

🔇 Additional comments (1)
docs/docs/inference.md (1)

128-143: Markdown formatting issue is incorrect; blank line already exists before heading.

The review identifies a non-existent markdown formatting problem. A blank line already exists at line 128 before the ### PUT /v0/files/ heading at line 129, so the proposed diff would introduce a duplicate blank line.

The curl syntax concern about using -F (multipart form-data) with -X PUT is valid to consider, but requires verification against the actual server implementation expectations, which is not visible in the repository.

Likely an incorrect or invalid review comment.

@NiveditJain NiveditJain merged commit 464b667 into main Dec 8, 2025
4 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
docs/docs/inference.md (4)

129-149: Fix markdown formatting: add blank lines before code blocks (MD031).

Multiple code blocks are missing required blank lines. Additionally, the text on line 145 can be improved for clarity.

 This API is used to upload a file to the server. Example request:
+
 ```bash
 curl -X PUT https://models.exosphere.host/v0/files/mydata.jsonl \
   -H "Authorization: Bearer <your-api-key>" \
   -F file="@mydata.jsonl"

Example response:
+

{
    "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852",
    "file_name": "mydata.jsonl",
    "bytes": 1000,
    "mime_type": "application/jsonl"
}

-The expected file content should look like:
+Expected file format (JSONL):
+

{"key": "object-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generation_config": {"temperature": 0.7}, "model": "deepseek:r1-32b"}}
{"key": "object-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}], "generation_config": {"temperature": 0.7}, "model": "openai:gpt-4o"}}

---

`151-162`: **Add blank line before code block (MD031).**

The bash code block needs a blank line before it.



```diff
 Now you can pass the file_id to the `/infer/` API to run inference on the file. Example request:
+
 ```bash
 curl -X POST https://models.exosphere.host/v0/infer/ \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer <your-api-key>" \
   -d '[
     {
         "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852",
         "sla": 60
     }
   ]'

---

`164-186`: **Add missing blank lines around code blocks (MD031).**

Multiple code blocks in this section lack required blank lines before and after.



```diff
 You can further request outputs as a file by passing the header `Output-Format: jsonl` to the API. Example request:
+
 ```bash
 curl -X POST https://models.exosphere.host/v0/infer/ \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer <your-api-key>" \
   -H "Output-Format: jsonl" \
   -d '[
     {
         "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852",
         "sla": 60
     }
   ]'

Example response:
+

{
    "status": "completed",
    "task_id": "2f92fc35-07d6-4737-aefa-8ddffd32f3fc",
    "total_items": 2,
    "output_url": "https://files.exosphere.host/v0/files/ae0b977c-76a0-4d71-81a5-05a6d8844852.jsonl"
}

---

`187-191`: **Fix markdown formatting and improve text clarity (MD031, MD047).**

Add blank line before the final code block, improve text clarity, and add trailing newline to the file.



```diff
-You can download the output file from the `output_url` and the content should look like:
+You can download the output file from the `output_url`. Expected output format (JSONL):
+
 ```jsonl
 {"key": "object-1", "output": {"type": "text", "text": "Photosynthesis is the process by which plants, algae, and some bacteria convert light energy into chemical energy."}}
 {"key": "object-2", "output": {"type": "text", "text": "The main ingredients in a Margherita pizza are tomato sauce, mozzarella cheese, and basil."}}

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: ASSERTIVE

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between bcf79246425a2911609b0d521d2c39236d1e1240 and 03c65d6d8f07c76bedebf876304858aab6f62a6f.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `docs/docs/inference.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>docs/docs/inference.md</summary>

128-128: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

---

130-130: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

134-134: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

136-136: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

146-146: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

152-152: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

165-165: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

176-176: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

178-178: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

188-188: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

191-191: Files should end with a single newline character

(MD047, single-trailing-newline)

</details>

</details>

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment on lines +125 to +128

Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API.

### `PUT /v0/files/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add blank line before heading (MD022).

There should be a blank line between the paragraph and the section heading.

 > **Note**: Auto retry policy will be triggered for transient failures without any additional cost.
 
 Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API.
+
 ### `PUT /v0/files/`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API.
### `PUT /v0/files/`
Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API.
### `PUT /v0/files/`
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

128-128: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
In docs/docs/inference.md around lines 125 to 128, there is no blank line
between the preceding paragraph and the "### `PUT /v0/files/`" heading which
violates MD022; insert a single blank line between the paragraph that ends with
"...pass the file to the `/infer/` API." and the "### `PUT /v0/files/`" heading
so the heading is separated by an empty line.

@NiveditJain NiveditJain deleted the inference-standards branch December 8, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments