Enhance inference documentation with batch API support#583
Conversation
Added details on the batch inference API format compatible with OpenAI and other providers. Included instructions for uploading JSONL files, making inference requests, and retrieving output files. Updated example requests and responses for clarity.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds documentation describing batch inference via JSONL: how to upload files with PUT /v0/files/, invoke /v0/infer/ using a file_id, specify Output-Format: jsonl, and retrieve completed JSONL outputs. Contains example requests, responses, and notes that existing inference endpoints/status logic remain unchanged. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @NiveditJain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the inference documentation by introducing a new section dedicated to batch inference API support. It provides clear, step-by-step instructions and practical examples for users to leverage batch processing, including file uploads, request formatting, and output retrieval, aligning the API's capabilities with common industry standards. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances the inference documentation by adding details about the batch inference API. The changes are clear and provide good examples for the new workflow. I've made a few suggestions to correct minor grammatical errors and improve the overall readability of the documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/docs/inference.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/docs/inference.md
128-128: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
130-130: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
134-134: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
136-136: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
146-146: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
152-152: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
165-165: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
176-176: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
178-178: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
188-188: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
191-191: Files should end with a single newline character
(MD047, single-trailing-newline)
🔇 Additional comments (1)
docs/docs/inference.md (1)
128-143: Markdown formatting issue is incorrect; blank line already exists before heading.The review identifies a non-existent markdown formatting problem. A blank line already exists at line 128 before the
### PUT /v0/files/heading at line 129, so the proposed diff would introduce a duplicate blank line.The curl syntax concern about using
-F(multipart form-data) with-X PUTis valid to consider, but requires verification against the actual server implementation expectations, which is not visible in the repository.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
docs/docs/inference.md (4)
129-149: Fix markdown formatting: add blank lines before code blocks (MD031).Multiple code blocks are missing required blank lines. Additionally, the text on line 145 can be improved for clarity.
This API is used to upload a file to the server. Example request: + ```bash curl -X PUT https://models.exosphere.host/v0/files/mydata.jsonl \ -H "Authorization: Bearer <your-api-key>" \ -F file="@mydata.jsonl"
Example response:
+{ "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852", "file_name": "mydata.jsonl", "bytes": 1000, "mime_type": "application/jsonl" }-The expected file content should look like:
+Expected file format (JSONL):
+{"key": "object-1", "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generation_config": {"temperature": 0.7}, "model": "deepseek:r1-32b"}} {"key": "object-2", "request": {"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}], "generation_config": {"temperature": 0.7}, "model": "openai:gpt-4o"}}--- `151-162`: **Add blank line before code block (MD031).** The bash code block needs a blank line before it. ```diff Now you can pass the file_id to the `/infer/` API to run inference on the file. Example request: + ```bash curl -X POST https://models.exosphere.host/v0/infer/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <your-api-key>" \ -d '[ { "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852", "sla": 60 } ]'--- `164-186`: **Add missing blank lines around code blocks (MD031).** Multiple code blocks in this section lack required blank lines before and after. ```diff You can further request outputs as a file by passing the header `Output-Format: jsonl` to the API. Example request: + ```bash curl -X POST https://models.exosphere.host/v0/infer/ \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <your-api-key>" \ -H "Output-Format: jsonl" \ -d '[ { "file_id": "ae0b977c-76a0-4d71-81a5-05a6d8844852", "sla": 60 } ]'
Example response:
+{ "status": "completed", "task_id": "2f92fc35-07d6-4737-aefa-8ddffd32f3fc", "total_items": 2, "output_url": "https://files.exosphere.host/v0/files/ae0b977c-76a0-4d71-81a5-05a6d8844852.jsonl" }--- `187-191`: **Fix markdown formatting and improve text clarity (MD031, MD047).** Add blank line before the final code block, improve text clarity, and add trailing newline to the file. ```diff -You can download the output file from the `output_url` and the content should look like: +You can download the output file from the `output_url`. Expected output format (JSONL): + ```jsonl {"key": "object-1", "output": {"type": "text", "text": "Photosynthesis is the process by which plants, algae, and some bacteria convert light energy into chemical energy."}} {"key": "object-2", "output": {"type": "text", "text": "The main ingredients in a Margherita pizza are tomato sauce, mozzarella cheese, and basil."}}
</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: ASSERTIVE **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between bcf79246425a2911609b0d521d2c39236d1e1240 and 03c65d6d8f07c76bedebf876304858aab6f62a6f. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `docs/docs/inference.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>docs/docs/inference.md</summary> 128-128: Headings should be surrounded by blank lines Expected: 1; Actual: 0; Below (MD022, blanks-around-headings) --- 130-130: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 134-134: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 136-136: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 146-146: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 152-152: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 165-165: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 176-176: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 178-178: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 188-188: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- 191-191: Files should end with a single newline character (MD047, single-trailing-newline) </details> </details> </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
|
|
||
| Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API. | ||
|
|
||
| ### `PUT /v0/files/` |
There was a problem hiding this comment.
Add blank line before heading (MD022).
There should be a blank line between the paragraph and the section heading.
> **Note**: Auto retry policy will be triggered for transient failures without any additional cost.
Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API.
+
### `PUT /v0/files/`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API. | |
| ### `PUT /v0/files/` | |
| Exosphere inference APIs also support the standard batch inference API format used by OpenAI, Gemini, and other providers. You can upload a JSONL file containing multiple inference requests, similar to OpenAI's batch API format and pass the file to the `/infer/` API. | |
| ### `PUT /v0/files/` |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
128-128: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
In docs/docs/inference.md around lines 125 to 128, there is no blank line
between the preceding paragraph and the "### `PUT /v0/files/`" heading which
violates MD022; insert a single blank line between the paragraph that ends with
"...pass the file to the `/infer/` API." and the "### `PUT /v0/files/`" heading
so the heading is separated by an empty line.
Added details on the batch inference API format compatible with OpenAI and other providers. Included instructions for uploading JSONL files, making inference requests, and retrieving output files. Updated example requests and responses for clarity.