feat: docling document metering via docling polling to generate logs by ricofurtado · Pull Request #1620 · langflow-ai/openrag

ricofurtado · 2026-05-18T18:12:36Z

This pull request introduces Docling usage metering to support billing and auditing by logging detailed records for every file submitted to Docling. Metering is optional and controlled by configuration, with minimal impact when disabled. The implementation ensures robust, atomic logging and integrates seamlessly with the existing ingestion flow. Additionally, the polling logic is enhanced to provide more accurate status tracking.

Key changes include:

Docling Usage Metering (Billing/Auditing):

Added a new DoclingMeteringService (src/services/docling_metering_service.py) that writes a JSONL record for each file conversion attempt, capturing metadata such as timing, outcome, file details, user, and deployment mode. Records are written atomically with an asyncio lock to avoid concurrency issues.
Integrated DoclingMeteringService into the service container and dependency injection system, making it available throughout the application when enabled. [1] [2] [3] [4] [5] [6]
Added configuration options in settings.py to control metering enablement, log path, and deployment mode.

Ingestion Flow Integration:

Updated LangflowFileService to record metering events at all major ingestion outcomes (success, failure, or Langflow error), using a new _record_meter helper method. Metering is fire-and-forget and does not block or fail the main ingestion path. [1] [2] [3] [4] [5]

Polling and Status Tracking Improvements:

Enhanced DoclingPollingService to track and return the number of poll attempts (poll_count) and ensure that a Docling task is only considered successful if the result is actually available and fetchable. [1] [2] [3] [4] [5] [6]
Updated all call sites to propagate and log poll_count for accurate metering.

Other Minor Changes:

Added missing imports and type hints to support the new features. [1] [2]

These changes provide a solid foundation for future billing and auditing features while maintaining performance and reliability.

Summary by CodeRabbit

New Features
- Added optional usage metering for Docling file conversions, tracking submission time, processing duration, file metadata, conversion outcomes, and polling statistics. Metering can be enabled via configuration.
Tests
- Added comprehensive unit tests for metering service functionality and integration with file ingestion workflow.

…gflowFileService

… fixture in tests

coderabbitai · 2026-05-18T18:12:52Z

Walkthrough

This PR introduces a complete JSONL-based metering system for Docling file submissions: configurable settings and service wiring enable optional recording; a new DoclingMeteringService appends metrics to a log file; polling now tracks iteration count and validates result availability; and LangflowFileService records outcomes (success, Docling failure, Langflow failure) with elapsed time and poll statistics. Tests verify JSONL serialization, concurrent-write safety, and metering integration across success and error paths.

Changes

Metering and Result Validation

Layer / File(s)	Summary
Configuration and Dependency Wiring `src/config/settings.py`, `src/app/container.py`, `src/dependencies.py`	Environment-driven settings `ENABLE_DOCLING_METERING`, `DOCLING_METERING_LOG_PATH`, and `DOCLING_DEPLOYMENT_MODE` are introduced; service container conditionally constructs `DoclingMeteringService` and wires it into `LangflowFileService`; FastAPI dependency provider exposes the optional metering service.
Metering Service Implementation `src/services/docling_metering_service.py`	`DoclingMeterRecord` dataclass defines JSONL schema with submission/terminal timestamps, elapsed time, outcome/failure detail, poll count, and deployment mode. `DoclingMeteringService` atomically appends records to a JSONL log under an `asyncio.Lock`, creates parent directories on-demand, and gracefully ignores I/O errors.
Polling Service Poll Count and Result Validation `src/services/docling_polling_service.py`, `src/services/docling_service.py`	`DoclingPollResult` gains a `poll_count` field. Polling loop now increments poll count each iteration and includes it in all terminal outcomes. On Docling `SUCCESS`, the service fetches the result and returns `FAILED` with error detail if the fetch fails (missing `document.json_content`).
Langflow File Service Metering Integration `src/services/langflow_file_service.py`	Constructor now accepts `metering_service` dependency. `upload_and_ingest_file` extracts file metadata, captures submission timestamp, and tracks poll count. Metering is recorded on three outcomes: polling failure (with `outcome`/`failure_detail`/`poll_count`), Langflow failure (with `outcome="langflow_failed"`), and success (with `outcome="success"`). Private `_record_meter` helper no-ops when metering is disabled.
Unit and Integration Test Coverage `tests/unit/test_docling_metering_service.py`, `tests/unit/test_docling_polling_service.py`	`DoclingMeteringService` tests verify JSONL line output, multi-record appending, directory creation, and exception swallowing. Integration tests mock the polling and ingestion flows and assert metering is triggered with correct outcome/failure detail/poll count for success, polling failure, and Langflow failure paths. Tests also verify the null-service and legacy (no-polling) paths record appropriately. Polling tests stub and assert `fetch_task_result` behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

langflow-ai/openrag#1572: Both PRs touch the Docling two-phase ingestion pipeline—especially LangflowFileService and the Docling polling flow—adding complementary wiring (docling_polling_service vs docling_metering_service) and shared outcome/polling metadata handling.

Suggested labels

enhancement

Suggested reviewers

lucaseduoli

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title clearly summarizes the main change: adding Docling metering functionality integrated with docling polling to generate logs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docling-metering-via-docling-polling-to-generate-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

tests/unit/test_docling_metering_service.py (1)

151-243: ⚡ Quick win

Add a regression test for Docling submission failure metering.

Current tests validate post-submission outcomes, but not the submit_to_docling exception path. Once fixed in service code, a dedicated test should assert outcome="submit_failed" is recorded.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_docling_metering_service.py` around lines 151 - 243, Add a
regression test that injects a failing submit_to_docling path and asserts
metering records outcome="submit_failed"; specifically, in the tests for
LangflowFileService.upload_and_ingest_file, patch or set
langflow_service.submit_to_docling (or svc.submit_to_docling) to raise an
exception, call upload_and_ingest_file with mock_polling_service (or None as
needed), catch the exception if the service propagates it, then inspect
mock_metering_service.build_record.call_args.kwargs to assert
build_kwargs["outcome"] == "submit_failed" and include any expected
failure_detail; mirror naming and setup used in existing tests (e.g.,
mock_metering_service, file_tuple, file_task) so the new test follows the
test_metering_* pattern.

tests/unit/test_docling_polling_service.py (1)

43-53: ⚡ Quick win

Assert poll_count in these updated tests to lock the metering contract.

These tests now cover the fetch-on-success behavior, but they still don’t verify result.poll_count, which is part of the new polling/metering contract. Please assert expected values (e.g., immediate terminal paths should be 1, processing sequence should match number of status checks).

Suggested assertions

 async def test_returns_success_immediately_when_already_done(polling_service, mock_docling_service):
@@
     assert result.outcome == PollOutcome.SUCCESS
+    assert result.poll_count == 1
     assert mock_docling_service.check_task_status.call_count == 1
     mock_docling_service.fetch_task_result.assert_awaited_once_with("t1")
@@
 async def test_loops_through_processing_then_success(
@@
     assert result.outcome == PollOutcome.SUCCESS
+    assert result.poll_count == 4
     assert mock_docling_service.check_task_status.call_count == 4
@@
 async def test_success_status_requires_fetchable_result(polling_service, mock_docling_service):
@@
     assert result.outcome == PollOutcome.FAILED
+    assert result.poll_count == 1
     assert "missing document.json_content" in (result.detail or "")

Also applies to: 57-75, 78-91

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_docling_polling_service.py` around lines 43 - 53, The tests
must assert the new polling/metering field result.poll_count to lock the
contract: update test_returns_success_immediately_when_already_done (and the
other tests at lines 57-75 and 78-91) to include an assertion that
result.poll_count == 1 for the immediate-success path; for tests modeling
repeated status checks assert result.poll_count equals the number of
check_task_status calls to match polling_service.poll_until_ready behavior and
PollOutcome constants (e.g., PollOutcome.SUCCESS). Use the existing mocks
(mock_docling_service.check_task_status) and the returned result from
polling_service.poll_until_ready to derive and assert the expected poll_count.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/config/settings.py`:
- Around line 211-215: The import of get_data_file is currently done inline as
"from config.paths import get_data_file as _get_data_file" after executable code
which triggers Ruff E402; move that import into the top module import section
with the other imports so it is a normal module-level import, then update the
usage that sets DOCLING_METERING_LOG_PATH to call _get_data_file (or rename to
get_data_file) as before; ensure the symbol referenced is
config.paths.get_data_file (or _get_data_file) so the assignment to
DOCLING_METERING_LOG_PATH remains unchanged and CI no longer fails.

In `@src/services/docling_metering_service.py`:
- Around line 23-25: Update the imports and annotations to fix Ruff UP045/UP017:
replace "from datetime import datetime, timezone" with "from datetime import
UTC, datetime", remove the "from typing import Optional" import, change all
occurrences of "Optional[str]" to the union form "str | None" (in the
functions/variables around the symbols that currently reference Optional at
lines where variables/params are declared, including the annotations near the
functions/variables referenced by names in this module), and replace any use of
"timezone.utc" with "UTC" (notably where datetime.now(...) is called). Ensure
you update the four specific Optional[str] occurrences and the single
timezone.utc usage to the new forms.

In `@src/services/langflow_file_service.py`:
- Around line 649-660: The metering call self._record_meter(...) is awaited and
blocks ingestion; change it to fire-and-forget by scheduling it as a background
task (e.g., asyncio.create_task or the service's event loop) instead of awaiting
it wherever used (including the occurrences around poll_result handling and the
other noted blocks), and ensure the task is protected against exceptions (either
by adding try/except inside _record_meter or attaching a done-callback to log
exceptions) so failures won't be unhandled or crash the process.
- Around line 603-606: Wrap the call to submit_to_docling(...) in a try/except
so that if submit_to_docling(filename, content, owner=owner,
jwt_token=jwt_token) raises, you still emit a metering event indicating a failed
submission (include filename/owner/timestamp/error) via the service's existing
metering API (e.g., self.meter.record or self.record_meter_event) and then
re-raise the exception; keep successful path behavior (setting submitted_at and
_submit_wall) unchanged when no error occurs.

---

Nitpick comments:
In `@tests/unit/test_docling_metering_service.py`:
- Around line 151-243: Add a regression test that injects a failing
submit_to_docling path and asserts metering records outcome="submit_failed";
specifically, in the tests for LangflowFileService.upload_and_ingest_file, patch
or set langflow_service.submit_to_docling (or svc.submit_to_docling) to raise an
exception, call upload_and_ingest_file with mock_polling_service (or None as
needed), catch the exception if the service propagates it, then inspect
mock_metering_service.build_record.call_args.kwargs to assert
build_kwargs["outcome"] == "submit_failed" and include any expected
failure_detail; mirror naming and setup used in existing tests (e.g.,
mock_metering_service, file_tuple, file_task) so the new test follows the
test_metering_* pattern.

In `@tests/unit/test_docling_polling_service.py`:
- Around line 43-53: The tests must assert the new polling/metering field
result.poll_count to lock the contract: update
test_returns_success_immediately_when_already_done (and the other tests at lines
57-75 and 78-91) to include an assertion that result.poll_count == 1 for the
immediate-success path; for tests modeling repeated status checks assert
result.poll_count equals the number of check_task_status calls to match
polling_service.poll_until_ready behavior and PollOutcome constants (e.g.,
PollOutcome.SUCCESS). Use the existing mocks
(mock_docling_service.check_task_status) and the returned result from
polling_service.poll_until_ready to derive and assert the expected poll_count.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4d6299f-5815-4978-b378-94af391ddef5

📥 Commits

Reviewing files that changed from the base of the PR and between 2869a11 and 17da376.

📒 Files selected for processing (9)

src/app/container.py
src/config/settings.py
src/dependencies.py
src/services/docling_metering_service.py
src/services/docling_polling_service.py
src/services/docling_service.py
src/services/langflow_file_service.py
tests/unit/test_docling_metering_service.py
tests/unit/test_docling_polling_service.py

coderabbitai · 2026-05-18T18:18:36Z

+from config.paths import get_data_file as _get_data_file
+
+DOCLING_METERING_LOG_PATH = os.getenv(
+    "DOCLING_METERING_LOG_PATH", _get_data_file("docling_tasks_logs.jsonl")
+)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move the get_data_file import to the module import section.

Line 211 introduces a module-level import after executable statements, which triggers Ruff E402 and fails CI.

💡 Proposed fix

-from config.paths import get_flows_path +from config.paths import get_data_file, get_flows_path ... -from config.paths import get_data_file as _get_data_file - DOCLING_METERING_LOG_PATH = os.getenv( - "DOCLING_METERING_LOG_PATH", _get_data_file("docling_tasks_logs.jsonl") + "DOCLING_METERING_LOG_PATH", get_data_file("docling_tasks_logs.jsonl") )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/config/settings.py` around lines 211 - 215, The import of get_data_file is currently done inline as "from config.paths import get_data_file as _get_data_file" after executable code which triggers Ruff E402; move that import into the top module import section with the other imports so it is a normal module-level import, then update the usage that sets DOCLING_METERING_LOG_PATH to call _get_data_file (or rename to get_data_file) as before; ensure the symbol referenced is config.paths.get_data_file (or _get_data_file) so the assignment to DOCLING_METERING_LOG_PATH remains unchanged and CI no longer fails.

coderabbitai · 2026-05-18T18:18:36Z

+from datetime import datetime, timezone
+from typing import Optional
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash ruff check src/services/docling_metering_service.py

Repository: langflow-ai/openrag

Length of output: 2871

Resolve the Ruff UP045/UP017 failures in this module.

This file currently fails lint on Optional[...] and timezone.utc usage, blocking the pipeline. The fixes are mechanical and low-risk.

Update the imports and type annotations:

Change from datetime import datetime, timezone to from datetime import UTC, datetime

Remove from typing import Optional (use X | None syntax instead)

Replace all Optional[str] with str | None (lines 39, 44, 72, 77)

Replace timezone.utc with UTC (line 50)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/docling_metering_service.py` around lines 23 - 25, Update the imports and annotations to fix Ruff UP045/UP017: replace "from datetime import datetime, timezone" with "from datetime import UTC, datetime", remove the "from typing import Optional" import, change all occurrences of "Optional[str]" to the union form "str | None" (in the functions/variables around the symbols that currently reference Optional at lines where variables/params are declared, including the annotations near the functions/variables referenced by names in this module), and replace any use of "timezone.utc" with "UTC" (notably where datetime.now(...) is called). Ensure you update the four specific Optional[str] occurrences and the single timezone.utc usage to the new forms.

coderabbitai · 2026-05-18T18:18:36Z

        task_id = await self.submit_to_docling(filename, content, owner=owner, jwt_token=jwt_token)
+        submitted_at = datetime.now(timezone.utc).isoformat()
+        _submit_wall = time.monotonic()



⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Record a metering event when Docling submission itself fails.

If submit_to_docling(...) raises, no meter record is emitted, so failed conversion attempts are currently missing from billing/audit logs.

💡 Proposed fix

- task_id = await self.submit_to_docling(filename, content, owner=owner, jwt_token=jwt_token) - submitted_at = datetime.now(timezone.utc).isoformat() - _submit_wall = time.monotonic() + submitted_at = datetime.now(timezone.utc).isoformat() + _submit_wall = time.monotonic() + try: + task_id = await self.submit_to_docling( + filename, content, owner=owner, jwt_token=jwt_token + ) + except Exception as e: + await self._record_meter( + task_id=f"submit_failed:{filename}:{int(time.time()*1000)}", + filename=filename, + size_bytes=size_bytes, + mimetype=mimetype, + owner=owner, + submitted_at=submitted_at, + elapsed_seconds=time.monotonic() - _submit_wall, + outcome="submit_failed", + failure_detail=str(e), + poll_count=0, + ) + raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/langflow_file_service.py` around lines 603 - 606, Wrap the call to submit_to_docling(...) in a try/except so that if submit_to_docling(filename, content, owner=owner, jwt_token=jwt_token) raises, you still emit a metering event indicating a failed submission (include filename/owner/timestamp/error) via the service's existing metering API (e.g., self.meter.record or self.record_meter_event) and then re-raise the exception; keep successful path behavior (setting submitted_at and _submit_wall) unchanged when no error occurs.

coderabbitai · 2026-05-18T18:18:36Z

+                await self._record_meter(
+                    task_id=task_id,
+                    filename=filename,
+                    size_bytes=size_bytes,
+                    mimetype=mimetype,
+                    owner=owner,
+                    submitted_at=submitted_at,
+                    elapsed_seconds=time.monotonic() - _submit_wall,
+                    outcome=poll_result.outcome.value,
+                    failure_detail=poll_result.detail,
+                    poll_count=poll_count,
+                )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The metering path is currently blocking ingestion despite “fire-and-forget” intent.

await self._record_meter(...) on all terminal paths adds write latency to user-facing ingestion flow.

💡 Proposed fix

+ def _record_meter_background(self, **kwargs) -> None: + if self.metering_service is None: + return + task = asyncio.create_task(self._record_meter(**kwargs)) + task.add_done_callback(lambda t: t.exception()) # force retrieval; _record_meter handles/logs - await self._record_meter( + self._record_meter_background( task_id=task_id, filename=filename, size_bytes=size_bytes, mimetype=mimetype, owner=owner, submitted_at=submitted_at, elapsed_seconds=time.monotonic() - _submit_wall, outcome=poll_result.outcome.value, failure_detail=poll_result.detail, poll_count=poll_count, ) ... - await self._record_meter( + self._record_meter_background( task_id=task_id, filename=filename, size_bytes=size_bytes, mimetype=mimetype, owner=owner, submitted_at=submitted_at, elapsed_seconds=time.monotonic() - _submit_wall, outcome="langflow_failed", failure_detail=str(e), poll_count=poll_count, ) ... - await self._record_meter( + self._record_meter_background( task_id=task_id, filename=filename, size_bytes=size_bytes, mimetype=mimetype, owner=owner, submitted_at=submitted_at, elapsed_seconds=time.monotonic() - _submit_wall, outcome="success", poll_count=poll_count, )

Also applies to: 709-720, 733-743, 766-767

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/langflow_file_service.py` around lines 649 - 660, The metering call self._record_meter(...) is awaited and blocks ingestion; change it to fire-and-forget by scheduling it as a background task (e.g., asyncio.create_task or the service's event loop) instead of awaiting it wherever used (including the occurrences around poll_result handling and the other noted blocks), and ensure the task is protected against exceptions (either by adding try/except inside _record_meter or attaching a done-callback to log exceptions) so failures won't be unhandled or crash the process.

ricofurtado added 2 commits May 18, 2026 13:15

feat: implement Docling usage metering service and integrate with Lan…

af741f9

…gflowFileService

fix: reorder parameters in submit_to_docling method and remove unused…

08ef0dc

… fixture in tests

github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests labels May 18, 2026

style: ruff format (auto)

17da376

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

ricofurtado changed the title ~~Docling metering via docling polling to generate logs~~ feat:docling metering via docling polling to generate logs May 19, 2026

github-actions Bot added the enhancement 🔵 New feature or request label May 19, 2026

ricofurtado changed the title ~~feat:docling metering via docling polling to generate logs~~ feat: docling document metering via docling polling to generate logs May 19, 2026

github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: docling document metering via docling polling to generate logs#1620

feat: docling document metering via docling polling to generate logs#1620
ricofurtado wants to merge 3 commits into
mainfrom
docling-metering-via-docling-polling-to-generate-logs

ricofurtado commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 18, 2026

Uh oh!

coderabbitai Bot May 18, 2026

Uh oh!

coderabbitai Bot May 18, 2026

Uh oh!

coderabbitai Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from datetime import datetime, timezone
		from typing import Optional

Conversation

ricofurtado commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ricofurtado commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading