Skip to content

feat: add file cost estimation command and public API#44

Merged
grumbach merged 5 commits intomainfrom
feat/file-cost-estimate
Apr 22, 2026
Merged

feat: add file cost estimation command and public API#44
grumbach merged 5 commits intomainfrom
feat/file-cost-estimate

Conversation

@grumbach
Copy link
Copy Markdown
Contributor

@grumbach grumbach commented Apr 17, 2026

Summary

Adds ant file cost <path> to estimate upload cost without uploading. Encrypts the file locally to determine chunk count, samples one or more network quotes, and extrapolates the total. No wallet required.

New public API: Client::estimate_upload_cost(path, mode, progress) returns UploadCostEstimate.

Example output

Estimated upload cost for document.pdf
  Size:    50.0 MB
  Chunks:  13
  Cost:    0.0025 ANT (gas: 0.000150 ETH)

JSON (--json):

{"file_size":52428800,"chunk_count":13,"storage_cost_atto":"49556250000000000","estimated_gas_cost_wei":"150000000000000","payment_mode":"single"}

13 chunks fits in one 64-chunk wave; gas = 1 wave × GAS_PER_WAVE_TX (1.5M gas) × ARBITRUM_GAS_PRICE_WEI (0.1 gwei) = 1.5 × 10^14 wei = 0.000150 ETH. Human output and JSON agree on the same wei value.

Accuracy

Tested against actual uploads on a local devnet. Storage cost estimates match actual costs exactly:

4 KB   | Est: 49556250000000000 | Act: 49556250000000000 | ratio: 1.00
100 KB | Est: 49601285156250000 | Act: 49601285156250000 | ratio: 1.00
1 MB   | Est: 49623829101562500 | Act: 49623829101562500 | ratio: 1.00
10 MB  | Est: 49646390625000000 | Act: 49646390625000000 | ratio: 1.00

Gas is an advisory heuristic based on per-transaction budgets (see constants below), not a live gas-oracle query. Treat it as an order-of-magnitude figure.

Changes

ant-core:

  • Client::estimate_upload_cost(path, mode, progress) with encryption progress support
  • UploadCostEstimate struct (Serialize/Deserialize, JSON-safe String amounts)
  • PaymentMode gains Serialize/Deserialize + #[serde(rename_all = "snake_case")]
  • Storage cost uses median price × 3 matching SingleNodePayment::from_quotes
  • Gas heuristic uses three named constants:
    • GAS_PER_WAVE_TX = 1_500_000 (one pay_for_quotes call on Arbitrum, up to 64 entries)
    • GAS_PER_MERKLE_TX = 500_000 (one merkle sub-batch tx)
    • ARBITRUM_GAS_PRICE_WEI = 100_000_000 (0.1 gwei baseline)
  • AlreadyStored retry: samples up to ESTIMATE_SAMPLE_CAP = 5 chunk addresses before deciding. Returns the new typed Error::CostEstimationInconclusive when every sample reports stored; only returns a zero-cost estimate when every address in the file is sampled and all are stored.

ant-cli:

  • ant file cost <path> [--merkle | --no-merkle] [--json]
  • Progress bar wired through drive_upload_progress so the encryption phase shows chunk progress, not a static spinner
  • CostEstimationInconclusive surfaces with a retry-suggestion message

Tests:

  • 4 E2E tests: accuracy vs actual upload, no-wallet, payment mode selection, tiny file rejection
  • Unit test for the new CostEstimationInconclusive error variant
  • Added to CI workflow

Design decisions

  • AlreadyStored retry, not silent zero: A single chunk being already stored (DataMap-adjacent chunks often collide with prior uploads) tells us nothing about the other 99% of the file. We now probe up to 5 distinct chunk addresses and only claim "fully stored" when every address in the file has been confirmed stored.
  • Gas as String: Prevents JSON integer overflow (u128 exceeds JS safe integer range). Matches upload output pattern.
  • Heuristic gas with named constants: Documented as advisory. Actual gas varies by network conditions; a live gas-oracle query can be added later without changing the API.
  • Progress support: The progress parameter accepts an UploadEvent sender for encryption progress on large files.

Test plan

  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • cargo test -p ant-core --lib passes
  • E2E test test_estimate_matches_actual_cost (4 file sizes, 15% tolerance)
  • E2E test test_estimate_works_without_wallet
  • E2E test test_estimate_payment_mode
  • E2E test test_estimate_rejects_tiny_files
  • Existing E2E tests unaffected
  • Adversarial code review + UX review, all findings addressed

Add `ant file cost <path>` to estimate upload cost without uploading.
Encrypts the file locally to determine chunk count, requests a single
quote from the network for a representative chunk, and extrapolates
the total storage cost. No wallet required.

New public API: `Client::estimate_upload_cost(path, mode)` returns
`UploadCostEstimate` with file size, chunk count, storage cost in
atto, estimated gas in wei, and payment mode.

Gas estimation uses a conservative heuristic based on chunk count
and payment mode (merkle vs single). Storage cost is the median
quoted price multiplied by chunk count.

Supports --json for structured output and --merkle/--no-merkle to
override payment mode selection.
@Nic-dorman
Copy link
Copy Markdown
Contributor

Overall

Solid feature. API is clean (option-style args, serializable result, no-wallet path, progress hook), the cost extrapolation correctly mirrors SingleNodePayment::from_quotes (median × 3), tests are comprehensive with actual-vs-estimate validation, and CI is green. The design decisions in the description read as thoughtful.

Issues

P1 — AlreadyStored path returns a misleading "free" estimate

ant-core/src/data/client/file.rs, the Err(Error::AlreadyStored) => { ... storage_cost_atto: "0" ... } branch:

get_store_quotes returns AlreadyStored only when a majority of the first chunk's close group confirms storage. The PR description explicitly says "Checking one chunk and claiming the whole file is stored is unreliable … Removed after adversarial review" — but this branch does exactly that. The CLI's format_cost will then print Cost: free (already stored) for a file where 99% of chunks still need to be paid for.

Two cleaner options:

  • Return a typed error ("could not obtain representative quote; first chunk already stored") and let the user rerun, or
  • Retry with the next address in spill.addresses if the first sample hits AlreadyStored, and only surface "free" if you've sampled a few and they all report stored.

P1 — Single-mode gas heuristic is likely off by a large factor

let waves = chunk_count.div_ceil(UPLOAD_WAVE_SIZE);
(waves as u128) * 150_000 * 1_000_000_000

batch_pay flattens all median payments in a wave into one wallet.pay_for_quotes(...) call containing up to 64 (quote_hash, rewards_address, amount) entries (batch.rs:236-253). A multi-payment tx of that shape on Arbitrum is well over 150k gas — more like 500k–1M+. The merkle case is closer (one tx per sub-batch is fine at ~150k), but single-mode gas will systematically under-estimate.

Either bump the per-wave constant, measure it once against an actual upload, or clearly label single-mode gas as "minimum possible" rather than "estimated."

P2 — polish

  • Redundant hash in estimate_upload_cost: first_addr is already the content address, but the code does let first_address = compute_address(&first_chunk); and passes &first_address to get_store_quotes. Just use first_addr.
  • chunk_count as u64 — the rest of the file uses u64::try_from(...). Minor style inconsistency, but follow the pattern.
  • CLI drops the progress hookhandle_file_cost passes None, so multi-GB files show only a static "Encrypting file…" spinner while several minutes of encryption happen. The plumbing for UploadEvent exists; the file upload command wires it through drive_upload_progress. Worth reusing, since encryption is the slow part of the estimate.
  • Example output mismatch in PR description: JSON shows estimated_gas_cost_wei: "150000000000000" (0.00015 ETH) but the human output shows gas: 0.0050 ETH. The two don't correspond — probably a stale paste, but worth fixing so reviewers aren't confused about what the heuristic produces.

Nits

  • UploadCostEstimate.estimated_gas_cost_wei: String is a nice JS-safe choice and consistent with storage_cost_atto. The pre-existing FileUploadResult.gas_cost_wei: u128 is asymmetric but that's not this PR's problem.

Nothing security-sensitive; no wallet is touched, estimate is advisory, spill cleanup runs on Drop. The P1 items are the ones I'd want fixed before merge; the rest can go in a follow-up.

P1 fixes from Nic's review on #44:

- Drop the AlreadyStored -> "free" best-effort branch. A majority confirming
  the first chunk is stored says nothing about the other 99% of chunks, so
  returning a zero-cost estimate was misleading. Now surfaces a typed
  InvalidData error so callers can retry instead of trusting the bogus cost.
- Rework single-mode gas heuristic. batch_pay flattens every chunk's close
  group quotes into one pay_for_quotes call, so gas scales with the number
  of quote entries in the wave (chunks x recipients/chunk), not with the
  number of waves. 150k/wave was off by 5-10x on full waves; replace with
  75k base + 25k per entry, summed across waves. Bump merkle budget to
  500k/sub-batch to reflect tree verification + pool commitment.

P2 polish:

- Drop the redundant compute_address re-hash on the first chunk; the spill
  address is already the content address.
- Replace `chunk_count as u64` with a checked conversion to match the rest
  of the file.
- Wire the progress hook through handle_file_cost so large-file encryption
  emits Encrypting / Encrypted events instead of a static spinner, reusing
  drive_upload_progress.

All 4 e2e_cost_estimate tests still pass on a local devnet.
P1 — AlreadyStored branch now samples up to ESTIMATE_SAMPLE_CAP chunk
addresses instead of trusting a single probe. Only returns zero-cost when
every address in the file is confirmed stored; otherwise returns a new
typed Error::CostEstimationInconclusive so callers can handle it cleanly.
The CLI renders this case with a helpful retry-suggestion message.

P1 — Replace the per-wave gas heuristic with named constants:
  GAS_PER_WAVE_TX        = 1_500_000 gas  (Arbitrum pay_for_quotes with 64 entries)
  GAS_PER_MERKLE_TX      =   500_000 gas
  ARBITRUM_GAS_PRICE_WEI = 100_000_000   (0.1 gwei baseline)
Each constant carries a comment explaining where the number comes from and
that it is advisory, not a live oracle query.

No change to the chunk-count conversion or progress plumbing (already done
in the previous review commit).
@grumbach
Copy link
Copy Markdown
Contributor Author

Hey @nirh-autonomi — pushed a follow-up. Mapping to the review items:

P1 — AlreadyStored "free" estimate → fixed. Instead of trusting a single probe, estimate_upload_cost now samples up to ESTIMATE_SAMPLE_CAP = 5 distinct chunk addresses. If every sample reports AlreadyStored AND we happen to have sampled every address in the file (small files only), we return the zero-cost estimate — that case is accurate. Otherwise we surface a new typed error Error::CostEstimationInconclusive(String), and the CLI renders it with a retry hint rather than lying. Commit: fbb2a84.

P1 — Single-mode gas heuristic → replaced the 150k/wave magic number with three named constants, each with a doc comment explaining where the number comes from:

  • GAS_PER_WAVE_TX = 1_500_000 — conservative upper bound for one pay_for_quotes on Arbitrum with up to 64 entries (21k base + ~64 × 20k for the SSTOREs).
  • GAS_PER_MERKLE_TX = 500_000 — one merkle sub-batch tx (tree verification + pool commitment).
  • ARBITRUM_GAS_PRICE_WEI = 100_000_000 — 0.1 gwei, a typical Arbitrum baseline.

Gas is now waves × GAS_PER_WAVE_TX × ARBITRUM_GAS_PRICE_WEI (and the merkle-analogous product). Doc comment calls this out as advisory, not a live oracle query. Same commit.

P2 — Redundant compute_address → already removed in the earlier review commit (af1e894); first_addr is passed directly to get_store_quotes. The new retry loop uses spill.addresses.iter() directly.

P2 — chunk_count as u64 → earlier commit switched to u64::try_from(...).map_err(...)?; this commit relaxes it to u64::try_from(chunk_count).unwrap_or(u64::MAX) on the storage path since chunk_count came from a Vec::len() (can't physically overflow u64 on any real machine) and saturating to MAX is safe. The gas math now uses u128::try_from(...).unwrap_or(u128::MAX) similarly.

P2 — CLI drops progress hook → already wired in af1e894: handle_file_cost reuses drive_upload_progress, so the encryption phase shows a live chunk counter.

P2 — PR description example mismatch → fixed in the PR body. With the new constants a 13-chunk file is one wave: 1 × 1_500_000 × 100_000_000 = 1.5e14 wei = 0.000150 ETH, which is what both the human output and the JSON now print.

No CHANGELOG in the repo, so nothing to update there. Let me know if you want the sample cap bumped or the gas price made configurable before we merge.

Copy link
Copy Markdown
Contributor

@Nic-dorman Nic-dorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS CI failure is due to failing to get enough quotes, not related to the PR

@grumbach grumbach merged commit b0c501a into main Apr 22, 2026
22 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants