init: manifold implementation by nikbpetrov · Pull Request #192 · forecastingresearch/forecastbench

nikbpetrov · 2026-05-16T11:23:31Z

In addition to the usual read/write patterns being updated during this refactor, note the updated bitwise OR update in search_markets - avoids having to pass ids around (and reduces, perhaps negligibly, memory footprint).

Note the second commit. I've found the fetch's 30s timeout was too quick during some testing in cases of high io in the GCP buckets, so bumped to 60s but I bumped further to match max_time from backoff.

Parity testing: old/new code matches perfectly when run together, with both fetch jobs finishing in ~10s, while for update jobs, the old code ran in 26mins vs 6mins for new code (see comment below). Both jobs' output differ ever so slightly compared to last night's prod data in my run (1 new id out of 2130, and 3 values slightly different out of 1981 values), all within expected deviations for a market source.

Full pipeline test: not done as per Slack discussion

…` backoff

…es old code

… of IO)

nikbpetrov · 2026-05-16T18:36:27Z

dbb3f5d is new behaviour (tested with logs against old) but seems like quite the low-hanging fruit to save a ton of IO (8 mins of runtime for update job vs 28mins in ideal conditions for new code)

Edit: Actually, this was the intended behaviour behind the current prod code too (here), but there seems to be a bug:

# Regenerate resolution files in case they've been deleted
        resolved_files = gcp.storage.list_with_prefix(
            bucket_name=env.QUESTION_BANK_BUCKET, prefix=source
        )
        filename = f"{row['id']}.jsonl"
        if filename not in resolved_files:
            market = _get_market(row["id"])
            _create_resolution_file(dfq, index, market)

here, resolved_files has a prefix, while filename does not, resulting in a resolution file always being created and uploaded, and thus excessive writing to GCP.

houtanb · 2026-05-18T06:32:26Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2a49e7752

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T06:36:29Z

+                existing_df=existing_df,
+            )
+            if df_res is not None:
+                resolution_files[row["id"]] = df_res


Avoid uploading unchanged resolution files

When _build_resolution_file returns existing_df for an already up-to-date unresolved market, this line adds that unchanged DataFrame to resolution_files before the identity check below. As a result, func_manifold_update still passes it to _source_io.upload_resolution_files, so nightly runs where most Manifold files are current rewrite every unresolved resolution file despite logging that the write was skipped, increasing job time and bucket writes. Only add the file in the df_res is not existing_df branch.

Useful? React with 👍 / 👎.

houtanb · 2026-05-18T06:38:03Z

@claude review

nikbpetrov · 2026-05-18T12:21:26Z

@houtanb I ran the full pipeline (I cancelled resolve and leaderboard after they started not to waste resources, but they start fine).

polymarket fetch job failed due to a timeout - seems to have been fetching stuff fine. Not sure what causes this at this point - I recall some issues with some providers?

Did not run acled as per previous agreement.

nikbpetrov added 6 commits May 16, 2026 14:11

init: manifold implementation

d896a15

fix: ensure manifold's fetch timeout is consistent with its `max_time…

7b73c34

…` backoff

format

98c4c89

fix: load all existing resolution files, not just those in dff; match…

f678efd

…es old code

fix: skip writing resolution files that have not changed (saves a ton…

dbb3f5d

… of IO)

format

f2a49e7

nikbpetrov requested a review from houtanb May 16, 2026 18:49

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init: manifold implementation#192

init: manifold implementation#192
nikbpetrov wants to merge 6 commits into
forecastingresearch:mainfrom
nikbpetrov:manifold

nikbpetrov commented May 16, 2026 •

edited

Loading

Uh oh!

nikbpetrov commented May 16, 2026 •

edited

Loading

Uh oh!

houtanb commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

houtanb commented May 18, 2026

Uh oh!

nikbpetrov commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikbpetrov commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikbpetrov commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

houtanb commented May 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

houtanb commented May 18, 2026

Uh oh!

nikbpetrov commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikbpetrov commented May 16, 2026 •

edited

Loading

nikbpetrov commented May 16, 2026 •

edited

Loading

nikbpetrov commented May 18, 2026 •

edited

Loading