Skip to content

Refactor: Infer source#166

Closed
nikbpetrov wants to merge 14 commits into
forecastingresearch:mainfrom
nikbpetrov:infer
Closed

Refactor: Infer source#166
nikbpetrov wants to merge 14 commits into
forecastingresearch:mainfrom
nikbpetrov:infer

Conversation

@nikbpetrov
Copy link
Copy Markdown
Collaborator

Before merging, remember to remove -new suffix from the name of the new jobs in src/orchestration/func_infer_fetch/Makefile and src/orchestration/func_infer_update/Makefile

@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 14, 2026

First pass, quick review from codex and claude:

  1. Avoid importing fetch/update dependencies at module import time — src/sources/infer.py:9-16 Moving the old INFER fetch/update code into sources.infer makes every consumer of the source registry import backoff, certifi, and requests just to load InferSource. That breaks entrypoints that import sources or helpers.question_curation but do not install those packages (for example src/orchestration/func_resolve/requirements.txt, src/leaderboard/requirements.txt, src/curate_questions/create_question_set/requirements.txt, and the metadata jobs), so those jobs will now fail during startup with ModuleNotFoundError before any work begins.
  2. source_intro/resolution_criteria are ClassVars only on InferSource, no base class contract
  3. orchestration/__init__.py docstring still says "resolve pipeline" but now also does fetch/update

Me quick review:

  1. Redundant: source_type: ClassVar[SourceType] = SourceType.MARKET
  2. What is display_name for? This should be "Rand Forecasting Institute", but I don't think we're ever displaying sources anywhere. Continue to use "infer" for historical reasons for "source"
  3. On cursory review, it seems like now you're downloading all resolution files from GCP Storage whereas I think you only want those in dff, otherwise you'll be downloading all resolved resolution files as well, which will grow with time. Likely not a big deal for infer, but costly for metaculus and polymarket.

@nikbpetrov
Copy link
Copy Markdown
Collaborator Author

  1. In-freaking-excusable. I get it's part consequence of the intermedia code structure, but still... all this testing and seemingly sensible arguments I put forward that a single test was going to be sufficient and it all proves to be little more than a fart in the wind. I will revert to running much more comprehensive test suite during this refactor moving forward and will init a separate repo where testing files will go. Will work on a fix too - not sure what the best option is.
  2. Fixed, though strict enforcement is deferred (i.e. tests + typing check) until all sources are refactored as otherwise sources that are yet to be refactored will be throwing errors (we can add those vars there but that means touching many more sources prematurely)
  3. Fixed
  4. Fixed
  5. Was meant to be for logging/debugging in my head based on the initial plan (when I wrote it), but you're right it's not used so I've removed
  6. Good catch; I've added an optional arg to pass ids so we don't load everything. The only hypothetical caveat is that files whose ids are not passed, i.e. not in dff, won't get updated - safe assumption, right?

@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 14, 2026

I don't see the new commits.

@nikbpetrov
Copy link
Copy Markdown
Collaborator Author

Yup, still working on it; don't have 1. solved yet, see Slack. Don't have the testing pipeline ready yet either, not continuing without one at this point.

@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 14, 2026

Of course, sorry, did not mean to pressure you. I saw "fixed" for some of them so was looking for those changes. Take your time, I'll respond to your Slack message tonight or tomorrow. Need to think a bit about it

@nikbpetrov
Copy link
Copy Markdown
Collaborator Author

nikbpetrov commented Apr 16, 2026

Source identity split solved. Please note that I've touched a number of files almost exclusively due to the fact that I wanted to move resolution_criteria and source_intro into sources/_metadata.py. I have then used the import/reexport pattern in each helpers/<source>.py. Technically a handful of these files could be deleted altogether if I just changed the import pattern in question_curation to import from sources metadata, but I've refrained and will do that during the dedicated question_curation refactor PR - let me know if you prefer otherwise (e.g. so we don't forget later, but we should be fine!)

@houtanb houtanb linked an issue Apr 16, 2026 that may be closed by this pull request
@nikbpetrov nikbpetrov requested a review from houtanb April 20, 2026 06:54
@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 22, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb30585e61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/orchestration/_source_io.py
Comment thread src/orchestration/func_infer_fetch/Makefile Outdated
@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 22, 2026

@claude review

@nikbpetrov
Copy link
Copy Markdown
Collaborator Author

@houtanb I have rebased and integrated yfinance based on #5042c68.

Note that nullified Qs (and, unique to this source, TICKER_RENAMES now go in metadata).

I've kept your terminology for DELISTED_STOCKS and preserved tests as they are but do note that this is quite unique. I don't understand why not call them nullified questions, as they are, and reference them as such? A comment can tell us that they are actually delisted. Or are you set on DELISTED_STOCKS?

@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 23, 2026

Not attached to the name. Just noting variable naming is not consistent or enforced anywhere. See, e.g. NULLIFICATION_START_DATES infred.py and _IDS_TO_NULLIFY in wikipedia.py.

Also, it should be changed to align with the similar usage elsewhere: a dict of str, date instead of an array of NullifiedQuestion.

@nikbpetrov
Copy link
Copy Markdown
Collaborator Author

Not attached to the name. Just noting variable naming is not consistent or enforced anywhere. See, e.g. NULLIFICATION_START_DATES infred.py and _IDS_TO_NULLIFY in wikipedia.py.

Also, it should be changed to align with the similar usage elsewhere: a dict of str, date instead of an array of NullifiedQuestion.

Not finding NULLIFICATION_START_DATES in fred.py - in general nullification Qs for fred seem fine to me but lmk if I am missing something.

For wikipedia, you make a good point - it's scattered and inconsistent. I was mostly focused on backwards compatibility (see the import/export pattern in helpers/wikipedia.py). I'd prefer to address this when we get to the wikipedia source refactor, but let me know if you'd prefer to address it within this PR.

@houtanb
Copy link
Copy Markdown
Member

houtanb commented Apr 23, 2026

No need to address it here, just pointing out inconsistencies that I agree we should deal with at some point!

@houtanb
Copy link
Copy Markdown
Member

houtanb commented May 6, 2026

@nikbpetrov to keep you updated (sorry for the delay). I did the code review and all looks good. Will do test runs tomorrow and providing that's good, can merge. If all psases, I can rebase on top of main, implement the recent FRED changes and merge.

@houtanb
Copy link
Copy Markdown
Member

houtanb commented May 15, 2026

@claude review

@houtanb
Copy link
Copy Markdown
Member

houtanb commented May 15, 2026

Merged in 3fd0ad2 and 2594813

@houtanb houtanb closed this May 15, 2026
@nikbpetrov nikbpetrov deleted the infer branch May 16, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor codebase

2 participants