-
Notifications
You must be signed in to change notification settings - Fork 10
Refactor: Infer source #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
b2335f1
base for fetch/update implementation across sources
nikbpetrov 361a739
init: infer implementation
nikbpetrov d9a3189
_fb_types fix
nikbpetrov 6c23958
formatting
nikbpetrov a88bfd0
fix: stale docstring
nikbpetrov 77bc459
fix: add source_intro and resolution_criteria as BaseSource class con…
nikbpetrov b4e1459
fix: remove source_type from the definition of each source
nikbpetrov 6905d2d
fix: load_existing_resolution_files can now only load relevant questi…
nikbpetrov 684cb2a
fix: remove display_name
nikbpetrov d96d27b
fix: INIT source identity split
nikbpetrov c981293
fix: revert benchmark start date shorthand
nikbpetrov 637f172
fix: resolve needs backoff as it imports all sources
nikbpetrov 3ea30a8
fix: update job names
nikbpetrov 0e0c73d
rebase and integrate yfinance
nikbpetrov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,6 @@ | ||
| """Infer-specific variables.""" | ||
| """Infer-specific variables. Delegates to sources._metadata.""" | ||
|
|
||
| SOURCE_INTRO = ( | ||
| "We would like you to predict the outcome of a prediction market. A prediction market, in this " | ||
| "context, is the aggregate of predictions submitted by users on the website INFER Public. " | ||
| "You're going to predict the probability that the market will resolve as 'Yes'." | ||
| ) | ||
| from sources._metadata import SOURCE_METADATA | ||
|
|
||
| RESOLUTION_CRITERIA = "Resolves to the outcome of the question found at {url}." | ||
| SOURCE_INTRO = SOURCE_METADATA["infer"]["source_intro"] | ||
| RESOLUTION_CRITERIA = SOURCE_METADATA["infer"]["resolution_criteria"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,6 @@ | ||
| """Manifold-specific variables.""" | ||
| """Manifold-specific variables. Delegates to sources._metadata.""" | ||
|
|
||
| SOURCE_INTRO = ( | ||
| "We would like you to predict the outcome of a prediction market. A prediction market, in this " | ||
| "context, is the aggregate of predictions submitted by users on the website Manifold. " | ||
| "You're going to predict the probability that the market will resolve as 'Yes'." | ||
| ) | ||
| from sources._metadata import SOURCE_METADATA | ||
|
|
||
| RESOLUTION_CRITERIA = "Resolves to the outcome of the question found at {url}." | ||
| SOURCE_INTRO = SOURCE_METADATA["manifold"]["source_intro"] | ||
| RESOLUTION_CRITERIA = SOURCE_METADATA["manifold"]["resolution_criteria"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,6 @@ | ||
| """Yfinance-specific variables.""" | ||
| """Yfinance-specific variables. Delegates to sources._metadata.""" | ||
|
|
||
| SOURCE_INTRO = ( | ||
| "Yahoo Finance provides financial data on stocks, bonds, and currencies and also offers news, " | ||
| "commentary and tools for personal financial management. You're going to predict how questions " | ||
| "based on this data will resolve." | ||
| ) | ||
| from sources._metadata import SOURCE_METADATA | ||
|
|
||
| RESOLUTION_CRITERIA = ( | ||
| "Resolves to the market close price at {url} for the resolution date. If the resolution date " | ||
| "coincides with a day the market is closed (weekend, holiday, etc.) the previous market close " | ||
| "price is used." | ||
| ) | ||
| SOURCE_INTRO = SOURCE_METADATA["yfinance"]["source_intro"] | ||
| RESOLUTION_CRITERIA = SOURCE_METADATA["yfinance"]["resolution_criteria"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| """Orchestration layer for resolve pipeline.""" | ||
| """Orchestration layer.""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| """Shared IO helpers for source fetch/update orchestration.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| import logging | ||
| import os | ||
| from typing import Iterable | ||
|
|
||
| import pandas as pd | ||
|
|
||
| from helpers import constants, data_utils, env | ||
| from utils import gcp | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def write_fetch_output(source: str, dff: pd.DataFrame) -> None: | ||
| """Write fetch DataFrame to <source>_fetch.jsonl and upload. | ||
|
|
||
| Args: | ||
| source (str): Source name (e.g. "infer"). | ||
| dff (pd.DataFrame): Fetched data to write. | ||
| """ | ||
| filenames = data_utils.generate_filenames(source) | ||
| local = filenames["local_fetch"] | ||
| with open(local, "w", encoding="utf-8") as f: | ||
| for record in dff.to_dict(orient="records"): | ||
| f.write(json.dumps(record, ensure_ascii=False) + "\n") | ||
| logger.info(f"Uploading {filenames['jsonl_fetch']} to GCP...") | ||
| gcp.storage.upload( | ||
| bucket_name=env.QUESTION_BANK_BUCKET, | ||
| local_filename=local, | ||
| ) | ||
|
|
||
|
|
||
| def load_existing_resolution_files( | ||
| source: str, | ||
| ids: Iterable[str] | None = None, | ||
| ) -> dict[str, pd.DataFrame]: | ||
| """Download <source>/<id>.jsonl resolution files. | ||
|
|
||
| If ids is given, download only those. If ids is None, list the bucket and | ||
| download every .jsonl under <source>/ — use sparingly, scales with backlog. | ||
|
|
||
| Args: | ||
| source (str): Source name (e.g. "infer"). | ||
| ids (Iterable[str] | None): Specific question IDs to load. If None, | ||
| load every resolution file present in the bucket for this source. | ||
|
|
||
| Returns: | ||
| dict mapping question_id to its resolution DataFrame. | ||
| """ | ||
| if ids is None: | ||
| paths = gcp.storage.list_with_prefix( | ||
| bucket_name=env.QUESTION_BANK_BUCKET, prefix=f"{source}/" | ||
| ) | ||
| question_ids = [ | ||
| os.path.basename(p).removesuffix(".jsonl") for p in paths if p.endswith(".jsonl") | ||
| ] | ||
| else: | ||
| question_ids = [str(qid) for qid in ids] | ||
|
|
||
| result: dict[str, pd.DataFrame] = {} | ||
| for question_id in question_ids: | ||
| basename = f"{question_id}.jsonl" | ||
| remote_path = f"{source}/{basename}" | ||
| local_filename = f"/tmp/{source}_{basename}" | ||
|
|
||
| gcp.storage.download_no_error_message_on_404( | ||
| bucket_name=env.QUESTION_BANK_BUCKET, | ||
| filename=remote_path, | ||
| local_filename=local_filename, | ||
| ) | ||
| if os.path.exists(local_filename): | ||
| df = pd.read_json( | ||
| local_filename, | ||
| lines=True, | ||
| dtype=constants.RESOLUTION_FILE_COLUMN_DTYPE, | ||
| convert_dates=False, | ||
| ) | ||
| if not df.empty: | ||
| result[question_id] = df | ||
| logger.info(f"Loaded {len(result)} existing resolution files for {source}.") | ||
| return result | ||
|
|
||
|
|
||
| def upload_resolution_files(source: str, resolution_files: dict[str, pd.DataFrame]) -> None: | ||
| """Upload per-question resolution files to <source>/<id>.jsonl. | ||
|
|
||
| Args: | ||
| source (str): Source name (e.g. "infer"). | ||
| resolution_files (dict): Mapping of question_id to resolution DataFrame. | ||
| """ | ||
| for question_id, df in resolution_files.items(): | ||
| basename = f"{question_id}.jsonl" | ||
| remote_filename = f"{source}/{basename}" | ||
| local_filename = f"/tmp/{basename}" | ||
|
|
||
| df[["id", "date", "value"]].to_json( | ||
| local_filename, orient="records", lines=True, date_format="iso" | ||
| ) | ||
| gcp.storage.upload( | ||
| bucket_name=env.QUESTION_BANK_BUCKET, | ||
| local_filename=local_filename, | ||
| filename=remote_filename, | ||
| ) | ||
| logger.info(f"Uploaded {len(resolution_files)} resolution files for {source}.") | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.