Skip to content

Make DATAPATH configurable and fix path joins (fixes #3)#5

Open
mkzung wants to merge 1 commit into
MITIBMxGraph:mainfrom
mkzung:configurable-datapath
Open

Make DATAPATH configurable and fix path joins (fixes #3)#5
mkzung wants to merge 1 commit into
MITIBMxGraph:mainfrom
mkzung:configurable-datapath

Conversation

@mkzung

@mkzung mkzung commented Jul 4, 2026

Copy link
Copy Markdown

Problem

preprocess_glass.py and preprocess_sub2vec.py hardcode DATAPATH = "./dataset/" and the README tells users to edit it by hand (issue #3). Paths are built by string concatenation with an inconsistent leading slash (DATAPATH+"/background_nodes.csv" vs DATAPATH+"connected_components.csv"), so both groups only work because the default value ends in /. If a user sets the natural DATAPATH="./dataset" (no trailing slash), the second group produces ./datasetconnected_components.csv and the scripts crash with FileNotFoundError.

Fix

  • Add a --datapath argument (falls back to $ELLIPTIC2_DATAPATH, default ./dataset) so the folder no longer needs hand-editing (closes Ease of use #3).
  • Build every path with os.path.join, removing the trailing-slash dependency and the ./dataset//... double slash.
  • Fix a copy-paste bug in the out-of-range warning (n2id[c1] -> n2id[c2] in the c2 branch).
  • Correct the README: the split is fixed by the deterministic per-subgraph assignment, not a user-editable "split size" (the unused train/val variables are removed).

Verified without downloading the full dataset: both ./dataset and ./dataset/ now resolve correctly, and both scripts compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ease of use

1 participant