diff --git a/.context/week09-svg-style.md b/.context/week09-svg-style.md new file mode 100644 index 0000000..5464ee2 --- /dev/null +++ b/.context/week09-svg-style.md @@ -0,0 +1,70 @@ +# Week 9 SVG house style (match Weeks 5-8 exactly) + +All hero SVGs live in `assets/icons/` and are referenced from the deck as `../../assets/icons/.svg`. Hand-authored SVG only, NO mermaid. They must render correctly as static SVG in a browser (Reveal.js), and read clearly from the back of a 30-foot room. + +## Canvas +- Root: `` (use height 320 for shorter diagrams; never exceed 360). Wide 16:9-ish hero. +- Everything is drawn in this 1000-wide coordinate space. Keep a ~40px left/right margin (content x from ~40 to ~960). + +## Typography (font-family always "sans-serif", or "monospace" for code/commands/filenames) +- Title (top, centered): `x="500" y="24" font-size="15" fill="#1E293B" font-weight="bold" text-anchor="middle"`. +- Subtitle/italic tagline under title: `x="500" y="42" font-size="11" fill="#64748B" font-style="italic" text-anchor="middle"`. +- Card title: font-size 12-13 bold, in the card's dark color. +- Card body lines: font-size 9.5-10.5. +- Monospace command/filename/code: font-size 10-11.5, `font-family="monospace"`. +- Bottom "moral strip" text: font-size 10-11. +- Never go below font-size 9 for anything meant to be read; 6-7 is allowed only to *depict* "too small" failure cases. + +## Color families (light fill / saturated stroke / dark text) +- Blue (structure, primary): fill `#DBEAFE`, stroke `#2563EB`, text `#1E40AF`. Lighter fill `#EFF6FF`. +- Green (good / success / semantics-rich): fill `#D1FAE5`, stroke `#059669`, text `#065F46`. +- Purple (compose / tooling): fill `#EDE9FE` (header band `#DDD6FE`), stroke `#7C3AED`, text `#5B21B6`. +- Amber (caution / steps): fill `#FEF3C7`, stroke `#D97706`, text `#92400E`. +- Teal (export / alt): fill `#CCFBF1`, stroke `#0D9488`, text `#134E4A`. +- Pink/magenta (agent): fill `#FCE7F3` (header band `#FBCFE8`), stroke `#BE185D`, text `#831843` / `#9D174D`. +- Red (danger / missing / "broken"): fill `#FEE2E2`, stroke `#DC2626`, text `#991B1B` / `#7F1D1D`. +- Neutrals: text `#1E293B`, muted `#475569` / `#64748B` / `#94A3B8`, borders `#CBD5E1`, panel fill `#F1F5F9`, white `#FFFFFF`. +- ORCID brand green (use ONLY for ORCID marks): `#A6CE39`. DOI/DataCite can use blue. + +## Card recipe +``` + +``` +- Use stroke-width 1.4 for normal cards, 1.8-2 for the emphasized/centerpiece card. +- Two-tone "named" card (command/skill name in a header band): draw the full card, then a header band rect on top: +``` + + +/command:name +``` + +## Arrow markers (paste once per file in a , vary the id) +``` + + + +``` +- Forward arrows: stroke `#475569` width 1.2-1.6, `marker-end="url(#arrowfwd)"`. +- Feedback/loop arrows: red `#DC2626`, `stroke-dasharray="6,3"`, with a red marker variant. + +## Badge recipe (e.g. NEW, TRIVIAL, ORCID) +``` + +NEW +``` + +## Bottom "moral strip" (most heroes end with one) +``` + +One-line takeaway. +``` + +## Mock code/terminal block (for CLI slides) +- Draw a dark rounded rect `fill="#0F172A"` (slate-900) or light `#F8FAFC` with border; render commands as `monospace` `` lines, prompt `$` in muted color, command in `#E2E8F0` (on dark) or `#1E293B` (on light), flags/comments in a muted/green tint. Keep ~14px line height. + +## Hard rules +- No overlapping text. Compute x/y so labels sit inside their boxes (use `text-anchor="middle"` and center on the box). +- Arrows end exactly at the target box edge (gap of ~2px), not inside it. +- Title + subtitle at top; optional moral strip at bottom; main content in between (y ~60 to ~320). +- Reference templates to mirror: `assets/icons/figures-pipeline.svg` (horizontal pipeline), `assets/icons/figures-plugin-map.svg` (center + satellite cards), `assets/icons/figures-failure-modes.svg` (left/right before-after with mock plots). +- Output valid standalone SVG. Self-check: every ``/``/`` is closed; the file starts with ``. diff --git a/.gitignore b/.gitignore index 65e6a32..2623094 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,6 @@ node_modules/ .env .env.local .gstack/ + +# Working QA scratch (renders, base64 embeds, DataCite probes) +.context/qa/ diff --git a/assets/icons/bids-conversion-flow.svg b/assets/icons/bids-conversion-flow.svg new file mode 100644 index 0000000..3c65bd9 --- /dev/null +++ b/assets/icons/bids-conversion-flow.svg @@ -0,0 +1,78 @@ + + + + + + + + /neuroinformatics:bids-conversion -- a guided 6-step workflow + Brain Imaging Data Structure (BIDS): one command walks raw recordings into a validated, shareable layout. + + + + + 1. Inventory + source data + formats, + subjects, + channels + + + + + + 2. Scaffold + dataset_ + description.json + participants.tsv + + + + + + 3. Convert + files + BrainVision, + EEGLAB .set, + EDF, BDF + + + + + + 4. JSON + sidecars + Sampling- + Frequency, + EEGReference... + + + + + + 5. TSV + tables + channels, + events, + electrodes + + + + + + 6. Validate + bids- + validator + + + + + + + + + + + + Modalities: EEG, EMG, MEG, fMRI, behavioral. + diff --git a/assets/icons/bids-tree.svg b/assets/icons/bids-tree.svg new file mode 100644 index 0000000..3d58c23 --- /dev/null +++ b/assets/icons/bids-tree.svg @@ -0,0 +1,61 @@ + + BIDS -- one layout, every dataset + Brain Imaging Data Structure: predictable folders and filenames for one HBN-EEG subject + + + + + + + + + + + + + ds00XXXX/ + ├─ dataset_description.json + ├─ participants.tsv + ├─ README + └─ sub-NDARAB1234/ +    └─ eeg/ +       ├─ sub-..._task-surroundSupp_eeg.set +       ├─ sub-..._task-surroundSupp_eeg.json +       ├─ sub-..._task-surroundSupp_channels.tsv +       ├─ sub-..._task-surroundSupp_events.tsv +       └─ sub-..._task-surroundSupp_events.json + + + .set = signals · .tsv = tables · .json = sidecar metadata + sub- = subject GUID · task- = task label · one entity per key + Same folders + names in every BIDS dataset, anywhere. + + + + + dataset_description.json + name, BIDSVersion, authors, license + + + + + _eeg.json (sidecar) + sampling rate, reference, channel count + + + + + _channels.tsv + name, type, units per electrode + + + + + _events.json + HED annotations -- the WHAT of events + + + + + Predictable names + sidecars = tools find everything without asking you. + diff --git a/assets/icons/bids-validator-agent.svg b/assets/icons/bids-validator-agent.svg new file mode 100644 index 0000000..af98fb2 --- /dev/null +++ b/assets/icons/bids-validator-agent.svg @@ -0,0 +1,81 @@ + + The bids-validator agent -- this week's mechanical defence + A looping agent locates, validates, fixes with confirmation, and re-validates -- then hands you a clean report + + + + + + + + + + + + + + + + + + bids-validator agent + + + + 1. locate dataset + + + + + 2. run BIDS validator + + + + + 3. categorize: errors / warnings / info + + + + + 4. apply fixes (with confirmation) + + + + + 5. re-validate + + + + loop until clean + + + + + + + + + + BIDS Validation Report + + Subjects: 12 Modalities: eeg + Errors fixed: 2 + [FIXED] missing dataset_description.json + [FIXED] _eeg.json missing + PowerLineFrequency -> 60 + Remaining warnings: 2 + + + + Ready for submission: YES + + + + fixes your data LOCALLY; + nemar-cli validates again AT THE GATE. + + + + + The agent automates the boring checks so the submission gate is a formality, not a surprise. + diff --git a/assets/icons/bids-why.svg b/assets/icons/bids-why.svg new file mode 100644 index 0000000..cb6d46c --- /dev/null +++ b/assets/icons/bids-why.svg @@ -0,0 +1,72 @@ + + Why BIDS -- one layout, every tool + A standard structure means analysis tools, validators, and archives all read your data unchanged + + + + + + + + + + + + + + + + + + + + + + + + + + BIDS dataset + one predictable layout + + machine-readable sidecars + + + + + EEGLAB + MATLAB analysis + + + + MNE-Python + Python analysis + + + + BIDS validator + checks compliance + + + + BIDS Apps + containerized pipelines + + + + OpenNeuro + public archive + + + + NEMAR + EEG compute gateway + + + + mega-analysis + pool many datasets + + + + Standard structure turns 'my data' into 'reusable data'. + diff --git a/assets/icons/demo-roadmap-neuro.svg b/assets/icons/demo-roadmap-neuro.svg new file mode 100644 index 0000000..b40219f --- /dev/null +++ b/assets/icons/demo-roadmap-neuro.svg @@ -0,0 +1,73 @@ + + Live demo -- small and honest + Two short actions, ~4 minutes -- annotate one event, then validate the practicum dataset + + + + + + + + + + + + Step 1 -- HEDit + + + ~2:00 + + one rich prose description of an HBN event + + + + prose + "hard cut to a new + scene; bright outdoor + shot; subject viewing" + + + + + + + validated HED + Sensory-event, + Visual-presentation, + (Onset, Scene-cut) + schema-valid + + the recreate-the-stimulus bar in miniature + + + + + + + Step 2 -- validate the dataset + + + ~2:00 + + nemar dataset validate + run on the HBN practicum dataset + + + + $ nemar dataset validate ./hbn-shot + BIDS: eeg subjects: 12 runs: 24 + errors: 0 + warnings: 2 (non-blocking) + Ready for submission: YES + + a clean BIDS report, validated at the gate + + + + + + + + We do not manufacture a pass. + If the validator flags real errors, we read them live and fix the data -- the report reflects the dataset, not our hopes. + diff --git a/assets/icons/doi-metadata-gap.svg b/assets/icons/doi-metadata-gap.svg new file mode 100644 index 0000000..c538b3c --- /dev/null +++ b/assets/icons/doi-metadata-gap.svg @@ -0,0 +1,72 @@ + + The metadata gap -- findability and credit + Live DataCite records, same dataset, same 8 authors: NEMAR nm000103 vs OpenNeuro ds005505 (HBN-EEG Release 1) + + + + NEMAR -- nm000103 + concept DOI 10.82901/NEMAR.nm000103 + + + OpenNeuro -- ds005505 + 10.18112/openneuro.ds005505.v1.0.1 (version only) + + + + FINDABILITY (FAIR) -- can anyone discover and reuse it? + + + + License / reuse terms + ✓ CC-BY-NC-SA-4.0 + ✗ none + + + + Keywords (subjects) + ✓ 8 terms + ✗ 0 + + + + Description / abstract + ✓ yes + ✗ none + + + + Links to papers + related datasets + ✓ 5 links + ✗ 0 + + + + CREDIT -- does reuse trace back to you? + + + + Authors linked to ORCID iD + ✓ 8 / 8 + ✗ 0 / 8 + + + + Stable concept DOI · funding references + ✓ yes · 2 + ✗ version-only · 0 + + + + + + + + + + OpenNeuro's DOI record carries only a title and author names -- everything else is blank. + License, keywords, description, related links, ORCID, funding: NEMAR fills every field. Source: api.datacite.org (live). + + + + Findable and citable is metadata, not luck -- nemar-cli populates it; the same dataset on OpenNeuro stays bare. + diff --git a/assets/icons/doi-orcid.svg b/assets/icons/doi-orcid.svg new file mode 100644 index 0000000..69f8374 --- /dev/null +++ b/assets/icons/doi-orcid.svg @@ -0,0 +1,100 @@ + + DOI minting + ORCID auto-link -- credit, automatically + Publish a dataset and every author's contribution lands on their ORCID record without a single manual step + + + + + + + + + + + + + Dataset authors + + + + + iD + Author 1 + 0000-0001-2345-6789 + + + + + iD + Author 2 + 0000-0002-9876-5432 + + + + + iD + Author 3 + 0000-0003-1122-3344 + + + each linked to an ORCID iD + + + + + + + + + DataCite DOI (via EZID) + + DOI minted on publish, metadata + includes every author's ORCID iD + + + + concept DOI 10.82901/NEMAR.nm000104 + + + per-version DOIs + ...v1.0.0 ...v1.1.0 ...v2.0.0 + + + + auto + + + + + + + iD + ORCID record -- Author 1 + + Works -- Datasets + + + + + HBN-EEG: movie shot-change dataset + Dataset -- NEMAR -- 2026 + Source: DataCite doi:10.82901/NEMAR... + + appears on each author's + ORCID record automatically + no copy-paste, no manual "add work" + + + + + + + + + + OpenNeuro does not link authors to ORCID on the DOI yet. + Same DataCite DOI infrastructure, but the credit never reaches the author's ORCID profile -- NEMAR closes that loop. + + + + Authors -> ORCID iDs -> DataCite DOI -> back onto every author's ORCID record. Credit flows automatically. + diff --git a/assets/icons/events-thin.svg b/assets/icons/events-thin.svg new file mode 100644 index 0000000..6766036 --- /dev/null +++ b/assets/icons/events-thin.svg @@ -0,0 +1,84 @@ + + events.tsv is thin -- an onset and a cryptic code + The shared events file carries timing and a number; everything that gives it meaning is left out. + + + + + + + + + + what the file actually holds + sub-01_task-movie_events.tsv + + + + + + + + + + + onset + duration + value + + + + + + 0.000 + n/a + 12 + + 1.500 + n/a + 14 + + 3.000 + n/a + 12 + + 4.500 + n/a + 13 + + + + + but + + + + + + what the code "12" never tells you + + + x + stimulus content -- what was on screen? + + x + modality -- visual or audio? + + x + condition -- foreground vs background contrast? + + x + participant response -- was there one? + + x + trial context -- which block, which run? + + + + + HBN originally shipped numeric event codes; step one was replacing them with meaningful strings. + + + + The meaning isn't lost -- it just never reaches the shared file. + diff --git a/assets/icons/hed-anatomy.svg b/assets/icons/hed-anatomy.svg new file mode 100644 index 0000000..01f64fb --- /dev/null +++ b/assets/icons/hed-anatomy.svg @@ -0,0 +1,97 @@ + + HED -- the fix in principle + Hierarchical Event Descriptors put the meaning into controlled, composable tags. + + + + + + + + + + + + one tag = a comma-separated path through the schema + + Sensory-event, Visual-presentation, (Foreground-disk, (Contrast, High)), (Background, Uniform), (Temporal-frequency, 25 Hz) + + + + analysis works at any level of the hierarchy + + + + Action + + + Move + + + Move-body-part + + + Move-upper-extremity + + + Press + + + + + + + + + tag at the leaf (Press) + OR at any ancestor: a + "Move" query matches all. + + + + + + + + the sidecar pattern -- meaning lives beside the data + + + events.tsv (unchanged) + + + + onset + value + 0.000 + 12 + 1.500 + 14 + 3.000 + 12 + 4.500 + 13 + + + + maps + + + events.json (HED keys) + + "value": { + "HED": { + "12": "Sensory-event, + Visual-presentation, + (Contrast, High)", + "14": "Sensory-event, + (Background, Uniform)" + } + + + + events.tsv unchanged; meaning lives in events.json + + + + Controlled, composable, validatable -- the schema means the same thing across labs. + diff --git a/assets/icons/hed-workflow-pain.svg b/assets/icons/hed-workflow-pain.svg new file mode 100644 index 0000000..0a6d278 --- /dev/null +++ b/assets/icons/hed-workflow-pain.svg @@ -0,0 +1,55 @@ + + + + + + + + Why HED workflows stall for most labs + Hierarchical Event Descriptors (HED): every event needs hand-built, schema-valid tags before anyone can search them. + + + + + 1. Read the paper + hours per session + extract event structure + by hand + + + + + + 2. Learn the schema + ~2000 tags + expert-only + vocabulary + + + + + + 3. Write the sidecar + value levels + + value slots + error-prone + + + + + + 4. Validate & repeat + cryptic validator + messages + loop until clean + + + + + + + + + + HED adoption stays inside the labs that build HED. + diff --git a/assets/icons/hedit-pipeline.svg b/assets/icons/hedit-pipeline.svg new file mode 100644 index 0000000..60791a7 --- /dev/null +++ b/assets/icons/hedit-pipeline.svg @@ -0,0 +1,52 @@ + + + + + + + + + + + HEDit -- describe in English, get validated HED + Today: natural language -> validated HED (LangGraph multi-agent) + + + + + Parser + natural language -> structured facts: + action, body-part, direction, + magnitude, unit + + + + + + Tagger + retrieve HED nodes + (RAG over schema), + compose the tag string + + + + + + Validator + official HED validator: + tag exists? units valid? + value slot well-formed? + + + + + + + + + re-tag with validator feedback + + + + The HED schema is the contract -- no agent invents vocabulary. + diff --git a/assets/icons/nemar-upload-publish.svg b/assets/icons/nemar-upload-publish.svg new file mode 100644 index 0000000..424d7ea --- /dev/null +++ b/assets/icons/nemar-upload-publish.svg @@ -0,0 +1,88 @@ + + nemar-cli -- upload to publish + Four commands, end to end -- private while you stage, public only when you ask + + + + + + + + + + + PRIVATE repo -- you're admin: invite collaborators, push directly + + + PUBLIC -- on request; then PR + tags + + + + + 1 + Authenticate + + nemar auth login + one-time, API key cached + + + + + + + + + + 2 + Validate + + nemar dataset validate ./ds + BIDS check, must pass + + + + + + + + + 3 + Upload + + nemar dataset upload ./ds + private by default + + + + + + + + + 4 + Publish request + + nemar dataset publish request <id> + admin approves -> public + DOI + + + + + + + + + + + + + + + + + + + + nemar-cli makes a private GitHub repo, you the admin -- invite collaborators and push directly, until publish. + After publishing, changes go through pull requests + version tags. (OpenNeuro also allows private upload, just CLI-only.) + diff --git a/assets/icons/nemar-validate.svg b/assets/icons/nemar-validate.svg new file mode 100644 index 0000000..97cbf44 --- /dev/null +++ b/assets/icons/nemar-validate.svg @@ -0,0 +1,53 @@ + + nemar-cli -- validation is now trivial + One command checks Brain Imaging Data Structure (BIDS) compliance before anything leaves your machine + + + + TRIVIAL + + + + + + + + + + terminal -- nemar-cli + + + $ + nemar dataset validate ./hbn-eeg + + + Resolving BIDS validator (Deno) ... + Scanning ./hbn-eeg -- 124 files, 12 subjects + Checking sidecars, events, channels ... + ! 2 warnings (optional README fields) + + + BIDS valid -- 0 errors, 2 warnings + $ + + + + + wraps the official BIDS + validator (Deno) + same engine NEMAR runs server-side + + + runs automatically on upload + AND on every update PR + no way to ship an invalid dataset + + + no separate toolchain + to install + the CLI fetches the validator for you + + + + Validation moved from a multi-step chore to a single command you can run before, during, and after upload. + diff --git a/assets/icons/neuro-plugin-map.svg b/assets/icons/neuro-plugin-map.svg new file mode 100644 index 0000000..dbf4921 --- /dev/null +++ b/assets/icons/neuro-plugin-map.svg @@ -0,0 +1,45 @@ + + The neuroinformatics plugin -- 2 skills + 1 agent + Conversion skill at the centre; the validator agent defends it; experiment-design waits in the wings + + + + + + + + + + + + bids-validator + agent -- autonomous validate + fix + this week's mechanical defence + + + + + + + + + + /neuroinformatics:bids-conversion + Guided BIDS conversion + EEG, EMG, MEG, fMRI + today's focus + + + + + + /neuroinformatics:experiment-design + PsychoPy + Lab Streaming Layer + data collection + (in the plugin; not today's focus) + + + + + HED annotation lives inside the skills; today's focus is conversion + validation. + diff --git a/assets/icons/openneuro-flow.svg b/assets/icons/openneuro-flow.svg new file mode 100644 index 0000000..6f38b92 --- /dev/null +++ b/assets/icons/openneuro-flow.svg @@ -0,0 +1,42 @@ + + + + + + + + OpenNeuro -- the default open BIDS archive + Brain Imaging Data Structure (BIDS) in, a citable public dataset out. + + + + + Validate (BIDS) + pass the validator first + + + + + + Upload + browser or CLI + + + + + + Public + DOI + citable, versioned dataset + + + + + + + + + + NOTE + Private upload exists, but only via CLI / direct push -- no polished GUI for it. + And the DOI record stays sparse: no ORCID author links, minimal metadata. + diff --git a/assets/icons/openneuro-nemar.svg b/assets/icons/openneuro-nemar.svg new file mode 100644 index 0000000..c770993 --- /dev/null +++ b/assets/icons/openneuro-nemar.svg @@ -0,0 +1,40 @@ + + Two homes for BIDS data -- OpenNeuro and NEMAR + Both speak BIDS; one is the broad archive, one sits next to a supercomputer + + + + + + OpenNeuro + Broad open neuroimaging archive + All modalities: fMRI, MRI, EEG, MEG... + Open sharing (private upload too, CLI-only) + DataCite DOI for every version + the general-purpose front door + + + + + + + NEMAR + EEG / MEG focus, BIDS-native + Sits next to SDSC supercomputer compute + Analyze without downloading + GitHub-backed: you're admin, invite collaborators + compute lives where the data lives + + + + + + + + BIDS + + + + + HBN-EEG lives on both. + diff --git a/assets/icons/recreate-the-stimulus-figure.svg b/assets/icons/recreate-the-stimulus-figure.svg new file mode 100644 index 0000000..8b26fa9 --- /dev/null +++ b/assets/icons/recreate-the-stimulus-figure.svg @@ -0,0 +1,24 @@ + + + + + + The model got the HED annotation only -- no image, no example + + + + + + + + + Reproduced from the HED + Gratings, background, fixation, contrast, + four disks present -- all correct. + + + + The miss: disk size and position + Both are awkward to express in HED, so + they were left out of the annotation. + diff --git a/assets/icons/recreate-the-stimulus.png b/assets/icons/recreate-the-stimulus.png new file mode 100644 index 0000000..527449c Binary files /dev/null and b/assets/icons/recreate-the-stimulus.png differ diff --git a/assets/icons/reuse-credit-gap.svg b/assets/icons/reuse-credit-gap.svg new file mode 100644 index 0000000..69533fe --- /dev/null +++ b/assets/icons/reuse-credit-gap.svg @@ -0,0 +1,96 @@ + + A finished analysis nobody can reuse + A rich raw recording funnels down to a thin, locked artifact -- structure, semantics, and credit all lost + + + + + + + + + + Rich raw recording + + + + EEG 128-ch continuous + + + + Event markers (timed) + + + + Behavioral log + + + + Stimulus log + + + + + + + + + + + + + all that's left + + + + + Thin shared artifact + + + + events.tsv + 1 cryptic column, no labels + onset + + + + results figure + locked in a folder, no provenance + + + + + + + + + + + + + Structure + can't find things + + + + + + + + Semantics + can't decode events + + + + + + + + Credit + reuse traces to no one + + + + + Analysis-ready means no forensic search for unreported details. + diff --git a/assets/icons/sidecar-anatomy.svg b/assets/icons/sidecar-anatomy.svg new file mode 100644 index 0000000..100f667 --- /dev/null +++ b/assets/icons/sidecar-anatomy.svg @@ -0,0 +1,79 @@ + + Where structure ends -- the sidecar and events.tsv + The sidecar describes the recording; events.tsv lists when things happened, but not what they meant + + + + + + + + + sub-..._task-surroundSupp_eeg.json + + + + { + "TaskName" + : "surroundSupp", + "SamplingFrequency" + : 500, + "EEGReference" + : "Cz", + "PowerLineFrequency" + : 60, + "EEGChannelCount" + : 128 + } + + + Machine-readable metadata about the recording. + Tools read this instead of asking you. + + + sub-..._task-surroundSupp_events.tsv + + + + + onset + duration + value + + + + + + + + 0.000 + n/a + 12 + + + 1.500 + n/a + 14 + + + 3.000 + n/a + 12 + + + value 12, 14 ... what do these codes mean? + the table alone cannot tell you + + + + structure says WHERE the event is + it cannot say WHAT it was. + + + + → HED + + + + BIDS gives you structure; HED adds the semantics that make events self-describing. + diff --git a/assets/icons/two-standards-bar.svg b/assets/icons/two-standards-bar.svg new file mode 100644 index 0000000..bbec2f1 --- /dev/null +++ b/assets/icons/two-standards-bar.svg @@ -0,0 +1,37 @@ + + Two standards, one bar for clarity + Structure says where the data lives; semantics say what each event meant + + + + + THE BAR + A language model can recreate the stimulus from the annotation alone. + + + + + + + BIDS = STRUCTURE + Where the files live on disk + Naming: sub- / ses- / task- + Sidecars + TSV tables + answers WHERE + + + + + + + HED = SEMANTICS + What each event actually meant + Controlled, composable vocabulary + Machine-readable annotations + answers WHAT + + + + + Structure + semantics together clear the bar -- reproducible by humans and machines alike. + diff --git a/blog/week-09-neuroinformatics.md b/blog/week-09-neuroinformatics.md new file mode 100644 index 0000000..091d9b4 --- /dev/null +++ b/blog/week-09-neuroinformatics.md @@ -0,0 +1,244 @@ +# Week 9 Guide: Neuroinformatics -- Standards, Sharing, and Credit + +*A finished analysis is not a finished contribution. The data behind it has to be reproducible, shareable, and citable, or the work dies with the paper. Two standards carry the weight: the Brain Imaging Data Structure (BIDS) answers where everything lives (structure), and Hierarchical Event Descriptors (HED) answer what every event meant (semantics). The single most useful idea this week: the bar for a complete annotation is concrete and falsifiable. **A language model should be able to reconstruct the stimulus, or the experiment, from the annotation alone.** That is not a metaphor; it is exactly the test demonstrated in the Healthy Brain Network EEG (HBN-EEG) paper (Shirazi et al., 2024, Figure 9), where Claude Sonnet 3.5 regenerated the Surround Suppression stimulus from its HED description with no image. The `neuroinformatics` plugin gets data to that bar; HEDit automates the hardest leg (natural language to validated HED); and `nemar-cli` ships it with rich, Open Researcher and Contributor ID (ORCID)-linked DataCite metadata, the part OpenNeuro still leaves blank.* + +This guide accompanies [Week 9](../sessions/week-09/) of the Agentic Research Course by the [Open Science Collective](https://osc.earth). It builds directly on Week 8 (figures), where a language model regenerated a stimulus figure from its HED annotation; this week that same trick becomes the standard of annotation completeness. The dataset throughout is HBN-EEG, the very data the course has analyzed since Week 3 ("The Present" movie). It is itself a BIDS + HED dataset published on both OpenNeuro and NEMAR with exactly the tools this session teaches: the loop closes, the data you analyzed is the worked example for how to share data. + +> **Scope note.** Week 9 is about *standardizing and sharing* data you already have. The `neuroinformatics` plugin also ships an `experiment-design` skill (PsychoPy + Lab Streaming Layer, the data-*collection* side); that is in the plugin and the README, but not the focus of this session. + +--- + +## The Reuse-and-Credit Gap + +A lab collects dense, synchronized data. What reaches re-users is a thin `events.tsv` with one cryptic column and a results figure locked in a folder. Three locks snap shut at once: + +- **Structure** -- where is everything? Custom folder layouts mean every re-user writes glue code before they can start. +- **Semantics** -- what did the events mean? A numeric event code is meaningless outside the lab. +- **Credit** -- who is cited when the data is reused? Without a Digital Object Identifier (DOI) and author identifiers, reuse traces back to no one. + +The phrase from the HBN-EEG paper is the target: a dataset is "analysis-ready" when re-users need **no forensic search for unreported details**. The information is not lost; it just never reaches the shared artifact. BIDS, HED, and good sharing are what close the three locks. + +--- + +## Two Standards, One Bar + +**BIDS answers where; HED answers what.** The bar that judges both is the same: someone, or a language model, can reconstruct your experiment without emailing you. + +- **BIDS (Brain Imaging Data Structure)** -- a filesystem convention plus metadata. *Structure.* +- **HED (Hierarchical Event Descriptors)** -- a controlled, composable, validatable event vocabulary. *Semantics.* + +Plant the bar early; it pays off when we hit Figure 9. + +--- + +## BIDS: the Structure Standard + +BIDS is a filesystem convention plus metadata: predictable names (`sub-`, `ses-`, `task-`), JSON sidecars, and TSV tables, with a top-level `dataset_description.json` and `participants.tsv`. + +```text +ds00XXXX/ (dataset root) +├── dataset_description.json (name, BIDSVersion, authors, license) +├── participants.tsv (age, sex, p-factor ...) +├── README +└── sub-NDARAB1234/ + └── eeg/ + ├── sub-..._task-surroundSupp_eeg.set (signals) + ├── sub-..._task-surroundSupp_eeg.json (sidecar) + ├── sub-..._task-surroundSupp_channels.tsv (name, type, units) + ├── sub-..._task-surroundSupp_events.tsv (onset, duration, value) + └── sub-..._task-surroundSupp_events.json (HED annotations) +``` + +**Why BIDS is worth it: one layout, every tool.** A BIDS dataset is readable by EEGLAB, MNE-Python, the BIDS validator, and BIDS Apps, and it is the upload format both OpenNeuro and NEMAR expect. Standard structure is also what makes mega-analysis across studies possible. The payoff is leverage, not bureaucracy. + +### Where structure ends + +The JSON sidecar carries acquisition metadata; `events.tsv` carries the timeline. + +```json +{ + "TaskName": "surroundSupp", + "SamplingFrequency": 500, + "EEGReference": "Cz", + "PowerLineFrequency": 60, + "EEGChannelCount": 128 +} +``` + +```text +onset duration value +0.000 n/a 12 +1.500 n/a 14 +3.000 n/a 12 +``` + +Structure tells you **where** an event sits on the timeline. It cannot tell you **what** the event was. That gap is semantics, and it is HED's job. + +--- + +## HED: the Semantics Standard + +`events.tsv` is thin: an onset and a cryptic numeric code. Stimulus content, modality, condition, participant response, trial context -- all real, all recorded, none of it in the shared file. (HBN originally shipped numeric event codes; the first curation step was replacing them with meaningful strings, then annotating with HED.) + +HED is the fix. One tag is a comma-separated path through a controlled schema; the hierarchy carries meaning, so analysis works at any level: + +```text +Action, Move, Move-body-part, Move-upper-extremity, Press +``` + +You can analyze at the leaf (`Press`) or at any ancestor (`Move`). Tags compose, take typed values with units, and validate against the official schema, so a tag means the same thing across labs. The sidecar pattern keeps `events.tsv` unchanged; all semantics live in `events.json` under HED keys, so existing analyses keep working. + +The HBN-EEG paper states three objectives for HED: build event context; create machine-readable and human-understandable annotation for mega-analysis and machine learning; and task transparency and reproducibility. + +### The bar: recreate the stimulus + +Here is the test made concrete. In the HBN-EEG paper, the HED annotation of the Surround Suppression task was handed to Claude Sonnet 3.5 with **no image** -- and the model regenerated the visual stimulus from the annotation alone. + +Everything structural came back correct: the gratings, the vertical-grating background, the central fixation point, the contrast relationship, four foreground disks present. The **only** miss was the disks' **size and position** -- both are awkward to express in HED, so they were left out of the annotation, and the model had no way to reproduce them. + +That miss is the proof the test is honest, and a real lesson: HED nails event semantics, but spatial geometry (size, position) is hard to encode. The rule that falls out: + +> If a language model can rebuild your stimulus from the annotation alone, the annotation is complete. If it can't, you left something out. + +The same trick drew several of the paper's figures (a callback to Week 8). Cite: Shirazi et al. (2024), *HBN-EEG: The FAIR implementation of the Healthy Brain Network EEG dataset*, bioRxiv [10.1101/2024.10.03.615261](https://doi.org/10.1101/2024.10.03.615261). + +--- + +## HEDit: AI-assisted HED + +HED workflows stall for most labs, and it is a workflow problem, not a willingness problem: roughly 2000 tags, expert-only fluency, a validator with cryptic messages. Adoption stays inside the labs that build the schema. + +[HEDit](https://github.com/Annotation-Garden/HEDit) turns the wall into a paragraph. You write one rich prose description per event value; HEDit runs a Parser to Tagger to Validator pipeline (a LangGraph multi-agent system with the official HED validator in the loop) and returns a BIDS-compliant `events.json` with HED tags plus a provenance trail. + +- **Parser** -- natural language to structured facts (action, body-part, direction, magnitude, unit) +- **Tagger** -- retrieve HED nodes (retrieval over the schema) and compose the tag string +- **Validator** -- the official HED validator: does the tag exist, are units valid, is the value slot well-formed? On failure, the error feeds back to the Tagger. + +The schema is the contract; no agent invents vocabulary. And HEDit is only as good as the description: it is tuned for exactly the detail the recreate-the-stimulus bar demands. Garbage in, garbage out. + +--- + +## The neuroinformatics Plugin: 2 Skills + 1 Agent + +The plugin packages the BIDS workflow for Claude Code. + +- **`/neuroinformatics:bids-conversion`** -- a guided conversion to BIDS. +- **`bids-validator`** (agent) -- autonomous validation and fixes. This week's mechanical defence. +- **`/neuroinformatics:experiment-design`** -- the data-collection side (PsychoPy + Lab Streaming Layer); in the plugin, not today's focus. + +HED annotation is woven into the skills rather than exposed as a separate command. + +### /neuroinformatics:bids-conversion + +A guided six-step workflow that ends where the next act begins, validation: + +1. **Inventory source data** -- formats, subjects, channels +2. **Scaffold** -- `dataset_description.json`, `participants.tsv`, the directory tree +3. **Convert files** -- BrainVision, EEGLAB `.set`, EDF, BDF +4. **JSON sidecars** -- `SamplingFrequency`, `EEGReference`, channel counts +5. **TSV tables** -- `channels`, `events`, `electrodes`, `coordsystem` +6. **Validate** -- the BIDS validator + +Modalities: EEG, EMG, MEG, fMRI, and behavioral data. + +### The bids-validator agent: the mechanical defence + +The agent locates the dataset, runs the BIDS validator, categorizes findings (errors must-fix, warnings should-fix, info optional), applies fixes with confirmation, re-validates, and reports readiness for OpenNeuro/NEMAR. + +```text +## BIDS Validation Report +Subjects: 12 Modalities: eeg +Errors fixed: 2 + [FIXED] missing dataset_description.json + [FIXED] _eeg.json missing PowerLineFrequency -> 60 +Remaining warnings: 2 +Ready for submission: YES +``` + +This is Week 9's equivalent of `cite-the-card` (Week 5) and `validate_fonts.py` (Week 8): a deterministic gate that turns "looks fine" into pass/fail. Note the division of labour, the agent fixes your data **locally**; `nemar-cli` validates again **at the upload gate**. + +--- + +## Sharing and Credit + +### OpenNeuro + +[OpenNeuro](https://openneuro.org) is the de-facto open BIDS archive: validated on ingest, public, DOI-minted, the default home for shared neuro data and a genuinely great resource. Two honest caveats that motivate what follows: private upload *is* possible, but only via the command line / direct push (there is no polished GUI for it), and the DOI record stays sparse -- no ORCID author links, minimal metadata. + +### NEMAR + +[NEMAR](https://nemar.org) (the Neuroelectromagnetic Data Archive and Tools Resource) specializes in EEG/MEG BIDS datasets and sits next to San Diego Supercomputer Center compute, so you can analyze without downloading. HBN-EEG lives on both OpenNeuro and NEMAR, which is exactly what lets us compare their DOI records head to head. + +### nemar-cli: validation is trivial + +```bash +nemar dataset validate ./my-dataset +``` + +It wraps the official BIDS validator (Deno), and it also runs automatically on upload and on every update pull request. No separate toolchain to install or configure. + +### nemar-cli: upload to publish, and the collaboration model + +```bash +nemar auth login # one-time, API key cached +nemar dataset validate ./my-dataset # BIDS check, must pass +nemar dataset upload ./my-dataset # creates a private GitHub repo +nemar dataset publish request nm000XXX # admin approves -> public + DOI +``` + +The model is what matters: `upload` creates a **private GitHub repository where you are the admin**. You invite collaborators and push directly to it while you stage. After publishing, changes go through pull requests and version tags. (OpenNeuro also supports private upload, just command-line only, so the NEMAR advantage is this collaboration model plus the rich DOI metadata below, not "private vs public.") + +### DOI minting + ORCID auto-link + +On publish, `nemar-cli` mints a **concept DOI** (one stable citation across all versions) plus per-version DOIs, via EZID writing DataCite kernel-4 metadata, and **auto-links every author's ORCID iD** in that metadata. The dataset then appears on each author's ORCID record automatically, with no manual "add work." OpenNeuro does not link authors to ORCID on the DOI yet. + +--- + +## The Metadata Gap: Proof on a Real Dataset + +This is not a claim; it is live DataCite data on the audience's own dataset. The same HBN-EEG Release 1, the same eight authors, two homes: + +| DataCite field | NEMAR `nm000103` | OpenNeuro `ds005505` | +|---|---|---| +| DOI | `10.82901/NEMAR.nm000103` (concept) | `10.18112/openneuro.ds005505.v1.0.1` (version-only) | +| Stable concept DOI | yes | no | +| **Authors linked to ORCID iD** | **8 / 8** | **0 / 8** | +| License | CC-BY-NC-SA-4.0 | none | +| Subject keywords | 8 | 0 | +| Description / abstract | yes | none | +| Links to papers + related datasets | 5 | 0 | +| Funding references | 2 | 0 | + +OpenNeuro's DOI record carries only a title and author names; everything else is blank. NEMAR fills every field. **Findability (the F in FAIR: Findable, Accessible, Interoperable, Reusable) and credit are metadata the platform writes, not luck.** (Source: `api.datacite.org`, live records.) + +--- + +## Live Walkthrough + +Two small, honest actions, about four minutes total: + +1. **HEDit** -- write one rich prose description of an HBN event and watch it become a validated HED string. The recreate-the-stimulus bar in miniature: the richer the description, the better the tag. +2. **`nemar dataset validate`** -- run it on the HBN practicum dataset and read the clean BIDS report. + +If validation surfaces something, we walk it. We do not manufacture a pass. + +--- + +## Common Pitfalls + +- **Treating filenames as metadata.** A re-user learning the condition from `sub-003_ses-02_task-pullwalk_events.tsv` is reading a string, not a machine-queryable field. Put it in the sidecar. +- **Shipping numeric event codes.** Replace them with meaningful strings, then annotate with HED. +- **Stopping at BIDS.** Structure without semantics still leaves the events undecodable. HED is the second half. +- **Skipping validation.** The `bids-validator` agent fixes locally; `nemar dataset validate` checks at the gate. Both, by design. +- **Assuming a DOI means credit.** A DOI with no ORCID author identifiers does not propagate to anyone's ORCID record. Check the DataCite metadata, not just that a DOI exists. + +--- + +## Before Next Week + +- Install [`research-skills`](https://github.com/neuromechanist/research-skills) if you have not; it bundles `neuroinformatics`, `figures`, `manuscript`, `opencite`, `grant`, `project`, and `presentation`. +- If you have a small EEG/EMG dataset, try `/neuroinformatics:bids-conversion` on it and run the `bids-validator` agent. +- Browse HBN-EEG on [NEMAR](https://nemar.org) and [OpenNeuro](https://openneuro.org); compare the two DOI records on [DataCite](https://commons.datacite.org). +- Optional: try [HEDit](https://github.com/Annotation-Garden/HEDit) on one event from your own experiment, written as a rich paragraph. + +Week 10 is the capstone: building your own plugin. This week was the last research-workflow plugin of the course; next you make one. diff --git a/presentations/week-09/outline.md b/presentations/week-09/outline.md new file mode 100644 index 0000000..35d2c05 --- /dev/null +++ b/presentations/week-09/outline.md @@ -0,0 +1,27 @@ +# Week 9 Outline -- Neuroinformatics: Standards, Sharing, and Credit + +## Target +23 slides, ~30 min presentation + ~5 min live demo + ~15 min Q&A. + +## Core message +You spent eight weeks producing an analysis. This week is about making the *data* behind it reproducible, shareable, and citable, so the work outlives the paper. Two standards do the heavy lifting: **BIDS** answers *where everything lives* (structure); **HED** answers *what every event meant* (semantics). The bar for "good enough" is concrete and high: an annotation is complete when a language model can reconstruct the stimulus or the experiment from the annotation alone, with no forensic search for unreported details. The `neuroinformatics` plugin gets your data to that bar (`/neuroinformatics:bids-conversion` + the `bids-validator` agent), `HEDit` automates the hardest part (natural language to validated HED), and `nemar-cli` ships it to the world with the one feature OpenNeuro still lacks: automatic ORCID author linking on the minted DOI, so credit flows back to you. + +## Narrative arc +1. **The gap** -- a finished analysis nobody can rerun, events nobody can decode, credit nobody can trace. +2. **BIDS** -- the structure standard: directory layout, sidecars, TSVs. Where things live. +3. **HED** -- the semantics standard: hierarchical, composable, validatable event vocabulary. What things mean. The "recreate the stimulus" bar (HBN-EEG, Figure 9). +4. **HEDit** -- AI-assisted HED: describe in English, get validated HED back. +5. **The neuroinformatics plugin** -- `bids-conversion` skill + `bids-validator` agent (the mechanical defence). +6. **Sharing and credit** -- OpenNeuro then NEMAR; `nemar-cli` makes validation trivial, keeps data private until ready, and mints DOIs with ORCID author auto-linking. +7. **Demo + close** -- a tiny HEDit annotation, a `nemar dataset validate`, then Week 10 (build your own plugin). + +## Practicum thread +The dataset throughout is HBN-EEG (Shirazi et al., 2024) -- the very data the course has analyzed since Week 3 ("The Present" movie). It is a BIDS + HED dataset published on both OpenNeuro and NEMAR with exactly the tools this session teaches. The loop closes: the data you analyzed is itself the worked example for how to share data. + +## Continuity +- **Week 5 (lit review):** cite-the-card -> here, cite-the-dataset (every shared dataset earns a DOI). +- **Week 8 (figures):** last week an LLM-from-HED regenerated a stimulus figure; this week that same trick becomes the *standard of annotation completeness*. +- **Week 10 (plugins):** today is the last research-workflow plugin; next week you build your own. + +## Out of scope (deliberate) +PsychoPy experiment design and Lab Streaming Layer (the data *collection* side) are in the plugin and the README but are not taught live this week; the session is about *standardizing and sharing* data you already have. diff --git a/presentations/week-09/presentation.json b/presentations/week-09/presentation.json new file mode 100644 index 0000000..d988e45 --- /dev/null +++ b/presentations/week-09/presentation.json @@ -0,0 +1,668 @@ +{ + "presentation": { + "metadata": { + "title": "Agentic Research Course", + "author": "Seyed Yahya Shirazi, Ph.D.", + "description": "Week 9: Neuroinformatics -- standards, sharing, and credit. BIDS for structure, HED for semantics, the bar that an LLM can recreate the stimulus from the annotation alone, the neuroinformatics plugin (bids-conversion + bids-validator agent), and sharing on OpenNeuro/NEMAR where nemar-cli mints DOIs with automatic ORCID author linking.", + "theme": "default", + "aspectRatio": "16:9", + "controls": { + "slideNumbers": true, + "progress": true + } + }, + "slides": [ + { + "id": "title", + "layout": "title", + "transition": "fade", + "elements": [ + { + "type": "text", + "content": "# Agentic Research Course", + "style": { "fontSize": "xxl", "alignment": "center", "fontWeight": "bold" }, + "position": { "area": "center" } + }, + { + "type": "text", + "content": "Week 9: Neuroinformatics -- Standards, Sharing, and Credit", + "style": { "fontSize": "large", "alignment": "center", "color": "#2563EB" }, + "position": { "area": "center", "order": 1 } + }, + { + "type": "text", + "content": "**Seyed Yahya Shirazi, Ph.D.**\nAssistant Project Scientist, Swartz Center for Computational Neuroscience\nUC San Diego\n\n[Open Science Collective](https://osc.earth)", + "style": { "fontSize": "medium", "alignment": "center", "color": "#475569" }, + "position": { "area": "center", "order": 2 } + }, + { + "type": "text", + "content": "Course: [courses.osc.earth/agentic-research](https://courses.osc.earth/agentic-research/) | Discord: [discord.gg/5dWJCUmUww](https://discord.gg/5dWJCUmUww) | Recording: published to YouTube within 48 h", + "style": { "fontSize": "medium", "alignment": "center", "color": "#2563EB" }, + "position": { "area": "center", "order": 3 } + } + ], + "speakerNotes": "- Welcome back.\n\n- Last week: publication-grade figures, validated before export.\n\n- Today: the data underneath the analysis. Eight weeks of work produces a result; this week makes the *data* reproducible, shareable, and citable, so it outlives the paper.\n\n- Two standards do the heavy lifting: BIDS (Brain Imaging Data Structure) for structure, HED (Hierarchical Event Descriptors) for semantics. Then the neuroinformatics plugin, and sharing on OpenNeuro / NEMAR.\n\n- The dataset throughout is HBN-EEG, the same data the course has analyzed since Week 3. It is itself shared with exactly these tools.\n\n- Format: ~30 min, two tiny live demos, 15 min Q&A." + }, + { + "id": "where-we-are", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Where We Are", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "bullets", + "items": [ + { "text": "**Weeks 1-4 -- Git, Claude Code, project management, CI/CD.** The safety net and the agent." }, + { "text": "**Weeks 5-7 -- Lit review, grants, manuscripts.** The writing pipeline; cite-the-card discipline throughout.", "animation": { "fragment": true, "type": "fade", "index": 0 } }, + { "text": "**Week 8 -- Scientific figures.** Publication-grade panels, validated before export.", "animation": { "fragment": true, "type": "fade", "index": 1 } }, + { "text": "**Today -- the data underneath.** You have an analysis and figures; now make the *data* reproducible, shareable, and citable.", "animation": { "fragment": true, "type": "slide-up", "index": 2 } } + ], + "bulletStyle": "disc", + "style": { "fontSize": "xl" }, + "position": { "area": "content" } + } + ], + "speakerNotes": "[Press right 3x to reveal fragments]\n\n- Recap the arc. Weeks 1-4 were the engine and the safety net; 5-7 the writing pipeline; 8 the figures.\n\n- Land the closing bullet: everything so far assumed you *had* analysis-ready data. Today is how that data gets made and shared.\n\n- The thread is reproducibility and credit: who can rerun this, and who gets cited when they do." + }, + { + "id": "reuse-gap", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## A Finished Analysis Nobody Can Reuse", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/reuse-credit-gap.svg", + "alt": "A rich raw recording (EEG 128-channel, event markers, behavioral log, stimulus log) funneling down to a thin shared artifact: a one-column events.tsv and a lone results figure locked in a folder. Three padlocks across the bottom labelled Structure (can't find things), Semantics (can't decode events), and Credit (reuse traces to no one).", + "width": "82%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "warning", + "content": "Three locks at once: **structure** (where is everything?), **semantics** (what did the events mean?), **credit** (who is cited when it is reused?). Analysis-ready means **no forensic search for unreported details.**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The motivation slide. A lab collects dense, synchronized data; what reaches re-users is a thin file and a figure.\n\n- Three failures, and the talk is structured around them: structure, semantics, credit.\n\n- 'Forensic search for unreported details' is the phrase from the HBN-EEG paper. That is exactly what BIDS + HED + good sharing eliminate.\n\n- This is not a will problem. The data exists; the format and the platform decide whether anyone can use it." + }, + { + "id": "two-standards", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Two Standards, One Bar for Clarity", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/two-standards-bar.svg", + "alt": "A full-width banner reading 'the bar -- a language model can recreate the stimulus from the annotation alone'. Below it two cards: a blue BIDS = STRUCTURE card (where files live; naming sub- ses- task-; sidecars + TSVs; answers WHERE) and a green HED = SEMANTICS card (what each event meant; controlled composable vocabulary; machine-readable; answers WHAT).", + "width": "88%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "**BIDS answers where; HED answers what.** The bar that judges both: someone, or an LLM, can reconstruct your experiment without emailing you.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- Plant the two-standard framing and the bar early; the bar pays off on slide 10.\n\n- BIDS = structure: the filesystem and metadata convention. HED = semantics: the controlled vocabulary for what happened.\n\n- The bar is deliberately concrete and falsifiable. Not 'good annotation' in the abstract, but: can a model rebuild the stimulus from it?\n\n- Both define abbreviations on first use: BIDS = Brain Imaging Data Structure, HED = Hierarchical Event Descriptors." + }, + { + "id": "what-is-bids", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## What Is BIDS -- One Layout, Every Dataset", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/bids-tree.svg", + "alt": "An annotated BIDS directory tree for one HBN-EEG subject: dataset root with dataset_description.json, participants.tsv, README, and a sub-NDARAB1234 folder containing an eeg subfolder with _eeg.set, _eeg.json sidecar, _channels.tsv, _events.tsv, and _events.json. Annotation chips point at the description file, the sidecar, channels.tsv, and events.json.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "note", + "content": "A filesystem convention plus metadata: predictable names (`sub-`, `ses-`, `task-`), JSON sidecars, and TSV tables. **The same shape in every BIDS dataset, anywhere.**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- Walk the tree. Root holds dataset_description.json and participants.tsv; each subject has a modality folder (eeg/) with the signals plus sidecars and TSVs.\n\n- The naming is the contract: a tool that knows BIDS finds the events file without being told where it is.\n\n- HBN uses NDAR GUIDs for subject IDs; the task here is surroundSupp.\n\n- Sidecars and TSVs are where the next two acts live: the sidecar is acquisition metadata, events.tsv is the timeline." + }, + { + "id": "why-bids", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Why BIDS -- One Layout, Every Tool", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/bids-why.svg", + "alt": "A hub-and-spoke diagram with a central BIDS dataset node and arrows out to EEGLAB, MNE-Python, the BIDS validator, BIDS Apps, OpenNeuro, NEMAR, and mega-analysis.", + "width": "84%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "Readable by EEGLAB, MNE, validators, and BIDS Apps; the upload format both OpenNeuro and NEMAR expect. **Standard structure turns 'my data' into 'reusable data.'**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- BIDS is worth the effort because the tooling assumes it.\n\n- One layout means analysis software, validators, BIDS Apps, and both major archives all read your data without custom glue.\n\n- It is also what makes mega-analysis across studies possible: common structure plus, next, common semantics.\n\n- Land it: the payoff of BIDS is leverage, not bureaucracy." + }, + { + "id": "sidecar-events", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Where Structure Ends -- the Sidecar and events.tsv", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/sidecar-anatomy.svg", + "alt": "Two panels. Left: a mock _eeg.json sidecar with TaskName, SamplingFrequency 500, EEGReference Cz, PowerLineFrequency 60, EEGChannelCount 128. Right: a mock events.tsv with onset, duration, value columns and three rows. A red tag under the table reads 'structure says WHERE the event is; it cannot say WHAT it was', and a green arrow exits to the right labelled toward HED.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "The sidecar carries acquisition metadata; `events.tsv` carries the timeline. **Structure tells you where an event is, not what it was.** That gap is semantics.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- This is the hinge slide between structure and semantics.\n\n- The sidecar answers acquisition questions: sampling rate, reference, line frequency, channel counts. BIDS handles this well.\n\n- events.tsv gives onset, duration, and a value. It places events on the timeline but does not say what they meant.\n\n- The green arrow is the bridge: to say what, you need HED." + }, + { + "id": "events-thin", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## events.tsv Is Thin", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/events-thin.svg", + "alt": "Left: a thin events.tsv with onset, duration, value columns and rows whose value is a cryptic numeric code (12, 14, 12, 13). Right: a red 'what's missing' list (stimulus content, modality, condition, participant response, trial context). A note: HBN originally shipped numeric event codes; step one was replacing them with meaningful strings.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "warning", + "content": "An onset and a cryptic code. Stimulus, modality, condition, response, context -- all real, all recorded, **none of it in the shared file.** The meaning is not lost; it just never reaches re-users.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The one-column problem. A numeric code is meaningless to anyone outside the lab.\n\n- Everything a re-analysis needs (what was on screen, was it visual or audio, which condition, did the subject respond) exists in the raw records but is not in events.tsv.\n\n- For HBN-EEG, the very first curation step was replacing numeric codes with meaningful strings, then annotating with HED.\n\n- Set up the fix: HED carries this meaning in a sidecar without touching the timeseries." + }, + { + "id": "hed-principle", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## HED -- the Fix in Principle", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/hed-anatomy.svg", + "alt": "Three zones. Top: one HED tag shown as a comma-separated path through the schema (Sensory-event, Visual-presentation, foreground disk with high contrast, uniform background, 25 Hz). Bottom-left: an inheritance tree Action to Move to Move-body-part to Move-upper-extremity to Press, noting analysis works at the leaf or any ancestor. Bottom-right: the sidecar pattern, an unchanged events.tsv beside an events.json mapping value codes to HED tag strings.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "note", + "content": "Controlled, composable, validatable. The hierarchy means analysis works at any level, and `events.tsv` stays unchanged: **all meaning lives in `events.json`** under HED keys.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- HED = Hierarchical Event Descriptors. A controlled vocabulary that is composable and machine-validatable.\n\n- One tag is a comma-separated path; the hierarchy carries meaning, so you can analyze at the leaf (Press) or at any ancestor (Move).\n\n- The sidecar pattern: events.tsv does not change, so existing analyses keep working; the semantics live in events.json.\n\n- Three stated objectives from the paper: build event context; machine-readable and human-understandable annotation for mega-analysis and ML; task transparency and reproducibility." + }, + { + "id": "recreate-the-stimulus", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## The Bar: Recreate the Stimulus", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/recreate-the-stimulus-figure.svg", + "alt": "HBN-EEG Figure 9, annotated. A banner notes the model was given the HED annotation only -- no image, no example. Two panels: left, the intended Surround Suppression stimulus (four horizontal-grating disks on a vertical-grating background with central fixation); right, Claude Sonnet 3.5's regeneration from the HED alone. A green callout under the left panel lists what was reproduced (four disks, gratings, background, fixation, contrast); an amber callout under the right panel notes the only miss: the disks' size and position, which were both left out because they are awkward to express in HED.", + "width": "82%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "HBN-EEG (Healthy Brain Network EEG), Figure 9: Claude redrew the Surround Suppression stimulus from the **HED annotation alone.** If a model can rebuild your stimulus from the annotation, it is complete; if it can't, you left something out. *(Shirazi et al., 2024)*", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- This is the slide trainees will quote. The bar made concrete.\n\n- In the HBN-EEG paper, the HED annotation of the Surround Suppression task was handed to a language model with no image, and it regenerated the visual stimulus as SVG, accurate down to foreground/background contrast and flicker; the only error was a circle rotation that was never in the annotation.\n\n- That is the test: annotation completeness = an LLM can reconstruct the stimulus. The same trick drew several of the paper's figures (a callback to last week's figures session).\n\n- Walk the annotations: everything structural was reproduced from the HED alone (the gratings, the vertical-grating background, central fixation, the contrast relationship, four disks present). The only miss is the disks' size and position: both are awkward to express in HED, so they were left out, and the model had no way to reproduce them. That miss is the proof the test is honest, and a real lesson: HED nails event semantics, but spatial geometry (size, position) is hard to encode.\n\n- Cite: Shirazi et al. (2024), bioRxiv 10.1101/2024.10.03.615261." + }, + { + "id": "hed-workflow-pain", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Why HED Workflows Stall", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/hed-workflow-pain.svg", + "alt": "A four-step pain pipeline: read the paper (hours per session), learn the schema (about 2000 tags, expert-only), write the sidecar (value levels and value slots, error-prone), validate and repeat (cryptic validator messages). A red bottom strip reads: HED adoption stays inside the labs that build HED.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "warning", + "content": "About 2000 tags, expert-only fluency, a validator with cryptic messages. **A workflow problem, not a willingness problem** -- and where HEDit comes in.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- Frame the gap honestly: HED is powerful, but writing it by hand is a wall for most labs.\n\n- Reading the paper, learning a 2000-tag schema, hand-writing value-level sidecars, looping on a terse validator. The labs that author the schema can do this fluently; everyone else stops.\n\n- This is the adoption bottleneck the next slide addresses." + }, + { + "id": "hedit", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## HEDit -- Describe in English, Get Validated HED", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/hedit-pipeline.svg", + "alt": "A three-stage pipeline: Parser (natural language to structured facts: action, body-part, direction, magnitude, unit), Tagger (retrieve HED nodes via RAG over the schema, compose the tag string), Validator (official HED validator: tag exists, units valid, value slot well-formed), with a red dashed feedback loop from Validator back to Tagger labelled re-tag with validator feedback. A bottom rail: the HED schema is the contract; no agent invents vocabulary.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "You write one rich prose description per event; HEDit returns BIDS-compliant HED with the **official validator in the loop.** Tuned for exactly the detail the recreate-the-stimulus bar demands -- garbage in, garbage out.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- HEDit turns the wall into a paragraph. You describe the event in plain English; it composes and validates the HED.\n\n- LangGraph multi-agent: Parser extracts structured facts, Tagger composes from the schema (RAG over node names and definitions), Validator runs the official HED validator and loops back on failure.\n\n- The schema is the contract: no agent invents vocabulary.\n\n- The link to the bar: HEDit is only as good as the description. The more completely you describe the stimulus, the closer the annotation gets to recreatable." + }, + { + "id": "plugin-map", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## The neuroinformatics Plugin -- 2 Skills + 1 Agent", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/neuro-plugin-map.svg", + "alt": "A plugin map: a central blue card /neuroinformatics:bids-conversion (guided BIDS conversion), a magenta bids-validator agent card with an arrow into the center (autonomous validate and fix, this week's mechanical defence), and a dimmed secondary card /neuroinformatics:experiment-design (PsychoPy + Lab Streaming Layer, data collection, not today's focus). A bottom strip: HED annotation lives inside the skills.", + "width": "88%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "note", + "content": "Today's focus: **`/neuroinformatics:bids-conversion`** and the **`bids-validator`** agent. The `experiment-design` skill (PsychoPy + LSL, the collection side) ships in the plugin too.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The plugin map. Two user-facing skills plus one autonomous agent.\n\n- bids-conversion (center): guided conversion to BIDS. bids-validator (agent): autonomous validation and fixes. experiment-design (dimmed): the data-collection side, PsychoPy and Lab Streaming Layer, in the README but not today.\n\n- HED annotation is woven into the skills rather than a separate command.\n\n- Next two slides: one per today's-focus skill/agent." + }, + { + "id": "bids-conversion", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## /neuroinformatics:bids-conversion", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/bids-conversion-flow.svg", + "alt": "A guided six-step conversion flow: 1 Inventory source data (formats, subjects, channels), 2 Scaffold (dataset_description.json, participants.tsv), 3 Convert files (BrainVision, EEGLAB .set, EDF, BDF), 4 JSON sidecars (SamplingFrequency, EEGReference), 5 TSV tables (channels, events, electrodes), 6 Validate (bids-validator). Bottom strip: modalities EEG, EMG, MEG, fMRI, behavioral.", + "width": "94%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "A guided six-step workflow that ends where the next act begins: **validation.** Handles EEG, EMG, MEG, fMRI, and behavioral data from the common source formats.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The conversion skill walks you from raw files to a valid BIDS dataset in six steps.\n\n- Inventory, scaffold the dataset (dataset_description.json, participants.tsv), convert the files, write JSON sidecars, write the TSV tables, then validate.\n\n- Source formats: BrainVision, EEGLAB .set, EDF, BDF; modalities EEG, EMG, MEG, fMRI, behavioral.\n\n- The last step is validation, which is the next slide's agent." + }, + { + "id": "bids-validator-agent", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## The bids-validator Agent -- the Mechanical Defence", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/bids-validator-agent.svg", + "alt": "Left: a magenta bids-validator agent card with a vertical loop, locate dataset, run BIDS validator, categorize errors warnings info, apply fixes with confirmation, re-validate, with a dashed loop-until-clean arrow. Right: a sample BIDS Validation Report with subjects and modalities, errors fixed (missing dataset_description.json, _eeg.json missing PowerLineFrequency added 60), remaining warnings, and Ready for submission YES in green. A note: fixes your data locally; nemar-cli validates again at the gate.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "Runs the validator, categorizes errors vs warnings, fixes with confirmation, re-validates, and reports readiness. This week's `cite-the-card`: **a deterministic gate that turns 'looks fine' into pass/fail.**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- Every week has a mechanical defence; this is Week 9's.\n\n- The agent locates the dataset, runs the BIDS validator, sorts findings into errors (must-fix), warnings (should-fix), and info, applies fixes with your confirmation, re-validates, and reports whether the dataset is ready for OpenNeuro/NEMAR.\n\n- Division of labour: this agent fixes your data locally; nemar-cli validates again at the upload gate. Two checks, by design.\n\n- Parallels cite-the-card (Week 5) and validate_fonts.py (Week 8): convert a vague worry into a precise, falsifiable result." + }, + { + "id": "openneuro", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Where BIDS Data Goes -- OpenNeuro", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/openneuro-flow.svg", + "alt": "A three-step flow: Validate (BIDS), Upload (browser or CLI), Public plus DOI. An amber honest-note card below reads: private upload exists but only via CLI or direct push, with no polished GUI, and the DOI record stays sparse with no ORCID author links and minimal metadata.", + "width": "88%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "note", + "content": "The de-facto open BIDS archive: validated on ingest, public, and DOI-minted. **Honest caveat:** private upload exists, but only via CLI / direct push (no polished GUI), and the DOI record stays sparse -- no ORCID author links.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- OpenNeuro is the default home for shared neuro data and a genuinely great resource.\n\n- Validate, upload (browser or CLI), and the dataset is public with a DOI.\n\n- Two honest caveats that set up NEMAR: private upload is possible but only via CLI / direct push (no polished GUI for it), and the DOI record does not link authors to their ORCID iDs yet (we prove this with live data in a few slides).\n\n- Not a knock on OpenNeuro; a gap that NEMAR fills." + }, + { + "id": "nemar", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## NEMAR -- EEG/MEG Focus, Compute Adjacency", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/openneuro-nemar.svg", + "alt": "Two cards sharing a central BIDS chip. Left: OpenNeuro, a broad open neuro archive, all modalities, public, DataCite DOI. Right: NEMAR, EEG/MEG focus, BIDS, next to San Diego Supercomputer Center compute so you can analyze without downloading, and GitHub-backed via nemar-cli so you are admin of your dataset and can invite collaborators. A note: HBN-EEG lives on both.", + "width": "88%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "The Neuroelectromagnetic Data Archive and Tools Resource: BIDS datasets specialized for EEG/MEG, **sitting next to SDSC compute** so you can analyze without downloading. HBN-EEG lives on both.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- NEMAR = Neuroelectromagnetic Data Archive and Tools Resource. Both archives speak BIDS; the difference is focus and features.\n\n- NEMAR specializes in EEG/MEG and sits next to San Diego Supercomputer Center compute, so analysis can run where the data is.\n\n- HBN-EEG is on both OpenNeuro and NEMAR, which is exactly what lets us compare their DOI records head-to-head shortly.\n\n- Next: why uploading to NEMAR via nemar-cli is now easy." + }, + { + "id": "nemar-validate", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## nemar-cli -- Validation Is Now Trivial", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/nemar-validate.svg", + "alt": "A dark terminal card running `nemar dataset validate ./hbn-eeg` with output ending in a green line, BIDS valid -- 0 errors, 2 warnings, and a green TRIVIAL badge. Three side notes: wraps the official BIDS validator (Deno); also runs automatically on upload and on every update PR; no separate toolchain to install.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "`nemar dataset validate ./ds` wraps the official BIDS validator, and it **also runs automatically on upload and on every update PR.** No separate toolchain to install or configure.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The first nemar-cli headline: validation is one command.\n\n- nemar dataset validate ./ds wraps the official Deno-based BIDS validator. Nothing else to install.\n\n- It is not a one-off: validation runs automatically when you upload and on every update pull request, so a dataset cannot drift out of spec.\n\n- This is the gate that complements the plugin's bids-validator agent: fix locally, then the platform checks again." + }, + { + "id": "nemar-upload-publish", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## nemar-cli -- Upload to Publish", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/nemar-upload-publish.svg", + "alt": "A four-step command pipeline with a private-to-public toggle. Step 1 nemar auth login (one-time, token cached). Step 2 nemar dataset validate ./ds (BIDS check, must pass). Step 3 nemar dataset upload ./ds (private by default, lock icon). Step 4 nemar dataset publish request id (admin approves to public plus DOI, globe icon). A red bottom strip: nemar-cli creates a private GitHub repo where you are admin -- invite collaborators and push directly until publish; after publishing, changes go through pull requests and version tags.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "`auth login` → `validate` → `upload` → `publish request`. Upload creates a **private GitHub repo where you're admin** -- invite collaborators and push directly until publish; after that, changes go via **PR + version tags**.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The full path in four commands: log in once, validate, upload, request publication.\n\n- Upload creates a private GitHub repo where you're the admin; you can invite collaborators and push directly until you publish. On publish request, an admin approves and the dataset goes public with a DOI; after that, changes go through pull requests and version tags.\n\n- OpenNeuro also supports private upload, just CLI-only -- so the real NEMAR advantages are this collaboration model plus the rich DOI metadata (next two slides), not 'private vs public'.\n\n- The DOI minted at publish is where the next two slides go." + }, + { + "id": "doi-orcid", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## DOI Minting + ORCID Auto-Link", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/doi-orcid.svg", + "alt": "A left-to-right flow: dataset authors each linked to an ORCID iD chip, into a DataCite DOI (via EZID) box noting a concept DOI plus per-version DOIs, then onto each author's ORCID record automatically (an ORCID profile card listing the dataset). An amber callout: OpenNeuro does not link authors to ORCID on the DOI yet.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "important", + "content": "On publish, nemar-cli mints a **concept DOI + per-version DOIs (DataCite via EZID)** and **auto-links every author's ORCID iD** in the metadata. Your dataset lands on your ORCID record automatically. OpenNeuro does not do this yet.", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The differentiator, as a mechanism.\n\n- At publish, nemar-cli mints a concept DOI (one stable citation across all versions) plus per-version DOIs, via EZID writing DataCite kernel-4 metadata.\n\n- Crucially, it auto-collects and embeds every author's ORCID iD, so the dataset appears on each author's ORCID record with no manual 'add work'.\n\n- OpenNeuro does not link authors to ORCID on the DOI yet. The next slide proves this on a real dataset." + }, + { + "id": "metadata-gap", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## The Metadata Gap -- Same Dataset, Two Homes", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/doi-metadata-gap.svg", + "alt": "A side-by-side DataCite comparison of the same dataset on two homes: NEMAR nm000103 vs OpenNeuro ds005505, HBN-EEG Release 1, the same eight authors. Findability rows: license (CC-BY-NC-SA-4.0 vs none), keywords (8 vs 0), description (yes vs none), links to papers and related datasets (5 vs 0). Credit rows: authors linked to ORCID (8 of 8 vs 0 of 8, highlighted), stable concept DOI and funding (yes and 2 vs version-only and 0). A strip: OpenNeuro's DOI record carries only a title and author names; everything else is blank. Source api.datacite.org.", + "width": "92%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "warning", + "content": "Live DataCite records, same data, same 8 authors. OpenNeuro's DOI carries **only a title and author names**; NEMAR fills license, keywords, description, related links, **8/8 ORCID**, and funding. **Findable and citable is metadata, not luck.**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- The proof, on the audience's own dataset. This is the strongest slide in the deck because it is real, current data.\n\n- Both DOIs describe HBN-EEG Release 1 with the same eight authors. Pulled live from api.datacite.org.\n\n- OpenNeuro's record: title and author names, nothing else. No license, no keywords, no description, no related links, 0/8 ORCID, no funding.\n\n- NEMAR's record: CC-BY-NC-SA-4.0, 8 keywords, abstract + citation, 5 related links (papers, GitHub, Zenodo archive), 8/8 ORCID, 2 funders, and a stable concept DOI.\n\n- Findability (the F in FAIR) and credit are fields the platform writes. nemar-cli writes them; on OpenNeuro the same dataset stays bare." + }, + { + "id": "demo-roadmap", + "layout": "single-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## Live Demo -- Small and Honest", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "image", + "src": "../../assets/icons/demo-roadmap-neuro.svg", + "alt": "Two demo steps with timing badges. Step 1 (about 2 minutes), HEDit: one rich prose description of an HBN event becomes a validated HED string, captioned the recreate-the-stimulus bar in miniature. Step 2 (about 2 minutes), a dark terminal running nemar dataset validate on the HBN practicum dataset producing a clean BIDS report. A tail note: we do not manufacture a pass.", + "width": "90%", + "position": { "area": "content" } + }, + { + "type": "callout", + "calloutType": "note", + "content": "Two actions: (1) **HEDit** turns one prose description into validated HED; (2) **`nemar dataset validate`** on the HBN dataset returns a clean BIDS report. **We do not manufacture a pass.**", + "animation": { "fragment": true, "type": "slide-up", "index": 0 }, + "position": { "area": "footer" } + } + ], + "speakerNotes": "- Two small, honest live actions, ~4 minutes total.\n\n- Step 1: HEDit. Write one rich prose description of an HBN event and watch it become a validated HED string. The recreate-the-stimulus bar in miniature: the richer the description, the better the tag.\n\n- Step 2: nemar dataset validate on the HBN practicum dataset, returning a clean BIDS report.\n\n- If validation surfaces something, we walk it; we do not manufacture a pass." + }, + { + "id": "what-next", + "layout": "two-column", + "transition": "slide", + "elements": [ + { + "type": "text", + "content": "## What Today Gives You / What's Next", + "style": { "fontSize": "xl" }, + "position": { "area": "header" } + }, + { + "type": "bullets", + "items": [ + { "text": "**BIDS for structure, HED for semantics**, and a concrete bar: an LLM can recreate the stimulus from the annotation alone.", "animation": { "fragment": true, "type": "fade", "index": 0 } }, + { "text": "**The neuroinformatics plugin**: `bids-conversion` + the `bids-validator` agent (the mechanical defence).", "animation": { "fragment": true, "type": "fade", "index": 1 } }, + { "text": "**Sharing with credit**: nemar-cli's trivial validation, private-until-ready upload, and DOIs with ORCID auto-link.", "animation": { "fragment": true, "type": "fade", "index": 2 } }, + { "text": "**The loop closes:** the HBN data you've analyzed since Week 3 is itself a BIDS + HED dataset, shared this exact way.", "animation": { "fragment": true, "type": "slide-up", "index": 3 } } + ], + "bulletStyle": "disc", + "style": { "fontSize": "large" }, + "position": { "area": "left" } + }, + { + "type": "callout", + "calloutType": "tip", + "content": "**Next week -- Week 10: build your own plugin.** The last research-workflow plugin of the course; next you make one.\n\n**Questions? Ask while the demo runs.**", + "animation": { "fragment": true, "type": "slide-up", "index": 4 }, + "position": { "area": "right" } + } + ], + "speakerNotes": "[Press right 5x to reveal fragments]\n\n- Land the takeaways one at a time: two standards + the bar; the plugin; sharing with credit.\n\n- Close the loop: the practicum dataset is itself a BIDS + HED dataset on OpenNeuro and NEMAR, shared with these exact tools.\n\n- Next week is Week 10: building your own plugin, the capstone.\n\n- Open the floor: questions while the demo runs." + } + ] + } +} diff --git a/presentations/week-09/slide-plan.md b/presentations/week-09/slide-plan.md new file mode 100644 index 0000000..e896b32 --- /dev/null +++ b/presentations/week-09/slide-plan.md @@ -0,0 +1,129 @@ +# Week 9 Slide Plan -- Neuroinformatics: Standards, Sharing, and Credit + +## Target: 23 slides, ~30 min presentation, then ~5 min live demo + ~15 min Q&A + +**Core message.** A finished analysis is not a finished contribution. The data behind it has to be reproducible, shareable, and citable, or the work dies with the paper. Two standards carry the weight: **Brain Imaging Data Structure (BIDS)** answers *where everything lives* (structure), and **Hierarchical Event Descriptors (HED)** answers *what every event meant* (semantics). The single most useful idea this session: the bar for a complete annotation is concrete and falsifiable: **a language model should be able to reconstruct the stimulus, or the experiment, from the annotation alone.** This is not a metaphor; it is exactly the test demonstrated in the HBN-EEG paper (Shirazi et al., 2024, Figure 9), where Claude Sonnet 3.5 regenerated the Surround Suppression stimulus as SVG using only its HED description. The `neuroinformatics` plugin gets data to that bar; `HEDit` automates the hardest leg (natural language to validated HED); `nemar-cli` ships it with automatic ORCID author linking on the DOI, the one feature OpenNeuro still lacks. + +The arc mirrors prior weeks: a clear failure mode up front, a standard that fixes it, a mechanical defence per stage (the `bids-validator` agent; the HED validator in HEDit's loop; validation-on-upload in nemar-cli), and every theoretical point landing on the HBN practicum dataset, which is itself a BIDS + HED dataset published on OpenNeuro and NEMAR with these exact tools. + +## Definitions on first use +- **BIDS** -- Brain Imaging Data Structure. +- **HED** -- Hierarchical Event Descriptors. +- **DOI** -- Digital Object Identifier. +- **ORCID** -- Open Researcher and Contributor ID. +- **HBN** -- Healthy Brain Network (the practicum dataset). + +## Slide Inventory + +### Opening (2 slides, ~2 min) + +1. **Title** -- "Week 9: Neuroinformatics -- Standards, Sharing, and Credit." Author block, course / Discord / recording links. Same title-slide pattern as Weeks 5-8. +2. **Where we are** -- One bullet per Weeks 1-8, paired by theme, landing on: you have an analysis (and figures, from last week); today is about the *data* underneath it: reproducible, shareable, citable. Fragment-animated bullets, final bullet `slide-up`. + +### Act 1 -- The gap (2 slides, ~3 min) + +3. **The forensic-search problem** -- A great analysis nobody can rerun, an `events.tsv` that is one cryptic numeric column, and reuse that traces credit to no one. Three locks: structure, semantics, credit. Asset: `reuse-credit-gap.svg` (a rich raw recording collapsing into a thin shared artifact, with three padlocks labelled Structure / Semantics / Credit). Callout (paper phrase): *"analysis-ready means no forensic search for unreported details."* +4. **Two standards and a bar for clarity** -- BIDS answers *where* (structure); HED answers *what* (semantics). The bar that judges both: someone, or a language model, can reconstruct your experiment without emailing you. Asset: `two-standards-bar.svg` (left: BIDS = structure/filesystem; right: HED = semantics/meaning; a banner across the top: "the bar -- an LLM can recreate the stimulus from the annotation alone"). This plants the bar early; it pays off on slide 10. + +### Act 2 -- BIDS, the structure standard (3 slides, ~5 min) + +5. **What is BIDS** -- A filesystem convention plus metadata: `sub-