Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@

## probably a mini oracle. definitely a mini chatbot

> **Current refactor:** Probaboracle is in a clean-baseline refactor. The
> Beta `6.0` pulse is preserved as a diagnostic snapshot, not active baseline
> proof. The next comparable evidence gate is a new fixed-prompt `eval-pulse`
> from the proper-config candidate.

Probaboracle is a small, local, agent-backed CLI mini chatbot using the **[Polinko research model](https://github.com/tryskian/polinko)**.

It only accepts four question types:
Expand Down Expand Up @@ -63,6 +68,24 @@ healthy.
- the clean-baseline candidate keeps the same pulse method before the next
beta-boundary decision

## Data Viz Direction

Probsie charts follow the eval shape first. Shared chart families are useful
only when the data shape naturally matches.

The initial visualisation set is:

- pulse charts:
- stacked horizontal bars for `anchor`, `counted_seam`, and `excluded_noise`
- grouped or faceted pulse comparison for snapshot versus clean baseline
- detail table:
- row id, prompt, output, pulse label, reason, and seam note below the chart
- row and lens charts:
- row-level `pass / fail / pending` stack by prompt type
- prompt-by-lens table heatmap
- fail-family horizontal bars
- correction slope only when true before/after pairs exist

## Run It

```sh
Expand Down Expand Up @@ -91,6 +114,10 @@ make check
- beta map and research reading path
- [docs/research/070_CB-CLEAN_BASELINE_RESET.md](./docs/research/070_CB-CLEAN_BASELINE_RESET.md)
- current reset boundary, docs cleanup, and first local pulse plan
- [docs/diagrams/EVAL_CHART.md](./docs/diagrams/EVAL_CHART.md)
- current static eval chart contract
- [docs/diagrams/PIPELINE.md](./docs/diagrams/PIPELINE.md)
- public generation and eval-shape diagrams
- [docs/runtime/templates/README.md](./docs/runtime/templates/README.md)
- public templates for future research docs and pulse reports
- [docs/governance/DECISIONS.md](./docs/governance/DECISIONS.md)
Expand Down
32 changes: 32 additions & 0 deletions docs/governance/DECISIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1181,3 +1181,35 @@ If a decision crosses layers, say so plainly instead of flattening the method in
scratch material or older row-level wording. Putting the templates under
runtime docs keeps the process discoverable while preserving the public
research lane for actual findings.

## D-062: Probsie chart types follow eval shape first

- Date: `2026-05-25`
- Category: `eval_quality`, `workflow_environment`
- Tags: `data_viz`, `observable_plot`, `pulse_method`, `chart_types`
- Provenance: `human-led visualisation planning with Codex chart-type review`
- Decision:
- choose chart types from Probsie's eval data shape before aligning with
neighbouring toy repos
- use shared chart families only when the data shape naturally matches:
- bars for counts
- stacked bars for part-to-whole eval labels
- table heatmaps for matrix-style lens/status reads
- slope charts only for real before/after correction pairs
- keep the initial Probsie chart set focused on:
- `eval-pulse-stack`: stacked horizontal bar chart
- `pulse-comparison`: grouped or faceted stacked horizontal bars
- `eval-detail-table`: detail table below the chart
- `row-verdict-stack`: stacked bar chart by prompt type
- `lens-table-heatmap`: table heatmap
- `fail-family-bars`: horizontal bar chart
- `correction-slope`: optional slope chart when correction pairs exist
- treat small multiples as a layout choice inside comparison views, not as a
standalone chart family
- treat lollipop styling as an option inside `fail-family-bars`, not as a
separate chart family
- keep Sankey-style method storytelling out of the initial eval-data set
- Why: Probsie's data is primarily an eval-method surface: row-level verdicts,
sidecar lenses, fixed-prompt pulse labels, and pulse-level verdicts. The
chart set should make that method legible without forcing another toy's
shape onto Probsie.