Skip to content

AustralianCancerDataNetwork/StudyAgent

 
 

Repository files navigation

OHDSI Study Design Assistant (in development)

Overview

The goal OHDSI Study Design Assistant (SDA) is to provide an experience similar to working with a coding agent but for designing and executing observational retrospective studies using OHDSI tools. SDA is designed to organize and enable users to interact with a wide variety of agentic tools to suppor their study work. It does so by providing a clean separation between the agentic user experience and the generative AI tools. Check out the tag first_agent_and_strategus for the first version to assist with Strategus (not validated) as shown in the more recent video for the second version (no sound). This demonstrates a possible way for the agent to help the user design, run, and interpret the results of an OHDSI incidence rate analysis using the CohortIncidenceModule of OHDSI Strategus. This older video shows an prior test of this concept.

Want to contribute?

Here are some ways:

  • Create a fork of the project, branch the new project's main branch, edit the README.md and do a pull request back this main branch. Your changes could be integrated very quickly that way!
  • Join the discussion on the OHDSI Forums
  • Attend the Generative AI WG monthly calls (currently 2nd Tuesdays of the month at 12 Eastern) or reach out directly to Rich Boyce on the OHDSI Teams or the OHDSI forums.
  • You may also post "question" issues on this repo.

Roadmap

Near term

  • data_quality_interpretation : study agent provides interpretation from Data Quality Dashboard, Achilles Heel data quality checks, and Achilles data source characterizations over one or more sources that a user intends to use within a study. In this mode, the study agent derive insights from those sources based on the user's study intent. This is important because it will make the information in the characterizations and QC reports more relevant and actionable to users than static and broad-scope reports (current state). Users will use this tool from R initially.

  • create_new_phenotype_definition : Study agent will guide the user through the creation of a definition for an EHR phenotype for the target or outcome cohort relevant to their study intent. This workflow involves selection of concepts, organization of concepts into concept sets, and assembly into cohort definition logic. In addition to concept retrieval, the agent will support reasoning over the semantic relationships encoded in the OMOP vocabulary system (via identity, hierarchical, compositional, associative and attribute links) to help users identify appropriate inclusions, exclusions, and boundary conditions. This enables deterministic validation of constructed concept sets, supports principled disambiguation of similar concepts during grounding, and provides traceable justification for why specific concepts or groups are included in a phenotype definition. Users will use this tool from R or Atlas initially.

  • keeper_design_sample : Study agent helps the user to create the createKeeper function to pull cases matching a clinical definition. This will guide the user through building the set of symptoms, related differential diagnoses (those that need to be ruled out), diagnostic procedures, complications, exposures, and measurements for the clinical definition.

Long term

Build out the entire set of planned services, each one evaluated and user-tested.

Design

  • An Agent Client Protocol (ACP) server that owns interaction policy: confirmations, safe summaries, and tool invocation routing.

    • acp_agent/: interaction policy + routing; calls MCP tools or falls back to core.
  • Multiple MCP servers that own tool contracts: JSON schemas + deterministic tool outputs.

    • mcp_server/: exposes tool APIs (core tools plus phenotype retrieval and prompt bundles).
  • Core logic stays pure and reusable across both ACP and MCP layers.

    • core/: pure, deterministic business logic (no IO, no network).

Why this architecture matters

ACP provides consistent UX and control across environments (R, Atlas/WebAPI, notebooks), while MCP provides a shared tool bus that can be reused across agents and institutions. ACP orchestrates tool calls and LLM calls; MCP owns retrieval, prompt assets, and deterministic tool outputs. This enables the same core tools can be accessed via MCP or directly by ACP without coupling to datasets or local files.

NOTE: at no time for any of the services should an LLM see row-level data (this can be accomplished through the careful use of protocols (MCP for tooling, Agent Client Protocol for OHDSI tool <-> LLM communication) and a security layer).

What is implemented so far?

Current unit tests

See docs/TESTING.md for install and CLI smoke tests.

phenotype_recommendation flow (ACP + MCP + LLM)

  1. ACP calls MCP phenotype_search to retrieve candidates.
  2. ACP calls MCP phenotype_prompt_bundle to fetch prompt assets and output schema.
  3. ACP calls an OpenAI-compatible LLM API to rank candidates.
  4. Core validates and filters LLM output.

For details on the design, see docs/PHENOTYPE_RECOMMENDATION_DESIGN.md.

phenotype_improvements flow (ACP + MCP + LLM)

  1. ACP calls MCP phenotype_prompt_bundle for improvement prompts.
  2. ACP calls an OpenAI-compatible LLM API for improvement suggestions.
  3. ACP calls MCP phenotype_improvements with LLM output for validation.

This flow reviews one phenotype definition at a time. If multiple cohorts are provided, ACP uses the first.

concept-sets-review flow (ACP + MCP + LLM)

  1. ACP calls MCP lint_prompt_bundle for lint prompts.
  2. ACP calls an OpenAI-compatible LLM API for findings/patches/actions.
  3. ACP calls MCP propose_concept_set_diff with LLM output for validation.

cohort-critique-general-design flow (ACP + MCP + LLM)

  1. ACP calls MCP phenotype_prompt_bundle for cohort critique prompts.
  2. ACP calls an OpenAI-compatible LLM API for findings/patches.
  3. ACP calls MCP cohort_lint with LLM output for validation.

phenotype_validation_review flow (ACP + MCP + LLM)

  1. ACP calls MCP keeper_sanitize_row to remove PHI/PII (fail-closed).
  2. ACP calls MCP keeper_prompt_bundle and keeper_build_prompt for a sanitized patient prompt.
  3. ACP calls an OpenAI-compatible LLM API to review the patient summary.
  4. ACP calls MCP keeper_parse_response to normalize the label.

LLM requests never include row-level PHI/PII; only sanitized summaries are sent.

For details on PHI/PII handling, see docs/PHENOTYPE_VALIDATION_REVIEW.md.

phenotype_recommendation_advice flow (ACP + MCP + LLM)

  1. ACP calls MCP phenotype_recommendation_advice for advisory prompt assets and schema.
  2. ACP calls an OpenAI-compatible LLM API to return actionable guidance.
  3. Core validates the advisory output.

This flow is used as a fallback when users do not accept initial recommendations.

Strategus incidence shell (R)

The interactive Strategus shell orchestrates phenotype selection, improvements, and script generation for a CohortIncidence study. See docs/STRATEGUS_SHELL.md.

Service Registry

Service definitions live in docs/SERVICE_REGISTRY.yaml. ACP exposes a /services endpoint that reports registry entries plus any additional ACP-implemented services. You can list services quickly with doit list_services.

Example run for phenotype_recommendation

Prerequisite: you have embedded phenotype definitions - see ./docs/PHENOTYPE_INDEXING.md

  1. Start the ACP server (runs on http://127.0.0.1:8765/ by default):
export LLM_API_KEY=<YOUR KEY>
export LLM_API_URL="<URL BASE>/api/chat/completions"
export LLM_LOG=1
export LLM_MODEL=<a model that supports completions> 
export EMBED_API_KEY=<YOUR KEY>
export EMBED_MODEL=<a text embedding model>
export EMBED_URL="<URL BASE>/v1/embeddings"
export PHENOTYPE_INDEX_DIR="<ABSOLUTE PATH TO phenotype_index>"
export STUDY_AGENT_MCP_CWD="<REPO ROOT (optional, for stable relative paths)>"
export STUDY_AGENT_HOST=127.0.0.1
export STUDY_AGENT_PORT=8765
export STUDY_AGENT_MCP_COMMAND=study-agent-mcp
export STUDY_AGENT_MCP_ARGS=""
study-agent-acp

Note: This starts MCP via stdio. If you use MCP over HTTP, do not set STUDY_AGENT_MCP_COMMAND. Note: Prefer stopping the ACP process (SIGINT/SIGTERM) so the MCP subprocess is closed cleanly. Killing the MCP directly can leave defunct processes. Note: ACP uses a threaded HTTP server by default. Set STUDY_AGENT_THREADING=0 to disable threading. Note: /health includes MCP preflight details under mcp_index when MCP is configured. Troubleshooting: run python mcp_server/scripts/mcp_probe.py to verify index paths and search without ACP.

MCP over HTTP (recommended for cross-platform stability)

Start MCP as a separate HTTP service:

export MCP_TRANSPORT=http
export MCP_HOST=127.0.0.1
export MCP_PORT=8790
export MCP_PATH=/mcp
study-agent-mcp

Then point ACP at it:

export STUDY_AGENT_MCP_URL="http://127.0.0.1:8790/mcp"
study-agent-acp

Note: STUDY_AGENT_MCP_URL must include the port (e.g. :8790). When set, ACP uses HTTP and ignores STUDY_AGENT_MCP_COMMAND.

PowerShell (Windows) quickstart:

$env:MCP_TRANSPORT = "http"
$env:MCP_HOST = "127.0.0.1"
$env:MCP_PORT = "8790"
$env:MCP_PATH = "/mcp"
study-agent-mcp
$env:STUDY_AGENT_MCP_URL = "http://127.0.0.1:8790/mcp"
study-agent-acp
  1. Run phenotype_recommendation
curl -s -X POST http://127.0.0.1:8765/flows/phenotype_recommendation \
  -H 'Content-Type: application/json' \
  -d '{"study_intent":"Identify clinical risk factors for older adult patients who experience an adverse event of acute gastro-intenstinal (GI) bleeding", "top_k":20, "max_results":10,"candidate_limit":10}'

Planned Services

Below is a set of planned study agent services, organized by category. For each service, document the input, output, and validation approach.

High Level Conceptual

protocol_generator

Input: PICO/TAR for a study intent.
Output: Templated protocol.
Validation: Protocol completeness and consistency review.

background_writer

Input: PICO/TAR and hypothesis.
Output: Background document justifying the study (systematic research summary).
Validation: Source coverage and alignment with hypothesis.

protocol_critique

Input: Protocol.
Output: Critique reviewing required components and consistency.
Validation: Checklist of required components; coherence checks.

dag_create

Input: Protocol or study intent statement.
Output: Directed acyclic graph of known causal/associative relations (LLM + literature discovery).
Validation: Consistency with cited relations and domain plausibility.

explain_cohort_diagnostics

Input: The user's study intent statement and cohort diagnostics output including code to run and the results files
Output: narrative summary / report of the analysis.
Validation: Correctly reported summary of the methods and results.

explain_incidence/estimation/characterization_results

Input: The user's study intent statement and cohort diagnostics and a completed analysis with strategus output folders with code to run and the results files (incidence/estimation/characterization).
Output: narrative summary / report of the analysis.
Validation: Correctly reported summary of the methods and results.

High Level Operational

strategus_*

Input: Study specification intent or existing Strategus JSON.
Output: Composed/compared/edited/criticized/debugged Strategus JSON.
Validation: Schema validation and diff review.

Search and Suggest

phenotype_recommendations

Input: Study intent.
Output: Suggested phenotypes with cohort definition artifacts for user-accepted selections.
Validation: Allowed-id filtering; user confirmation before writes.

phenotype_improvements (or phenotype fit)

Input: Selected phenotypes + study intent.
Output: Improved cohort definitions or Atlas records for accepted changes.
Validation: Target cohort ID validation; user confirmation before writes.

concept_set_recommendations

Input: Phenotype/covariate intent lacking a cohort definition.
Output: Suggested concept sets and created concept set artifacts if accepted.
Validation: Concept set schema validation; user confirmation before writes.

propose_negative_control_outcomes

Input: Target (optionally comparator).
Output: Recommended negative control outcomes with cohort definitions if accepted.
Validation: Clinical plausibility check; user confirmation before writes.

propose_comparator

Input: Target.
Output: Proposed comparator cohort definition if accepted (optionally using OHDSI Comparator Selector).
Validation: Comparator appropriateness review; user confirmation before writes.

propose_adjustment_set

Input: Study intent + DAG.
Output: Adjustment set from OHDSI features plus suggested FeatureExtraction features.
Validation: Confounder/collider/mediator checks against DAG. E.g., showing the user if any known and biased collider that someone in another paper published might accidentally be including in their study design. See this JAMA article for more about colliders. Also, potentially using a knowledge graph of causal findings from the entire literature to informat the user of the same.

Study Component Testing, Improvement, and Linting

propose_concept_set_diff

Input: Concept set + study intent.
Output: Proposed patches to concept set artifacts if accepted.
Validation: Deterministic diff rules; user confirmation before writes.

phenotype_characterize

Input: Selected phenotype(s).
Output: R code (or Atlas services) to characterize populations.
Validation: Execution preview; user confirmation before running.

phenotype_data_quality_review

Input: Phenotype definitions + data quality sources (DQD, Achilles Heel, characterization).
Output: Mitigations and patches for accepted issues.
Validation: Issue traceability to data quality sources; user confirmation before writes.

phenotype_dataset_profiler

Input: Phenotype definition(s) + datasets.
Output: R code to run (e.g., Cohort Diagnostics) and a brief summary of drivers of cohort size variation.
Validation: Reproducible execution outputs; summary tied to diagnostics.

phenotype_validation_review

Input: Selected phenotype definition (usually for an outcome cohort) and a narrative clinical description with differential diagnoses and known associated factors for validation and to compare to known phenotype performance.
Output: code to extract sample cases based on the clinical description and LLM-assessment of a sample (user-specified or random) of cohort records stripped of PHI. Validation: Sampling logic review; user confirmation.

cohort_definition_build

Input: Phenotype/covariate intent without a cohort definition.
Output: Capr code for cohort definition.
Validation: Schema validation; user confirmation before writes.

cohort_definition_lint

Input: Cohort JSON.
Output: Proposed patches for design issues and execution efficiency.
Validation: Deterministic lint rules; user confirmation before writes.

review_negative_control

Input: Target + outcome.
Output: Judgement on causal implausibility with explanation and citations.
Validation: Citation review and domain plausibility.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 64.0%
  • R 36.0%