OHDSI Study Design Assistant (in development)

Overview

The goal OHDSI Study Design Assistant (SDA) is to provide an experience similar to working with a coding agent but for designing and executing observational retrospective studies using OHDSI tools. SDA is designed to organize and enable users to interact with a wide variety of agentic tools to suppor their study work. It does so by providing a clean separation between the agentic user experience and the generative AI tools. Check out the tag first_agent_and_strategus for the first version to assist with Strategus (not validated) as shown in the more recent video for the second version (no sound). This demonstrates a possible way for the agent to help the user design, run, and interpret the results of an OHDSI incidence rate analysis using the CohortIncidenceModule of OHDSI Strategus. This older video shows an prior test of this concept.

Want to contribute?

Here are some ways:

Create a fork of the project, branch the new project's main branch, edit the README.md and do a pull request back this main branch. Your changes could be integrated very quickly that way!
Join the discussion on the OHDSI Forums
Attend the Generative AI WG monthly calls (currently 2nd Tuesdays of the month at 12 Eastern) or reach out directly to Rich Boyce on the OHDSI Teams or the OHDSI forums.
You may also post "question" issues on this repo.

Roadmap

Near term

data_quality_interpretation : study agent provides interpretation from Data Quality Dashboard, Achilles Heel data quality checks, and Achilles data source characterizations over one or more sources that a user intends to use within a study. In this mode, the study agent derive insights from those sources based on the user's study intent. This is important because it will make the information in the characterizations and QC reports more relevant and actionable to users than static and broad-scope reports (current state). Users will use this tool from R initially.
create_new_phenotype_definition : Study agent will guide the user through the creation of a definition for an EHR phenotype for the target or outcome cohort relevant to their study intent. This workflow involves selection of concepts, organization of concepts into concept sets, and assembly into cohort definition logic. In addition to concept retrieval, the agent will support reasoning over the semantic relationships encoded in the OMOP vocabulary system (via identity, hierarchical, compositional, associative and attribute links) to help users identify appropriate inclusions, exclusions, and boundary conditions. This enables deterministic validation of constructed concept sets, supports principled disambiguation of similar concepts during grounding, and provides traceable justification for why specific concepts or groups are included in a phenotype definition. Users will use this tool from R or Atlas initially.
keeper_design_sample : Study agent helps the user to create the createKeeper function to pull cases matching a clinical definition. This will guide the user through building the set of symptoms, related differential diagnoses (those that need to be ruled out), diagnostic procedures, complications, exposures, and measurements for the clinical definition.

Long term

Build out the entire set of planned services, each one evaluated and user-tested.

Design

An Agent Client Protocol (ACP) server that owns interaction policy: confirmations, safe summaries, and tool invocation routing.
- acp_agent/: interaction policy + routing; calls MCP tools or falls back to core.
Multiple MCP servers that own tool contracts: JSON schemas + deterministic tool outputs.
- mcp_server/: exposes tool APIs (core tools plus phenotype retrieval and prompt bundles).
Core logic stays pure and reusable across both ACP and MCP layers.
- core/: pure, deterministic business logic (no IO, no network).

Why this architecture matters

ACP provides consistent UX and control across environments (R, Atlas/WebAPI, notebooks), while MCP provides a shared tool bus that can be reused across agents and institutions. ACP orchestrates tool calls and LLM calls; MCP owns retrieval, prompt assets, and deterministic tool outputs. This enables the same core tools can be accessed via MCP or directly by ACP without coupling to datasets or local files.

NOTE: at no time for any of the services should an LLM see row-level data (this can be accomplished through the careful use of protocols (MCP for tooling, Agent Client Protocol for OHDSI tool <-> LLM communication) and a security layer).

What is implemented so far?

Current unit tests

See docs/TESTING.md for install and CLI smoke tests.

`phenotype_recommendation` flow (ACP + MCP + LLM)

ACP calls MCP phenotype_search to retrieve candidates.
ACP calls MCP phenotype_prompt_bundle to fetch prompt assets and output schema.
ACP calls an OpenAI-compatible LLM API to rank candidates.
Core validates and filters LLM output.

For details on the design, see docs/PHENOTYPE_RECOMMENDATION_DESIGN.md.

`phenotype_improvements` flow (ACP + MCP + LLM)

ACP calls MCP phenotype_prompt_bundle for improvement prompts.
ACP calls an OpenAI-compatible LLM API for improvement suggestions.
ACP calls MCP phenotype_improvements with LLM output for validation.

This flow reviews one phenotype definition at a time. If multiple cohorts are provided, ACP uses the first.

`concept-sets-review` flow (ACP + MCP + LLM)

ACP calls MCP lint_prompt_bundle for lint prompts.
ACP calls an OpenAI-compatible LLM API for findings/patches/actions.
ACP calls MCP propose_concept_set_diff with LLM output for validation.

`cohort-critique-general-design` flow (ACP + MCP + LLM)

ACP calls MCP phenotype_prompt_bundle for cohort critique prompts.
ACP calls an OpenAI-compatible LLM API for findings/patches.
ACP calls MCP cohort_lint with LLM output for validation.

`phenotype_validation_review` flow (ACP + MCP + LLM)

ACP calls MCP keeper_sanitize_row to remove PHI/PII (fail-closed).
ACP calls MCP keeper_prompt_bundle and keeper_build_prompt for a sanitized patient prompt.
ACP calls an OpenAI-compatible LLM API to review the patient summary.
ACP calls MCP keeper_parse_response to normalize the label.

LLM requests never include row-level PHI/PII; only sanitized summaries are sent.

For details on PHI/PII handling, see docs/PHENOTYPE_VALIDATION_REVIEW.md.

`phenotype_recommendation_advice` flow (ACP + MCP + LLM)

ACP calls MCP phenotype_recommendation_advice for advisory prompt assets and schema.
ACP calls an OpenAI-compatible LLM API to return actionable guidance.
Core validates the advisory output.

This flow is used as a fallback when users do not accept initial recommendations.

Strategus incidence shell (R)

The interactive Strategus shell orchestrates phenotype selection, improvements, and script generation for a CohortIncidence study. See docs/STRATEGUS_SHELL.md.

Service Registry

Service definitions live in docs/SERVICE_REGISTRY.yaml. ACP exposes a /services endpoint that reports registry entries plus any additional ACP-implemented services. You can list services quickly with doit list_services.

Example run for `phenotype_recommendation`

Prerequisite: you have embedded phenotype definitions - see ./docs/PHENOTYPE_INDEXING.md

Start the ACP server (runs on http://127.0.0.1:8765/ by default):

export LLM_API_KEY=<YOUR KEY>
export LLM_API_URL="<URL BASE>/api/chat/completions"
export LLM_LOG=1
export LLM_MODEL=<a model that supports completions> 
export EMBED_API_KEY=<YOUR KEY>
export EMBED_MODEL=<a text embedding model>
export EMBED_URL="<URL BASE>/v1/embeddings"
export PHENOTYPE_INDEX_DIR="<ABSOLUTE PATH TO phenotype_index>"
export STUDY_AGENT_MCP_CWD="<REPO ROOT (optional, for stable relative paths)>"
export STUDY_AGENT_HOST=127.0.0.1
export STUDY_AGENT_PORT=8765
export STUDY_AGENT_MCP_COMMAND=study-agent-mcp
export STUDY_AGENT_MCP_ARGS=""
study-agent-acp

Note: This starts MCP via stdio. If you use MCP over HTTP, do not set STUDY_AGENT_MCP_COMMAND. Note: Prefer stopping the ACP process (SIGINT/SIGTERM) so the MCP subprocess is closed cleanly. Killing the MCP directly can leave defunct processes. Note: ACP uses a threaded HTTP server by default. Set STUDY_AGENT_THREADING=0 to disable threading. Note: /health includes MCP preflight details under mcp_index when MCP is configured. Troubleshooting: run python mcp_server/scripts/mcp_probe.py to verify index paths and search without ACP.

MCP over HTTP (recommended for cross-platform stability)

Start MCP as a separate HTTP service:

export MCP_TRANSPORT=http
export MCP_HOST=127.0.0.1
export MCP_PORT=8790
export MCP_PATH=/mcp
study-agent-mcp

Then point ACP at it:

export STUDY_AGENT_MCP_URL="http://127.0.0.1:8790/mcp"
study-agent-acp

Note: STUDY_AGENT_MCP_URL must include the port (e.g. :8790). When set, ACP uses HTTP and ignores STUDY_AGENT_MCP_COMMAND.

PowerShell (Windows) quickstart:

$env:MCP_TRANSPORT = "http"
$env:MCP_HOST = "127.0.0.1"
$env:MCP_PORT = "8790"
$env:MCP_PATH = "/mcp"
study-agent-mcp

$env:STUDY_AGENT_MCP_URL = "http://127.0.0.1:8790/mcp"
study-agent-acp

Run phenotype_recommendation

curl -s -X POST http://127.0.0.1:8765/flows/phenotype_recommendation \
  -H 'Content-Type: application/json' \
  -d '{"study_intent":"Identify clinical risk factors for older adult patients who experience an adverse event of acute gastro-intenstinal (GI) bleeding", "top_k":20, "max_results":10,"candidate_limit":10}'

Planned Services

Below is a set of planned study agent services, organized by category. For each service, document the input, output, and validation approach.

High Level Conceptual

`protocol_generator`

Input: PICO/TAR for a study intent.
Output: Templated protocol.
Validation: Protocol completeness and consistency review.

`background_writer`

Input: PICO/TAR and hypothesis.
Output: Background document justifying the study (systematic research summary).
Validation: Source coverage and alignment with hypothesis.

`protocol_critique`

Input: Protocol.
Output: Critique reviewing required components and consistency.
Validation: Checklist of required components; coherence checks.

`dag_create`

Input: Protocol or study intent statement.
Output: Directed acyclic graph of known causal/associative relations (LLM + literature discovery).
Validation: Consistency with cited relations and domain plausibility.

`explain_cohort_diagnostics`

Input: The user's study intent statement and cohort diagnostics output including code to run and the results files
Output: narrative summary / report of the analysis.
Validation: Correctly reported summary of the methods and results.

`explain_incidence/estimation/characterization_results`

Input: The user's study intent statement and cohort diagnostics and a completed analysis with strategus output folders with code to run and the results files (incidence/estimation/characterization).
Output: narrative summary / report of the analysis.
Validation: Correctly reported summary of the methods and results.

High Level Operational

`strategus_*`

Input: Study specification intent or existing Strategus JSON.
Output: Composed/compared/edited/criticized/debugged Strategus JSON.
Validation: Schema validation and diff review.

Search and Suggest

`phenotype_recommendations`

Input: Study intent.
Output: Suggested phenotypes with cohort definition artifacts for user-accepted selections.
Validation: Allowed-id filtering; user confirmation before writes.

`phenotype_improvements` (or `phenotype fit`)

Input: Selected phenotypes + study intent.
Output: Improved cohort definitions or Atlas records for accepted changes.
Validation: Target cohort ID validation; user confirmation before writes.

`concept_set_recommendations`

Input: Phenotype/covariate intent lacking a cohort definition.
Output: Suggested concept sets and created concept set artifacts if accepted.
Validation: Concept set schema validation; user confirmation before writes.

`propose_negative_control_outcomes`

Input: Target (optionally comparator).
Output: Recommended negative control outcomes with cohort definitions if accepted.
Validation: Clinical plausibility check; user confirmation before writes.

`propose_comparator`

Input: Target.
Output: Proposed comparator cohort definition if accepted (optionally using OHDSI Comparator Selector).
Validation: Comparator appropriateness review; user confirmation before writes.

`propose_adjustment_set`

Input: Study intent + DAG.
Output: Adjustment set from OHDSI features plus suggested FeatureExtraction features.
Validation: Confounder/collider/mediator checks against DAG. E.g., showing the user if any known and biased collider that someone in another paper published might accidentally be including in their study design. See this JAMA article for more about colliders. Also, potentially using a knowledge graph of causal findings from the entire literature to informat the user of the same.

Study Component Testing, Improvement, and Linting

`propose_concept_set_diff`

Input: Concept set + study intent.
Output: Proposed patches to concept set artifacts if accepted.
Validation: Deterministic diff rules; user confirmation before writes.

`phenotype_characterize`

Input: Selected phenotype(s).
Output: R code (or Atlas services) to characterize populations.
Validation: Execution preview; user confirmation before running.

`phenotype_data_quality_review`

Input: Phenotype definitions + data quality sources (DQD, Achilles Heel, characterization).
Output: Mitigations and patches for accepted issues.
Validation: Issue traceability to data quality sources; user confirmation before writes.

`phenotype_dataset_profiler`

Input: Phenotype definition(s) + datasets.
Output: R code to run (e.g., Cohort Diagnostics) and a brief summary of drivers of cohort size variation.
Validation: Reproducible execution outputs; summary tied to diagnostics.

`phenotype_validation_review`

Input: Selected phenotype definition (usually for an outcome cohort) and a narrative clinical description with differential diagnoses and known associated factors for validation and to compare to known phenotype performance.
Output: code to extract sample cases based on the clinical description and LLM-assessment of a sample (user-specified or random) of cohort records stripped of PHI. Validation: Sampling logic review; user confirmation.

`cohort_definition_build`

Input: Phenotype/covariate intent without a cohort definition.
Output: Capr code for cohort definition.
Validation: Schema validation; user confirmation before writes.

`cohort_definition_lint`

Input: Cohort JSON.
Output: Proposed patches for design issues and execution efficiency.
Validation: Deterministic lint rules; user confirmation before writes.

`review_negative_control`

Input: Target + outcome.
Output: Judgement on causal implausibility with explanation and citations.
Validation: Citation review and domain plausibility.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
R/OHDSIAssistant		R/OHDSIAssistant
acp_agent		acp_agent
core		core
demo		demo
docs		docs
mcp_server		mcp_server
scripts		scripts
tests		tests
.gitignore		.gitignore
CODING_AGENT_README.md		CODING_AGENT_README.md
GIT-GUIDE.md		GIT-GUIDE.md
README.md		README.md
conftest.py		conftest.py
dodo.py		dodo.py
environment.yml		environment.yml
ohdsi-logo-ascii.txt		ohdsi-logo-ascii.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

OHDSI Study Design Assistant (in development)

Overview

Want to contribute?

Roadmap

Near term

Long term

Design

Why this architecture matters

What is implemented so far?

Current unit tests

phenotype_recommendation flow (ACP + MCP + LLM)

phenotype_improvements flow (ACP + MCP + LLM)

concept-sets-review flow (ACP + MCP + LLM)

cohort-critique-general-design flow (ACP + MCP + LLM)

phenotype_validation_review flow (ACP + MCP + LLM)

phenotype_recommendation_advice flow (ACP + MCP + LLM)

Strategus incidence shell (R)

Service Registry

Example run for phenotype_recommendation

MCP over HTTP (recommended for cross-platform stability)

Planned Services

High Level Conceptual

protocol_generator

background_writer

protocol_critique

dag_create

explain_cohort_diagnostics

explain_incidence/estimation/characterization_results

High Level Operational

strategus_*

Search and Suggest

phenotype_recommendations

phenotype_improvements (or phenotype fit)

concept_set_recommendations

propose_negative_control_outcomes

propose_comparator

propose_adjustment_set

Study Component Testing, Improvement, and Linting

propose_concept_set_diff

phenotype_characterize

phenotype_data_quality_review

phenotype_dataset_profiler

phenotype_validation_review

cohort_definition_build

cohort_definition_lint

review_negative_control

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`phenotype_recommendation` flow (ACP + MCP + LLM)

`phenotype_improvements` flow (ACP + MCP + LLM)

`concept-sets-review` flow (ACP + MCP + LLM)

`cohort-critique-general-design` flow (ACP + MCP + LLM)

`phenotype_validation_review` flow (ACP + MCP + LLM)

`phenotype_recommendation_advice` flow (ACP + MCP + LLM)

Example run for `phenotype_recommendation`

`protocol_generator`

`background_writer`

`protocol_critique`

`dag_create`

`explain_cohort_diagnostics`

`explain_incidence/estimation/characterization_results`

`strategus_*`

`phenotype_recommendations`

`phenotype_improvements` (or `phenotype fit`)

`concept_set_recommendations`

`propose_negative_control_outcomes`

`propose_comparator`

`propose_adjustment_set`

`propose_concept_set_diff`

`phenotype_characterize`

`phenotype_data_quality_review`

`phenotype_dataset_profiler`

`phenotype_validation_review`

`cohort_definition_build`

`cohort_definition_lint`

`review_negative_control`

Packages