This is a portfolio project for a deployable business-agent pattern:
- Document the messy human workflow before agentifying it.
- Define a black-box contract for inputs, outputs, policy, and audit.
- Build portable business logic that does not depend on one agent vendor.
- Add thin agent/runtime adapters around that stable core.
- Evaluate behavior against fixtures instead of trusting a prompt demo.
The example workflow is a sales-data request process. A requester sends an unstructured email asking for a report. The system interprets the request, asks for clarification where needed, checks permissions, generates a deterministic CSV/XLSX report from synthetic data, drafts a response, and records an audit event.
- Business-process understanding before implementation.
- A bounded agent that could sit behind email, Teams, Copilot, OpenAI, a web form, or another surface.
- Deterministic policy checks, report generation, response drafting, and audit logging.
- Vendor-neutral core logic with OpenAI Agents SDK as one working adapter.
- OpenAI-compatible provider support through Together AI, while treating model choice as an evaluated operational decision.
- Microsoft Copilot/Agents mapping without committing to Microsoft implementation earlier than necessary.
- A companion scenario with sales-reporting, where generated data files feed a human-in-the-loop reporting and synthesis workflow.
This is not a general BI replacement. It is a narrow, governed request-handling workflow.
Start here:
- Process overview: what the manual workflow is and where an agent helps.
- Black-box contract: stable input/output and audit expectations.
- Architecture: diagrams for the portable core, adapter boundary, and before/after workflow.
- Evaluation results: deterministic baseline and live-adapter observations.
- Microsoft Copilot mapping: how the same core would surface in Microsoft 365 without platform lock-in.
- Companion project scenario: how this could feed a human-overseen reporting workflow.
- Demo script: suggested walkthrough for an interview or screen recording.
Then inspect the implementation:
- agent_core: vendor-neutral schemas, policy, reports, response drafting, audit, evaluation.
- implementations/openai_agents_sdk: thin OpenAI Agents SDK adapter.
- samples: synthetic inbound requests, policy/data fixtures, expected outputs, and committed example artifacts.
- tests: deterministic tests plus adapter and evaluation checks.
flowchart LR
E["Inbound email or channel message"] --> A["Thin agent adapter"]
A --> R["Structured request"]
R --> C["Portable core"]
C --> V["Validation"]
C --> P["Policy decision"]
C --> G["CSV/XLSX generation"]
C --> D["Response drafting"]
C --> U["Audit event"]
G --> O["Report file"]
D --> M["Reply message"]
The model interprets language. The portable core decides, calculates, writes files, drafts governed responses, and audits.
python -m venv .venv
python -m pip install -r requirements.txt
python -m unittest discover -s testsRun one fixture through the portable core:
python tools\run_core_case.py case-001Run the deterministic evaluation baseline:
python tools\evaluate_cases.py --implementation coreExpected summary:
Summary: 8 passed, 0 failed, 8 total, pass_rate=1.0
Install the OpenAI Agents SDK:
python -m pip install -r requirements-openai.txtRun one case with the default OpenAI provider:
$env:OPENAI_API_KEY = "sk-..."
python tools\run_openai_agent_case.py case-001Run through Together AI's OpenAI-compatible endpoint:
$env:OPENAI_AGENT_PROVIDER = "together"
$env:TOGETHER_API_KEY = "your-together-key"
$env:OPENAI_AGENT_MODEL = "openai/gpt-oss-20b"
python tools\run_openai_agent_case.py case-001Compare model behavior for one run:
python tools\run_openai_agent_case.py case-001 --model "provider/model-name"Evaluate the live adapter against fixtures:
python tools\evaluate_cases.py --implementation openai --cases case-002 case-003 case-008Committed example outputs are under samples/example-outputs:
reports/case-001.xlsx: approved XLSX report.reports/case-008.csv: approved CSV trend report.traces/case-001.json: generated-report audit example.traces/case-002.json: clarification-required audit example.traces/case-005.json: approval-required audit example.
The regular generated folder is ignored by Git for local runs.
- The agent must call the portable core for every request.
- The agent may extract intent, but must not calculate metrics or decide policy.
- Clarification terminates the current processing run. A clarified reply starts a new run linked by request metadata.
- Model choice is evaluated by pass rate and reliability, not preference or token price alone.
- Microsoft is treated as a deployment surface until a real tenant-backed implementation is justified.
docs/ Workflow docs, architecture notes, demo material
samples/ Synthetic data, requests, expected outputs, examples
agent_core/ Vendor-neutral schemas, policy, reports, audit, evaluation
implementations/ Runtime-specific adapters
tools/ CLI runners and evaluation commands
tests/ Unit, adapter, and evaluation tests
generated/ Local generated output, ignored by Git
- Backlog
- Architecture
- Demo script
- Companion project scenario
- Implementation notes
- Enterprise AI considerations
- Limitations and extensions
- Process overview
- Current human workflow
- Black-box contract
- Supported request types
- Policy and permissions
- Clarification rules
- Evaluation plan
- Evaluation results
- Synthetic business domain
- Portable core
- OpenAI Agents SDK adapter
- Microsoft Copilot mapping
- OpenAI agent transcript example
- Plan review
- Source chat PDF