This project demonstrates a deployable business-agent pattern: document a messy human-operated request process, define the black-box contract, then implement one portable agentic capability that can be surfaced through OpenAI, Microsoft, email, Teams, forms, or other channels.
The first vertical slice is a sales-data request workflow. Requesters send unstructured requests for report extracts. The system interprets the request, asks for clarification where needed, checks policy, generates a deterministic CSV/XLSX report from synthetic data, drafts a response, and records an audit trail.
Goal: make the project navigable and ready for implementation.
- Move planning artifacts into
docs/. - Initialize Git repository.
- Create private GitHub remote.
- Add project
README.mdwith thesis, scope, and quickstart placeholder. - Add
.gitignorefor Python, Node, generated reports, local env files, and temporary traces. - Decide initial implementation stack.
- Create top-level folders for docs, samples, core logic, implementations, and generated outputs.
Goal: document the business workflow before agentifying it.
- Write
docs/process-overview.md. - Write
docs/current-human-workflow.md. - Write
docs/black-box-contract.md. - Write
docs/supported-request-types.md. - Write
docs/policy-and-permissions.md. - Write
docs/clarification-rules.md. - Write
docs/evaluation-plan.md. - Update
docs/plan-review.mdas the framing note rather than the active plan.
Acceptance criteria:
- A reader can understand the current manual process without reading code.
- Inputs, outputs, rules, exceptions, and audit requirements are explicit.
- The documentation explains when a web form would be sufficient and when an agentic intake layer is justified.
Goal: create a realistic but safe data and policy environment.
- Define
SalesOrdersschema. - Generate synthetic sales dataset.
- Create requester profiles with roles, regions, and entitlements.
- Define policy fixtures for margin access, regional access, raw extract approval, and customer-level restrictions.
- Add sample inbound requests covering clear, ambiguous, unauthorized, and approval-required cases.
- Add expected structured outputs for each sample request.
Acceptance criteria:
- The dataset is plausible enough for demos but contains no real data.
- Every sample request has a known expected decision.
- Data and permissions are versioned as fixtures.
Goal: build vendor-neutral request processing primitives.
- Define structured request schema.
- Define policy decision schema.
- Define report plan schema.
- Define audit event schema.
- Implement request validation and completeness checks.
- Implement deterministic policy checks.
- Implement deterministic report query/build logic.
- Implement CSV output.
- Implement XLSX output with summary, data, and request metadata tabs.
- Implement response drafting from structured outcomes.
- Implement audit log generation.
Acceptance criteria:
- Core logic can run without any agent SDK.
- Calculations and file generation are deterministic.
- Policy checks are testable without model calls.
Goal: show a working code-first agent implementation.
- Create OpenAI implementation folder.
- Define the intake agent instructions.
- Expose core operations as tools.
- Use structured outputs for extracted request intent.
- Implement clarification path.
- Implement approved-report path.
- Implement rejection path.
- Implement approval-required path.
- Capture traces or transcript examples.
Acceptance criteria:
- The agent can process fixture requests end to end.
- The model interprets language, while tools perform policy, data, and file work.
- Outputs include structured request JSON, report file, response text, and audit log.
Goal: make the demo credible as an engineered workflow, not a prompt trick.
- Define evaluation cases from sample inbound requests.
- Check extraction accuracy against expected structured request fields.
- Check policy decision accuracy.
- Check whether clarification is requested when required.
- Check report files contain expected filters, dimensions, and metrics.
- Check audit events are created.
- Add a simple command to run the evaluation set.
- Record provider and model for each live evaluation run.
- Document when repeated extraction failures should trigger live model comparison.
- Document known limitations and failure cases.
Acceptance criteria:
- Evaluation can be run repeatably.
- Failures are visible and actionable.
- Model choice can be evaluated by observed pass rate and reliability, not preference.
- At least one example exists for success, clarification, rejection, and approval-required outcomes.
Goal: demonstrate avoidance of vendor lock-in and practical Microsoft awareness.
- Write
docs/microsoft-copilot-mapping.md. - Map portable core concepts to Copilot Studio concepts.
- Map portable core concepts to Microsoft 365 Agents SDK/custom engine agent concepts.
- Identify which components would live in Graph, Power Automate, Dataverse, Fabric, or a custom API.
- Describe edge lock-in explicitly: auth, mailbox, Teams, tenant governance, approvals, publishing, and monitoring.
- Sketch thin-adapter architecture for a Microsoft-facing implementation.
- Decide whether to build a small Microsoft 365 Agents SDK adapter or keep this as documented architecture in v1.
Acceptance criteria:
- The Microsoft story is specific enough for an interview discussion.
- Business logic remains outside Microsoft-specific configuration.
- The repo can explain what would change between OpenAI and Microsoft deployments.
Goal: make the project easy to inspect and discuss.
- Write final
README.md. - Add architecture diagram.
- Add before/after workflow diagram.
- Add demo script.
- Add generated example reports.
- Add anonymized trace/audit examples.
- Add short implementation notes explaining design trade-offs.
- Add record-ready demo outline instead of committing video/GIF media.
- Add final limitations and extension ideas.
Acceptance criteria:
- A reviewer can understand the project in five minutes.
- A technical interviewer can inspect the implementation and tests.
- The project clearly avoids claiming to be a general BI replacement.
- Should evaluation use static fixture expectations only, or also model-graded qualitative checks?
- What pass-rate or repeated-failure threshold should trigger trying a stronger model in a real deployment?
- Should inbound email be simulated as plain text,
.eml, or JSON-wrapped messages? - Should a future Microsoft implementation target Microsoft 365 Agents SDK first, or Copilot Studio first if a real tenant becomes available?
- Real mailbox ingestion.
- Graph API outbound replies.
- Teams bot surface.
- Power Automate approval flow.
- Power BI/Fabric semantic model integration.
- Natural-language SQL over arbitrary schemas.
- Multi-agent decomposition.
- Web form comparison UI.
- Hosted demo.