diff --git a/.aider/conventions.md b/.aider/conventions.md new file mode 100644 index 0000000..6a8dc93 --- /dev/null +++ b/.aider/conventions.md @@ -0,0 +1,279 @@ +# Decispher: Project Conventions for Aider + + +These are the active engineering decisions for this repository. +Aider should follow all of these conventions when making changes. + +## Email + +### Migrate email service to Zoho and update SMTP infrastructure + +**Convention:** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + +**Why:** The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `infrastructure/mail`, `services/smtp`, `config/email_routing` + +## Infrastructure + +### Migrate from Shipsy to in-house mapping event system + +**Convention:** Switch from the third-party Shipsy provider to an in-house developed mapping event system. + +**Why:** The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `services/shipping-integration`, `infrastructure/event-bus` + +## Api + +### Abandon RFC 7807 for error responses + +**Convention:** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + +**Why:** The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `api/responses`, `api/error-handling` + +## Vector-search + +### Standardize on HNSW for new vector indexes + +**Convention:** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + +**Why:** HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. + +**Relevant files:** `db/schema/vector_indexes`, `db/migrations/sprint_16/migrate_llm_cache_to_hnsw` + +## Rfc-7807 + +### Establish authoritative RFC 7807 error format convention + +**Convention:** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + +**Why:** The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. + +**Relevant files:** `packages/api/src/plugins/error-handler.ts` + +## Billing + +### Establish ownership and modification constraints for credits and billing system + +**Convention:** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + +**Why:** To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `packages/api/src/routes/credits.ts`, `packages/decision-store/src/repositories/credit-repository.ts`, `packages/common/src/types/credits.ts` + +### Ownership of Billing Module + +**Convention:** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + +**Why:** This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. + +**Relevant files:** `packages/api/src/billing/` + +## Llm + +### Define Model Fallback Ordering Strategy for API Rate Limits + +**Convention:** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + +**Why:** To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `src/llm/client_factory.py`, `src/llm/fallback_logic.py` + +## Aws + +### Migrate infrastructure orchestration from AWS ECS to AWS EKS + +**Convention:** The team will migrate from AWS ECS to AWS EKS for container orchestration. + +**Why:** EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `infrastructure/terraform`, `infrastructure/k8s` + +## Pgvector + +### Use cosine distance for pgvector similarity searches + +**Convention:** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + +**Why:** Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. + +### Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW + +**Convention:** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + +**Why:** Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. + +## Mongodb + +### Use MongoDB Atlas for analytics event ingestion + +**Convention:** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + +**Why:** To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. + +**Relevant files:** `analytics/storage`, `infrastructure/database-policy` + +### Use MongoDB Atlas for schemaless analytics webhook storage + +**Convention:** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + +**Why:** MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `services/analytics-webhook-handler`, `infrastructure/database-clusters` + +### Use MongoDB for Analytics Events Pipeline + +**Convention:** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + +**Why:** MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `packages/api/src/analytics/` + +### Prohibition of MongoDB in the Tech Stack for Analytics Events + +**Convention:** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + +**Why:** There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. + +> ⚠️ This is a **CRITICAL** priority rule — do not violate it. + +## Postgresql + +### Prohibit MongoDB and mandate PostgreSQL for core pipelines + +**Convention:** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + +**Why:** Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. + +> ⚠️ This is a **CRITICAL** priority rule — do not violate it. + +**Relevant files:** `infrastructure/database`, `src/db/config.ts` + +### Standardize on PostgreSQL with pgvector for primary storage and vector search + +**Convention:** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + +**Why:** PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +### We use PostgreSQL with pgvector for all data storage + +**Convention:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +**Why:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +> ⚠️ This is a **CRITICAL** priority rule — do not violate it. + +## Event-sourcing + +### Abandoning EventStoreDB for monorepo event handling + +**Convention:** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + +**Why:** The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. + +## API + +### Enforce RFC 7807 for Internal API Error Formats + +**Convention:** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + +**Why:** Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. + +**Relevant files:** `packages/api/src/routes/internal/` + +## LLM + +### LLM Provider Strategy by Pipeline Step and Effort Mode + +**Convention:** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + +**Why:** The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +**Relevant files:** `packages/analyzer/` + +### Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes + +**Convention:** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + +**Why:** The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +### Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides + +**Convention:** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + +**Why:** This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +## Vector-database + +### Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing + +**Convention:** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + +**Why:** The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. + +**Relevant files:** `packages/decision-store/src/schema.ts` + +## Redis + +### Implement Redis Semantic Caching for LLM Embedding Calls + +**Convention:** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + +**Why:** Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + +## Migration + +### Plan to Migrate Application Infrastructure from Railway to AWS ECS + +**Convention:** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + +**Why:** This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +## Architecture + +### Defer Microservices Adoption, Maintain Monorepo Architecture + +**Convention:** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + +**Why:** An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. + +> ⚠️ This is a **HIGH** priority rule — do not violate it. + +## Database + +### Standardize on PostgreSQL and Redis; Prohibit MongoDB + +**Convention:** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + +**Why:** ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. + +> ⚠️ This is a **CRITICAL** priority rule — do not violate it. diff --git a/.clinerules b/.clinerules new file mode 100644 index 0000000..ca6cd2e --- /dev/null +++ b/.clinerules @@ -0,0 +1,107 @@ +# Decispher Engineering Rules +# Auto-generated by Decispher — do not edit manually + +You are working in a codebase governed by the following engineering decisions. +Before writing or modifying code, review these rules and do not violate them. + +## Rules + +1. 🔴 **[CRITICAL]** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + - **Why:** Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. + - **Files:** `infrastructure/database`, `src/db/config.ts` + +2. 🔴 **[CRITICAL]** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + - **Why:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +3. 🔴 **[CRITICAL]** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + - **Why:** There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. + +4. 🔴 **[CRITICAL]** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + - **Why:** ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. + +5. 🟠 **[HIGH]** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + - **Why:** The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. + - **Files:** `infrastructure/mail`, `services/smtp`, `config/email_routing` + +6. 🟠 **[HIGH]** Switch from the third-party Shipsy provider to an in-house developed mapping event system. + - **Why:** The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. + - **Files:** `services/shipping-integration`, `infrastructure/event-bus` + +7. 🟠 **[HIGH]** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + - **Why:** The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. + - **Files:** `api/responses`, `api/error-handling` + +8. 🟠 **[HIGH]** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + - **Why:** To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. + - **Files:** `packages/api/src/routes/credits.ts`, `packages/decision-store/src/repositories/credit-repository.ts`, `packages/common/src/types/credits.ts` + +9. 🟠 **[HIGH]** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + - **Why:** To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. + - **Files:** `src/llm/client_factory.py`, `src/llm/fallback_logic.py` + +10. 🟠 **[HIGH]** The team will migrate from AWS ECS to AWS EKS for container orchestration. + - **Why:** EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. + - **Files:** `infrastructure/terraform`, `infrastructure/k8s` + +11. 🟠 **[HIGH]** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + - **Why:** PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. + +12. 🟠 **[HIGH]** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + - **Why:** MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. + - **Files:** `services/analytics-webhook-handler`, `infrastructure/database-clusters` + +13. 🟠 **[HIGH]** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + - **Why:** MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. + - **Files:** `packages/api/src/analytics/` + +14. 🟠 **[HIGH]** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + - **Why:** The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. + - **Files:** `packages/analyzer/` + +15. 🟠 **[HIGH]** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + - **Why:** This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + +16. 🟠 **[HIGH]** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + - **Why:** An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. + +17. 🟠 **[HIGH]** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + - **Why:** The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + +18. 🟠 **[HIGH]** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + - **Why:** This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). + +19. 🟡 **[MEDIUM]** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + - **Why:** HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. + - **Files:** `db/schema/vector_indexes`, `db/migrations/sprint_16/migrate_llm_cache_to_hnsw` + +20. 🟡 **[MEDIUM]** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + - **Why:** The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. + - **Files:** `packages/api/src/plugins/error-handler.ts` + +21. 🟡 **[MEDIUM]** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + - **Why:** Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. + +22. 🟡 **[MEDIUM]** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + - **Why:** To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. + - **Files:** `analytics/storage`, `infrastructure/database-policy` + +23. 🟡 **[MEDIUM]** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + - **Why:** The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. + +24. 🟡 **[MEDIUM]** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + - **Why:** Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. + - **Files:** `packages/api/src/routes/internal/` + +25. 🟡 **[MEDIUM]** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + - **Why:** The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. + - **Files:** `packages/decision-store/src/schema.ts` + +26. 🟡 **[MEDIUM]** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + - **Why:** Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. + +27. 🟡 **[MEDIUM]** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + - **Why:** Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + +28. 🟡 **[MEDIUM]** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + - **Why:** This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. + - **Files:** `packages/api/src/billing/` diff --git a/.cursorrules b/.cursorrules new file mode 100644 index 0000000..58ee49f --- /dev/null +++ b/.cursorrules @@ -0,0 +1,152 @@ +# Project Decisions & Conventions +# Auto-generated by Decispher — Do not edit manually + +## Email + +- Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + Rationale: The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. + Files: infrastructure/mail, services/smtp, config/email_routing + +## Infrastructure + +- Switch from the third-party Shipsy provider to an in-house developed mapping event system. + Rationale: The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. + Files: services/shipping-integration, infrastructure/event-bus + +## Api + +- We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + Rationale: The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. + Files: api/responses, api/error-handling + +## Vector-search + +- All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + Rationale: HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. + Files: db/schema/vector_indexes, db/migrations/sprint_16/migrate_llm_cache_to_hnsw + +## Rfc-7807 + +- Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + Rationale: The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. + Files: packages/api/src/plugins/error-handler.ts + +## Billing + +- Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + Rationale: To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. + Files: packages/api/src/routes/credits.ts, packages/decision-store/src/repositories/credit-repository.ts, packages/common/src/types/credits.ts + +- The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + Rationale: This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. + Files: packages/api/src/billing/ + +## Llm + +- Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + Rationale: To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. + Files: src/llm/client_factory.py, src/llm/fallback_logic.py + +## Aws + +- The team will migrate from AWS ECS to AWS EKS for container orchestration. + Rationale: EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. + Files: infrastructure/terraform, infrastructure/k8s + +## Pgvector + +- We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + Rationale: Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. + Files: * + +- We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + Rationale: Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. + Files: * + +## Mongodb + +- MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + Rationale: To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. + Files: analytics/storage, infrastructure/database-policy + +- Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + Rationale: MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. + Files: services/analytics-webhook-handler, infrastructure/database-clusters + +- We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + Rationale: MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. + Files: packages/api/src/analytics/ + +- MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + Rationale: There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. + Files: * + +## Postgresql + +- The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + Rationale: Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. + Files: infrastructure/database, src/db/config.ts + +- Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + Rationale: PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. + Files: * + +- After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + Rationale: After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + Files: * + +## Event-sourcing + +- The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + Rationale: The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. + Files: * + +## API + +- All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + Rationale: Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. + Files: packages/api/src/routes/internal/ + +## LLM + +- Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + Rationale: The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. + Files: packages/analyzer/ + +- The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + Rationale: The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + Files: * + +- We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + Rationale: This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). + Files: * + +## Vector-database + +- We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + Rationale: The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. + Files: packages/decision-store/src/schema.ts + +## Redis + +- Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + Rationale: Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + Files: * + +## Migration + +- The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + Rationale: This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + Files: * + +## Architecture + +- To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + Rationale: An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. + Files: * + +## Database + +- MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + Rationale: ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. + Files: * diff --git a/.decispher/context-rules.json b/.decispher/context-rules.json new file mode 100644 index 0000000..ee6f71a --- /dev/null +++ b/.decispher/context-rules.json @@ -0,0 +1,1514 @@ +{ + "$schema": "https://decispher.dev/schemas/context-rules/v1.json", + "specVersion": "1.0.0", + "generatedAt": "2026-05-01T23:41:20.866Z", + "companyId": "1489dcdc-ef7f-4cc5-b0cb-b453efa059f4", + "meta": { + "totalRules": 27, + "rulesByType": { + "decision": 18, + "convention": 2, + "constraint": 3, + "rationale": 0, + "ownership": 2, + "history": 1, + "plan": 1 + }, + "rulesBySeverity": { + "CRITICAL": 3, + "HIGH": 14, + "MEDIUM": 10, + "LOW": 0 + }, + "generator": "decispher-ai-blocker@1.0.0" + }, + "rules": [ + { + "id": "e9097831-0c7b-43f1-895a-b115dd36c20b", + "type": "decision", + "title": "Migrate email service to Zoho and update SMTP infrastructure", + "problem": "Need to replace the existing webmaster mailing service and transition away from the current SMTP server.", + "decision": "Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server.", + "rationale": "The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [ + "infrastructure/mail", + "services/smtp", + "config/email_routing" + ], + "tags": [ + "email", + "infrastructure", + "zoho", + "smtp", + "migration" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1777377502.008759", + "channelId": "C0ALKBAGZQS", + "sourceUrl": null, + "channelName": null, + "slackTeamId": null, + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "infrastructure/mail", + "services/smtp", + "config/email_routing" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-28T12:00:42.011Z", + "updatedAt": "2026-04-28T16:34:25.219Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "bd1748c1-766b-4f12-9819-536593b3c5ed", + "type": "decision", + "title": "Migrate from Shipsy to in-house mapping event system", + "problem": "Dependency on third-party provider Shipsy is causing scalability issues.", + "decision": "Switch from the third-party Shipsy provider to an in-house developed mapping event system.", + "rationale": "The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [ + "services/shipping-integration", + "infrastructure/event-bus" + ], + "tags": [ + "infrastructure", + "scalability", + "backend", + "migration" + ], + "alternatives": [ + { + "option": "Continue using Shipsy", + "reasonRejected": "It acts as a bottleneck for system scalability." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1777232325.309009", + "channelId": "C0ALJ121XLZ", + "sourceUrl": null, + "channelName": null, + "slackTeamId": null, + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Switch from the third-party Shipsy provider to an in-house developed mapping event system." + } + ], + "supersededBy": null, + "enforcement": null, + "createdAt": "2026-04-26T19:40:00.437Z", + "updatedAt": "2026-04-26T19:40:40.165Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "d01a8dd8-f37c-406d-8050-d98eaab82da0", + "type": "decision", + "title": "Abandon RFC 7807 for error responses", + "problem": "The current API error handling standard (RFC 7807) is considered outdated for the team's needs.", + "decision": "We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward.", + "rationale": "The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture.", + "severity": "HIGH", + "status": "active", + "confidence": 0.74, + "affectedFiles": [ + "api/responses", + "api/error-handling" + ], + "tags": [ + "api", + "rfc7807", + "standards", + "backend" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776879200.206639", + "channelId": "C0ALJ121XLZ", + "sourceUrl": "https://newworkspace-zdx9462.slack.com/archives/C0ALJ121XLZ/p1776879200206639?thread_ts=1776879200.206639&cid=C0ALJ121XLZ", + "channelName": "decispher-test-1", + "slackTeamId": null, + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit", + "triggeredByName": "Ali Abbas" + }, + "snippet": "We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward." + } + ], + "supersededBy": "127df168-2a46-4b88-a207-e6887cc4183f", + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "api/responses", + "api/error-handling" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T17:35:14.262Z", + "updatedAt": "2026-04-22T17:37:25.303Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "f3db51f2-6914-415f-9fca-78783563463d", + "type": "decision", + "title": "Standardize on HNSW for new vector indexes", + "problem": "Uncertainty regarding whether the rejection of IVFFlat to HNSW migration applied to the index technology choice or the migration process itself.", + "decision": "All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16.", + "rationale": "HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "db/schema/vector_indexes", + "db/migrations/sprint_16/migrate_llm_cache_to_hnsw" + ], + "tags": [ + "vector-search", + "postgresql", + "hnsw", + "architecture", + "database" + ], + "alternatives": [ + { + "option": "IVFFlat", + "reasonRejected": "The team has standardized on HNSW for new indexes to maintain architectural consistency, despite potential performance profiles for specific query patterns." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776840745.000000", + "channelId": "C_DB_PERFORMANCE", + "sourceUrl": null, + "channelName": "db-performance", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "db/schema/vector_indexes", + "db/migrations/sprint_16/migrate_llm_cache_to_hnsw" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:15:23.717Z", + "updatedAt": "2026-04-22T11:44:22.889Z", + "version": 1, + "createdBy": "U_REZA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "17f772b1-8901-4961-834a-da4fbcf68132", + "type": "convention", + "title": "Establish authoritative RFC 7807 error format convention", + "problem": "Duplicate and conflicting conventions regarding RFC 7807 error format severity were documented in the Decispher system.", + "decision": "Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance.", + "rationale": "The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.45, + "affectedFiles": [ + "packages/api/src/plugins/error-handler.ts" + ], + "tags": [ + "rfc-7807", + "api-design", + "error-handling", + "decispher" + ], + "alternatives": [ + { + "option": "MEDIUM severity specification", + "reasonRejected": "The HIGH severity version was explicitly selected as the authoritative and canonical standard." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.45, + "ref": { + "priority": "high", + "threadTs": "1776840345.000000", + "channelId": "C_API_DESIGN", + "sourceUrl": null, + "channelName": "api-design", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "packages/api/src/plugins/error-handler.ts" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:14:38.568Z", + "updatedAt": "2026-04-22T11:44:39.580Z", + "version": 1, + "createdBy": "U_NADIA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "08631b5a-a39c-4766-b840-beb3a13fde5e", + "type": "ownership", + "title": "Establish ownership and modification constraints for credits and billing system", + "problem": "Uncertainty regarding ownership of the billing module and the requirements for implementing new effort modes.", + "decision": "Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements.", + "rationale": "To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized.", + "severity": "HIGH", + "status": "active", + "confidence": 0.43, + "affectedFiles": [ + "packages/api/src/routes/credits.ts", + "packages/decision-store/src/repositories/credit-repository.ts", + "packages/common/src/types/credits.ts" + ], + "tags": [ + "billing", + "ownership", + "credits", + "compliance" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.43, + "ref": { + "priority": "high", + "threadTs": "1776839945.000000", + "channelId": "C_ENG_GENERAL", + "sourceUrl": null, + "references": [ + "ADR-019" + ], + "channelName": "engineering-general", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must st…" + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "packages/api/src/routes/credits.ts", + "packages/decision-store/src/repositories/credit-repository.ts", + "packages/common/src/types/credits.ts" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:14:04.546Z", + "updatedAt": "2026-04-22T11:43:40.979Z", + "version": 1, + "createdBy": "U_SARA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "ac87f0e5-a048-4197-a533-31b2a71ea5a8", + "type": "decision", + "title": "Define Model Fallback Ordering Strategy for API Rate Limits", + "problem": "Handling 429 rate limit errors from LLM providers during extraction and detection tasks.", + "decision": "Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek.", + "rationale": "To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [ + "src/llm/client_factory.py", + "src/llm/fallback_logic.py" + ], + "tags": [ + "llm", + "reliability", + "rate-limiting", + "architecture" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776839545.000000", + "channelId": "C_AI_PIPELINE", + "sourceUrl": null, + "references": [ + "ADR-003", + "ADR-013" + ], + "channelName": "ai-pipeline", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "src/llm/client_factory.py", + "src/llm/fallback_logic.py" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:12:54.597Z", + "updatedAt": "2026-04-22T11:45:21.636Z", + "version": 1, + "createdBy": "U_FATIMA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "add28e0b-7a4c-4b40-9679-0d488565bb4b", + "type": "decision", + "title": "Migrate infrastructure orchestration from AWS ECS to AWS EKS", + "problem": "AWS ECS lacks built-in support for multi-region failover without complex custom routing and requires additional tooling for advanced scaling capabilities.", + "decision": "The team will migrate from AWS ECS to AWS EKS for container orchestration.", + "rationale": "EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes.", + "severity": "HIGH", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "infrastructure/terraform", + "infrastructure/k8s" + ], + "tags": [ + "aws", + "eks", + "ecs", + "kubernetes", + "infrastructure", + "migration" + ], + "alternatives": [ + { + "option": "AWS ECS", + "reasonRejected": "Lacks sufficient multi-region failover support and requires custom routing implementations." + }, + { + "option": "Railway", + "reasonRejected": "Retained only as a temporary fallback, deemed insufficient for long-term production orchestration." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776839145.000000", + "channelId": "C_DEVOPS", + "sourceUrl": null, + "channelName": "devops", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "The team will migrate from AWS ECS to AWS EKS for container orchestration." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "infrastructure/terraform", + "infrastructure/k8s" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:11:59.569Z", + "updatedAt": "2026-04-22T11:43:57.133Z", + "version": 1, + "createdBy": "U_OMAR", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "75490c8c-3a3d-4253-a8ff-9768e6359aaf", + "type": "decision", + "title": "Use cosine distance for pgvector similarity searches", + "problem": "Determine the optimal distance metric for embedding similarity search in pgvector to maximize recall.", + "decision": "We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations.", + "rationale": "Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.5, + "affectedFiles": [], + "tags": [ + "pgvector", + "postgresql", + "embeddings", + "vector-search" + ], + "alternatives": [ + { + "option": "L2 distance", + "reasonRejected": "It is sensitive to embedding magnitude and demonstrated poorer recall compared to cosine distance for our data." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776838845.000000", + "channelId": "C_ML_PLATFORM", + "sourceUrl": null, + "channelName": "ml-platform", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:10:45.689Z", + "updatedAt": "2026-04-22T11:44:59.552Z", + "version": 1, + "createdBy": "U_FATIMA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "a06bac38-381d-472f-852d-f902db9a70c4", + "type": "decision", + "title": "Use MongoDB Atlas for analytics event ingestion", + "problem": "The team needs a high-throughput storage solution for analytics event ingestion, but is restricted to PostgreSQL and Redis for general data storage.", + "decision": "MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR.", + "rationale": "To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "analytics/storage", + "infrastructure/database-policy" + ], + "tags": [ + "mongodb", + "postgresql", + "analytics", + "infrastructure" + ], + "alternatives": [ + { + "option": "PostgreSQL partitioned tables", + "reasonRejected": "The team expressed concern that it may struggle with the required write throughput of 50k events per second." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776838445.000000", + "channelId": "C_DATA_PIPELINE", + "sourceUrl": null, + "channelName": "data-pipeline", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "MongoDB Atlas is permitted exclusively for analytics event ingestion, while all other core application data must remain on PostgreSQL." + }, + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1777674096.801579", + "channelId": "C0ALKBAGZQS", + "sourceUrl": "https://newworkspace-zdx9462.slack.com/archives/C0ALKBAGZQS/p1777674096801579?thread_ts=1777674096.801579&cid=C0ALKBAGZQS", + "channelName": "bot-test-1", + "slackTeamId": null, + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit", + "triggeredByName": "Ali Abbas" + }, + "snippet": "MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation r…" + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "analytics/storage", + "infrastructure/database-policy" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:09:36.210Z", + "updatedAt": "2026-05-01T23:41:10.725Z", + "version": 2, + "createdBy": "U_REZA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "8d730075-9042-46a9-b163-9bedcaaee371", + "type": "constraint", + "title": "Prohibit MongoDB and mandate PostgreSQL for core pipelines", + "problem": "Need to define and enforce the database technology stack to ensure system consistency and data integrity.", + "decision": "The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited.", + "rationale": "Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector.", + "severity": "CRITICAL", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "infrastructure/database", + "src/db/config.ts" + ], + "tags": [ + "postgresql", + "database", + "mongodb", + "infrastructure", + "backend" + ], + "alternatives": [ + { + "option": "MongoDB", + "reasonRejected": "Prohibited to maintain stack consistency and data integrity requirements." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776838145.000000", + "channelId": "C_BACKEND_INFRA", + "sourceUrl": null, + "channelName": "backend-infra", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "infrastructure/database", + "src/db/config.ts" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:07:48.054Z", + "updatedAt": "2026-04-22T11:43:24.579Z", + "version": 1, + "createdBy": "U_NADIA", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "81145171-98f5-43da-b922-9c76d78741b5", + "type": "decision", + "title": "Standardize on PostgreSQL with pgvector for primary storage and vector search", + "problem": "Selecting the primary datastore to handle both standard relational data and vector search requirements efficiently.", + "decision": "Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations.", + "rationale": "PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [], + "tags": [ + "postgresql", + "pgvector", + "database", + "infrastructure", + "vector-search" + ], + "alternatives": [ + { + "option": "MongoDB", + "reasonRejected": "The team preferred the relational capabilities of PostgreSQL and the unified support for vector search provided by pgvector." + }, + { + "option": "CockroachDB", + "reasonRejected": "The team decided that PostgreSQL with pgvector was sufficient and preferred over the complexity or features offered by CockroachDB." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776837955.000000", + "channelId": "C_ENG_GENERAL", + "sourceUrl": null, + "channelName": "engineering-general", + "slackTeamId": "T05FJS3A8JG", + "triggeredBy": null, + "detectionMethod": "explicit", + "triggeredByName": null + }, + "snippet": "Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-22T06:06:49.679Z", + "updatedAt": "2026-04-22T11:44:09.055Z", + "version": 1, + "createdBy": "U_ALI", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "3213c504-c4a3-4364-bc9f-3e3213242b7a", + "type": "decision", + "title": "Use MongoDB Atlas for schemaless analytics webhook storage", + "problem": "", + "decision": "Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL.", + "rationale": "MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [ + "services/analytics-webhook-handler", + "infrastructure/database-clusters" + ], + "tags": [ + "mongodb", + "postgresql", + "analytics", + "database", + "infrastructure", + "fusion:contradicts" + ], + "alternatives": [ + { + "option": "PostgreSQL JSONB", + "reasonRejected": "Proved too difficult and inefficient to index effectively for schemaless event data." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776590790.779709", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + }, + "snippet": "Use MongoDB Atlas as the primary datastore for all raw analytics event logs, bypassing the existing PostgreSQL setup for this specific data type." + }, + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1777674632.806469", + "channelId": "C0ALKBAGZQS", + "sourceUrl": "https://newworkspace-zdx9462.slack.com/archives/C0ALKBAGZQS/p1777674632806469?thread_ts=1777674632.806469&cid=C0ALKBAGZQS", + "channelName": "bot-test-1", + "slackTeamId": null, + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit", + "triggeredByName": "Ali Abbas" + }, + "snippet": "Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL." + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "services/analytics-webhook-handler", + "infrastructure/database-clusters" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-19T09:26:59.727Z", + "updatedAt": "2026-05-01T23:41:07.751Z", + "version": 2, + "createdBy": "U05F9P78LTG", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + }, + { + "id": "7b6d8f24-86b1-44f7-8279-529084ec8cc3", + "type": "history", + "title": "Abandoning EventStoreDB for monorepo event handling", + "problem": "The team was experiencing excessive operational overhead and complexity managing EventStoreDB for event sourcing, which did not provide enough value regarding auditability at their current scale.", + "decision": "The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo.", + "rationale": "The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [], + "tags": [ + "event-sourcing", + "eventstoredb", + "monorepo", + "architecture", + "infrastructure" + ], + "alternatives": [ + { + "option": "Retaining EventStoreDB for event sourcing", + "reasonRejected": "The operational complexity was too high and not justified by the benefits gained." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776558094.932279", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-19T00:22:20.176Z", + "updatedAt": "2026-04-19T00:22:57.812Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "d17917ac-183e-43cf-b52a-894b44e7eb32", + "type": "convention", + "title": "Enforce RFC 7807 for Internal API Error Formats", + "problem": "Internal API routes are returning plain strings instead of adhering to the RFC 7807 error format, which breaks AI tools that parse our errors due to inconsistent formats.", + "decision": "All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes.", + "rationale": "Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "packages/api/src/routes/internal/" + ], + "tags": [ + "API", + "error handling", + "RFC 7807", + "internal API", + "convention" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776553405.605959", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": null, + "createdAt": "2026-04-18T23:03:47.745Z", + "updatedAt": "2026-04-18T23:05:16.624Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "465c1e1a-bb9e-4b3f-ade4-dc7674a5870a", + "type": "decision", + "title": "Use MongoDB for Analytics Events Pipeline", + "problem": "PostgreSQL's write throughput is insufficient for high-cardinality analytics event data, failing to meet new scale requirements.", + "decision": "We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data.", + "rationale": "MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [ + "packages/api/src/analytics/" + ], + "tags": [ + "mongodb", + "analytics", + "database", + "pipeline", + "backend", + "fusion:contradicts" + ], + "alternatives": [ + { + "option": "PostgreSQL", + "reasonRejected": "PostgreSQL's write throughput is 10x lower than MongoDB for high-cardinality event data, making it unsuitable for the new scale requirements of the analytics pipeline." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776548082.933139", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "packages/api/src/analytics/" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T21:34:59.620Z", + "updatedAt": "2026-04-18T21:35:58.658Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "3b06241d-4a87-4cf9-a36c-cd9c831b3a97", + "type": "decision", + "title": "LLM Provider Strategy by Pipeline Step and Effort Mode", + "problem": null, + "decision": "Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus.", + "rationale": "The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection.", + "severity": "HIGH", + "status": "active", + "confidence": 0.4, + "affectedFiles": [ + "packages/analyzer/" + ], + "tags": [ + "LLM", + "AI", + "provider strategy", + "architecture", + "environment variables" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.4, + "ref": { + "priority": "high", + "threadTs": "1776546311.192409", + "channelId": "C0ALJ121XLZ", + "channelName": null, + "triggeredBy": null, + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": null, + "createdAt": "2026-04-18T21:08:43.793Z", + "updatedAt": "2026-04-18T21:09:50.739Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "f2770524-250d-48f2-bf4c-ea54e9744ada", + "type": "decision", + "title": "Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing", + "problem": null, + "decision": "We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`.", + "rationale": "The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [ + "packages/decision-store/src/schema.ts" + ], + "tags": [ + "vector-database", + "embeddings", + "PostgreSQL", + "HNSW", + "OpenAI", + "text-embedding-3-small" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776545993.253939", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "packages/decision-store/src/schema.ts" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T21:00:58.211Z", + "updatedAt": "2026-04-18T21:01:44.258Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "e1408395-0737-4115-a49b-e616d5227fe5", + "type": "decision", + "title": "Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW", + "problem": "How to accurately measure semantic similarity of text embeddings for deduplication search using pgvector HNSW?", + "decision": "We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication.", + "rationale": "Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.714, + "affectedFiles": [], + "tags": [ + "pgvector", + "HNSW", + "embeddings", + "semantic search", + "cosine distance", + "L2 distance", + "deduplication" + ], + "alternatives": [ + { + "option": "L2 (Euclidean) distance", + "reasonRejected": "L2 distance penalizes vectors with different norms (magnitudes) even if they point in the same semantic direction, which is not suitable for accurately measuring semantic similarity of text embeddings." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776521423.382989", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T14:10:45.429Z", + "updatedAt": "2026-04-19T00:00:18.657Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "fdb94885-f909-42c2-993b-28951d6b8bf6", + "type": "decision", + "title": "Implement Redis Semantic Caching for LLM Embedding Calls", + "problem": "Redundant and inefficient LLM embedding calls were occurring.", + "decision": "Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour.", + "rationale": "Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.48, + "affectedFiles": [], + "tags": [ + "redis", + "caching", + "llm", + "embeddings", + "performance", + "optimization" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776520914.829239", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": null, + "createdAt": "2026-04-18T14:08:08.109Z", + "updatedAt": "2026-04-18T14:08:35.063Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "393329ba-9640-498c-89f1-46c1324aaeff", + "type": "constraint", + "title": "Prohibition of MongoDB in the Tech Stack for Analytics Events", + "problem": "Considering MongoDB for analytics events due to perceived better write throughput for time-series data and high cardinality event logs.", + "decision": "MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events.", + "rationale": "There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement.", + "severity": "CRITICAL", + "status": "active", + "confidence": 0.43, + "affectedFiles": [], + "tags": [ + "mongodb", + "database", + "analytics", + "acid compliance", + "constraint", + "architecture" + ], + "alternatives": [ + { + "option": "MongoDB for analytics events", + "reasonRejected": "It violates an active architectural constraint due to its lack of native ACID compliance, which is non-negotiable for billing and user data within our stack." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.43, + "ref": { + "priority": "high", + "threadTs": "1776519194.797029", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T13:33:59.601Z", + "updatedAt": "2026-04-18T13:34:22.478Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "71727a5f-aefc-4a9d-af63-247b7a964dc1", + "type": "plan", + "title": "Plan to Migrate Application Infrastructure from Railway to AWS ECS", + "problem": "The current hosting platform, Railway, becomes cost-prohibitive at scale (exceeding $500/month) and lacks the VPC isolation capabilities required for enterprise customers.", + "decision": "The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds.", + "rationale": "This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage.", + "severity": "HIGH", + "status": "active", + "confidence": 0.5, + "affectedFiles": [], + "tags": [ + "migration", + "infrastructure", + "aws", + "ecs", + "railway", + "cost", + "vpc" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776519051.201129", + "channelId": "C0ANM3QAQMN", + "channelName": "decispher-live-test", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T13:30:54.166Z", + "updatedAt": "2026-04-18T23:35:33.377Z", + "version": 2, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "670907d2-a11a-4cc9-8a02-67fb54fef3f7", + "type": "decision", + "title": "Defer Microservices Adoption, Maintain Monorepo Architecture", + "problem": "The team considered adopting a microservices architecture for the recorder and analyzer components but faced challenges.", + "decision": "To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members.", + "rationale": "An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size.", + "severity": "HIGH", + "status": "active", + "confidence": 0.48, + "affectedFiles": [], + "tags": [ + "architecture", + "microservices", + "monorepo", + "team-size", + "deployment" + ], + "alternatives": [ + { + "option": "Adopt a microservices architecture by splitting recorder and analyzer into separate gRPC services.", + "reasonRejected": "The previous attempt in Phase 1 led to brutal deployment complexity for a 3-person team, consuming 40% of their time debugging inter-service authentication and network failures." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776518787.443219", + "channelId": "C0ANM3QAQMN", + "channelName": "decispher-live-test", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T13:26:33.042Z", + "updatedAt": "2026-04-18T13:27:55.817Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "43a78d92-5543-412c-9be3-00596ee8ce65", + "type": "ownership", + "title": "Ownership of Billing Module", + "problem": null, + "decision": "The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review.", + "rationale": "This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance.", + "severity": "MEDIUM", + "status": "active", + "confidence": 0.38, + "affectedFiles": [ + "packages/api/src/billing/" + ], + "tags": [ + "billing", + "ownership", + "team" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.38, + "ref": { + "priority": "high", + "threadTs": "1776518686.983299", + "channelId": "C0ANM3QAQMN", + "channelName": "decispher-live-test", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [ + "packages/api/src/billing/" + ], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T13:24:50.857Z", + "updatedAt": "2026-04-18T13:25:43.024Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "3998d8e6-48a3-4161-860b-be094e4feb87", + "type": "constraint", + "title": "Standardize on PostgreSQL and Redis; Prohibit MongoDB", + "problem": "Ensure ACID compliance for critical billing and user data, and standardize data storage technologies to maintain data integrity and consistency.", + "decision": "MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes.", + "rationale": "ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements.", + "severity": "CRITICAL", + "status": "active", + "confidence": 0.74, + "affectedFiles": [], + "tags": [ + "database", + "postgresql", + "redis", + "mongodb", + "data-storage", + "acid" + ], + "alternatives": [ + { + "option": "MongoDB", + "reasonRejected": "MongoDB was rejected because it does not provide the necessary ACID compliance required for critical billing and user data, which is a non-negotiable architectural requirement for data integrity." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.5, + "ref": { + "priority": "high", + "threadTs": "1776518248.102319", + "channelId": "C0ANM3QAQMN", + "channelName": "decispher-live-test", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T13:17:48.935Z", + "updatedAt": "2026-04-18T23:32:02.232Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "b6869b8c-7d43-47d4-9cd6-dddb0f9f92b9", + "type": "decision", + "title": "Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes", + "problem": "We need to lock down exactly which model combination maps to which effort mode.", + "decision": "The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format.", + "rationale": "The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing.", + "severity": "HIGH", + "status": "active", + "confidence": 0.48, + "affectedFiles": [], + "tags": [ + "LLM", + "model-selection", + "multi-provider", + "cost-optimization", + "pricing-tiers", + "gemini-flash", + "claude-haiku", + "gpt-4o-mini", + "claude-sonnet", + "claude-opus" + ], + "alternatives": [], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776513910.490949", + "channelId": "C0ALJ121XLZ", + "channelName": "decispher-test-1", + "triggeredBy": "U05F9P78LTG", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": null, + "createdAt": "2026-04-18T12:06:37.563Z", + "updatedAt": "2026-04-18T12:07:56.083Z", + "version": 1, + "createdBy": "U05F9P78LTG", + "reviewedBy": "U05F9P78LTG" + }, + { + "id": "6acd0667-b5bc-4fd2-8811-36010ed7c2ac", + "type": "decision", + "title": "Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides", + "problem": "The current LLM provider strategy is unmaintainable, using different providers (Gemini-Flash, Claude-Sonnet, GPT-4o-mini) for different pipeline steps, leading to high costs (Claude-Sonnet is 60% of the bill) and inconsistent availability (Sonnet outages).", + "decision": "We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis.", + "rationale": "This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google).", + "severity": "HIGH", + "status": "active", + "confidence": 0.48, + "affectedFiles": [], + "tags": [ + "LLM", + "AI", + "architecture", + "multi-provider", + "cost-management", + "pipeline", + "vendor-strategy" + ], + "alternatives": [ + { + "option": "Continue with current fragmented multi-provider setup (Gemini-Flash for detection, Claude-Sonnet for extraction, GPT-4o-mini for formatting).", + "reasonRejected": "This approach is unmaintainable, costly (Claude-Sonnet accounts for 60% of the LLM bill), and suffers from inconsistent provider availability issues." + }, + { + "option": "Consolidate to a single LLM provider for all pipeline steps.", + "reasonRejected": "This would limit flexibility, potentially sacrificing accuracy for high-tier companies or forcing budget-conscious companies to pay for more expensive models than necessary. It would also lead to vendor lock-in and a single point of failure for LLM stability." + } + ], + "sources": [ + { + "type": "slack", + "confidence": 0.48, + "ref": { + "priority": "high", + "threadTs": "1776512831.575679", + "channelId": "C0ANM3QAQMN", + "channelName": "decispher-live-test", + "triggeredBy": "U0AUM92Q1C0", + "detectionMethod": "explicit" + } + } + ], + "supersededBy": null, + "enforcement": { + "blockOnViolation": false, + "requiresExplicitOverride": false, + "filePatterns": [], + "exemptPatterns": [] + }, + "createdAt": "2026-04-18T11:50:06.354Z", + "updatedAt": "2026-04-18T11:50:59.825Z", + "version": 1, + "createdBy": "U0AUM92Q1C0", + "reviewedBy": "df51fb32-47a2-48b2-ac32-3887a582a966" + } + ] +} diff --git a/.decispher/decisions.md b/.decispher/decisions.md new file mode 100644 index 0000000..0329aa9 --- /dev/null +++ b/.decispher/decisions.md @@ -0,0 +1,1191 @@ + +## Decision: Migrate email service to Zoho and update SMTP infrastructure + +**Status**: Active +**Date**: 2026-04-28 +**Severity**: Critical + +**Files**: +- `infrastructure/mail` +- `services/smtp` +- `config/email_routing` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "{infrastructure/mail/**,services/smtp/**,config/email_routing/**}", + "content_rules": [ + { + "mode": "regex", + "pattern": "(legacy_smtp|smtp_old|old_mail_server)", + "patterns": [ + "legacy_smtp", + "smtp_old" + ] + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** Need to replace the existing webmaster mailing service and transition away from the current SMTP server. + +**Decision:** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + +**Rationale:** The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. + +--- + + +## Decision: Migrate from Shipsy to in-house mapping event system + +**Status**: Active +**Date**: 2026-04-26 +**Severity**: Critical + +**Files**: +- `services/shipping-integration` +- `infrastructure/event-bus` + +### Context + +**Problem:** Dependency on third-party provider Shipsy is causing scalability issues. + +**Decision:** Switch from the third-party Shipsy provider to an in-house developed mapping event system. + +**Rationale:** The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. + +**Alternatives Considered:** +- **Continue using Shipsy**: It acts as a bottleneck for system scalability. + +--- + + +## Decision: Abandon RFC 7807 for error responses + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Files**: +- `api/responses` +- `api/error-handling` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "api/responses/**/*", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "application/problem+json", + "type", + "title", + "status", + "detail", + "instance" + ] + } + ] + }, + { + "type": "file", + "pattern": "api/error-handling/**/*", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "application/problem+json", + "type", + "title", + "status", + "detail", + "instance" + ] + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The current API error handling standard (RFC 7807) is considered outdated for the team's needs. + +**Decision:** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + +**Rationale:** The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. + +--- + + +## Decision: Standardize on HNSW for new vector indexes + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Warning + +**Files**: +- `db/schema/vector_indexes` +- `db/migrations/sprint_16/migrate_llm_cache_to_hnsw` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "db/schema/vector_indexes", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "USING\\s+(IVFFLAT|IVFFLAT\\s+)" + } + ] + }, + { + "type": "file", + "pattern": "db/migrations/sprint_16/migrate_llm_cache_to_hnsw", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "CREATE\\s+INDEX.*USING\\s+IVFFLAT" + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** Uncertainty regarding whether the rejection of IVFFlat to HNSW migration applied to the index technology choice or the migration process itself. + +**Decision:** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + +**Rationale:** HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. + +**Alternatives Considered:** +- **IVFFlat**: The team has standardized on HNSW for new indexes to maintain architectural consistency, despite potential performance profiles for specific query patterns. + +--- + + +## Decision: Establish authoritative RFC 7807 error format convention + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Warning + +**Files**: +- `packages/api/src/plugins/error-handler.ts` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "packages/api/src/plugins/error-handler.ts", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(?s)^(?!.*(type|title|status|detail|instance)).*$" + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Duplicate and conflicting conventions regarding RFC 7807 error format severity were documented in the Decispher system. + +**Decision:** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + +**Rationale:** The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. + +**Alternatives Considered:** +- **MEDIUM severity specification**: The HIGH severity version was explicitly selected as the authoritative and canonical standard. + +--- + + +## Decision: Establish ownership and modification constraints for credits and billing system + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Files**: +- `packages/api/src/routes/credits.ts` +- `packages/decision-store/src/repositories/credit-repository.ts` +- `packages/common/src/types/credits.ts` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "packages/{api/src/routes/credits.ts,decision-store/src/repositories/credit-repository.ts,common/src/types/credits.ts}", + "content_rules": [ + { + "mode": "regex", + "start": 1, + "pattern": "(credit_ledger|DrizzleCreditRepository|EFFORT_MODE_CONFIGS)" + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Uncertainty regarding ownership of the billing module and the requirements for implementing new effort modes. + +**Decision:** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + +**Rationale:** To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. + +--- + + +## Decision: Define Model Fallback Ordering Strategy for API Rate Limits + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Files**: +- `src/llm/client_factory.py` +- `src/llm/fallback_logic.py` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "src/llm/{client_factory,fallback_logic}.py", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(?i)fallback" + } + ], + "content_match_mode": "all" + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Handling 429 rate limit errors from LLM providers during extraction and detection tasks. + +**Decision:** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + +**Rationale:** To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. + +--- + + +## Decision: Migrate infrastructure orchestration from AWS ECS to AWS EKS + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Files**: +- `infrastructure/terraform` +- `infrastructure/k8s` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "infrastructure/terraform/**/*", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "aws_ecs_cluster", + "aws_ecs_service", + "aws_ecs_task_definition" + ] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** AWS ECS lacks built-in support for multi-region failover without complex custom routing and requires additional tooling for advanced scaling capabilities. + +**Decision:** The team will migrate from AWS ECS to AWS EKS for container orchestration. + +**Rationale:** EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. + +**Alternatives Considered:** +- **AWS ECS**: Lacks sufficient multi-region failover support and requires custom routing implementations. +- **Railway**: Retained only as a temporary fallback, deemed insufficient for long-term production orchestration. + +--- + + +## Decision: Use cosine distance for pgvector similarity searches + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Warning + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*.sql", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "<->" + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Determine the optimal distance metric for embedding similarity search in pgvector to maximize recall. + +**Decision:** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + +**Rationale:** Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. + +**Alternatives Considered:** +- **L2 distance**: It is sensitive to embedding magnitude and demonstrated poorer recall compared to cosine distance for our data. + +--- + + +## Decision: Use MongoDB Atlas for analytics event ingestion + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Warning + +**Files**: +- `analytics/storage` +- `infrastructure/database-policy` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "{analytics/storage/**,infrastructure/database-policy/**}", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "mongodb|mongoclient|mongodb-driver" + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The team needs a high-throughput storage solution for analytics event ingestion, but is restricted to PostgreSQL and Redis for general data storage. + +**Decision:** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + +**Rationale:** To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. + +**Alternatives Considered:** +- **PostgreSQL partitioned tables**: The team expressed concern that it may struggle with the required write throughput of 50k events per second. + +--- + + +## Decision: Prohibit MongoDB and mandate PostgreSQL for core pipelines + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Files**: +- `infrastructure/database` +- `src/db/config.ts` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "{infrastructure/database/**,src/db/config.ts}", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "mongodb", + "mongoose", + "mongo" + ] + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Need to define and enforce the database technology stack to ensure system consistency and data integrity. + +**Decision:** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + +**Rationale:** Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. + +**Alternatives Considered:** +- **MongoDB**: Prohibited to maintain stack consistency and data integrity requirements. + +--- + + +## Decision: Standardize on PostgreSQL with pgvector for primary storage and vector search + +**Status**: Active +**Date**: 2026-04-22 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "regex", + "start": 1, + "pattern": "(?i)(mongodb|cockroachdb|elasticsearch|pinecone)", + "patterns": [ + "mongodb", + "cockroachdb", + "elasticsearch", + "pinecone" + ] + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Selecting the primary datastore to handle both standard relational data and vector search requirements efficiently. + +**Decision:** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + +**Rationale:** PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. + +**Alternatives Considered:** +- **MongoDB**: The team preferred the relational capabilities of PostgreSQL and the unified support for vector search provided by pgvector. +- **CockroachDB**: The team decided that PostgreSQL with pgvector was sufficient and preferred over the complexity or features offered by CockroachDB. + +--- + + +## Decision: Use MongoDB Atlas for schemaless analytics webhook storage + +**Status**: Active +**Date**: 2026-04-19 +**Severity**: Critical + +**Files**: +- `services/analytics-webhook-handler` +- `infrastructure/database-clusters` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "services/analytics-webhook-handler/**", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "INSERT INTO", + "UPDATE ", + "pg_query" + ] + } + ] + }, + { + "type": "file", + "pattern": "infrastructure/database-clusters/**", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "postgresql" + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Decision:** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + +**Rationale:** MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. + +**Alternatives Considered:** +- **PostgreSQL JSONB**: Proved too difficult and inefficient to index effectively for schemaless event data. + +--- + + +## Decision: Abandoning EventStoreDB for monorepo event handling + +**Status**: Active +**Date**: 2026-04-19 +**Severity**: Warning + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "EventStoreDB", + "EventStore", + "event-sourcing" + ] + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The team was experiencing excessive operational overhead and complexity managing EventStoreDB for event sourcing, which did not provide enough value regarding auditability at their current scale. + +**Decision:** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + +**Rationale:** The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. + +**Alternatives Considered:** +- **Retaining EventStoreDB for event sourcing**: The operational complexity was too high and not justified by the benefits gained. + +--- + + +## Decision: Enforce RFC 7807 for Internal API Error Formats + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Warning + +**Files**: +- `packages/api/src/routes/internal/` + +### Context + +**Problem:** Internal API routes are returning plain strings instead of adhering to the RFC 7807 error format, which breaks AI tools that parse our errors due to inconsistent formats. + +**Decision:** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + +**Rationale:** Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. + +--- + + +## Decision: Use MongoDB for Analytics Events Pipeline + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Files**: +- `packages/api/src/analytics/` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "exclude": [ + "packages/api/src/analytics/**/*.test.{ts,js,go,py}", + "packages/api/src/analytics/migrations/**/*" + ], + "pattern": "packages/api/src/analytics/**/*.{ts,js,go,py}", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "pg", + "postgres", + "postgresql", + "new Client(", + "createPool(", + "sequelize", + "typeorm", + "knex" + ] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** PostgreSQL's write throughput is insufficient for high-cardinality analytics event data, failing to meet new scale requirements. + +**Decision:** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + +**Rationale:** MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. + +**Alternatives Considered:** +- **PostgreSQL**: PostgreSQL's write throughput is 10x lower than MongoDB for high-cardinality event data, making it unsuitable for the new scale requirements of the analytics pipeline. + +--- + + +## Decision: LLM Provider Strategy by Pipeline Step and Effort Mode + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Files**: +- `packages/analyzer/` + +### Context + +**Decision:** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + +**Rationale:** The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. + +--- + + +## Decision: Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Warning + +**Files**: +- `packages/decision-store/src/schema.ts` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "packages/decision-store/src/schema.ts", + "content_rules": [ + { + "mode": "string", + "pattern": "text-embedding-3-small" + }, + { + "mode": "string", + "pattern": "1536" + }, + { + "mode": "string", + "pattern": "knowledge_chunks" + }, + { + "mode": "string", + "pattern": "ef_construction=200" + }, + { + "mode": "string", + "pattern": "m=16" + } + ], + "content_match_mode": "all" + } + ], + "match_mode": "all" +} +``` + +### Context + +**Decision:** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + +**Rationale:** The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. + +--- + + +## Decision: We use PostgreSQL with pgvector for all data storage + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(?i)(mongodb|dynamodb|mongo client|aws dynamodb|cosmosdb)", + "patterns": [] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Decision:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +**Rationale:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +--- + + +## Decision: Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Warning + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "regex", + "pattern": "(?i)pgvector|HNSW" + }, + { + "mode": "regex", + "pattern": "(?i)L2_DISTANCE|EUCLIDEAN_DISTANCE|l2_distance|euclidean_distance|distance_type\\s*=\\s*['\\\"]l2['\\\"]|metric\\s*=\\s*['\\\"]l2['\\\"]" + } + ], + "content_match_mode": "all" + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** How to accurately measure semantic similarity of text embeddings for deduplication search using pgvector HNSW? + +**Decision:** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + +**Rationale:** Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. + +**Alternatives Considered:** +- **L2 (Euclidean) distance**: L2 distance penalizes vectors with different norms (magnitudes) even if they point in the same semantic direction, which is not suitable for accurately measuring semantic similarity of text embeddings. + +--- + + +## Decision: Implement Redis Semantic Caching for LLM Embedding Calls + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Warning + +**Files**: +- `**/*` + +### Context + +**Problem:** Redundant and inefficient LLM embedding calls were occurring. + +**Decision:** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + +**Rationale:** Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + +--- + + +## Decision: Prohibition of MongoDB in the Tech Stack for Analytics Events + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(?i)mongo(?:db)?", + "patterns": [] + } + ] + } + ], + "match_mode": "all" +} +``` + +### Context + +**Problem:** Considering MongoDB for analytics events due to perceived better write throughput for time-series data and high cardinality event logs. + +**Decision:** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + +**Rationale:** There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. + +**Alternatives Considered:** +- **MongoDB for analytics events**: It violates an active architectural constraint due to its lack of native ACID compliance, which is non-negotiable for billing and user data within our stack. + +--- + + +## Decision: Plan to Migrate Application Infrastructure from Railway to AWS ECS + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(?i)(migrate|aws|ecs|railway)" + } + ] + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The current hosting platform, Railway, becomes cost-prohibitive at scale (exceeding $500/month) and lacks the VPC isolation capabilities required for enterprise customers. + +**Decision:** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + +**Rationale:** This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + +--- + + +## Decision: Defer Microservices Adoption, Maintain Monorepo Architecture + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "**/*.{proto,go,ts,js,py,java,cs,yaml,yml,json,Dockerfile}", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(^|\\W)(new|separate|standalone)\\s+(grpc|microservice|distributed)\\s+(service|api|component|deployment)($|\\W)", + "patterns": [] + }, + { + "mode": "regex", + "start": 0, + "pattern": "(^|\\W)(recorder|analyzer)\\s+service\\s+(definition|interface|deployment)($|\\W)", + "patterns": [] + }, + { + "mode": "string", + "patterns": [ + "GrpcServiceBuilder", + "MicroserviceClient", + "ServiceDiscoveryRegistration", + "ApiGatewayConfiguration" + ] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The team considered adopting a microservices architecture for the recorder and analyzer components but faced challenges. + +**Decision:** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + +**Rationale:** An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. + +**Alternatives Considered:** +- **Adopt a microservices architecture by splitting recorder and analyzer into separate gRPC services.**: The previous attempt in Phase 1 led to brutal deployment complexity for a 3-person team, consuming 40% of their time debugging inter-service authentication and network failures. + +--- + + +## Decision: Ownership of Billing Module + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Warning + +**Files**: +- `packages/api/src/billing/` + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "pattern": "packages/api/src/billing/**", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "stripe", + "credit", + "ledger", + "invoice", + "payment", + "billing" + ] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Decision:** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + +**Rationale:** This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. + +--- + + +## Decision: Standardize on PostgreSQL and Redis; Prohibit MongoDB + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "exclude": [ + "**/node_modules/**", + "**/vendor/**", + "**/*.lock", + "**/*.md" + ], + "pattern": "**/*", + "content_rules": [ + { + "mode": "string", + "patterns": [ + "mongodb", + "mongo", + "mongoose" + ] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** Ensure ACID compliance for critical billing and user data, and standardize data storage technologies to maintain data integrity and consistency. + +**Decision:** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + +**Rationale:** ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. + +**Alternatives Considered:** +- **MongoDB**: MongoDB was rejected because it does not provide the necessary ACID compliance required for critical billing and user data, which is a non-negotiable architectural requirement for data integrity. + +--- + + +## Decision: Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Files**: +- `**/*` + +### Context + +**Problem:** We need to lock down exactly which model combination maps to which effort mode. + +**Decision:** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + +**Rationale:** The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + +--- + + +## Decision: Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides + +**Status**: Active +**Date**: 2026-04-18 +**Severity**: Critical + +**Rules**: +```json +{ + "conditions": [ + { + "type": "file", + "exclude": [ + "**/*test*", + "**/*doc*" + ], + "pattern": "**/*.(ts|js|py|go|java|cs|yml|yaml|env|ini|properties|json)", + "content_rules": [ + { + "mode": "regex", + "start": 0, + "pattern": "(new|import|from)[^;\\n]*?(Anthropic|OpenAI|GoogleCloud|AzureOpenAI|Claude|Gemini|GPT-4o)", + "patterns": [] + }, + { + "mode": "regex", + "start": 0, + "pattern": "(LLM_PROVIDER|LLM_MODEL)[^=\\n]*?=(?!.*(config|abstraction|env))[^\\n]*(Claude-Sonnet|Gemini-Flash|GPT-4o-mini)", + "patterns": [] + } + ], + "content_match_mode": "any" + } + ], + "match_mode": "any" +} +``` + +### Context + +**Problem:** The current LLM provider strategy is unmaintainable, using different providers (Gemini-Flash, Claude-Sonnet, GPT-4o-mini) for different pipeline steps, leading to high costs (Claude-Sonnet is 60% of the bill) and inconsistent availability (Sonnet outages). + +**Decision:** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + +**Rationale:** This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). + +**Alternatives Considered:** +- **Continue with current fragmented multi-provider setup (Gemini-Flash for detection, Claude-Sonnet for extraction, GPT-4o-mini for formatting).**: This approach is unmaintainable, costly (Claude-Sonnet accounts for 60% of the LLM bill), and suffers from inconsistent provider availability issues. +- **Consolidate to a single LLM provider for all pipeline steps.**: This would limit flexibility, potentially sacrificing accuracy for high-tier companies or forcing budget-conscious companies to pay for more expensive models than necessary. It would also lead to vendor lock-in and a single point of failure for LLM stability. diff --git a/.devin/rules.md b/.devin/rules.md new file mode 100644 index 0000000..891c1c7 --- /dev/null +++ b/.devin/rules.md @@ -0,0 +1,272 @@ +# Decispher: Engineering Rules for Devin + + +## Overview + +This repository enforces the following engineering decisions. +Devin MUST follow all rules below. Do not deviate without explicit instruction. + +## Rules + +### Prohibit MongoDB and mandate PostgreSQL for core pipelines + +- **Severity:** CRITICAL +- **Rule:** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. +- **Rationale:** Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. +- **Scope:** infrastructure/database, src/db/config.ts +- **Tags:** postgresql, database, mongodb, infrastructure, backend + +**Alternatives considered (rejected):** +- ~~MongoDB~~ — Prohibited to maintain stack consistency and data integrity requirements. + +### We use PostgreSQL with pgvector for all data storage + +- **Severity:** CRITICAL +- **Rule:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. +- **Rationale:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. +- **Tags:** postgresql, pgvector, hnsw, database + +### Prohibition of MongoDB in the Tech Stack for Analytics Events + +- **Severity:** CRITICAL +- **Rule:** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. +- **Rationale:** There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. +- **Tags:** mongodb, database, analytics, acid compliance, constraint, architecture + +**Alternatives considered (rejected):** +- ~~MongoDB for analytics events~~ — It violates an active architectural constraint due to its lack of native ACID compliance, which is non-negotiable for billing and user data within our stack. + +### Standardize on PostgreSQL and Redis; Prohibit MongoDB + +- **Severity:** CRITICAL +- **Rule:** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. +- **Rationale:** ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. +- **Tags:** database, postgresql, redis, mongodb, data-storage, acid + +**Alternatives considered (rejected):** +- ~~MongoDB~~ — MongoDB was rejected because it does not provide the necessary ACID compliance required for critical billing and user data, which is a non-negotiable architectural requirement for data integrity. + +### Migrate email service to Zoho and update SMTP infrastructure + +- **Severity:** HIGH +- **Rule:** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. +- **Rationale:** The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. +- **Scope:** infrastructure/mail, services/smtp, config/email_routing +- **Tags:** email, infrastructure, zoho, smtp, migration + +### Migrate from Shipsy to in-house mapping event system + +- **Severity:** HIGH +- **Rule:** Switch from the third-party Shipsy provider to an in-house developed mapping event system. +- **Rationale:** The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. +- **Scope:** services/shipping-integration, infrastructure/event-bus +- **Tags:** infrastructure, scalability, backend, migration + +**Alternatives considered (rejected):** +- ~~Continue using Shipsy~~ — It acts as a bottleneck for system scalability. + +### Abandon RFC 7807 for error responses + +- **Severity:** HIGH +- **Rule:** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. +- **Rationale:** The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. +- **Scope:** api/responses, api/error-handling +- **Tags:** api, rfc7807, standards, backend + +### Establish ownership and modification constraints for credits and billing system + +- **Severity:** HIGH +- **Rule:** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. +- **Rationale:** To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. +- **Scope:** packages/api/src/routes/credits.ts, packages/decision-store/src/repositories/credit-repository.ts, packages/common/src/types/credits.ts +- **Tags:** billing, ownership, credits, compliance + +### Define Model Fallback Ordering Strategy for API Rate Limits + +- **Severity:** HIGH +- **Rule:** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. +- **Rationale:** To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. +- **Scope:** src/llm/client_factory.py, src/llm/fallback_logic.py +- **Tags:** llm, reliability, rate-limiting, architecture + +### Migrate infrastructure orchestration from AWS ECS to AWS EKS + +- **Severity:** HIGH +- **Rule:** The team will migrate from AWS ECS to AWS EKS for container orchestration. +- **Rationale:** EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. +- **Scope:** infrastructure/terraform, infrastructure/k8s +- **Tags:** aws, eks, ecs, kubernetes, infrastructure, migration + +**Alternatives considered (rejected):** +- ~~AWS ECS~~ — Lacks sufficient multi-region failover support and requires custom routing implementations. +- ~~Railway~~ — Retained only as a temporary fallback, deemed insufficient for long-term production orchestration. + +### Standardize on PostgreSQL with pgvector for primary storage and vector search + +- **Severity:** HIGH +- **Rule:** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. +- **Rationale:** PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. +- **Tags:** postgresql, pgvector, database, infrastructure, vector-search + +**Alternatives considered (rejected):** +- ~~MongoDB~~ — The team preferred the relational capabilities of PostgreSQL and the unified support for vector search provided by pgvector. +- ~~CockroachDB~~ — The team decided that PostgreSQL with pgvector was sufficient and preferred over the complexity or features offered by CockroachDB. + +### Use MongoDB Atlas for schemaless analytics webhook storage + +- **Severity:** HIGH +- **Rule:** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. +- **Rationale:** MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. +- **Scope:** services/analytics-webhook-handler, infrastructure/database-clusters +- **Tags:** mongodb, postgresql, analytics, database, infrastructure, fusion:contradicts + +**Alternatives considered (rejected):** +- ~~PostgreSQL JSONB~~ — Proved too difficult and inefficient to index effectively for schemaless event data. + +### Use MongoDB for Analytics Events Pipeline + +- **Severity:** HIGH +- **Rule:** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. +- **Rationale:** MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. +- **Scope:** packages/api/src/analytics/ +- **Tags:** mongodb, analytics, database, pipeline, backend, fusion:contradicts + +**Alternatives considered (rejected):** +- ~~PostgreSQL~~ — PostgreSQL's write throughput is 10x lower than MongoDB for high-cardinality event data, making it unsuitable for the new scale requirements of the analytics pipeline. + +### LLM Provider Strategy by Pipeline Step and Effort Mode + +- **Severity:** HIGH +- **Rule:** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. +- **Rationale:** The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. +- **Scope:** packages/analyzer/ +- **Tags:** LLM, AI, provider strategy, architecture, environment variables + +### Plan to Migrate Application Infrastructure from Railway to AWS ECS + +- **Severity:** HIGH +- **Rule:** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. +- **Rationale:** This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. +- **Tags:** migration, infrastructure, aws, ecs, railway, cost, vpc + +### Defer Microservices Adoption, Maintain Monorepo Architecture + +- **Severity:** HIGH +- **Rule:** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. +- **Rationale:** An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. +- **Tags:** architecture, microservices, monorepo, team-size, deployment + +**Alternatives considered (rejected):** +- ~~Adopt a microservices architecture by splitting recorder and analyzer into separate gRPC services.~~ — The previous attempt in Phase 1 led to brutal deployment complexity for a 3-person team, consuming 40% of their time debugging inter-service authentication and network failures. + +### Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes + +- **Severity:** HIGH +- **Rule:** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. +- **Rationale:** The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. +- **Tags:** LLM, model-selection, multi-provider, cost-optimization, pricing-tiers, gemini-flash, claude-haiku, gpt-4o-mini, claude-sonnet, claude-opus + +### Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides + +- **Severity:** HIGH +- **Rule:** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. +- **Rationale:** This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). +- **Tags:** LLM, AI, architecture, multi-provider, cost-management, pipeline, vendor-strategy + +**Alternatives considered (rejected):** +- ~~Continue with current fragmented multi-provider setup (Gemini-Flash for detection, Claude-Sonnet for extraction, GPT-4o-mini for formatting).~~ — This approach is unmaintainable, costly (Claude-Sonnet accounts for 60% of the LLM bill), and suffers from inconsistent provider availability issues. +- ~~Consolidate to a single LLM provider for all pipeline steps.~~ — This would limit flexibility, potentially sacrificing accuracy for high-tier companies or forcing budget-conscious companies to pay for more expensive models than necessary. It would also lead to vendor lock-in and a single point of failure for LLM stability. + +### Standardize on HNSW for new vector indexes + +- **Severity:** MEDIUM +- **Rule:** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. +- **Rationale:** HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. +- **Scope:** db/schema/vector_indexes, db/migrations/sprint_16/migrate_llm_cache_to_hnsw +- **Tags:** vector-search, postgresql, hnsw, architecture, database + +**Alternatives considered (rejected):** +- ~~IVFFlat~~ — The team has standardized on HNSW for new indexes to maintain architectural consistency, despite potential performance profiles for specific query patterns. + +### Establish authoritative RFC 7807 error format convention + +- **Severity:** MEDIUM +- **Rule:** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. +- **Rationale:** The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. +- **Scope:** packages/api/src/plugins/error-handler.ts +- **Tags:** rfc-7807, api-design, error-handling, decispher + +**Alternatives considered (rejected):** +- ~~MEDIUM severity specification~~ — The HIGH severity version was explicitly selected as the authoritative and canonical standard. + +### Use cosine distance for pgvector similarity searches + +- **Severity:** MEDIUM +- **Rule:** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. +- **Rationale:** Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. +- **Tags:** pgvector, postgresql, embeddings, vector-search + +**Alternatives considered (rejected):** +- ~~L2 distance~~ — It is sensitive to embedding magnitude and demonstrated poorer recall compared to cosine distance for our data. + +### Use MongoDB Atlas for analytics event ingestion + +- **Severity:** MEDIUM +- **Rule:** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. +- **Rationale:** To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. +- **Scope:** analytics/storage, infrastructure/database-policy +- **Tags:** mongodb, postgresql, analytics, infrastructure + +**Alternatives considered (rejected):** +- ~~PostgreSQL partitioned tables~~ — The team expressed concern that it may struggle with the required write throughput of 50k events per second. + +### Abandoning EventStoreDB for monorepo event handling + +- **Severity:** MEDIUM +- **Rule:** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. +- **Rationale:** The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. +- **Tags:** event-sourcing, eventstoredb, monorepo, architecture, infrastructure + +**Alternatives considered (rejected):** +- ~~Retaining EventStoreDB for event sourcing~~ — The operational complexity was too high and not justified by the benefits gained. + +### Enforce RFC 7807 for Internal API Error Formats + +- **Severity:** MEDIUM +- **Rule:** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. +- **Rationale:** Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. +- **Scope:** packages/api/src/routes/internal/ +- **Tags:** API, error handling, RFC 7807, internal API, convention + +### Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing + +- **Severity:** MEDIUM +- **Rule:** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. +- **Rationale:** The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. +- **Scope:** packages/decision-store/src/schema.ts +- **Tags:** vector-database, embeddings, PostgreSQL, HNSW, OpenAI, text-embedding-3-small + +### Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW + +- **Severity:** MEDIUM +- **Rule:** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. +- **Rationale:** Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. +- **Tags:** pgvector, HNSW, embeddings, semantic search, cosine distance, L2 distance, deduplication + +**Alternatives considered (rejected):** +- ~~L2 (Euclidean) distance~~ — L2 distance penalizes vectors with different norms (magnitudes) even if they point in the same semantic direction, which is not suitable for accurately measuring semantic similarity of text embeddings. + +### Implement Redis Semantic Caching for LLM Embedding Calls + +- **Severity:** MEDIUM +- **Rule:** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. +- **Rationale:** Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. +- **Tags:** redis, caching, llm, embeddings, performance, optimization + +### Ownership of Billing Module + +- **Severity:** MEDIUM +- **Rule:** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. +- **Rationale:** This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. +- **Scope:** packages/api/src/billing/ +- **Tags:** billing, ownership, team diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..fd835ae --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,33 @@ +# Copilot Custom Instructions +# Auto-generated by Decispher — Do not edit manually + +When generating code for this project, follow these rules: + +1. Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. +2. Switch from the third-party Shipsy provider to an in-house developed mapping event system. +3. We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. +4. All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. +5. Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. +6. Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. +7. Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. +8. The team will migrate from AWS ECS to AWS EKS for container orchestration. +9. We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. +10. MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. +11. The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. +12. Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. +13. Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. +14. The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. +15. All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. +16. We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. +17. Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. +18. We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. +19. After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. +20. We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. +21. Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. +22. MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. +23. The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. +24. To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. +25. The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. +26. MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. +27. The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. +28. We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. diff --git a/.roo/rules.md b/.roo/rules.md new file mode 100644 index 0000000..9ac3450 --- /dev/null +++ b/.roo/rules.md @@ -0,0 +1,100 @@ +# Decispher: Roo Code Rules + + +## ⚠️ Critical Rules — Do Not Violate + +- **Migrate email service to Zoho and update SMTP infrastructure:** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + *(The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure.)* + Files: infrastructure/mail, services/smtp, config/email_routing + +- **Migrate from Shipsy to in-house mapping event system:** Switch from the third-party Shipsy provider to an in-house developed mapping event system. + *(The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies.)* + Files: services/shipping-integration, infrastructure/event-bus + +- **Abandon RFC 7807 for error responses:** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + *(The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture.)* + Files: api/responses, api/error-handling + +- **Establish ownership and modification constraints for credits and billing system:** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + *(To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized.)* + Files: packages/api/src/routes/credits.ts, packages/decision-store/src/repositories/credit-repository.ts, packages/common/src/types/credits.ts + +- **Define Model Fallback Ordering Strategy for API Rate Limits:** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + *(To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries.)* + Files: src/llm/client_factory.py, src/llm/fallback_logic.py + +- **Migrate infrastructure orchestration from AWS ECS to AWS EKS:** The team will migrate from AWS ECS to AWS EKS for container orchestration. + *(EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes.)* + Files: infrastructure/terraform, infrastructure/k8s + +- **Prohibit MongoDB and mandate PostgreSQL for core pipelines:** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + *(Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector.)* + Files: infrastructure/database, src/db/config.ts + +- **Standardize on PostgreSQL with pgvector for primary storage and vector search:** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + *(PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases.)* + +- **Use MongoDB Atlas for schemaless analytics webhook storage:** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + *(MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load.)* + Files: services/analytics-webhook-handler, infrastructure/database-clusters + +- **Use MongoDB for Analytics Events Pipeline:** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + *(MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged.)* + Files: packages/api/src/analytics/ + +- **LLM Provider Strategy by Pipeline Step and Effort Mode:** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + *(The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection.)* + Files: packages/analyzer/ + +- **We use PostgreSQL with pgvector for all data storage:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + *(After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. )* + +- **Prohibition of MongoDB in the Tech Stack for Analytics Events:** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + *(There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement.)* + +- **Plan to Migrate Application Infrastructure from Railway to AWS ECS:** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + *(This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage.)* + +- **Defer Microservices Adoption, Maintain Monorepo Architecture:** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + *(An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size.)* + +- **Standardize on PostgreSQL and Redis; Prohibit MongoDB:** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + *(ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements.)* + +- **Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes:** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + *(The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing.)* + +- **Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides:** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + *(This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google).)* + +## General Conventions + +- **Standardize on HNSW for new vector indexes:** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + *(HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW.)* + +- **Establish authoritative RFC 7807 error format convention:** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + *(The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations.)* + +- **Use cosine distance for pgvector similarity searches:** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + *(Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case.)* + +- **Use MongoDB Atlas for analytics event ingestion:** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + *(To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint.)* + +- **Abandoning EventStoreDB for monorepo event handling:** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + *(The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale.)* + +- **Enforce RFC 7807 for Internal API Error Formats:** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + *(Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows.)* + +- **Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing:** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + *(The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings.)* + +- **Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW:** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + *(Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction.)* + +- **Implement Redis Semantic Caching for LLM Embedding Calls:** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + *(Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests.)* + +- **Ownership of Billing Module:** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + *(This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance.)* diff --git a/.windsurfrules b/.windsurfrules new file mode 100644 index 0000000..74e4125 --- /dev/null +++ b/.windsurfrules @@ -0,0 +1,112 @@ +# Decispher: Active Engineering Rules +# Auto-generated — do not edit manually +# Source: Decispher decision store + +The following rules are active engineering decisions for this codebase. +Cascade MUST respect these rules when generating or modifying code. + +## [CRITICAL] + +- The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. + Reason: Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. + Files: infrastructure/database, src/db/config.ts + +- After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + Reason: After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +- MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. + Reason: There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. + +- MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. + Reason: ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. + +## [HIGH] + +- Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. + Reason: The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. + Files: infrastructure/mail, services/smtp, config/email_routing + +- Switch from the third-party Shipsy provider to an in-house developed mapping event system. + Reason: The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. + Files: services/shipping-integration, infrastructure/event-bus + +- We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. + Reason: The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. + Files: api/responses, api/error-handling + +- Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. + Reason: To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. + Files: packages/api/src/routes/credits.ts, packages/decision-store/src/repositories/credit-repository.ts, packages/common/src/types/credits.ts + +- Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. + Reason: To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. + Files: src/llm/client_factory.py, src/llm/fallback_logic.py + +- The team will migrate from AWS ECS to AWS EKS for container orchestration. + Reason: EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. + Files: infrastructure/terraform, infrastructure/k8s + +- Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. + Reason: PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. + +- Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. + Reason: MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. + Files: services/analytics-webhook-handler, infrastructure/database-clusters + +- We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. + Reason: MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. + Files: packages/api/src/analytics/ + +- Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. + Reason: The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. + Files: packages/analyzer/ + +- The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. + Reason: This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + +- To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. + Reason: An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. + +- The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. + Reason: The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + +- We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. + Reason: This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). + +## [MEDIUM] + +- All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. + Reason: HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. + Files: db/schema/vector_indexes, db/migrations/sprint_16/migrate_llm_cache_to_hnsw + +- Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. + Reason: The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. + Files: packages/api/src/plugins/error-handler.ts + +- We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. + Reason: Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. + +- MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. + Reason: To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. + Files: analytics/storage, infrastructure/database-policy + +- The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. + Reason: The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. + +- All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. + Reason: Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. + Files: packages/api/src/routes/internal/ + +- We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. + Reason: The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. + Files: packages/decision-store/src/schema.ts + +- We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. + Reason: Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. + +- Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. + Reason: Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + +- The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. + Reason: This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. + Files: packages/api/src/billing/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..122e737 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,50 @@ +# AGENTS.md +# Auto-generated by Decispher — Do not edit manually + +## Instructions for AI Agents + +The following decisions are active engineering constraints. +Any code change that violates these decisions MUST be flagged. + +### CRITICAL + +| Decision | Rationale | Files | +|----------|-----------|-------| +| The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. | Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. | infrastructure/database, src/db/config.ts | +| After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. | After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. | * | +| MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. | There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. | * | +| MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. | ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. | * | + +### HIGH + +| Decision | Rationale | Files | +|----------|-----------|-------| +| Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. | The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. | infrastructure/mail, services/smtp, config/email_routing | +| Switch from the third-party Shipsy provider to an in-house developed mapping event system. | The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. | services/shipping-integration, infrastructure/event-bus | +| We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. | The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. | api/responses, api/error-handling | +| Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. | To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. | packages/api/src/routes/credits.ts, packages/decision-store/src/repositories/credit-repository.ts, packages/common/src/types/credits.ts | +| Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. | To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. | src/llm/client_factory.py, src/llm/fallback_logic.py | +| The team will migrate from AWS ECS to AWS EKS for container orchestration. | EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. | infrastructure/terraform, infrastructure/k8s | +| Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. | PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. | * | +| Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. | MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. | services/analytics-webhook-handler, infrastructure/database-clusters | +| We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. | MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. | packages/api/src/analytics/ | +| Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. | The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. | packages/analyzer/ | +| The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. | This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. | * | +| To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. | An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. | * | +| The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. | The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. | * | +| We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. | This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). | * | + +### MEDIUM + +| Decision | Rationale | Files | +|----------|-----------|-------| +| All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. | HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. | db/schema/vector_indexes, db/migrations/sprint_16/migrate_llm_cache_to_hnsw | +| Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. | The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. | packages/api/src/plugins/error-handler.ts | +| We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. | Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. | * | +| MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. | To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. | analytics/storage, infrastructure/database-policy | +| The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. | The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. | * | +| All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. | Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. | packages/api/src/routes/internal/ | +| We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. | The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. | packages/decision-store/src/schema.ts | +| We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. | Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. | * | +| Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. | Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. | * | +| The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. | This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. | packages/api/src/billing/ | diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..f03c2d2 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,156 @@ +# CLAUDE.md +# Auto-generated by Decispher — Do not edit manually + +## Project Context + +This project follows specific engineering decisions captured by the team. +Violating these decisions requires explicit approval. + +## Active Decisions + +### Migrate email service to Zoho and update SMTP infrastructure (HIGH) +- **Decision:** Migrate all email services to Zoho and update the SMTP server infrastructure, including the implementation of new routing rules to block any traffic to the legacy SMTP server. +- **Rationale:** The team decided to move to Zoho to consolidate mailing services and address the limitations or overhead associated with the existing legacy SMTP infrastructure. +- **Affected files:** `infrastructure/mail`, `services/smtp`, `config/email_routing` + +### Migrate from Shipsy to in-house mapping event system (HIGH) +- **Decision:** Switch from the third-party Shipsy provider to an in-house developed mapping event system. +- **Rationale:** The team identified that the Shipsy service was negatively impacting the platform's scalability, and moving to an internal solution reduces external dependencies. +- **Affected files:** `services/shipping-integration`, `infrastructure/event-bus` +- **Do NOT:** Continue using Shipsy (It acts as a bottleneck for system scalability.) + +### Abandon RFC 7807 for error responses (HIGH) +- **Decision:** We have decided to officially discontinue the use of RFC 7807 (Problem Details for HTTP APIs) for all API error responses moving forward. +- **Rationale:** The team determined that the RFC 7807 specification is outdated and no longer aligns with the current requirements and standards of the API architecture. +- **Affected files:** `api/responses`, `api/error-handling` + +### Standardize on HNSW for new vector indexes (MEDIUM) +- **Decision:** All new vector indexes must be created using the HNSW algorithm. Existing IVFFlat indexes (specifically in the llm_cache table) are to be migrated to HNSW in Sprint 16. +- **Rationale:** HNSW is the current architectural standard for vector indexing. The previous rejection of the migration to HNSW was due to operational risks in production, not a lack of performance or technical suitability of HNSW. +- **Affected files:** `db/schema/vector_indexes`, `db/migrations/sprint_16/migrate_llm_cache_to_hnsw` +- **Do NOT:** IVFFlat (The team has standardized on HNSW for new indexes to maintain architectural consistency, despite potential performance profiles for specific query patterns.) + +### Establish authoritative RFC 7807 error format convention (MEDIUM) +- **Decision:** Adopt the HIGH severity specification as the authoritative version for the RFC 7807 error format, which includes fields: type, title, status, detail, and instance. +- **Rationale:** The team identified that two existing conventions were redundant. Designating the HIGH severity entry as canonical while allowing the fusion engine to merge duplicate references ensures consistency across documentation and API implementations. +- **Affected files:** `packages/api/src/plugins/error-handler.ts` +- **Do NOT:** MEDIUM severity specification (The HIGH severity version was explicitly selected as the authoritative and canonical standard.) + +### Establish ownership and modification constraints for credits and billing system (HIGH) +- **Decision:** Sara is the primary owner of the billing module; all changes to the credit_ledger schema, DrizzleCreditRepository, and the EFFORT_MODE_CONFIGS require specific approvals from Sara and Ali. Furthermore, the system must strictly adhere to the append-only ledger constraint per ADR-019 and maintain SERIALIZABLE transaction requirements. +- **Rationale:** To ensure accountability and maintain architectural integrity of the financial ledger and billing configuration, specific code ownership and structural constraints have been formalized. +- **Affected files:** `packages/api/src/routes/credits.ts`, `packages/decision-store/src/repositories/credit-repository.ts`, `packages/common/src/types/credits.ts` + +### Define Model Fallback Ordering Strategy for API Rate Limits (HIGH) +- **Decision:** Establish explicit provider fallback orderings: For extraction, use Anthropic → DeepSeek → OpenAI. For detection, use Google → OpenAI → DeepSeek. +- **Rationale:** To maintain system reliability and avoid task failure when individual LLM providers hit rate limits, a hierarchical fallback mechanism ensures work is diverted to alternative models before resorting to the Dead Letter Queue (DLQ) after retries. +- **Affected files:** `src/llm/client_factory.py`, `src/llm/fallback_logic.py` + +### Migrate infrastructure orchestration from AWS ECS to AWS EKS (HIGH) +- **Decision:** The team will migrate from AWS ECS to AWS EKS for container orchestration. +- **Rationale:** EKS provides superior orchestration flexibility, including native Horizontal Pod Autoscaler and improved multi-AZ/multi-region failover capabilities, which are necessary for the current scale, outweighing the operational overhead of Kubernetes. +- **Affected files:** `infrastructure/terraform`, `infrastructure/k8s` +- **Do NOT:** AWS ECS (Lacks sufficient multi-region failover support and requires custom routing implementations.) +- **Do NOT:** Railway (Retained only as a temporary fallback, deemed insufficient for long-term production orchestration.) + +### Use cosine distance for pgvector similarity searches (MEDIUM) +- **Decision:** We have standardized on cosine distance (using the <=> operator in pgvector) for all similarity search operations. +- **Rationale:** Cosine distance provides significantly better recall (12% improvement) on normalized text embeddings compared to L2 distance. Furthermore, L2 distance is overly sensitive to embedding magnitude, making it less reliable for our specific use case. +- **Do NOT:** L2 distance (It is sensitive to embedding magnitude and demonstrated poorer recall compared to cosine distance for our data.) + +### Use MongoDB Atlas for analytics event ingestion (MEDIUM) +- **Decision:** MongoDB is strictly prohibited for use in core pipeline services (including the core decision pipeline, authentication, and the context store). These services must exclusively use PostgreSQL 16 and Redis. Any deviation requires a formal ADR. +- **Rationale:** To maintain architectural integrity and prevent fragmentation in the core tech stack. Previous attempts to introduce MongoDB for event queues nearly caused instability, highlighting the need for a hard, enforceable constraint. +- **Affected files:** `analytics/storage`, `infrastructure/database-policy` +- **Do NOT:** PostgreSQL partitioned tables (The team expressed concern that it may struggle with the required write throughput of 50k events per second.) + +### Prohibit MongoDB and mandate PostgreSQL for core pipelines (CRITICAL) +- **Decision:** The core pipeline must exclusively use PostgreSQL 16 with pgvector and Redis; the use of MongoDB is strictly prohibited. +- **Rationale:** Enforcing a specific database stack ensures architectural consistency, simplifies maintenance, and leverages existing infrastructure and expertise with PostgreSQL and pgvector. +- **Affected files:** `infrastructure/database`, `src/db/config.ts` +- **Do NOT:** MongoDB (Prohibited to maintain stack consistency and data integrity requirements.) + +### Standardize on PostgreSQL with pgvector for primary storage and vector search (HIGH) +- **Decision:** Use PostgreSQL with pgvector and HNSW indexes as the standard solution for primary datastore and vector search operations. +- **Rationale:** PostgreSQL with pgvector provides the ability to manage both SQL-based relational data and vector search capabilities within a single system, simplifying the architecture compared to managing separate databases. +- **Do NOT:** MongoDB (The team preferred the relational capabilities of PostgreSQL and the unified support for vector search provided by pgvector.) +- **Do NOT:** CockroachDB (The team decided that PostgreSQL with pgvector was sufficient and preferred over the complexity or features offered by CockroachDB.) + +### Use MongoDB Atlas for schemaless analytics webhook storage (HIGH) +- **Decision:** Use MongoDB Atlas specifically for the analytics event ingestion pipeline, while keeping all other core application data in PostgreSQL. +- **Rationale:** MongoDB Atlas provides the necessary horizontal sharding and schemaless structure to handle the required 50k write operations per second, whereas PostgreSQL performance degrades under this load. +- **Affected files:** `services/analytics-webhook-handler`, `infrastructure/database-clusters` +- **Do NOT:** PostgreSQL JSONB (Proved too difficult and inefficient to index effectively for schemaless event data.) + +### Abandoning EventStoreDB for monorepo event handling (MEDIUM) +- **Decision:** The team decided to discontinue the use of EventStoreDB and removed event sourcing as an architectural pattern following the migration back to a monorepo. +- **Rationale:** The complexity of maintaining three separate runbooks for EventStoreDB operations outweighed the benefits of its auditability features for the current team size and system scale. +- **Do NOT:** Retaining EventStoreDB for event sourcing (The operational complexity was too high and not justified by the benefits gained.) + +### Enforce RFC 7807 for Internal API Error Formats (MEDIUM) +- **Decision:** All internal API routes must adhere to the RFC 7807 error format, consistent with public-facing API routes. +- **Rationale:** Inconsistent error formats, specifically plain strings from internal routes, prevent AI tools from reliably parsing and analyzing errors, leading to broken analysis workflows. +- **Affected files:** `packages/api/src/routes/internal/` + +### Use MongoDB for Analytics Events Pipeline (HIGH) +- **Decision:** We will use MongoDB for the analytics events pipeline, provisioning a MongoDB Atlas cluster to handle the data. +- **Rationale:** MongoDB offers 10x the write throughput compared to PostgreSQL for high-cardinality event data, which is essential to meet the current scale requirements. The previous constraint was established before these new scale demands emerged. +- **Affected files:** `packages/api/src/analytics/` +- **Do NOT:** PostgreSQL (PostgreSQL's write throughput is 10x lower than MongoDB for high-cardinality event data, making it unsuitable for the new scale requirements of the analytics pipeline.) + +### LLM Provider Strategy by Pipeline Step and Effort Mode (HIGH) +- **Decision:** Each LLM pipeline step (detection, extraction, formatting) has its own provider configuration managed via environment variables. An 'effort mode' concept allows overriding these configurations per company at request time, defining specific LLM models for different quality/cost tiers: Saver uses gemini-flash, Balanced mixes gemini-flash, claude-haiku, and gpt-4o-mini, Pro uses claude-sonnet for extraction, and Super uses claude-opus. +- **Rationale:** The strategy is designed to provide flexibility and optimization across different pipeline steps and 'effort modes'. By configuring providers per step and allowing overrides based on company effort modes, the system can balance cost, performance, and model quality according to specific requirements, from 'Saver' (likely cost-optimized) to 'Super' (likely highest quality/cost). The multi-provider abstraction facilitates this dynamic selection. +- **Affected files:** `packages/analyzer/` + +### Implementation details for text embeddings in PostgreSQL using OpenAI's text-embedding-3-small and HNSW indexing (MEDIUM) +- **Decision:** We will use the `text-embedding-3-small` OpenAI model to generate 1536-dimension embeddings. These embeddings will be stored in the `knowledge_chunks` table within PostgreSQL. The HNSW index used for vector search will be configured with `ef_construction=200` and `m=16`. +- **Rationale:** The chosen HNSW parameters (`ef_construction=200` and `m=16`) are set to provide an optimal tradeoff between recall accuracy and search speed. The `text-embedding-3-small` model is selected for generating the text embeddings. +- **Affected files:** `packages/decision-store/src/schema.ts` + +### We use PostgreSQL with pgvector for all data storage (CRITICAL) +- **Decision:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. +- **Rationale:** After evaluating MongoDB, DynamoDB, and PostgreSQL, we chose PostgreSQL 16 with pgvector HNSW indexes. Reason: vector similarity search, ACID guarantees, and single DB for both structured data and embeddings. + +### Use cosine distance over L2 for semantic text embedding similarity with pgvector HNSW (MEDIUM) +- **Decision:** We decided to use cosine distance for semantic similarity search of text embeddings with pgvector HNSW for deduplication. +- **Rationale:** Cosine distance is invariant to vector magnitude, meaning it only considers the direction of vectors. This property is precisely what is desired for semantic similarity of text embeddings, as it allows for accurate comparison of semantic meaning regardless of variations in embedding vector norms. L2 (Euclidean) distance, on the other hand, would incorrectly penalize vectors with different magnitudes, even if they share the same semantic direction. +- **Do NOT:** L2 (Euclidean) distance (L2 distance penalizes vectors with different norms (magnitudes) even if they point in the same semantic direction, which is not suitable for accurately measuring semantic similarity of text embeddings.) + +### Implement Redis Semantic Caching for LLM Embedding Calls (MEDIUM) +- **Decision:** Implemented Redis semantic caching for LLM embedding calls. The cache key is a hash of the input text, model, and provider. The cache entries have a Time-To-Live (TTL) of 1 hour. +- **Rationale:** Redis was a natural extension since it is already in use for BullMQ and session caching. This implementation reduced redundant embedding calls by approximately 40% in tests. + +### Prohibition of MongoDB in the Tech Stack for Analytics Events (CRITICAL) +- **Decision:** MongoDB is strictly prohibited from being integrated into the current technology stack, including for analytics events. +- **Rationale:** There is an active and non-negotiable architectural constraint against MongoDB in the stack due to the critical requirement for ACID compliance across all billing and user data. MongoDB does not satisfy this fundamental requirement. +- **Do NOT:** MongoDB for analytics events (It violates an active architectural constraint due to its lack of native ACID compliance, which is non-negotiable for billing and user data within our stack.) + +### Plan to Migrate Application Infrastructure from Railway to AWS ECS (HIGH) +- **Decision:** The trigger metric for initiating the AWS migration has been adjusted from 20 paying customers to 30 paying customers. The Q3 2026 timeline for the migration still holds. +- **Rationale:** This adjustment is due to Railway costs being more predictable than initially expected. Additionally, the VPC isolation requirement, which was a significant factor, only applies to enterprise customers, a segment we are targeting at a later stage. + +### Defer Microservices Adoption, Maintain Monorepo Architecture (HIGH) +- **Decision:** To defer the adoption of a microservices architecture and continue with a monorepo architecture utilizing shared packages. The decision to revisit microservices will be made when the team size reaches 8 or more members. +- **Rationale:** An earlier attempt (Phase 1) to split the recorder and analyzer into separate gRPC services resulted in brutal deployment complexity for a 3-person team. This led to approximately 40% of the team's time being spent debugging inter-service authentication and network failures, making it unmanageable for the current team size. +- **Do NOT:** Adopt a microservices architecture by splitting recorder and analyzer into separate gRPC services. (The previous attempt in Phase 1 led to brutal deployment complexity for a 3-person team, consuming 40% of their time debugging inter-service authentication and network failures.) + +### Ownership of Billing Module (MEDIUM) +- **Decision:** The billing module, including Stripe integration, credit ledger, credit deduction logic, and Stripe webhook handlers, is owned by U05F9P78LTG. All changes to billing flows require their review. +- **Rationale:** This statement clarifies responsibility for the billing module and its components to ensure proper review and maintenance. +- **Affected files:** `packages/api/src/billing/` + +### Standardize on PostgreSQL and Redis; Prohibit MongoDB (CRITICAL) +- **Decision:** MongoDB is strictly prohibited in this stack due to its lack of ACID compliance. PostgreSQL will be used as the primary datastore for all persistent data, especially critical billing and user data. Redis will be used exclusively for caching purposes. +- **Rationale:** ACID compliance is a non-negotiable requirement for billing and user data to guarantee data integrity and consistency. PostgreSQL provides robust ACID transaction support. Adopting a standardized approach with PostgreSQL and Redis simplifies the technology stack and enforces critical data integrity requirements. +- **Do NOT:** MongoDB (MongoDB was rejected because it does not provide the necessary ACID compliance required for critical billing and user data, which is a non-negotiable architectural requirement for data integrity.) + +### Define LLM Model Combinations for Saver, Balanced, Pro, and Super Effort Modes (HIGH) +- **Decision:** The specific LLM model combinations for the multi-provider effort modes were finalized: Saver mode uses `gemini-flash` for detection, extraction, and format. Balanced mode uses `gemini-flash` for detection, `claude-haiku` for extraction, and `gpt-4o-mini` for format. Pro mode uses `gemini-flash` for detection, `claude-sonnet` for extraction, and `gpt-4o-mini` for format. Super mode uses `gemini-flash` for detection, `claude-opus` for extraction, and `claude-sonnet` for format. +- **Rationale:** The chosen LLM model combinations for each effort mode (Saver, Balanced, Pro, Super) were selected to provide different performance and cost profiles, aligning with the multi-provider strategy. Cost analysis confirmed that the proposed combinations, ranging from ~$0.08/1M tokens for Saver to ~$4.50/1M tokens for Super, ensure fine margins at current credit pricing. + +### Implement Multi-Provider LLM Abstraction for Pipeline Steps with Per-Company Overrides (HIGH) +- **Decision:** We will implement a multi-provider abstraction where each pipeline step (detection, extraction, enrichment, formatting) has its own LLM provider configuration via environment variables. At request time, an 'effort mode' can override the provider selection on a per-company basis. +- **Rationale:** This approach allows companies with high context volume (Tier 3+) to pay extra for Claude-Sonnet's accuracy where needed, while companies with tighter budgets can use more cost-effective options like Gemini-Flash for all steps. It also decouples our infrastructure from individual LLM vendor stability and enables independent contract negotiations with different providers (Anthropic, OpenAI, Google). +- **Do NOT:** Continue with current fragmented multi-provider setup (Gemini-Flash for detection, Claude-Sonnet for extraction, GPT-4o-mini for formatting). (This approach is unmaintainable, costly (Claude-Sonnet accounts for 60% of the LLM bill), and suffers from inconsistent provider availability issues.) +- **Do NOT:** Consolidate to a single LLM provider for all pipeline steps. (This would limit flexibility, potentially sacrificing accuracy for high-tier companies or forcing budget-conscious companies to pay for more expensive models than necessary. It would also lead to vendor lock-in and a single point of failure for LLM stability.)