🚀 [v0.4.0-beta.0] Rastreabilidade Total & SSoT Hardening#12
Merged
Conversation
…tructive database preparation
…ing (families/coefficients/labor-mix)
- database.py: UPSERT on append, audit logging, version/run_id propagation - etl_pipeline.py: DELETE by period (not TRUNCATE), extract SINAPI version - Remove embedded .git from AutoSINAPI/ (was preventing tracking) - Update .gitignore to not ignore AutoSINAPI/ toolkit
New test files:
- test_migration.py: Validates Alembic 002 migration (traceability columns)
- test_traceability_db.py: UPSERT, audit log, version propagation
- test_traceability_etl.py: ETL traceability (DELETE by period, version extraction)
- test_traceability_api.py: API traceability (audit endpoint, schemas)
- test_sandbox_integration.py: E2E integration with mock SINAPI data
Updated test files:
- test_database.py: Added UPSERT behavior, traceability propagation tests
- test_pipeline.py: Added sinapi_versao extraction, DELETE by period tests
- test_file_input.py: (already existed, staged)
Features tested:
- Migration 002 creates traceability columns + audit log table
- _append_data() now does UPSERT (not just INSERT IGNORE)
- sinapi_versao and etl_run_id propagated through ETL
- DELETE by period replaces TRUNCATE for structure tables
- New /audit/{tipo}/{codigo} API endpoint
- TraceabilityMixin in Pydantic schemas
- Create api/sandbox_utils.py for sandbox mode (AUTOSINAPI_SANDBOX) - Update config.py to support mode='sandbox' with sandbox_ table prefix - Update database.py: always propagate sinapi_versao/etl_run_id - Fix test_migration.py: correct assertions for Alembic 002 migration - Fix test_database.py: use call.args[0] for TextClause content checks - Fix test_pipeline.py: use _execute_phase_1_acquisition mock, accent namin - Fix test_traceability_etl.py: same pattern fixes - Fix test_traceability_api.py: use app.dependency_overrides for DB mock - Skip integration tests requiring real PostgreSQL - 50 total: 37 ETL + 13 API tests passing, 2 skipped
- etl_pipeline.py: Phase 0 checks config.DB_TABLE_INSUMOS (not hardcoded) - config.py: add DB_TABLE_AUDIT_LOG for sandbox prefix support - database.py: use config table names for audit log DDL and queries - database.py: drop/create audit log with config table name
- Fix etl_pipeline.py: use config table names for Phase 0 check - Fix database.py: use config DB_TABLE_AUDIT_LOG, add uuid import - Fix database.py: change etl_run_id to VARCHAR(36) type - Fix database.py: add DISTINCT ON dedup for UPSERT queries - Fix etl_pipeline.py: add column existence checks in placeholder gen - Fix etl_pipeline.py: revert structure tables to TRUNCATE (no data_referencia) - Add run_sandbox.py for sandbox ETL execution - Migration 002 applied to real DB - Sandbox ETL populated 2025-07: 1,160,750 records - Traceability fields verified: sinapi_versao, etl_run_id
…integrity - Refactor Trends endpoint to support agrupar_por (classificacao, grupo, item) and codigos filter. - Update UI to allow switching trend dimensions and individual item analysis. - Fix Processor to correctly extract Grupo column from 'Analítico' Excel sheet. - Harden ETL pipeline to protect classifications during placeholder merging. - Ensure etl_run_id and sinapi_versao are propagated to all 10 DB tables. - Standardize metadata with UPPER(TRIM()) in trend analysis. - Update documentation in READMEs and history records.
…traceability and data intelligence
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 [v0.4.0-beta.0] Rastreabilidade Total & SSoT Hardening
Este Pull Request unifica dois grandes ciclos de desenvolvimento do Toolkit AutoSINAPI: a implementação da Rastreabilidade de Dados (Audit Logging & UPSERT) e o Enriquecimento SSoT (Famílias, Mistura de Mão de Obra e Smart Discovery).
✨ O que há de novo?
1. Rastreabilidade & Confiabilidade (
feat/traceability)sinapi_audit_log. Toda execução do pipeline agora gera um registro único (run_id), contendo a versão do SINAPI, quantidade de registros afetados e as tabelas atualizadas.created_at,updated_at,sinapi_versaoeetl_run_idforam adicionadas nativamente a todas as tabelas do modelo de dados.UPSERTnativo (Insert ou Update) com base em colunas de Primary Key definidas dinamicamente, eliminando duplicações em cargas parciais.INSUMO_DESCONHECIDO_XXXX) para manter a integridade referencial.2. Enriquecimento de Dados (
feat/etl-evolution-ssot-hardening)insumos_familias,coeficientes_familia_mensal) e "Mão de Obra" (composicoes_mix_mao_de_obra).Downloadere oPipelineETLagora conseguem localizar de forma inteligente arquivos ZIP do SINAPI que o usuário coloque manualmente na raiz da pastadownloads/, ignorando arquivos PDF irrelevantes.percentual_mo(Percentual de Mão de Obra).🛠️ Correções e Melhorias Internas (Chore/Fix)
pytestvalide com precisão a estrutura sem quebrar por dependências locais. 100% de aprovação (35/35 testes).ValueError).test_migration.py) delegando as migrações para a stack da API.autoSINAPI_API.✅ Checklist de Validação
pytest)