A lightweight, scalable data ingestion service built in Go with support for multiple data sources, privacy protection, and tiered processing configurations.
- Multi-Source Ingestion: Support for CSV, JSON, and PostgreSQL data sources
- Privacy Protection: Built-in data sanitization and encryption
- Tiered Processing: Configurable performance tiers (small, medium, large)
- Dual Storage: PostgreSQL for metadata and ClickHouse for analytics
- RESTful API: Simple HTTP endpoints for data ingestion
- Docker Support: Easy deployment with Docker Compose
- Graceful Shutdown: Proper signal handling and cleanup
- Auto-scaling Workers: Configurable worker pools for optimal performance
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Data Sources βββββΆβ Privacy Guard βββββΆβ Transform β
β (CSV/JSON/DB) β β (Sanitization) β β (Processing) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β PostgreSQL ββββββ Storage ββββββ Ingestor β
β (Metadata) β β Layer β β (Workers) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ |
β ClickHouse ββββββ Storage β ββββ
β (Analytics) β β Layer β
βββββββββββββββββββ βββββββββββββββββββ
- Go 1.23.4 or later
- Docker and Docker Compose
- PostgreSQL 15+
- ClickHouse 23.3+
-
Clone the repository
git clone https://github.com/lucent1/ingest-light.git cd ingest-light -
Start the services
docker-compose up -d
This will start:
- PostgreSQL on port 5432
- ClickHouse on ports 8123 (HTTP) and 9000 (native)
- Ingest service on port 8080
-
Verify the setup
curl http://localhost:8080/status
-
Install dependencies
go mod download
-
Set up databases
- Start PostgreSQL and create a database named
ingestion - Start ClickHouse
- Start PostgreSQL and create a database named
-
Configure environment variables
export POSTGRES_URL="postgres://username:password@localhost:5432/ingestion?sslmode=disable" export CLICKHOUSE_URL="localhost:9000" export PRIVACY_ENCRYPTION_KEY="your-secure-encryption-key" export PRIVACY_SALT="your-secure-salt"
-
Run the service
go run cmd/server/main.go
The service uses a YAML configuration file (config.yaml) to define processing tiers:
tiers:
small:
workers: 4
buffer_size: 100
batch_size: 50
timeout_ms: 500
retention_days: 1
medium:
workers: 8
buffer_size: 500
batch_size: 100
timeout_ms: 300
retention_days: 7
large:
workers: 16
buffer_size: 1000
batch_size: 200
timeout_ms: 200
retention_days: 14| Variable | Description | Default |
|---|---|---|
CONFIG_PATH |
Path to configuration file | config.yaml |
POSTGRES_URL |
PostgreSQL connection string | - |
CLICKHOUSE_URL |
ClickHouse connection string | - |
PRIVACY_ENCRYPTION_KEY |
Encryption key for privacy | - |
PRIVACY_SALT |
Salt for privacy functions | - |
Ingest data into the system.
Request Body:
{
"source": "csv_file",
"tier": "medium",
"payload": [
{
"id": 1,
"name": "John Doe",
"email": "john@example.com"
}
]
}Response:
{
"status": "processed",
"processed_at": "2024-01-01T00:00:00Z",
"tier": "medium",
"records": 1
}Check service health status.
Response:
{
"status": "ok",
"time": "2024-01-01T00:00:00Z"
}Get current tier configuration.
Response:
{
"workers": 8,
"buffer_size": 500,
"batch_size": 100,
"timeout_ms": 300,
"retention_days": 7
}The service includes a privacy guard that:
- Sanitizes sensitive data before processing
- Encrypts personal information
- Provides configurable data retention policies
- Supports GDPR compliance requirements
Important: Change the default encryption key and salt in production!
Run the test suite:
go test ./...Run specific test files:
go test ./internal/privacy
go test ./internal/transformThe service provides built-in monitoring:
- Health check endpoint (
/status) - Tier configuration endpoint (
/tiers) - Structured logging for debugging
- Graceful shutdown handling
Choose the appropriate tier based on your workload:
- Small: Low-volume, real-time processing
- Medium: Balanced performance for most use cases
- Large: High-volume, batch processing
- PostgreSQL: Configure connection pooling and indexing
- ClickHouse: Optimize for analytical queries and compression
ingest/
βββ cmd/server/ # Main application entry point
βββ internal/ # Internal packages
β βββ adapter/ # Data source adapters
β βββ api/ # HTTP handlers and types
β βββ cleaner/ # Data retention and cleanup
β βββ config/ # Configuration management
β βββ db/ # Database connections, schemas, and storage
β βββ ingest/ # Core ingestion logic
β βββ privacy/ # Privacy and security
β βββ transform/ # Data transformation
βββ pkg/ # Public packages
βββ examples/ # Usage examples
βββ test/ # Test files
- Implement the
SourceAdapterinterface ininternal/adapter/ - Add configuration for the new source
- Update the main application to use the new adapter
- Implement the storage interface in
internal/db/ - Add connection logic in
internal/db/ - Update the storage initialization