Skip to content

A high-performance, concurrent sFlow collector and exporter

License

Notifications You must be signed in to change notification settings

vajra77/SeaFlows

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

495 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeaFlows 🌊

SeaFlows is a high-performance, concurrent sFlow (v5) collector and exporter written in Go. Designed for speed, reliability, and low resource footprint, it ingests network samples, aggregates flow data in memory, and persists it to disk using rrdcached for lightning-fast, I/O-optimized time-series storage. It also includes a lightning-fast Exporter API to serve this data to your frontend dashboards.

🚀 Key Features

The Collector

  • High-Concurrency Ingestion: Utilizes a scalable Goroutine worker pool and tuned OS network buffers to handle massive UDP packet floods without dropping data.
  • Smart Memory Broker: Aggregates flow records in memory over a configurable interval (e.g., 60 seconds) before flushing. It calculates the true traffic volume using SamplingRate * PacketSize to ensure exact bit-rate representation.
  • Zero-Latency Storage (rrdcached): Communicates directly with the rrdcached daemon via Unix sockets, completely eliminating disk I/O bottlenecks and fork/exec overhead during data updates.
  • Auto-Provisioning: Dynamically creates hierarchical directories and RRD databases on the fly as new source/destination MAC address pairs are discovered.
  • Deep Packet Parsing: Safely decodes XDR-encoded sFlow datagrams, robustly handling IPv4, IPv6, VLAN tags (802.1Q), and preventing panics on malformed or truncated packets.

The Exporter

  • RESTful API: A lightweight HTTP server built in Go that directly queries the RRD files and returns clean, structured JSON data for your frontend.
  • Time-Series Querying: Built-in support for different time schedules (hourly, daily, weekly) to instantly fetch the right resolution of data (utilizing AVERAGE and MAX RRAs).
  • Clean Data Paths: Seamlessly maps the frontend requests to the file system using sanitized MAC addresses (e.g., 001122334455).

🏗 Architecture

SeaFlows is composed of two main operational blocks:

  1. The Collector (cmd/collector): Listens on UDP port 6343. It dispatches incoming byte streams to a pool of worker threads that decode the sFlow datagrams into normalized FlowRecord structs. The Broker Service aggregates these bytes per MAC-to-MAC flow and flushes the data to rrdcached every 60 seconds.
  2. The Exporter (cmd/exporter): An HTTP API service that reads the .rrd files generated by the collector. It handles time calculations and extracts the requested time-series data using the github.com/ziutek/rrd library, making it immediately available for visualization.

Storage Strategy (RRD)

Data is stored using the ABSOLUTE Data Source type. This ensures that the aggregated bytes sent by the Broker are correctly interpreted as "Bytes per Second" by RRDtool.

  • DS: bytes4 (IPv4) and bytes6 (IPv6)
  • Step: 300 seconds (5 minutes)
  • Retention: Configured with AVERAGE and MAX RRAs to retain granular 5-minute data for 2 days, scaling up to daily summaries kept for over 2 years.

⚙️ Prerequisites

  • Go: Version 1.21 or higher.
  • OS: Linux (recommended for optimal Unix socket and UDP buffer performance).
  • RRDtool & RRDCached: Must be installed and running on the host system.
    # Example for Debian/Ubuntu
    sudo apt update
    sudo apt install rrdtool rrdcached librrd-dev

🛠 Installation & Setup

1. Configure RRDCached

Ensure rrdcached is configured to listen on a Unix socket and has access to your target data directory.

# Example start command
rrdcached -l unix:/var/run/rrdcached.sock -b /srv/rrd/flows -j /var/lib/rrdcached/journal -F -w 300 -z 300```

### 2. Clone the Repository
```bash
git clone https://github.com/vajra77/SeaFlows.git
cd SeaFlows

3. Build the Binaries

SeaFlows provides two distinct binaries. You can build them both from the root directory:

go mod tidy
go build -o bin/seaflows-collector ./cmd/collector
go build -o bin/seaflows-exporter ./cmd/exporter

or use the sample Makefile shipping with the code:

make collector
make exporter

4. Run the Services

Prepare the .env configuration file: Copy the .env.example file to the project bin directory (create it if not present) and rename it to .env. Fill in the requested configuration variables that suits your environment. This is an example configuration:

# NETWORK
EXPORTER_ADDRESS="127.0.0.1:8080"
COLLECTOR_ADDRESS="192.168.201.193:6343"

# RRD
RRD_CACHE_SOCKET="/var/run/rrdcached.sock"
RRD_BASE_PATH="/srv/rrd"
RRD_GAMMA=1.0

# IX-F MAP=
IXF_URL="https://my.namex.it/api/v4/member-export/ixf/1.0"

Start the Collector: Ensure the user has write permissions to the /srv/rrd/flows directory and the rrdcached.sock.

./seaflows-collector

Start the Exporter API:

./seaflows-exporter

(Note: For production, it is highly recommended to run both SeaFlows components via Systemd services, you can find some configuration examples in the assets directory).


📡 Exporter API Usage

Once the Exporter is running, you can query flow data using standard HTTP GET requests.

Example Request:

curl "http://localhost:8080/api/flow?src=001122334455&dst=AABBCCDDEEFF&schedule=daily&proto=4"

Supported Schedules:

  • daily (Last 24 hours)
  • weekly (Last 7 days)
  • monthly (Last 30 days)
  • yearly (Last 12 months)

📂 Project Structure

SeaFlows/
├── assets/               # Various helper files
├── cmd/
│   ├── collector/        # Main entry point for the UDP Collector
│   └── exporter/         # HTTP API for querying RRD data
├── internal/
│   ├── handlers/         # UDP Socket listener and worker pool logic, API handlers
│   ├── helpers/          # Support for parsing of IX-F Json files (to map MAC addresses)
│   ├── middleware/       # API Authentication support
│   ├── models/           # Data structures (Datagram, Sample) & Interfaces
│   └─ services/          # Business logic (Broker aggregation, RRD storage)
├── go.mod
└── README.md

📊 Tuning for Production

  • CPU / Workers: SeaFlows automatically scales its worker pool based on runtime.NumCPU(). For heavy loads (e.g., 10+ switches), assign at least 2-4 vCPUs to your VM.
  • Memory: The application is highly memory-efficient. A footprint of 1GB-2GB RAM is recommended to allow the Linux kernel to comfortably cache RRD files.
  • UDP Buffers: SeaFlows requests a 16MB read buffer from the OS. To fully utilize this, you may need to increase your sysctl limits:
    sudo sysctl -w net.core.rmem_max=16777216

📝 License

This project is licensed under the GNU GPL-3.0 License - see the LICENSE file for details.

About

A high-performance, concurrent sFlow collector and exporter

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published