Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 76 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,82 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.1] - unreleased
## [2.0.0] - unreleased

### Security advisory

- **Be cautious when piping raw `_msg` into `email_body_html`.** The example
config switched `body: "{{ _msg }}"` in v1.2.0 (#26 fix), and operators may
reasonably mirror that in `email_body_html`. The email notifier marks `body`
as `safe` (pre-escaped HTML) before injection into the email envelope, so a
log line containing raw HTML or `<script>` tags would render unescaped in
the recipient's mail client. This is pre-existing behaviour from v1.x, not a
regression introduced in v2.0.0, but the surface is wider now that the
example actively uses `_msg`. If your VictoriaLogs ingests untrusted
content (web request bodies, user-controlled fields), wrap the offending
field with `| escape` or render via plain `body` (not `email_body_html`)
for email destinations until the email path is hardened in a follow-up.

### Breaking changes

- **`victorialogs` is now a map of named sources.** A single valerter instance can tail multiple VL backends and route alerts per source. The v1.x single-URL shape (`victorialogs.url: ...` at the top level) is rejected at load with an actionable migration error.

Migrate from:

```yaml
victorialogs:
url: "http://victorialogs:9428"
basic_auth:
username: "u"
password: "p"
```

To:

```yaml
victorialogs:
default:
url: "http://victorialogs:9428"
basic_auth:
username: "u"
password: "p"
```

Then optionally target sources per rule via `vl_sources: [name, ...]`, or omit the field to fan out across every configured source. Credentials, TLS, and headers are per-source, self-contained in each `VlSourceConfig`.

- **Default throttle key is now `{rule}-{source}:global`** (was `{rule}:global` in v1.x). Multi-source deployments get isolated throttle buckets per source with no extra config. Users who want cross-source dedup must override `throttle.key` explicitly (e.g. `key: "{{ rule_name }}"`).

- **Source names are restricted to `^[a-zA-Z0-9_]+$`.** No dashes, colons, dots, or spaces allowed. Validated at load. The constraint avoids ambiguity in the default throttle key format above.

- **Notifier output formats extended with `vl_source`.** The Mattermost footer now reads `valerter | <rule> | <source> | <timestamp>` instead of `valerter | <rule> | <timestamp>`. The default webhook payload exposes `vl_source` as a top-level JSON field. Downstream parsers / dashboards that match exact strings in either output need to update.

- **All per-rule Prometheus metrics now also carry a `vl_source` label.** Affected counters: `valerter_alerts_sent_total`, `valerter_alerts_throttled_total`, `valerter_alerts_passed_total`, `valerter_alerts_failed_total`, `valerter_email_recipient_errors_total`, `valerter_lines_discarded_total`, `valerter_logs_matched_total`, `valerter_notify_errors_total`, `valerter_parse_errors_total`, `valerter_reconnections_total`, `valerter_rule_panics_total`, `valerter_rule_errors_total`. Affected gauge/histogram: `valerter_last_query_timestamp`, `valerter_query_duration_seconds`. Dashboards and alerts that grouped by `rule_name` alone keep working but get an extra `vl_source` dimension; PromQL using `sum by (rule_name) (...)` still rolls up correctly. `valerter_queue_size` stays unlabeled (the queue is shared, not per-source).

- **`valerter_victorialogs_up{rule_name}` removed and replaced by `valerter_vl_source_up{vl_source}`.** The new gauge is per-source (one value per configured source, regardless of how many rules tail it) since reachability is a property of the source, not the rule. Alerts and panels need to migrate from per-rule to per-source semantics. Examples:

```promql
# v1.x (per-rule): valerter_victorialogs_up{rule_name="nginx-5xx"} == 0
# v2.0.0 (per-source): valerter_vl_source_up{vl_source="prod"} == 0

# v1.x (any rule down): min(valerter_victorialogs_up) == 0
# v2.0.0 (any source): min(valerter_vl_source_up) == 0
```

The label key is now `vl_source` (not `rule_name`), and the cardinality drops from `|rules|` to `|sources|`.

- **`defaults.max_streams` cap introduced (default 50).** Total VictoriaLogs streams = sum of `(rule, source)` pairs spawned for enabled rules. Breaching the cap fails the config at load with both the actual count and the cap value. Configurable via `defaults.max_streams: <usize>`. Disabled rules do not contribute. Prevents accidental fan-out from DoSing a backend.

### Added

- **Multi-source VictoriaLogs support** (issue #34). The engine spawns one task per `(rule, source)` pair with per-source cancellation and reconnect isolation, so a single unhealthy source does not stop alerts on the others.
- **`{{ vl_source }}` template variable** available everywhere `{{ rule_name }}` is: layer 1 templates (`title`, `body`, `email_body_html`), `throttle.key`, and notifier-level layer 2 contexts (`subject_template`, `body_template`). Always non-empty, owned `String`, equal to the source name currently processing the event. Synthetic value wins over any event field literally named `vl_source` (matches the `rule_name` collision policy).
- **`AlertPayload.vl_source`** propagated end-to-end so notifiers can render the source name. See Breaking changes above for the related output format updates on Mattermost and webhook destinations.
- **`valerter_vl_source_up{vl_source}` per-source reachability gauge.** Initialized to 0 for every configured source at startup; engine flips to 1 on tail connect success and back to 0 on permanent failure or stream error. Replaces the v1.x per-rule `valerter_victorialogs_up`.
- **`±10%` uniform jitter on reconnect backoff** (per `(rule, source)` task). Sources behind a flapping load balancer no longer reconnect in lock-step, breaking the thundering-herd alignment over a few cycles. Hardcoded jitter range; not configurable in this release.
- **`tests/metrics_snapshot.rs` integration test.** Spins up a 2-source 1-rule engine, scrapes `/metrics`, and asserts the set of metric names + label keys (not values) against an inline expected string. Catches accidental relabel/rename in future PRs.
- **`examples/multi-source/config.yaml`** reference and top-level **[`MIGRATION.md`](MIGRATION.md)** for v1.x upgraders.

## [1.2.1] - 2026-04-16

### Fixed
- **`{{ rule_name }}` available in top-level templates and throttle key** (issue #31). `rule_name` is now injected into the render context of `templates.<name>.title`, `body`, and `email_body_html`, and also into the `throttle.key` template, not just the notifier-level `subject_template` / `body_template`. Configs that referenced `{{ rule_name }}` in a top-level template previously rendered an empty string; they now render the rule name. If an event field happens to be literally named `rule_name`, the synthetic rule name wins, matching the collision policy of the existing notifier-level contexts.
Expand Down
3 changes: 2 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "valerter"
version = "1.2.1"
version = "2.0.0"
edition = "2024"
description = "Real-time log alerting for VictoriaLogs"
license = "Apache-2.0"
Expand Down Expand Up @@ -31,6 +31,9 @@ minijinja = { version = "2.12", features = ["builtins", "json"] }
# Caching/Throttle
moka = { version = "0.12", features = ["sync"] }

# Random (jitter on reconnect backoff to break thundering-herd alignment).
rand = "0.8"

# Observability
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
Expand Down
214 changes: 214 additions & 0 deletions MIGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# Migration Guide

This guide covers upgrading from Valerter **v1.x** to **v2.0.0**. Follow it section by section. Every breaking change has a before / after snippet you can copy.

If you only need a one-line summary: **the `victorialogs` section is now a map of named sources, every per-rule Prometheus metric gained a `vl_source` label, and `valerter_victorialogs_up` was renamed.**

## 1. Pre-Upgrade Checklist

Walk through this before you flip the binary. Each bullet is a concrete file or query you should touch.

### Configuration

- [ ] Open every `config.yaml`, `rules.d/*.yaml`, and `notifiers.d/*.yaml` you ship.
- [ ] Find the top-level `victorialogs:` block. If it has a direct `url:` key, you must rewrite it (see Section 2).
- [ ] Decide on your source name(s). Names must match `^[a-zA-Z0-9_]+$`. If you have only one backend today, pick `default`.
- [ ] Decide if any rule should pin to a subset of sources via `vl_sources: [name, ...]`. By default, rules fan out across every configured source.

### Prometheus Dashboards

- [ ] Search every Grafana dashboard JSON for `valerter_victorialogs_up`. Every match must move to `valerter_vl_source_up` (see Section 3).
- [ ] Search for PromQL queries that group by `rule_name` only (e.g. `sum by (rule_name) (valerter_alerts_sent_total)`). They keep working but now silently aggregate across sources. If that is not what you want, add `vl_source` to the `by` clause.
- [ ] Verify panels on per-source overlays will not collapse if a rule fans out to multiple sources.

### Prometheus Alerts

- [ ] Find every `alert:` rule that referenced `valerter_victorialogs_up{rule_name=...}`. Migrate the label key from `rule_name` to `vl_source` (see Section 3 for examples).
- [ ] Re-evaluate cardinality. Per-rule alerts now multiply by the number of sources you tail.

### Notifier Output

- [ ] If you parse the Mattermost footer string downstream, expect a 4-segment `valerter | <rule> | <source> | <timestamp>` instead of the 3-segment v1.x format (see Section 4).
- [ ] If you consume the default webhook payload, expect a new top-level `vl_source` field.

## 2. Config Migration

The `victorialogs` section is now a **map of named sources**. The v1.x single-URL shape is rejected at load with an actionable error.

### Before (v1.x)

```yaml
victorialogs:
url: "http://victorialogs:9428"
basic_auth:
username: "u"
password: "p"
```

### After (v2.0.0)

```yaml
victorialogs:
default:
url: "http://victorialogs:9428"
basic_auth:
username: "u"
password: "p"
```

The minimum-effort migration is one new key (`default:`) and one extra indent level. Credentials, TLS, and headers move under each source, self-contained.

### Targeting sources per rule

Add `vl_sources: [name, ...]` to a rule to restrict it to a subset of sources. Omit the field to fan out across every configured source:

```yaml
rules:
- name: "prod_only_alert"
query: '...'
vl_sources: [prod] # only the `prod` source
notify: { template: "...", destinations: ["..."] }

- name: "all_envs_alert"
query: '...'
# no vl_sources → fans out across every source
notify: { template: "...", destinations: ["..."] }
```

See [`examples/multi-source/`](examples/multi-source/) for a complete reference with prod + staging.

### Source name format

Source names must match `^[a-zA-Z0-9_]+$`. No dashes, dots, colons, or spaces. The constraint avoids ambiguity in the default throttle key format below. Validation runs at load time.

### Default throttle key change

The default throttle key changed from the literal string `<rule_name>:global` (v1.x) to `<rule_name>-<vl_source>:global` (v2.0.0). These angle-bracket placeholders are descriptive notation, not template syntax. Multi-source deployments get isolated throttle buckets per source automatically. If you want **cross-source dedup** (one bucket shared across sources for the same rule), set `throttle.key` explicitly:

```yaml
rules:
- name: "shared_bucket_alert"
throttle:
key: "{{ rule_name }}" # back to v1.x semantics
# ...
```

### `defaults.max_streams` cap

A new cap on total `(rule, source)` pairs spawned, default `50`. Disabled rules do not contribute. Breaching the cap fails the config at load with both the actual count and the cap value. Tune via `defaults.max_streams: <usize>` if you fan out many rules across many sources.

## 3. Prometheus Migration

### Removed: `valerter_victorialogs_up{rule_name}`

This per-rule gauge was replaced by a per-source gauge. Reachability is a property of the **source**, not the rule (every rule that tails the same backend reports the same up/down state).

### Added: `valerter_vl_source_up{vl_source}`

One value per configured source, regardless of how many rules tail it. Initialized to 0 at startup; flipped to 1 on tail connect success and back to 0 on permanent failure or stream error. The label key is `vl_source` (not `rule_name`), and the cardinality drops from `|rules|` to `|sources|`.

#### PromQL migration examples

```promql
# v1.x (per-rule): valerter_victorialogs_up{rule_name="nginx-5xx"} == 0
# v2.0.0 (per-source): valerter_vl_source_up{vl_source="prod"} == 0

# v1.x (any rule down): min(valerter_victorialogs_up) == 0
# v2.0.0 (any source): min(valerter_vl_source_up) == 0
```

### `vl_source` label added to every per-rule metric

Affected counters: `valerter_alerts_sent_total`, `valerter_alerts_throttled_total`, `valerter_alerts_passed_total`, `valerter_alerts_failed_total`, `valerter_email_recipient_errors_total`, `valerter_lines_discarded_total`, `valerter_logs_matched_total`, `valerter_notify_errors_total`, `valerter_parse_errors_total`, `valerter_reconnections_total`, `valerter_rule_panics_total`, `valerter_rule_errors_total`.

Affected gauge / histogram: `valerter_last_query_timestamp`, `valerter_query_duration_seconds`.

Dashboards and alerts that grouped by `rule_name` alone keep working but now get an extra `vl_source` dimension. PromQL using `sum by (rule_name) (...)` still rolls up correctly across sources. `valerter_queue_size` stays unlabeled (the queue is shared, not per-source).

### Per-rule alert example

```yaml
# v1.x
- alert: ValerterVictoriaLogsDown
expr: valerter_victorialogs_up == 0
for: 5m

# v2.0.0
- alert: ValerterVictoriaLogsSourceDown
expr: valerter_vl_source_up == 0
for: 5m
annotations:
summary: "Source {{ $labels.vl_source }} unreachable"
```

## 4. Notifier Output Changes

### Mattermost footer

The footer now carries 4 segments instead of 3:

```
v1.x: valerter | <rule> | <timestamp>
v2.0.0: valerter | <rule> | <source> | <timestamp>
```

If you parse the footer string downstream, update the split logic.

### Default webhook payload

The default webhook payload (used when `body_template` is omitted) gained a top-level `vl_source` field:

```json
{
"alert_name": "<notifier_name>",
"rule_name": "...",
"vl_source": "prod",
"title": "...",
"body": "...",
"timestamp": "<ISO8601>",
"log_timestamp": "<ISO8601>",
"log_timestamp_formatted": "DD/MM/YYYY HH:MM:SS TZ"
}
```

### Templates

The `{{ vl_source }}` template variable is available everywhere `{{ rule_name }}` is: layer 1 templates (`title`, `body`, `email_body_html`), `throttle.key`, and notifier-level layer 2 contexts (`subject_template`, `body_template`). Always non-empty, equal to the source name currently processing the event.

If an event field is literally named `vl_source`, the synthetic value wins (matches the `rule_name` collision policy).

## 5. Rollback

If something goes wrong after the upgrade, you can roll back to **v1.2.1** with no state migration required. The Prometheus metric labels are additive at the storage layer, except for the removed `valerter_victorialogs_up` gauge (which simply stops being produced when v2.0.0 runs).

### Debian / Ubuntu

```bash
# Pin v1.2.1
curl -LO https://github.com/fxthiry/valerter/releases/download/v1.2.1/valerter_1.2.1_amd64.deb
sudo dpkg -i valerter_1.2.1_amd64.deb
sudo systemctl restart valerter
```

### Static binary

```bash
curl -LO https://github.com/fxthiry/valerter/releases/download/v1.2.1/valerter-linux-x86_64.tar.gz
tar -xzf valerter-linux-x86_64.tar.gz
./valerter --validate -c /etc/valerter/config.yaml
```

You will need to revert the v2.0.0 config rewrite (the v1.x binary will reject the map shape). Keep a `config.yaml.v1` backup before you upgrade.

### Notes

- No on-disk state to migrate: throttle buckets are in-memory only.
- Prometheus historical data with the new `vl_source` label remains valid (the label simply becomes empty for older samples in TSDB).
- The v1.x `valerter_victorialogs_up` time series stops growing under v2.0.0 and resumes under v1.2.1.

## See Also

- [`CHANGELOG.md`](CHANGELOG.md) : full v2.0.0 release notes
- [`examples/multi-source/`](examples/multi-source/) : complete working multi-source reference
- [`docs/configuration.md`](docs/configuration.md) : full configuration reference
- [`docs/metrics.md`](docs/metrics.md) : Prometheus metric catalog
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ See [Cisco Switches example](examples/cisco-switches/) for a complete implementa

## Features

- **One Valerter for every VictoriaLogs you run.** Tail prod, staging, per-region or per-tenant backends from a single instance; pin rules to a specific source or fan out across all of them, with isolated reconnects, per-source metrics, and a `vl_source` label everywhere
- **Multi-channel notifications** — Webhook (PagerDuty, Slack, Discord), Email SMTP, Mattermost, Telegram
- **Full log context** — Alerts include the actual log line and extracted fields
- **Intelligent throttling** — Avoid alert spam with per-key rate limiting
Expand Down Expand Up @@ -93,12 +94,13 @@ Example configuration:

```yaml
victorialogs:
url: "http://victorialogs:9428"
default:
url: "http://victorialogs:9428" # replace with your VictoriaLogs host

notifiers:
mattermost-ops:
type: mattermost
webhook_url: "https://mattermost.example.com/hooks/your-webhook-id"
webhook_url: "https://mattermost.example.com/hooks/your-webhook-id" # replace with your real webhook

defaults:
throttle:
Expand All @@ -113,7 +115,7 @@ templates:

rules:
- name: "error_logs"
query: '_msg:~"(error|failed|critical)"'
query: '_msg:~"(error|failed|critical)"' # adjust to match the events you care about
parser:
regex: '(?P<message>.*)'
notify:
Expand All @@ -122,6 +124,8 @@ rules:
- "mattermost-ops"
```

> **Upgrading from v1.x?** The config schema and Prometheus metrics changed in v2.0.0. See [MIGRATION.md](MIGRATION.md) for the full guide.

## Documentation

- **[Getting Started](docs/getting-started.md)** — Installation and first setup
Expand All @@ -131,6 +135,7 @@ rules:
- **[Performance](docs/performance.md)** — Benchmarks and capacity planning
- **[Architecture](docs/architecture.md)** — How Valerter works
- **[Examples](examples/)** — Real-world configurations
- **[Multi-source example](examples/multi-source/)** — Tail several VictoriaLogs backends from one Valerter instance

## Contributing

Expand Down
Loading
Loading