Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,14 @@ jobs:
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install Python dependencies
run: pip install -r requirements_dev.txt --quiet

- name: Set up pnpm
uses: pnpm/action-setup@v4

Expand All @@ -134,6 +142,8 @@ jobs:

- name: Run unit tests
run: pnpm exec vitest run --project unit
env:
PYTHON_BIN: python3

test-storybook:
name: Test / Storybook
Expand Down
104 changes: 104 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
- [How datapoints appear](#how-datapoints-appear)
- [Cards in practice](#cards-in-practice)
- [History chart and page features](#history-chart-and-page-features)
- [Trend analysis](#trend-analysis)
- [Anomaly detection](#anomaly-detection)
- [Using automations to create useful analytical datapoints](#using-automations-to-create-useful-analytical-datapoints)
- [WebSocket API](#websocket-api)
Expand Down Expand Up @@ -472,6 +473,109 @@ The chart `+` action can create a datapoint at the inspected time. The dialog ca

---

## Trend analysis

Trend analysis overlays a computed curve on top of the raw sensor data in the chart. Each method answers a different question about your data, so choosing the right one depends on what you are investigating.

Enable trend lines from the analysis panel for each target row. The trend window selector controls how much history the smoothing methods use to compute each point.

### Linear trend

A straight line fitted to all visible points using least-squares regression.

**What it shows:** The overall direction of the data — whether the value is rising, falling, or flat across the whole window.

**Use it when:**

- You want to confirm a slow long-term drift (e.g. sensor calibration drift, gradual battery discharge)
- You are comparing the slope between two time windows to detect a change in behavior
- The data is noisy but you only care about the broad direction, not local variation

**Avoid it when:** The signal is clearly non-linear (curved, periodic, or mean-reverting). A straight line will misrepresent those patterns.

### Rolling average

A sliding-window mean that replaces each point with the average of all points within the preceding time window.

**What it shows:** The local level of the data, smoothed to remove high-frequency noise. The resulting curve lags the true signal — the tighter the window, the less lag; the wider the window, the smoother the result.

**Use it when:**

- You want to see the general level of a noisy sensor (temperature, humidity, energy)
- You are trying to compare two smoothed series to spot divergence
- The window length roughly matches the natural timescale of the change you are investigating (e.g. a 1h window for heating dynamics, a 24h window for daily patterns)

**Avoid it when:** You need responsiveness to recent changes. Because the window weights all points equally, a sharp step up will only be fully reflected in the average after the window has fully moved past the step.

### Exponential moving average (EMA)

A weighted average where recent points contribute more than older ones. The `alpha` parameter controls responsiveness: values near 1 track the signal closely with little smoothing; values near 0 produce a heavily smoothed curve that responds slowly.

The window selector maps to alpha values tuned for typical HA data cadences:
`30m → 0.97`, `1h → 0.92`, `6h → 0.75`, `24h → 0.50`, `7d → 0.25`, `14d → 0.15`, `21d → 0.10`, `28d → 0.07`.

**What it shows:** The local level of the data, like rolling average, but with less lag. A step change will begin appearing in the EMA immediately; a rolling average of equivalent width will not reflect it until the window moves past the old values.

**Use it when:**

- You want smoothing similar to rolling average but with faster response to real changes
- You are investigating whether a recent change represents a new pattern or a transient spike
- The data has an irregular update cadence (EMA is computed point-to-point, so it does not require evenly spaced samples)

**Avoid it when:** You need a precise, interpretable window like "the average over the last hour". EMA is adaptive and does not have a hard time boundary, so its output at any point blends all past data with exponentially decaying weight.

### Polynomial trend (quadratic)

A quadratic (degree-2) curve fitted globally to all visible points using least-squares regression.

**What it shows:** The overall shape of the data — whether it is arcing upward, bending back down, or following a U or inverted-U curve. A linear trend can only say "up" or "down"; the polynomial trend can also say "accelerating" or "decelerating".

**Use it when:**

- You suspect a non-linear drift — for example a battery whose discharge rate changes over time, or a room that heats quickly then tapers off
- You want to see whether a recovery is complete or still in progress
- Seasonal effects within the window create a visible curve

**Avoid it when:**

- The data is periodic or highly variable — the polynomial fit covers the entire window and will be distorted by extreme values at either end
- You only need a directional signal; use linear trend instead as it is easier to interpret

### LOWESS (Locally Weighted Scatterplot Smoothing)

A non-parametric smoother that computes a weighted local linear regression at each point, using only nearby data within a bandwidth window. The tricubic weight function gives maximum influence to very close neighbors and smoothly reduces weight toward the bandwidth boundary.

**What it shows:** The underlying shape of the data without assuming any global functional form. LOWESS can follow curves, plateaus, transitions, and reversals that would require a high-degree polynomial to approximate analytically.

**Use it when:**

- The signal has a complex or unknown shape — for example temperature that rises, plateaus during occupancy, then drops overnight
- You want a visually clean, intuitive curve that roughly follows the "center" of the data at every local region
- You are investigating whether a specific period deviates from the local pattern (compare the LOWESS curve to the raw signal)
- The window selector controls locality: a 1h bandwidth tracks rapid changes; a 24h bandwidth gives a broad global shape

**Avoid it when:**

- The series is very short (fewer than 5–10 points) — local regression needs enough neighbors to be meaningful
- You need a mathematically interpretable output; LOWESS is empirical and does not produce slope or intercept values

### Rate of change

Computes the per-hour rate of change between each point and a lookback comparison. In point-to-point mode, each point is compared to the immediately preceding one. In windowed mode (e.g. 1h), each point is compared to the nearest point that is at least one window-width earlier.

**What it shows:** How fast the value is changing, expressed in units per hour. A flat original series produces a rate near zero. A sharp spike appears as a large positive or negative value.

**Use it when:**

- You want to confirm whether a temperature is rising or falling fast enough to be significant
- You are investigating an abrupt event — an open window, a power surge, a pump starting — that shows up as a spike in rate of change
- You are comparing rate-of-change between two periods to detect whether the dynamics have changed (e.g. heating slower than it used to be)
- Point-to-point mode is useful for fine-grained detection; windowed mode reduces noise from rapid oscillations

**Avoid it when:** The sensor updates irregularly or has long gaps — rate of change over a large gap can produce misleadingly large or small values. Use a windowed mode with a window wider than typical gaps to reduce this.

---

## Anomaly detection

Anomaly detection is designed to help you spot suspicious patterns in time series without having to inspect every line manually.
Expand Down
130 changes: 130 additions & 0 deletions custom_components/hass_datapoints/anomaly_detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ def _persistence_flat_fraction(sensitivity: str) -> float:
# ---------------------------------------------------------------------------

_TREND_WINDOWS_MS: dict[str, int] = {
"30m": 1_800_000,
"1h": 3_600_000,
"2h": 7_200_000,
"3h": 10_800_000,
"6h": 21_600_000,
"24h": 86_400_000,
"7d": 604_800_000,
Expand Down Expand Up @@ -142,11 +145,138 @@ def _build_linear_trend(pts: list) -> list:
]


_EMA_ALPHAS = {
"30m": 0.97,
"1h": 0.92,
"2h": 0.88,
"3h": 0.84,
"6h": 0.75,
"24h": 0.5,
"7d": 0.25,
"14d": 0.15,
"21d": 0.1,
"28d": 0.07,
}

_LOWESS_FRACTIONS = {
"30m": 0.05,
"1h": 0.1,
"2h": 0.13,
"3h": 0.16,
"6h": 0.2,
"24h": 0.3,
"7d": 0.4,
"14d": 0.55,
"21d": 0.7,
"28d": 0.85,
}


def _build_ema(pts: list, alpha: float) -> list:
if len(pts) < 2:
return []
a = max(0.0, min(1.0, alpha))
result = [[pts[0][0], pts[0][1]]]
for i in range(1, len(pts)):
ema = a * pts[i][1] + (1 - a) * result[-1][1]
result.append([pts[i][0], ema])
return result


def _build_polynomial_trend(pts: list) -> list:
if len(pts) < 3:
return []
origin = pts[0][0]
scale = (pts[-1][0] - origin) or 1.0
s0 = s1 = s2 = s3 = s4 = 0.0
t0 = t1 = t2 = 0.0
for time, value in pts:
x = (time - origin) / scale
x2 = x * x
s0 += 1
s1 += x
s2 += x2
s3 += x2 * x
s4 += x2 * x2
t0 += value
t1 += x * value
t2 += x2 * value
det = s0 * (s2 * s4 - s3 * s3) - s1 * (s1 * s4 - s3 * s2) + s2 * (s1 * s3 - s2 * s2)
if not math.isfinite(det) or abs(det) < 1e-12:
return []
a = (
t0 * (s2 * s4 - s3 * s3) - s1 * (t1 * s4 - s3 * t2) + s2 * (t1 * s3 - s2 * t2)
) / det
b = (
s0 * (t1 * s4 - s3 * t2) - t0 * (s1 * s4 - s3 * s2) + s2 * (s1 * t2 - t1 * s2)
) / det
c = (
s0 * (s2 * t2 - t1 * s3) - s1 * (s1 * t2 - t1 * s2) + t0 * (s1 * s3 - s2 * s2)
) / det
return [
[time, a + b * ((time - origin) / scale) + c * ((time - origin) / scale) ** 2]
for time, _ in pts
]


def _build_lowess(pts: list, bandwidth: float) -> list:
if len(pts) < 2:
return []
MAX_INPUT = 2000
MAX_OUTPUT = 300
n = len(pts)

def subsample(total: int, max_count: int) -> list:
if total <= max_count:
return list(range(total))
return [round(i / (max_count - 1) * (total - 1)) for i in range(max_count)]

input_idx = subsample(n, MAX_INPUT)
output_idx = subsample(n, MAX_OUTPUT)

result = []
for oi in output_idx:
xi = pts[oi][0]
sum_w = sum_wx = sum_wy = sum_wxx = sum_wxy = 0.0
for k in range(len(input_idx)):
d = abs(pts[input_idx[k]][0] - xi)
if d >= bandwidth:
continue
norm_dist = d / bandwidth
u = 1 - norm_dist**3
w = u**3
if w <= 0:
continue
xj, yj = pts[input_idx[k]][0], pts[input_idx[k]][1]
sum_w += w
sum_wx += w * xj
sum_wy += w * yj
sum_wxx += w * xj * xj
sum_wxy += w * xj * yj
denom = sum_w * sum_wxx - sum_wx * sum_wx
if not math.isfinite(denom) or abs(denom) < 1e-12:
result.append([xi, sum_wy / sum_w if sum_w > 0 else pts[oi][1]])
continue
slope = (sum_w * sum_wxy - sum_wx * sum_wy) / denom
intercept = (sum_wy - slope * sum_wx) / sum_w
result.append([xi, intercept + slope * xi])
return result


def _build_trend_pts(pts: list, method: str, trend_window: str) -> list:
if len(pts) < 2:
return []
if method == "linear_trend":
return _build_linear_trend(pts)
if method == "ema":
return _build_ema(pts, _EMA_ALPHAS.get(trend_window, 0.5))
if method == "polynomial_trend":
return _build_polynomial_trend(pts)
if method == "lowess":
fraction = _LOWESS_FRACTIONS.get(trend_window, 0.3)
span = (pts[-1][0] - pts[0][0]) if len(pts) >= 2 else 0
bandwidth = fraction * span if span > 0 else fraction
return _build_lowess(pts, bandwidth)
return _build_rolling_average(pts, _trend_window_ms(trend_window))


Expand Down
Loading
Loading