Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@
_(nothing active — pick the next batch from below)_

## Deferred / future refinements
- [ ] **Community albums — semantic-similarity + device-tier enhancements.** P13e-1 detects communities over
the **entity** graph only (every-device). Fold **semantic-similarity edges** (looser-threshold vector kNN)
into the same label-propagation graph on capable tiers for richer thematic grouping, gated like the
Suggested albums; possibly tune cluster bounds (`minSize`/`maxSize`/`maxGroupSize`) per device tier.
*(From P13e-1.)*
- [ ] **Community albums — richer labels + caching.** `clusterLabel` resolves dominant **tag → uploader → site
→ title**; consider playlist names, LLM-named groups (capable tiers), and caching/persisting discovered
albums (today they recompute query-time, like Suggested). *(From P13e-1.)*
- [ ] **GraphRAG — search/filter conversations.** The chat list (P13d-2b) shows every active chat; once a user
accumulates many, add a search field (by title/preview) and/or pinning. *(From P13d-2b.)*
- [ ] **GraphRAG — bulk chat management + swipe gestures.** Multi-select archive/delete and a swipe-to-archive
Expand Down
2 changes: 1 addition & 1 deletion docs/GRAPH-SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ gracefully when `GraphStore.isAvailable` is false.
| **Proactive grouping** (P10c-d) | **d-1 (live, every device):** a distinct **Duplicates** auto-album in Collections→Albums (exact-hash `duplicateOf`), with bulk **Clean up** (keep oldest) + **Review** → the cleanup screen. **d-2 (graph, live):** **Suggested** similarity albums — query-time vector clusters (pairwise cosine + connected components in `near_duplicate_clustering.dart`, exact pairs excluded) with one-tap **Save as collection**. The richer community-detection / label-propagation auto-albums + "Rediscover" stay **P13** (below). |
| **Tag suggestions** (P10c-c-2, **live**) | `GraphQueryService.coOccurringTags`: tags on items sharing a deterministic signal with this one (`postedBy`/`inPlaylist`/`taggedWith`/`coDownloadedWith`), minus the item's own tags, ranked in Dart (`cooccurrence_ranking.dart`) by distinct supporting items. Surfaced as tappable chips in the metadata editor. Pure Datalog — every device. |
| **Interactive graph viz** (P10c-e/f) | **e (render, live):** `GraphQueryService.neighborhood(id)` (an item's direct entity + duplicate/co-download edges) → a force-directed `graphview` render with pan/zoom + a type legend, reached via item-detail "View in graph". Deterministic edges — no embedder. **f (live):** tap a media node → its item, tap an entity node → expand its media (and long-press → open hub), edge-type **legend filters**, expansion capped (`:limit`). |
| **Graph-clustered auto-albums** (P13) | **community detection / label propagation** over the similarity + entity graph. |
| **Graph-clustered auto-albums** (P13) | **community detection / label propagation** over the similarity + entity graph. **Realized P13e-1** over the **entity** graph (shared uploader/playlist/tag + co-download) via pure deterministic label propagation (`community_clustering.dart`) — every-device, no embedder; surfaced as the **"Discovered"** album section. Folding in semantic-similarity edges is deferred (BACKLOG). |
| **Centrality "Rediscover"** (P13) | `PageRank` / betweenness × `lastAccessedAt` to resurface central-but-stale items. |
| **Path / bridge discovery** (P13) | shortest-path / connectivity between two items or entities. |
| **Local GraphRAG "Ask your library"** (P13) | hybrid retrieval (vector + graph re-rank) feeds the on-device LLM (see `AI-SPEC.md`). |
Expand Down
8 changes: 8 additions & 0 deletions docs/VERIFICATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,14 @@ entries, or verify after P11c lands.)*
- [ ] **RAM co-residency (the P12d-2 carry-over):** on real **low/mid** hardware, a full generated, cited
answer runs while the Cozo **HNSW index is live** — no OOM, crash, or jank; repeated turns stay stable.

### P13e-1 — Community-detection auto-albums *(install `app-arm64-v8a-debug.apk`; needs a real library)*
- [ ] On a library with shared signals (same channels / playlists / tags / co-downloaded batches), Collections
→ **Albums** shows a **"Discovered"** section of coherent multi-signal groups, each labeled by its
dominant shared signal ("Around 'recipes'", "Mostly <channel>", …).
- [ ] Works on a **low-end device** (no generation model / embedder needed — entity graph only).
- [ ] Tapping a discovered album opens its items; **Save** creates a normal collection containing them.
- [ ] The Discovered section is **absent** when the graph index is unavailable (e.g. unsupported ABI).

### P13 (later subphases)
- [ ] **Transcription / summarization / translation / OCR** each work (capability-gated) and write
results back to the item.
Expand Down
15 changes: 12 additions & 3 deletions docs/design/P13-PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,15 +292,24 @@ Incapable / low tiers fall back to an ephemeral **retrieval-only** answer (d-3).
generation + the live HNSW index together within memory budget (verified on real hardware). ✓ (CI parts) ·
APK owed (RAM co-residency on real low/mid hardware)

### `[ ]` P13e — Advanced graph analytics & viz *(graph; split into 3 PRs)*
### `[~]` P13e — Advanced graph analytics & viz *(graph; split into 3 PRs)*
The richer graph payoff beyond P10's Duplicates + Suggested-similarity albums (GRAPH-SPEC §7). Runs via
`GraphStore.runScript` / `GraphQueryService`; device-universal (deterministic graph algorithms, no LLM).

#### `[ ]` P13e-1 — Community-detection auto-albums *(graph; APK)*
#### `[~]` P13e-1 — Community-detection auto-albums *(graph; APK)*
- **Label-propagation / community detection** over the similarity + entity graph → richer auto-albums,
surfaced in Collections beside the existing auto-albums with one-tap **Save as collection**.
- **Status:** implemented (CI-green; APK spot-check owed). Runs over the **entity graph** (shared
uploader/playlist/tag + co-download) — **every-device, no embedder** (maintainer call; semantic-similarity +
tier enhancements → BACKLOG). New `lib/core/graph/community_clustering.dart` (pure **deterministic label
propagation**, mirrors `near_duplicate_clustering.dart`; prunes over-generic buckets); `cozo_query.dart`
`entityMembershipScript()` + `coDownloadPairsScript()`; `GraphQueryService.communityClusters()`;
`clusteredAlbumsProvider` + `clusterLabel` (dominant **tag → uploader → site → title**) reusing the
`SuggestedAlbum` model + `/suggested-album` screen; a **"Discovered"** section in Collections → Albums. **No
schema, no deps.** Tests: clusterer (determinism, web-merge, bucket pruning, min/max, dominant tag), the two
scripts, `communityClusters` decode, the provider (hydrate/label/empty), and the Discovered section.
- **Exit / review:** clusters are coherent on a real library; degrades to nothing when the graph is
unavailable; saving a cluster creates a normal collection.
unavailable; saving a cluster creates a normal collection. ✓ (CI parts) · APK owed

#### `[ ]` P13e-2 — Centrality "Rediscover" *(graph; APK)*
- **PageRank / betweenness × `lastAccessedAt`** to resurface central-but-stale items; surfaced as a
Expand Down
152 changes: 152 additions & 0 deletions lib/core/graph/community_clustering.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
/// Pure community detection for the "Discovered" auto-albums (P13e-1). No
/// Flutter, engine, or AI imports — `GraphQueryService` feeds it the decoded
/// entity memberships + co-download pairs and it returns id communities, so the
/// whole thing is unit-testable. The looser, thematic cousin of the tight
/// similarity clustering in `near_duplicate_clustering.dart`.
library;

/// A detected community: its member ids and the dominant shared **tag** across
/// members (for labeling, when one stands out), if any.
class Community {
const Community({required this.items, this.dominantTag});

final List<String> items;
final String? dominantTag;
}

/// Groups items into communities via **deterministic label propagation** over
/// the entity graph: items sharing an entity bucket ([memberships]; `group` is a
/// type-prefixed key like `u:<id>` / `p:<id>` / `t:<tag>`) are connected, plus
/// the direct [pairs] edges (co-download). Buckets with more than [maxGroupSize]
/// members are dropped as **too generic** (a mega-tag/uploader would merge
/// unrelated items — the "discard blobs" rule from the similarity clusterer).
///
/// Only communities with `minSize ≤ size ≤ maxSize` are returned, largest-first;
/// ids keep first-seen order. Label propagation is sequential (each node adopts
/// the most-frequent neighbour label, ties broken by the lexicographically
/// smallest label) over a fixed node order, so the result is deterministic.
List<Community> detectCommunities({
required List<({String item, String group})> memberships,
required List<({String a, String b})> pairs,
int minSize = 3,
int maxSize = 30,
int maxGroupSize = 50,
int maxIterations = 20,
}) {
// Stable node order = first appearance (memberships, then pairs).
final order = <String>[];
final seen = <String>{};
void note(String x) {
if (seen.add(x)) order.add(x);
}

final byGroup = <String, List<String>>{};
for (final m in memberships) {
note(m.item);
(byGroup[m.group] ??= <String>[]).add(m.item);
}
for (final p in pairs) {
note(p.a);
note(p.b);
}

// Undirected adjacency among items.
final adj = <String, Set<String>>{};
void link(String a, String b) {
if (a == b) return;
(adj[a] ??= <String>{}).add(b);
(adj[b] ??= <String>{}).add(a);
}

for (final members in byGroup.values) {
// Singletons contribute no edge; oversized buckets are too generic.
if (members.length < 2 || members.length > maxGroupSize) continue;
for (var i = 0; i < members.length; i++) {
for (var j = i + 1; j < members.length; j++) {
link(members[i], members[j]);
}
}
}
for (final p in pairs) {
link(p.a, p.b);
}

final nodes = [
for (final n in order)
if (adj.containsKey(n)) n,
];
if (nodes.isEmpty) return const [];

final label = {for (final n in nodes) n: n};
for (var iter = 0; iter < maxIterations; iter++) {
var changed = false;
for (final n in nodes) {
final counts = <String, int>{};
for (final nb in adj[n]!) {
final l = label[nb]!;
counts[l] = (counts[l] ?? 0) + 1;
}
if (counts.isEmpty) continue;
var best = label[n]!;
var bestCount = -1;
counts.forEach((l, c) {
if (c > bestCount || (c == bestCount && l.compareTo(best) < 0)) {
best = l;
bestCount = c;
}
});
if (best != label[n]) {
label[n] = best;
changed = true;
}
}
if (!changed) break;
}

// Tags carried by each item (for the dominant-tag label seed).
final tagsByItem = <String, List<String>>{};
for (final m in memberships) {
if (m.group.startsWith('t:')) {
(tagsByItem[m.item] ??= <String>[]).add(m.group.substring(2));
}
}

final byLabel = <String, List<String>>{};
for (final n in nodes) {
(byLabel[label[n]!] ??= <String>[]).add(n);
}

final communities = <Community>[
for (final members in byLabel.values)
if (members.length >= minSize && members.length <= maxSize)
Community(
items: members,
dominantTag: _dominantTag(members, tagsByItem),
),
]..sort((a, b) => b.items.length.compareTo(a.items.length));
return communities;
}

/// The tag shared by the most members (support ≥ 2), ties broken by the
/// lexicographically smallest tag; `null` if none stands out.
String? _dominantTag(
List<String> members,
Map<String, List<String>> tagsByItem,
) {
final counts = <String, int>{};
for (final id in members) {
for (final t in tagsByItem[id] ?? const <String>[]) {
counts[t] = (counts[t] ?? 0) + 1;
}
}
String? best;
var bestCount = 0;
counts.forEach((t, c) {
if (c > bestCount ||
(c == bestCount && best != null && t.compareTo(best!) < 0)) {
best = t;
bestCount = c;
}
});
return bestCount >= 2 ? best : null; // support ≥ 2 to be "dominant"
}
17 changes: 17 additions & 0 deletions lib/core/graph/cozo_query.dart
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,23 @@ String allEmbeddingsScript() => '?[id, v] := *embedding{id, v}';
String allDuplicatePairsScript() =>
'?[a, b] := *duplicateOf{mediaId: a, otherId: b}';

/// Global entity-membership for community detection (P13e-1), as `[mediaId,
/// kind, key]` rows: `kind` ∈ `u | p | t` (uploader / playlist / tag), `key` the
/// shared entity id/tag. The Dart clusterer composes a `"$kind:$key"` bucket and
/// connects items sharing one. **Site is excluded** (too coarse — it would merge
/// everything from a platform). Linear in edges (one row per membership, not per
/// pair). Pure Datalog — no vector syntax.
String entityMembershipScript() =>
'?[mediaId, kind, key] := *postedBy{mediaId, uploaderId: key}, kind = "u"\n'
'?[mediaId, kind, key] := *inPlaylist{mediaId, playlistId: key}, kind = "p"\n'
'?[mediaId, kind, key] := *taggedWith{mediaId, tag: key}, kind = "t"';

/// Every co-download pair as `[a, b]` (one direction per stored row). Co-download
/// is inherently pairwise, so it joins the community graph as a direct item–item
/// edge rather than an entity bucket.
String coDownloadPairsScript() =>
'?[a, b] := *coDownloadedWith{mediaId: a, otherId: b}';

/// Tags co-occurring with item `$id`: tags on the items that share a
/// deterministic signal with it (same uploader/playlist/tag/co-download),
/// excluding the tags `$id` already carries. Emits one `[other, tag]` row per
Expand Down
37 changes: 37 additions & 0 deletions lib/core/graph/graph_query_service.dart
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import 'package:grabbit/core/graph/community_clustering.dart';
import 'package:grabbit/core/graph/cooccurrence_ranking.dart';
import 'package:grabbit/core/graph/cozo_query.dart';
import 'package:grabbit/core/graph/graph_store.dart';
Expand Down Expand Up @@ -162,6 +163,42 @@ class GraphQueryService {
);
}

/// Thematic **communities** over the entity graph (P13e-1) — items linked
/// through a web of shared uploader/playlist/tag signals + co-download edges,
/// grouped by deterministic label propagation. Every-device (pure Datalog +
/// Dart; no embedder, unlike [similarityClusters]). `[]` when unavailable.
Future<List<Community>> communityClusters({
int minSize = 3,
int maxSize = 30,
int maxGroupSize = 50,
}) async {
if (!_store.isAvailable) return const [];
final memberships = [
for (final r in decodeRows(
await _store.runScript(entityMembershipScript()),
))
if (r['mediaId'] case final Object id)
if (r['kind'] case final Object kind)
if (r['key'] case final Object key)
(item: id.toString(), group: '$kind:$key'),
];
if (memberships.isEmpty) return const [];
final pairs = [
for (final r in decodeRows(
await _store.runScript(coDownloadPairsScript()),
))
if (r['a'] case final Object a)
if (r['b'] case final Object b) (a: a.toString(), b: b.toString()),
];
return detectCommunities(
memberships: memberships,
pairs: pairs,
minSize: minSize,
maxSize: maxSize,
maxGroupSize: maxGroupSize,
);
}

/// The immediate graph neighborhood of media item [id] — its connected
/// entities + directly-linked media — for the graph-view render (P10c-e).
/// `[]` when the store is unavailable. Pure deterministic edges; no embedder.
Expand Down
100 changes: 100 additions & 0 deletions lib/features/library/presentation/clustered_albums_provider.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:grabbit/core/db/database.dart';
import 'package:grabbit/core/db/database_provider.dart';
import 'package:grabbit/core/graph/graph_query_provider.dart';
import 'package:grabbit/features/library/presentation/suggested_albums_provider.dart';

// Hand-written (returns Drift `MediaItem` rows): thematic community clusters over
// the entity graph the user can save as a collection (P13e-1). Reuses the
// `SuggestedAlbum` model + screen.

/// "Discovered" albums from on-device **entity-graph community detection**
/// (P13e-1), hydrated to [MediaItem]s. Every-device — needs only the graph
/// store, no embedder (unlike [suggestedAlbumsProvider]). `[]` when the graph is
/// unavailable or nothing clusters.
final clusteredAlbumsProvider = FutureProvider<List<SuggestedAlbum>>((
ref,
) async {
final communities = await ref
.watch(graphQueryServiceProvider)
.communityClusters();
if (communities.isEmpty) return const [];

final db = ref.watch(appDatabaseProvider);
final allIds = {for (final c in communities) ...c.items};
final found = await (db.select(
db.mediaItems,
)..where((t) => t.id.isIn(allIds.toList()))).get();
final byId = {for (final m in found) m.id: m};

// Batched signals for labeling (one query each, no N+1).
final uploaderById = {
for (final r in await (db.select(
db.mediaMetadata,
)..where((t) => t.itemId.isIn(allIds.toList()))).get())
r.itemId: r.uploader,
};

final albums = <SuggestedAlbum>[];
for (final community in communities) {
// Members may have been deleted since the graph was built.
final items = [
for (final id in community.items)
if (byId[id] case final MediaItem m) m,
];
if (items.length < 3) continue;
albums.add(
SuggestedAlbum(
label: clusterLabel(
items,
dominantTag: community.dominantTag,
uploaderById: uploaderById,
),
items: items,
),
);
}
return albums;
});

/// Labels a community by its **dominant shared signal** (P13e-1): the tag most
/// members share, else the most common uploader, else the most common site, else
/// the newest item's title (the Suggested-album style). Pure + unit-testable.
String clusterLabel(
List<MediaItem> items, {
String? dominantTag,
Map<String, String?> uploaderById = const {},
}) {
if (dominantTag != null && dominantTag.trim().isNotEmpty) {
return "Around '${dominantTag.trim()}'";
}
final uploader = _mostCommon([for (final it in items) uploaderById[it.id]]);
if (uploader != null) return 'Mostly $uploader';
final site = _mostCommon([for (final it in items) it.site]);
if (site != null) return 'Mostly $site';

final rep = items.reduce((a, b) => a.createdAt.isAfter(b.createdAt) ? a : b);
final t = rep.title.trim();
final short = t.length > 40 ? '${t.substring(0, 40).trimRight()}…' : t;
return "Like '$short'";
}

/// The most frequent non-blank value with support ≥ 2 (ties → lexicographically
/// smallest); `null` if none stands out.
String? _mostCommon(List<String?> values) {
final counts = <String, int>{};
for (final v in values) {
final s = v?.trim();
if (s != null && s.isNotEmpty) counts[s] = (counts[s] ?? 0) + 1;
}
String? best;
var bestCount = 0;
counts.forEach((s, c) {
if (c > bestCount ||
(c == bestCount && best != null && s.compareTo(best!) < 0)) {
best = s;
bestCount = c;
}
});
return bestCount >= 2 ? best : null; // a single occurrence isn't "common"
}
Loading
Loading