Skip to content

[Rule] LongestCommonSubsequence to MaximumIndependentSet #109

@zazabap

Description

@zazabap

Source: LongestCommonSubsequence
Target: MaximumIndependentSet
Motivation: Connects string problems to graph theory; enables solving LCS via MaxIS solvers and reveals structural equivalence between subsequence ordering and graph independence.
Reference: Apostolico & Guerra, 1987; Baxter et al., 2004

Reduction Algorithm

Notation:

  • Source: $k$ strings $s_1, \ldots, s_k$ over alphabet $\Sigma$, with lengths $n_1, \ldots, n_k$
  • Target: undirected graph $G = (V, E)$

Step 1 — Construct match nodes $V$:

A node $v = (p_1, p_2, \ldots, p_k)$ is created for each $k$-tuple of positions such that $s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]$ (all strings have the same character at their respective positions).

$$V = {(p_1, \ldots, p_k) : s_1[p_1] = s_2[p_2] = \cdots = s_k[p_k]}$$

Step 2 — Construct conflict edges $E$:

Two nodes $u = (a_1, \ldots, a_k)$ and $v = (b_1, \ldots, b_k)$ are connected by an edge if they conflict — i.e., they cannot both appear in a valid common subsequence. A conflict occurs when the position differences are not consistently ordered across all strings:

$${u, v} \in E \iff \text{NOT}\ (\forall i: a_i < b_i) \ \text{AND NOT}\ (\forall i: a_i > b_i)$$

In other words, there exist indices $i, j$ such that $a_i &lt; b_i$ but $a_j &gt; b_j$ (a "crossing"), or $a_i = b_i$ for some $i$ (same position used twice).

Variable mapping: Each LCS character corresponds to a selected node; each MaxIS vertex in the independent set corresponds to a matched position tuple.

Objective: $\text{MaxIS}(G) = \text{LCS}(s_1, \ldots, s_k)$. An independent set of size $L$ gives a common subsequence of length $L$, and vice versa.

Size Overhead

Target metric (code name) Polynomial (using symbols above)
num_vertices $\leq \prod_{i=1}^{k} n_i$ (number of character-match $k$-tuples)
num_edges $\leq \binom{

For $k = 2$ strings of length $n$: at most $n^2$ vertices and $O(n^4)$ edges.

Validation Method

Closed-loop testing: solve LCS by brute-force (enumerate subsequences of shortest string), solve the reduced MaxIS instance, and verify that both give the same optimal value. Test on instances with varying alphabet sizes and string counts ($k = 2, 3, 4$).

Example

Source instance: $k = 2$, $s_1$ = ABAC, $s_2$ = BACA, alphabet $\Sigma = {A, B, C}$.

Step 1 — Match nodes:

Node $s_1$ pos $s_2$ pos Character
$v_0$ 0 1 A
$v_1$ 0 3 A
$v_2$ 1 0 B
$v_3$ 2 1 A
$v_4$ 2 3 A
$v_5$ 3 2 C

Step 2 — Conflict edges (9 edges):

Edge Reason
$v_0 - v_1$ same $s_1$ position (0)
$v_0 - v_2$ crossing: $s_1$: $0&lt;1$, $s_2$: $1&gt;0$
$v_0 - v_3$ same $s_2$ position (1)
$v_1 - v_2$ crossing: $s_1$: $0&lt;1$, $s_2$: $3&gt;0$
$v_1 - v_3$ crossing: $s_1$: $0&lt;2$, $s_2$: $3&gt;1$
$v_1 - v_4$ same $s_2$ position (3)
$v_1 - v_5$ crossing: $s_1$: $0&lt;3$, $s_2$: $3&gt;2$
$v_3 - v_4$ same $s_1$ position (2)
$v_4 - v_5$ crossing: $s_1$: $2&lt;3$, $s_2$: $3&gt;2$

MaxIS solution: ${v_2, v_3, v_5}$ (size 3, unique).

Node Position Char
$v_2$ $s_1[1], s_2[0]$ B
$v_3$ $s_1[2], s_2[1]$ A
$v_5$ $s_1[3], s_2[2]$ C

Extracted LCS: BAC (length 3). Verified: B-A-C is a subsequence of both ABAC (positions 1,2,3) and BACA (positions 0,1,2).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions