Skip to content

[Rule] LongestCommonSubsequence to ILP #110

@zazabap

Description

@zazabap

Source: LongestCommonSubsequence
Target: ILP
Motivation: Enables solving LCS instances via ILP solvers; the match-pair formulation naturally encodes subsequence ordering as linear constraints.
Reference: Lancia et al., 2001; general ILP formulations for sequence alignment in Althaus et al., 2006

Reduction Algorithm

Notation:

  • Source: $k = 2$ strings $s_1$ (length $n_1$) and $s_2$ (length $n_2$) over alphabet $\Sigma$
  • Target: ILP with binary variables, linear constraints, and a maximize objective

For general $k$ strings: the formulation extends by introducing match $k$-tuple variables and pairwise ordering constraints across all $\binom{k}{2}$ string pairs.

Variable mapping:

For each pair of positions $(j_1, j_2)$ where $s_1[j_1] = s_2[j_2]$, create a binary variable:

$$m_{j_1, j_2} \in {0, 1}$$

meaning "position $j_1$ of $s_1$ is matched to position $j_2$ of $s_2$ in the common subsequence."

Total variables: $|M|$ where $M = {(j_1, j_2) : s_1[j_1] = s_2[j_2]}$, bounded by $n_1 \cdot n_2$.

Constraints:

  1. Each position in $s_1$ matched at most once: for each $j_1 \in {0, \ldots, n_1-1}$,
    $$\sum_{j_2 : (j_1,j_2) \in M} m_{j_1, j_2} \leq 1$$

  2. Each position in $s_2$ matched at most once: for each $j_2 \in {0, \ldots, n_2-1}$,
    $$\sum_{j_1 : (j_1,j_2) \in M} m_{j_1, j_2} \leq 1$$

  3. Order preservation (no crossings): for all $(j_1, j_2), (j_1', j_2') \in M$ with $j_1 < j_1'$ and $j_2 > j_2'$,
    $$m_{j_1, j_2} + m_{j_1', j_2'} \leq 1$$

Objective: maximize $\sum_{(j_1,j_2) \in M} m_{j_1, j_2}$

Solution extraction: The matched positions with $m_{j_1,j_2} = 1$, sorted by $j_1$, give the characters of the LCS.

Size Overhead

Target metric (code name) Polynomial (using symbols above)
num_vars $\leq n_1 \cdot n_2$ (number of character-match pairs)
num_constraints $\leq n_1 + n_2 + \binom{

Validation Method

Closed-loop testing: solve LCS by brute-force, solve the reduced ILP, and verify that both give the same optimal objective. Test with varying alphabet sizes and string lengths, including edge cases (identical strings, no common characters, single-character alphabet).

Example

Source instance: $s_1$ = ABAC, $s_2$ = BACA, alphabet $\Sigma = {A, B, C}$.

Variables (6 match pairs):

Variable $s_1$ pos $s_2$ pos Character
$m_{0,1}$ 0 1 A
$m_{0,3}$ 0 3 A
$m_{1,0}$ 1 0 B
$m_{2,1}$ 2 1 A
$m_{2,3}$ 2 3 A
$m_{3,2}$ 3 2 C

Constraints (13 total):

Assignment for $s_1$: $m_{0,1} + m_{0,3} \leq 1$, $m_{2,1} + m_{2,3} \leq 1$ (plus 2 trivial single-variable constraints).

Assignment for $s_2$: $m_{0,1} + m_{2,1} \leq 1$, $m_{0,3} + m_{2,3} \leq 1$ (plus 2 trivial).

No-crossing: $m_{0,1} + m_{1,0} \leq 1$, $m_{0,3} + m_{1,0} \leq 1$, $m_{0,3} + m_{2,1} \leq 1$, $m_{0,3} + m_{3,2} \leq 1$, $m_{2,3} + m_{3,2} \leq 1$.

Objective: maximize $m_{0,1} + m_{0,3} + m_{1,0} + m_{2,1} + m_{2,3} + m_{3,2}$

Optimal ILP solution: objective = 3, with $m_{1,0} = m_{2,1} = m_{3,2} = 1$ (all others 0).

Extracted LCS: $s_1[1] = s_2[0]$ = B, $s_1[2] = s_2[1]$ = A, $s_1[3] = s_2[2]$ = C → BAC (length 3), matching the brute-force LCS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions