The Logic Network Generator transforms Reactome pathway data into directed logic networks suitable for perturbation analysis and pathway flow studies. The system decomposes complex biochemical structures (complexes and entity sets) into individual components and creates a network where edges represent biochemical transformations.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Reactome Neo4j Database β
β (Biological Pathway Data) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Neo4j Queries
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β reaction_connections_{pathway_id}.csv β
β (Connections between reactions: preceding β following) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Decomposition
β (Break complexes/sets into components)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β decomposed_uid_mapping_{pathway_id}.csv β
β (Maps hashes to individual physical entities - proteins, etc.) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Hungarian Algorithm
β (Optimal input/output pairing)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β best_matches_{pathway_id}.csv β
β (Pairs of input/output combinations within reactions) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Logic Network Generation
β (Create transformation edges)
β (Position-aware UUID assignment)
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β pathway_logic_network.csv β
β (source_id β target_id edges with AND/OR logic annotations) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β UUID Mapping Export
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β uuid_to_reactome_{pathway_id}.csv β
β (Maps UUIDs back to Reactome database IDs) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
In Reactome, a :PhysicalEntity represents any biological molecule or complex:
- Simple molecules (ATP, water)
- Proteins (individual gene products)
- Complexes (protein complexes like Complex(A,B,C))
- Entity sets (alternative molecules like EntitySet(IsoformA, IsoformB))
Complex structures are broken down into individual components:
Input: Complex(ProteinA, ProteinB, EntitySet(ATP, GTP))
β decomposition
Output:
- Combination 1: ProteinA, ProteinB, ATP
- Combination 2: ProteinA, ProteinB, GTP
This creates all possible molecular combinations through cartesian product, preserving biological alternatives.
A single biological reaction in Reactome may represent multiple transformations after decomposition:
Biological Reaction (Reactome ID: 12345):
Inputs: Complex(A,B), ATP
Outputs: Complex(A,B,P), ADP
After decomposition and best matching:
Virtual Reaction 1 (UID: uuid-1, Reactome ID: 12345):
input_hash: "hash-of-[A,B,ATP]"
output_hash: "hash-of-[A,B,P,ADP]"
Virtual Reaction 2 (UID: uuid-2, Reactome ID: 12345):
input_hash: "hash-of-[A,B,ATP]"
output_hash: "hash-of-[A,P,B,ADP]"
...
Each virtual reaction gets a unique UID (UUID v4) while preserving the link to the original Reactome reaction ID.
CRITICAL: Edges represent transformations WITHIN reactions, not connections BETWEEN reactions.
Reaction: ATP + Water β ADP + Phosphate
Creates 4 edges (cartesian product):
ATP β ADP
ATP β Phosphate
Water β ADP
Water β Phosphate
Reactions connect implicitly through shared physical entities:
Reaction 1: A β B (creates edge where B is target)
Reaction 2: B β C (creates edge where B is source)
Result: Pathway flow A β B β C (B connects the reactions)
Self-loops are minimized using position-aware UUIDs. When the same entity connects reactions, the union-find algorithm ensures entities in the same connected component share UUIDs, creating intentional self-loops that represent pathway flow, while entities at disconnected positions get different UUIDs.
The system uses position-aware UUIDs to uniquely identify entities at different pathway positions:
Example:
Reaction1 β gene1 β Reaction2
Reaction3 β gene1 β Reaction2
Result: gene1 gets UUID_A (connected component)
But elsewhere:
Reaction100 β gene1 β Reaction101
Result: gene1 gets UUID_B (different position)
Key Properties:
- Entities in same connected component share UUIDs (union-find algorithm)
- Entities at disconnected positions get different UUIDs
- Registry tracks:
(entity_dbId, reaction_uuid, role) β entity_uuid - Results in 0% self-loops in real pathways while maintaining connectivity
See UUID_DESIGN.md for detailed design.
The logic network assigns AND/OR relationships based on how many reactions produce the same physical entity:
OR Relationship (Multiple sources):
R1: Glycolysis β ATP
R2: Oxidative Phosphorylation β ATP
R3: ATP β Energy
For R3: ATP can come from R1 OR R2
Edges: R1βATP (OR), R2βATP (OR)
Then: ATPβR3 (AND - ATP is required)
AND Relationship (Single source):
R1: Glucose β Glucose-6-Phosphate
R2: Glucose-6-Phosphate β ...
Only one source produces Glucose-6-Phosphate
Edge: R1βG6P (AND - required)
Rule:
- Multiple preceding reactions β OR (alternatives)
- Single preceding reaction β AND (required)
- All inputs to reactions are AND (required)
Purpose: Query Reactome Neo4j database
Key Functions:
get_reaction_connections(): Get preceding/following reaction pairsget_catalysts_for_reaction(): Get catalyst relationshipsget_positive/negative_regulators_for_reaction(): Get regulatory relationships
Output: Raw Reactome data as DataFrames
Purpose: Decompose complexes and sets into components
Key Functions:
get_decomposed_uid_mapping(): Main decomposition orchestrator- Handles complexes (using
itertools.productfor combinations) - Handles entity sets (using
itertools.productfor alternatives) - Recursively decomposes nested structures
Output: decomposed_uid_mapping with all molecular combinations
Purpose: Pair input/output combinations optimally
Algorithm: Hungarian algorithm (optimal assignment)
Input: Input combinations and output combinations from same reaction
Output: best_matches DataFrame with optimal pairings
Purpose: Generate the final logic network with position-aware UUIDs
Key Functions:
create_pathway_logic_network(): Main orchestrator_get_or_create_entity_uuid(): Union-find UUID assignment_assign_uuids(): Position-aware UUID generationcreate_reaction_id_map(): Create virtual reactions from best_matchesextract_inputs_and_outputs(): Create transformation edges_determine_edge_properties(): Assign AND/OR logic_add_pathway_connections(): Add edges with cartesian productappend_regulators(): Add catalyst/regulator edgesexport_uuid_to_reactome_mapping(): Export UUIDβdbId mapping
Output:
- Logic network DataFrame with edges and logic annotations
- UUID to Reactome ID mapping for entity tracking
Purpose: Command-line interface for generating pathways
Usage:
# Single pathway
poetry run python bin/create-pathways.py --pathway-id 69620
# Multiple pathways
poetry run python bin/create-pathways.py --pathway-list pathways.tsvPurpose: Create human-readable mapping of database IDs to names
- Root Inputs: Physical entities that only appear as sources (pathway starting points)
- Intermediate Entities: Appear as both sources and targets (connect reactions)
- Terminal Outputs: Physical entities that only appear as targets (pathway endpoints)
-
Main edges: Transformation edges within reactions
edge_type: "input" (single source, AND) or "output" (multiple sources, OR)pos_neg: "pos" (positive transformation)and_or: "and" (required) or "or" (alternative)
-
Regulatory edges: Catalysts and regulators
edge_type: "catalyst" or "regulator"pos_neg: "pos" (positive regulation) or "neg" (negative regulation)and_or: Empty (not applicable to regulation)
- Directed: Edges have direction (source β target)
- Acyclic: No cycles in main transformation edges (within individual reactions)
- Bipartite-like: Entities and reactions connect through transformations
- Minimal self-loops: Position-aware UUIDs minimize self-loops while preserving pathway connectivity
-
Unit Tests (
tests/test_logic_network_generator.py)- Individual helper functions
- Position-aware UUID assignment with union-find
- Edge property determination
-
Integration Tests (
tests/test_edge_direction_integration.py)- Multi-reaction pathways
- End-to-end data flow
-
Semantic Tests (
tests/test_transformation_semantics.py)- Cartesian product correctness
- Edge direction validation
- Transformation logic
-
Invariant Tests (
tests/test_network_invariants.py)- No self-loops
- Root inputs only as sources
- Terminal outputs only as targets
- AND/OR logic consistency
-
Logic Tests (
tests/test_and_or_logic.py)- Multiple sources β OR
- Single source β AND
- User requirement validation
-
Validation Tests (
tests/test_input_validation.py)- Empty DataFrame handling
- Missing column detection
- Error message clarity
- 73+ tests total (100% passing for core unit tests)
- Covers position-aware UUIDs, core functionality, edge semantics, network properties, and comprehensive validation
- Run tests with:
poetry run pytest tests/ -v
- Problem: A biological reaction may have multiple input/output combinations after decomposition
- Solution: Create multiple "virtual reactions" representing each combination
- Benefit: Clean mapping from combinations to transformations
- Problem: How to represent transformation within a reaction with multiple inputs/outputs?
- Solution: Every input connects to every output (cartesian product)
- Rationale: Biochemically accurate - all reactants contribute to all products
- Problem: How do reactions connect in the network?
- Solution: Through shared physical entities (molecule appears as target in R1, source in R2)
- Benefit: Natural representation - pathways flow through molecules, not abstract connections
- User Requirement: Multiple sources should be OR, inputs to reactions should be AND
- Implementation: Count preceding reactions - if >1 then OR, otherwise AND
- Rationale: Matches biological intuition (alternatives vs requirements)
- Files are cached:
reaction_connections_{id}.csv,decomposed_uid_mapping_{id}.csv,best_matches_{id}.csv - Subsequent runs reuse cached data
- Position-aware UUIDs tracked in
entity_uuid_registry(regenerated each run for consistency) - UUIDβdbId mappings exported to
uuid_to_reactome_{id}.csv
- Decomposition uses itertools.product (efficient for combinatorics)
- Hungarian algorithm is O(nΒ³) but pathways are typically small (<1000 reactions)
- Pandas operations are vectorized where possible
- Small pathway (10-20 reactions): <1 second
- Medium pathway (100-200 reactions): 1-5 seconds
- Large pathway (500+ reactions): 5-30 seconds
- Main README:
../README.md- Quick start guide and features - Position-Aware UUIDs:
UUID_DESIGN.md- Why and how UUIDs are assigned per pathway position - Design Decisions:
DESIGN_DECISIONS.md- Intentional behaviors that look surprising - Examples:
../examples/README.md- Usage patterns and troubleshooting - Reactome Database: https://reactome.org/