Deterministic prime factorisation for Python using Miller-Rabin primality testing and a multi-stage factorisation pipeline supporting Trial Division, Pollard's p-1, Pollard's Rho (Brent), ECM, Quadratic Sieve, and a GNFS adapter for very large inputs.
factorise provides:
- A library API for prime decomposition of signed integers.
- Deterministic primality checks for all
n < 2^64. - Bounded retry/iteration controls to avoid unbounded runtime.
- A configurable multi-stage pipeline that escalates from fast to powerful algorithms.
- A CLI for operational use and debugging.
- A hybrid engine that adaptively routes each cofactor to the optimal algorithm.
Multi-stage factorisation pipeline order:
validate_int
-> is_prime (Miller-Rabin)
-> Pipeline (stages in order)
1. Trial Division (small primes, O(π(b)))
2. Pollard p-1 (smooth factors, p-1 method)
3. Pollard's Rho (Brent variant, general-purpose)
4. ECM (Elliptic Curve Method, medium factors)
5. Quadratic Sieve (medium-to-large, up to ~100 digits)
6. GNFS (external adapter, very large inputs)
-> factorise()
Key capabilities:
factorise(n, config=None)returns a structuredFactorisationResult.is_prime(n)is available as a standalone function.- Optional reproducibility via
seed/FACTORISE_SEED. - CLI with validated log levels and ANSI output.
HybridFactorisationEnginefor adaptive algorithm selection by input size.
# Install
pip install factorise
# Using the library (requires Python 3.10+)
python -c "from factorise import factorise; print(factorise(123456789).expression())"
# Using the CLI
factorise 123456789 --verbosefactorise/
__init__.py Public exports and version.
core.py Algorithms, validation, domain exceptions.
config.py Configuration dataclasses with validation.
pipeline.py Multi-stage pipeline, FactorStage interface.
hybrid.py Adaptive hybrid engine with threshold routing.
cli.py CLI command, display, logging, signal handling.
utils.py Shared utilities (prime sieve).
py.typed PEP 561 marker.
stages/
__init__.py Package initialiser.
trial_division.py Trial Division stage (wheel-optimised).
improved_pm1.py Pollard p-1 stage (progressive bounds).
pollard_rho.py Pollard-Rho (Brent) stage.
ecm.py Elliptic Curve Method stage.
ecm_two_pass.py Two-pass ECM stage.
ecm_shared.py Shared elliptic-curve arithmetic.
quadratic_sieve.py Quadratic Sieve stage.
siqs.py Self-Initializing Quadratic Sieve stage.
qs_shared.py Shared QS utilities (factor base, Gaussian elimination).
gnfs_optimized.py Pure-Python GNFS stage for 60-128 bit inputs.
tests/
conftest.py Shared test constants.
test_core_config.py Config, model, and validation behavior.
test_core_primality.py Primality behavior and edge cases.
test_core_factorisation.py Factorisation semantics and determinism.
test_core_pollard.py Pollard-Brent and flattening behavior.
test_core_edge_cases.py Internal algorithm edge paths.
test_pipeline.py Multi-stage pipeline tests.
test_cli.py CLI user behavior.
test_cli_errors.py CLI error and logging-mode behavior.
benchmarks/
timing.py Timing benchmarks.
memory.py Allocation/memory benchmarks.
stress.py Process-based stress checks and CI gate.
inputs.py Shared benchmark datasets.
.github/workflows/ci.yml
Lint, typecheck, tests, stress gate, security audit, build/release checks.
The pipeline (FactorisationPipeline) coordinates multiple factorisation algorithms,
each as a FactorStage. Stages are tried in order until one finds a factor;
the pipeline then recurses on both the factor and co-factor until everything
is prime.
Finds small prime factors by testing divisibility against a fixed list of small primes (up to 7919 by default). Very fast for numbers with small factors.
Finds factors p where p-1 is smooth (has only small prime factors). Uses
progressive smoothness bounds and multiple bases. Effective as an intermediate
stage between trial division and Pollard's Rho.
General-purpose factorisation using Brent's improvement to Pollard's Rho algorithm. Batches GCD computations for throughput. The primary workhorse for small-to-medium composites.
Modern general-purpose factorisation using elliptic curve operations. Most effective for finding factors in the 10-40 digit range. Runs a configurable number of curves (each with a different random curve).
Fast for medium-to-large inputs up to ~80 bits. This simplified implementation is suitable for educational purposes and medium-sized inputs.
Practical choice for 60-110 digit composites in pure Python. Auto-computes the smoothness bound and factor base.
Full in-repo adapter wrapping external GNFS tools (msieve, CADO-NFS) with strict isolation: input validation, timeout handling, output parsing, and failure isolation. Silently skipped if the binary is not on PATH. For very large inputs (~80+ bits).
PipelineConfig controls the pipeline:
from factorise import PipelineConfig, FactorisationPipeline
config = PipelineConfig(
trial_division_bound=10_000,
pm1_bound=10**6,
ecm_curves=20,
gnfs_timeout=600,
gnfs_binary="msieve",
max_iterations=10_000_000,
max_retries=20,
batch_size=128,
seed=None,
stage_order=(
"trial_division",
"pollard_pminus1",
"pollard_rho",
"ecm",
"quadratic_sieve",
"gnfs",
),
)
pipeline = FactorisationPipeline(config)HybridConfig controls the adaptive engine:
from factorise import HybridConfig, HybridFactorisationEngine
config = HybridConfig()
engine = HybridFactorisationEngine(config)
result = engine.attempt(123456789)Environment variables (FACTORISE_*):
FACTORISE_TRIAL_DIVISION_BOUND— trial division prime ceiling.FACTORISE_PM1_BOUND— Pollard p-1 smoothness bound.FACTORISE_ECM_CURVES— number of ECM curves to try.FACTORISE_GNFS_TIMEOUT— GNFS subprocess timeout in seconds.FACTORISE_GNFS_BINARY— GNFS binary name/path.FACTORISE_BATCH_SIZE,FACTORISE_MAX_ITERATIONS,FACTORISE_MAX_RETRIES— Pollard-Brent parameters.FACTORISE_SEED— deterministic seed.
from factorise import factorise, is_prime, FactorisationResult
from factorise import FactoriserConfig
from factorise import FactorisationPipeline, PipelineConfig
from factorise import StageResult, StageStatus, FactorStage
from factorise import HybridConfig, HybridFactorisationEngine, hybrid_factorise
from factorise import PerfectPowerResult, find_perfect_power, has_carmichael_property
from factorise import ensure_integer_input
# Basic usage
result = factorise(123456789)
print(result.factors) # [3, 3607, 3803]
print(result.powers) # {3: 2, 3607: 1, 3803: 1}
print(result.expression()) # '3^2 * 3607 * 3803'
# Direct pipeline usage
pipeline = FactorisationPipeline()
stage_result = pipeline.attempt(123456789)
print(stage_result.status) # StageStatus.SUCCESS
print(stage_result.factor) # 3
# Primality testing
is_prime(97) # True
# Perfect-power detection
pp = find_perfect_power(125)
print(pp.base, pp.exponent) # 5 3
# Hybrid engine (adaptive algorithm selection)
engine = HybridFactorisationEngine(HybridConfig())
result = engine.attempt(123456789)factorise 123456789
factorise 123456789 --verbose
factorise 123456789 --log-level INFONote: The CLI does not accept negative integers directly because the leading
- is interpreted as an option flag. Use the library API (factorise(-n)) for
negative inputs.
Primary test commands:
pytest tests/ -v
pytest --cov=factorise --cov-fail-under=90 tests/
pytest -v benchmarks/stress.py::test_stress_correctness
pytest -q tests/test_core_primality.py
pytest -q tests/test_cli_errors.py
pytest -q tests/test_pipeline.pyTest strategy:
- Unit tests: correctness, edge cases, API behavior.
- Pipeline tests: multi-stage integration, stage ordering, fallback behavior.
- Integration tests: CLI paths and error handling.
- Property tests: invariants across broad integer ranges.
- Stress tests: deterministic correctness at scale.
- Test modules are organized by domain to reduce navigation friction.
just lint
just format
just type-checkOr directly:
ruff check factorise/ tests/ benchmarks/
ruff format factorise/ tests/ benchmarks/
mypy factorise/ tests/ benchmarks/Pre-commit:
pre-commit install
pre-commit run --all-filesCI enforces these checks and publishes JUnit artifacts for test jobs.
Documentation:
- Algorithm notes are maintained as curated narrative docs under
docs/.
Design choices:
- Immutable config (
FactoriserConfig,PipelineConfig,HybridConfig) for reproducible and testable behavior. - Explicit domain failure (
FactorisationError) for exhausted compute budgets. FactorStageabstract interface for composable, replaceable stages.StageResultstructured output for observability and debugging.- Explicit configuration-driven stage construction (no global registry).
- Generator-based recursive splitting to keep memory bounded.
Scalability and safety:
max_iterationsandmax_retriesconstrain worst-case runtime.- Trial division and perfect-square fast paths reduce heavy-path pressure.
- GNFS adapter with strict input validation, timeout, and output parsing.
- No unsafe shell execution; external tool inputs are sanitized.
Logging model:
- Library logging uses the standard-library
loggingmodule under the"factorise"logger name. - CLI enables logging and validates allowed levels (
DEBUG,INFO,WARNING,ERROR). - All logs use a structured
key=valueformat for machine-parseable observability. - Stage-level logging includes:
stage,n,factor,status,reason,elapsed_ms.
Stage result statuses:
SUCCESS: stage found a non-trivial factor.FAILURE: stage ran but could not find a factor.SKIPPED: stage was not applicable (input too small/large, binary not found, etc.).
Troubleshooting:
TypeErroron input: ensure plainintvalues (nobool,float,str).FactorisationError: increase retry/iteration budget or set deterministic seed.- GNFS always skipped: ensure
msieveorcado-nfsis on PATH. - Wrong factors: enable DEBUG logging to see which stage found each factor.
- Add full in-repo GNFS implementation.
- Add parallel ECM curve execution.
- Add generated API reference docs if external integration demand increases.
- Add periodic benchmark trend checks in CI for regression detection.
- Work in-place; avoid parallel implementations.
- Keep imports, naming, and docstrings consistent with Google-style conventions.
- Add tests for behavior changes, especially error paths and edge cases.
- Run
just ci(orjust ci-full) before opening a PR. - Keep comments high-value: explain intent, assumptions, and tradeoffs.
- Release flow on version tags uses trusted publishing (OIDC) and produces:
- package artifacts (sdist/wheel)
dist/SHA256SUMSintegrity filedist/sbom.cdx.jsonCycloneDX SBOM artifact
See CONTRIBUTING.md and SECURITY.md for policy details.