factorise

Deterministic prime factorisation for Python using Miller-Rabin primality testing and a multi-stage factorisation pipeline supporting Trial Division, Pollard's p-1, Pollard's Rho (Brent), ECM, Quadratic Sieve, and a GNFS adapter for very large inputs.

1. Project Overview

factorise provides:

A library API for prime decomposition of signed integers.
Deterministic primality checks for all n < 2^64.
Bounded retry/iteration controls to avoid unbounded runtime.
A configurable multi-stage pipeline that escalates from fast to powerful algorithms.
A CLI for operational use and debugging.
A hybrid engine that adaptively routes each cofactor to the optimal algorithm.

Multi-stage factorisation pipeline order:

validate_int
  -> is_prime (Miller-Rabin)
  -> Pipeline (stages in order)
       1. Trial Division  (small primes, O(π(b)))
       2. Pollard p-1   (smooth factors, p-1 method)
       3. Pollard's Rho (Brent variant, general-purpose)
       4. ECM           (Elliptic Curve Method, medium factors)
       5. Quadratic Sieve (medium-to-large, up to ~100 digits)
       6. GNFS          (external adapter, very large inputs)
  -> factorise()

Key capabilities:

factorise(n, config=None) returns a structured FactorisationResult.
is_prime(n) is available as a standalone function.
Optional reproducibility via seed / FACTORISE_SEED.
CLI with validated log levels and ANSI output.
HybridFactorisationEngine for adaptive algorithm selection by input size.

2. Quick Start

# Install
pip install factorise

# Using the library (requires Python 3.10+)
python -c "from factorise import factorise; print(factorise(123456789).expression())"

# Using the CLI
factorise 123456789 --verbose

3. Repository Structure

factorise/
  __init__.py     Public exports and version.
  core.py         Algorithms, validation, domain exceptions.
  config.py       Configuration dataclasses with validation.
  pipeline.py     Multi-stage pipeline, FactorStage interface.
  hybrid.py       Adaptive hybrid engine with threshold routing.
  cli.py          CLI command, display, logging, signal handling.
  utils.py        Shared utilities (prime sieve).
  py.typed        PEP 561 marker.
  stages/
    __init__.py         Package initialiser.
    trial_division.py   Trial Division stage (wheel-optimised).
    improved_pm1.py     Pollard p-1 stage (progressive bounds).
    pollard_rho.py      Pollard-Rho (Brent) stage.
    ecm.py              Elliptic Curve Method stage.
    ecm_two_pass.py     Two-pass ECM stage.
    ecm_shared.py       Shared elliptic-curve arithmetic.
    quadratic_sieve.py  Quadratic Sieve stage.
    siqs.py             Self-Initializing Quadratic Sieve stage.
    qs_shared.py        Shared QS utilities (factor base, Gaussian elimination).
    gnfs_optimized.py   Pure-Python GNFS stage for 60-128 bit inputs.

tests/
  conftest.py                Shared test constants.
  test_core_config.py        Config, model, and validation behavior.
  test_core_primality.py     Primality behavior and edge cases.
  test_core_factorisation.py Factorisation semantics and determinism.
  test_core_pollard.py       Pollard-Brent and flattening behavior.
  test_core_edge_cases.py    Internal algorithm edge paths.
  test_pipeline.py           Multi-stage pipeline tests.
  test_cli.py                CLI user behavior.
  test_cli_errors.py         CLI error and logging-mode behavior.

benchmarks/
  timing.py       Timing benchmarks.
  memory.py         Allocation/memory benchmarks.
  stress.py         Process-based stress checks and CI gate.
  inputs.py         Shared benchmark datasets.

.github/workflows/ci.yml
  Lint, typecheck, tests, stress gate, security audit, build/release checks.

4. Multi-Stage Pipeline

The pipeline (FactorisationPipeline) coordinates multiple factorisation algorithms, each as a FactorStage. Stages are tried in order until one finds a factor; the pipeline then recurses on both the factor and co-factor until everything is prime.

Stage: Trial Division

Finds small prime factors by testing divisibility against a fixed list of small primes (up to 7919 by default). Very fast for numbers with small factors.

Stage: Pollard p-1

Finds factors p where p-1 is smooth (has only small prime factors). Uses progressive smoothness bounds and multiple bases. Effective as an intermediate stage between trial division and Pollard's Rho.

Stage: Pollard's Rho (Brent)

General-purpose factorisation using Brent's improvement to Pollard's Rho algorithm. Batches GCD computations for throughput. The primary workhorse for small-to-medium composites.

Stage: ECM (Elliptic Curve Method)

Modern general-purpose factorisation using elliptic curve operations. Most effective for finding factors in the 10-40 digit range. Runs a configurable number of curves (each with a different random curve).

Stage: Quadratic Sieve

Fast for medium-to-large inputs up to ~80 bits. This simplified implementation is suitable for educational purposes and medium-sized inputs.

Stage: SIQS (Self-Initializing Quadratic Sieve)

Practical choice for 60-110 digit composites in pure Python. Auto-computes the smoothness bound and factor base.

Stage: GNFS (General Number Field Sieve)

Full in-repo adapter wrapping external GNFS tools (msieve, CADO-NFS) with strict isolation: input validation, timeout handling, output parsing, and failure isolation. Silently skipped if the binary is not on PATH. For very large inputs (~80+ bits).

5. Configuration

PipelineConfig controls the pipeline:

from factorise import PipelineConfig, FactorisationPipeline

config = PipelineConfig(
    trial_division_bound=10_000,
    pm1_bound=10**6,
    ecm_curves=20,
    gnfs_timeout=600,
    gnfs_binary="msieve",
    max_iterations=10_000_000,
    max_retries=20,
    batch_size=128,
    seed=None,
    stage_order=(
        "trial_division",
        "pollard_pminus1",
        "pollard_rho",
        "ecm",
        "quadratic_sieve",
        "gnfs",
    ),
)
pipeline = FactorisationPipeline(config)

HybridConfig controls the adaptive engine:

from factorise import HybridConfig, HybridFactorisationEngine

config = HybridConfig()
engine = HybridFactorisationEngine(config)
result = engine.attempt(123456789)

Environment variables (FACTORISE_*):

FACTORISE_TRIAL_DIVISION_BOUND — trial division prime ceiling.
FACTORISE_PM1_BOUND — Pollard p-1 smoothness bound.
FACTORISE_ECM_CURVES — number of ECM curves to try.
FACTORISE_GNFS_TIMEOUT — GNFS subprocess timeout in seconds.
FACTORISE_GNFS_BINARY — GNFS binary name/path.
FACTORISE_BATCH_SIZE, FACTORISE_MAX_ITERATIONS, FACTORISE_MAX_RETRIES — Pollard-Brent parameters.
FACTORISE_SEED — deterministic seed.

6. API Reference

Library

from factorise import factorise, is_prime, FactorisationResult
from factorise import FactoriserConfig
from factorise import FactorisationPipeline, PipelineConfig
from factorise import StageResult, StageStatus, FactorStage
from factorise import HybridConfig, HybridFactorisationEngine, hybrid_factorise
from factorise import PerfectPowerResult, find_perfect_power, has_carmichael_property
from factorise import ensure_integer_input

# Basic usage
result = factorise(123456789)
print(result.factors)       # [3, 3607, 3803]
print(result.powers)        # {3: 2, 3607: 1, 3803: 1}
print(result.expression())  # '3^2 * 3607 * 3803'

# Direct pipeline usage
pipeline = FactorisationPipeline()
stage_result = pipeline.attempt(123456789)
print(stage_result.status)  # StageStatus.SUCCESS
print(stage_result.factor)  # 3

# Primality testing
is_prime(97)  # True

# Perfect-power detection
pp = find_perfect_power(125)
print(pp.base, pp.exponent)  # 5 3

# Hybrid engine (adaptive algorithm selection)
engine = HybridFactorisationEngine(HybridConfig())
result = engine.attempt(123456789)

CLI

factorise 123456789
factorise 123456789 --verbose
factorise 123456789 --log-level INFO

Note: The CLI does not accept negative integers directly because the leading - is interpreted as an option flag. Use the library API (factorise(-n)) for negative inputs.

7. Testing

Primary test commands:

pytest tests/ -v
pytest --cov=factorise --cov-fail-under=90 tests/
pytest -v benchmarks/stress.py::test_stress_correctness
pytest -q tests/test_core_primality.py
pytest -q tests/test_cli_errors.py
pytest -q tests/test_pipeline.py

Test strategy:

Unit tests: correctness, edge cases, API behavior.
Pipeline tests: multi-stage integration, stage ordering, fallback behavior.
Integration tests: CLI paths and error handling.
Property tests: invariants across broad integer ranges.
Stress tests: deterministic correctness at scale.
Test modules are organized by domain to reduce navigation friction.

8. Linting / Formatting / Type Checking

just lint
just format
just type-check

Or directly:

ruff check factorise/ tests/ benchmarks/
ruff format factorise/ tests/ benchmarks/
mypy factorise/ tests/ benchmarks/

Pre-commit:

pre-commit install
pre-commit run --all-files

CI enforces these checks and publishes JUnit artifacts for test jobs.

Documentation:

Algorithm notes are maintained as curated narrative docs under docs/.

9. Architecture Notes

Design choices:

Immutable config (FactoriserConfig, PipelineConfig, HybridConfig) for reproducible and testable behavior.
Explicit domain failure (FactorisationError) for exhausted compute budgets.
FactorStage abstract interface for composable, replaceable stages.
StageResult structured output for observability and debugging.
Explicit configuration-driven stage construction (no global registry).
Generator-based recursive splitting to keep memory bounded.

Scalability and safety:

max_iterations and max_retries constrain worst-case runtime.
Trial division and perfect-square fast paths reduce heavy-path pressure.
GNFS adapter with strict input validation, timeout, and output parsing.
No unsafe shell execution; external tool inputs are sanitized.

10. Logging / Troubleshooting

Logging model:

Library logging uses the standard-library logging module under the "factorise" logger name.
CLI enables logging and validates allowed levels (DEBUG, INFO, WARNING, ERROR).
All logs use a structured key=value format for machine-parseable observability.
Stage-level logging includes: stage, n, factor, status, reason, elapsed_ms.

Stage result statuses:

SUCCESS: stage found a non-trivial factor.
FAILURE: stage ran but could not find a factor.
SKIPPED: stage was not applicable (input too small/large, binary not found, etc.).

Troubleshooting:

TypeError on input: ensure plain int values (no bool, float, str).
FactorisationError: increase retry/iteration budget or set deterministic seed.
GNFS always skipped: ensure msieve or cado-nfs is on PATH.
Wrong factors: enable DEBUG logging to see which stage found each factor.

11. Future Enhancements

Add full in-repo GNFS implementation.
Add parallel ECM curve execution.
Add generated API reference docs if external integration demand increases.
Add periodic benchmark trend checks in CI for regression detection.

12. Contribution Standards

Work in-place; avoid parallel implementations.
Keep imports, naming, and docstrings consistent with Google-style conventions.
Add tests for behavior changes, especially error paths and edge cases.
Run just ci (or just ci-full) before opening a PR.
Keep comments high-value: explain intent, assumptions, and tradeoffs.
Release flow on version tags uses trusted publishing (OIDC) and produces:
- package artifacts (sdist/wheel)
- dist/SHA256SUMS integrity file
- dist/sbom.cdx.json CycloneDX SBOM artifact

See CONTRIBUTING.md and SECURITY.md for policy details.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
benchmarks		benchmarks
docs		docs
factorise		factorise
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
cleanup.sh		cleanup.sh
justfile		justfile
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

factorise

1. Project Overview

2. Quick Start

3. Repository Structure

4. Multi-Stage Pipeline

Stage: Trial Division

Stage: Pollard p-1

Stage: Pollard's Rho (Brent)

Stage: ECM (Elliptic Curve Method)

Stage: Quadratic Sieve

Stage: SIQS (Self-Initializing Quadratic Sieve)

Stage: GNFS (General Number Field Sieve)

5. Configuration

6. API Reference

Library

CLI

7. Testing

8. Linting / Formatting / Type Checking

9. Architecture Notes

10. Logging / Troubleshooting

11. Future Enhancements

12. Contribution Standards

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

factorise

1. Project Overview

2. Quick Start

3. Repository Structure

4. Multi-Stage Pipeline

Stage: Trial Division

Stage: Pollard p-1

Stage: Pollard's Rho (Brent)

Stage: ECM (Elliptic Curve Method)

Stage: Quadratic Sieve

Stage: SIQS (Self-Initializing Quadratic Sieve)

Stage: GNFS (General Number Field Sieve)

5. Configuration

6. API Reference

Library

CLI

7. Testing

8. Linting / Formatting / Type Checking

9. Architecture Notes

10. Logging / Troubleshooting

11. Future Enhancements

12. Contribution Standards

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages