pdb_dpi — Diffraction Precision Index Calculator

A Python script implementing the Diffraction Precision Index (DPI) for macromolecular crystal structures, following Cruickshank (1999) with the per-atom extension of Gurusaran et al. (2014).

Background

The DPI gives an estimate of the coordinate precision of a refined crystal structure without running a full covariance matrix inversion. Two formulae are provided:

R-based (Cruickshank 1999, Eq. 24)

σ(x, B_avg) = √(Nᵢ / p)     × C^(−1/3) × R      × d_min

R_free-based (Cruickshank 1999, Eq. 31)

σ(x, B_avg) = √(Nᵢ / n_obs) × C^(−1/3) × R_free × d_min

Isotropic position error:

σ(r, B_avg) = √3 × σ(x, B_avg)

Per-atom precision (Helliwell 2023, Eq. 2)

σ(x, Bᵢ) = σ(x, B_avg) × (Z_avg / Zᵢ) × √(Bᵢ / B_avg)

This gives physically reasonable values (e.g. √(120/60) = 1.41× for a high-B atom), unlike the exponential form which can produce absurdly large corrections.

Symbol	Meaning
Nᵢ	Number of non-H atoms in refinement
p	N_obs − N_params (degrees of freedom)
n_obs	Total observed reflections used in refinement
C	Data completeness (0–1)
R	R_work (conventional R-factor)
R_free	Free R-factor
d_min	Maximum resolution (Å)
B_avg	Average B-factor of working atoms
Bᵢ	B-factor of atom i
Zᵢ	Atomic number of atom i

Important: The R_free formula uses n_obs (total observed reflections) in the denominator, not N_free (the free-set size). This is explicit in Cruickshank (1999) Table 3, where the column heading for the R_free rows reads "(Nᵢ/n_obs)^½".

Installation

No external dependencies are required — the project uses only the Python standard library.

Clone and run:

git clone https://github.com/LifeHasOrder/pdb_dpi.git
cd pdb_dpi
python dpi_calculator.py --help

To run the formula validation tests (no network required):

pip install pytest
pytest test_dpi_calculator.py -v

To also run the end-to-end PDB download tests (requires internet access):

pytest --run-network test_dpi_calculator.py -v

Usage

Download from RCSB and compute DPI

python dpi_calculator.py --pdb-id 1NLS

Use a local PDB or mmCIF file

python dpi_calculator.py --file my_structure.pdb
python dpi_calculator.py --file my_structure.cif

Supply refinement statistics from a Phenix log

python dpi_calculator.py --file my_structure.pdb --phenix-log refine.log

Override parsed parameters manually

python dpi_calculator.py --file my_structure.pdb \
    --resolution 1.5 --r-work 0.18 --r-free 0.22 \
    --n-obs 30000 --completeness 0.97

Select which formula to use

# R-based only
python dpi_calculator.py --pdb-id 1NLS --method R

# R_free-based only
python dpi_calculator.py --pdb-id 1NLS --method Rfree

Save per-atom CSV and annotated PDB

python dpi_calculator.py --pdb-id 1NLS --out-dir results/
# Writes: results/1NLS_dpi_r.csv, results/1NLS_dpi_rfree.csv
#         results/1NLS_dpi_r_annotated.pdb, …

Full option list

python dpi_calculator.py --help

Outputs

File	Contents
`<prefix>_dpi_r.csv`	Per-atom σ(x) and σ(r) using R-based formula
`<prefix>_dpi_rfree.csv`	Per-atom σ(x) and σ(r) using R_free-based formula
`<prefix>_dpi_r_annotated.pdb`	Input PDB with B-column replaced by σ(x)
`<prefix>_dpi_rfree_annotated.pdb`	Same with R_free-based values

Validation Against Cruickshank (1999) Table 3

PDB ID Mapping

The following table maps Cruickshank (1999) Table 3 proteins to their confirmed RCSB PDB IDs:

Protein	PDB ID	Reference	End-to-end status
Concanavalin A	1NLS	Deacon et al. (1997)	✅ Should match
HEW lysozyme (ground)	193L	Vaney et al. (1996)	✅ Should match
HEW lysozyme (space)	194L	Vaney et al. (1996)	✅ Should match
γB-crystallin	1GCS	Tickle et al. (1998a)	✅ Should match
βB2-crystallin	2BB2	Tickle et al. (1998a)	✅ Should match
β-purothionin	1BHP	Stec, Rao et al. (1995)	✅ Should match
α₁-purothionin	2PLH	Rao et al. (1995)	✅ Should match
EM lysozyme	1JUG	Guss et al. (1997)	✅ Should match
Azurin II	1ARN	Dodd et al. (1995)	⚠️ Known mismatch (see below)
Ribonuclease A with RI	1DFJ	Kobe & Deisenhofer (1995)	✅ Should match
Fab HyHEL-5 with HEWL	3HFL	Cohen et al. (1996)	⚠️ OBSOLETE on RCSB (cannot download)
Immunoglobulin (Table 1)	1BWW	Usón et al. (1999)	⚠️ Table 1 only; known mismatch

Known Mismatches

Azurin II (1ARN): Cruickshank used values from Dodd et al. (1995) (R=0.188, R_free=0.207, n_obs=12162). The PDB entry was deposited in 2000, after the Cruickshank paper was published, and may reflect a re-refinement. End-to-end tests for 1ARN are marked xfail.

Fab HyHEL-5 with HEWL (3HFL): This PDB entry is OBSOLETE on RCSB and has been superseded by a higher-resolution structure. The entry cannot be downloaded from RCSB. The formula-only test (see below) still validates the calculation using the Table 3 intermediate values directly.

Fab HyHEL-5 R_free DPI discrepancy: Table 3 prints 0.69 Å for the R_free DPI of Fab HyHEL-5, which appears to be a transcription error (the same value as the RNase A row immediately above it). Computing from the tabulated intermediate values ((Nᵢ/n_obs)^½ = 0.607, C^(−1/3) = 1.111, R_free = 0.288, d_min = 2.65 Å) gives ~0.89 Å. The formula test uses 0.89 Å and the discrepancy is documented.

Immunoglobulin (1BWW): The Usón et al. manuscript was "in preparation" when Cruickshank published. This structure appears only in Table 1 (not Table 3). The deposited model has better R-factors than Cruickshank's paper values. End-to-end tests for 1BWW are marked xfail.

Formula-Only Validation Table (Tier 1)

The following table reproduces the published DPI values using only the intermediate factors listed in the paper. All values computed by this script are within 1 % of the values obtained from the tabulated intermediates. Where the printed final DPI in Table 3 differs by more than 1 % from the computed value (due to rounding of the intermediate factors to 3 decimal places), the computed value is used in the formula tests.

Protein	Nᵢ	n_obs	(Nᵢ/p)^½	C^(−1/3)	R	d_min (Å)	σ(r,R) printed	σ(r,R) calc
Concanavalin A	2130	116712	0.148	1.099	0.128	0.94	0.034	0.034
HEW lysozyme (ground)	1145	24111	0.242	1.048	0.184	1.33	0.11	0.1075 ¹
HEW lysozyme (space)	1141	21542	0.259	1.040	0.183	1.40	0.12	0.120
γB-crystallin	1708	26151	0.297	1.032	0.180	1.49	0.14	0.1424 ¹
βB2-crystallin	1558	18583	0.356	1.032	0.184	2.10	0.25	0.2459 ¹
β-purothionin	439	4966	0.370	1.050	0.198	1.70	0.22	0.2265 ¹
EM lysozyme	1068	8308	0.514	1.040	0.169	1.90	0.30	0.297
Azurin II	1012	12162	0.353	1.174	0.188	1.90	0.26	0.2564 ¹
Ribonuclease A+RI	4416	18859	1.922	1.145	0.194	2.50	1.85	1.849

Protein	(Nᵢ/n_obs)^½	R_free	σ(r,Rfree) printed	σ(r,Rfree) calc
Concanavalin A	0.135	0.148	0.036	0.036
HEW lysozyme (ground)	0.218	0.226	0.12	0.119
HEW lysozyme (space)	0.230	0.226	0.13	0.131
γB-crystallin	0.256	0.204	0.14	0.139
βB2-crystallin	0.290	0.200	0.22	0.2174 ¹
β-purothionin	0.297	0.281	0.26	0.258
EM lysozyme	0.359	0.229	0.28	0.281
Azurin II	0.288	0.207	0.23	0.231
Ribonuclease A+RI	0.484	0.286	0.69	0.686
Fab HyHEL-5	0.607	0.288	~~0.69~~ 0.89 ²	0.892

¹ Computed value differs from printed value by >1% due to rounding of the intermediate factors in the paper. The formula test uses the computed value.

² The printed value 0.69 appears to be a transcription error (same as the RNase A row above). Computing from the intermediate values gives ~0.89.

Running the End-to-End PDB Tests (Tier 2)

End-to-end tests download the actual PDB mmCIF files from RCSB and run the full pipeline (parse → calculate → compare). They are marked @pytest.mark.network and skipped by default:

# Run all tests including PDB downloads
pytest --run-network test_dpi_calculator.py -v

# Run only the end-to-end tests
pytest --run-network -m network test_dpi_calculator.py -v

Notes and Caveats

N_params auto-estimation

When N_params is not reported in the file (common for older PDB entries and Phenix output), the script estimates it as 4 × N_atoms (isotropic B, i.e. x, y, z, B per atom). Override with --n-params if you know the true value.

Alternate conformers

Only atoms with alt_loc = '' or 'A' are included by default to avoid inflating Nᵢ and skewing B_avg.

HETATM records and metal ions

By default, HETATM records (ligands, modified residues, metal ions) are excluded from the atom count. Metal ions such as Cu²⁺ (Azurin II), Fe (haem proteins), Zn (zinc finger proteins), etc. are always stored as HETATM. If your structure has an active-site metal that was included in Cruickshank's original count, pass --include-hetatm on the command line:

python dpi_calculator.py --pdb-id 1ARN --include-hetatm

or set include_hetatm=True when constructing DPICalculator directly. When testing structures from Cruickshank Table 3 that have metal ions, using include_hetatm=True more closely matches the original atom counts.

Completeness factor

If completeness is missing it defaults to 1.0 (C^(−1/3) = 1.0), which means the DPI is slightly underestimated for incomplete datasets.

References

Cruickshank, D. W. J. (1999). Acta Cryst. D55, 583–601. https://doi.org/10.1107/S0907444999001250
Blow, D. M. (2002). Acta Cryst. D58, 792–797.
Gurusaran, M. et al. (2014). IUCrJ 1, 74–81. https://doi.org/10.1107/S2052252513031461
Kumar, K. S. D. et al. (2015). J. Appl. Cryst. 48, 939–942.
Helliwell, J. R. (2023). Curr. Res. Struct. Biol. 6, 100111.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
dpi_calculator.py		dpi_calculator.py
requirements.txt		requirements.txt
test_dpi_calculator.py		test_dpi_calculator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdb_dpi — Diffraction Precision Index Calculator

Background

Installation

Usage

Download from RCSB and compute DPI

Use a local PDB or mmCIF file

Supply refinement statistics from a Phenix log

Override parsed parameters manually

Select which formula to use

Save per-atom CSV and annotated PDB

Full option list

Outputs

Validation Against Cruickshank (1999) Table 3

PDB ID Mapping

Known Mismatches

Formula-Only Validation Table (Tier 1)

Running the End-to-End PDB Tests (Tier 2)

Notes and Caveats

N_params auto-estimation

Alternate conformers

HETATM records and metal ions

Completeness factor

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdb_dpi — Diffraction Precision Index Calculator

Background

Installation

Usage

Download from RCSB and compute DPI

Use a local PDB or mmCIF file

Supply refinement statistics from a Phenix log

Override parsed parameters manually

Select which formula to use

Save per-atom CSV and annotated PDB

Full option list

Outputs

Validation Against Cruickshank (1999) Table 3

PDB ID Mapping

Known Mismatches

Formula-Only Validation Table (Tier 1)

Running the End-to-End PDB Tests (Tier 2)

Notes and Caveats

N_params auto-estimation

Alternate conformers

HETATM records and metal ions

Completeness factor

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages