Skip to content

Add ring info queries and 20 targeted molecular descriptors#53

Draft
sharifhsn wants to merge 14 commits intordkit-rs:mainfrom
sharifhsn:pr/descriptors-ring-info
Draft

Add ring info queries and 20 targeted molecular descriptors#53
sharifhsn wants to merge 14 commits intordkit-rs:mainfrom
sharifhsn:pr/descriptors-ring-info

Conversation

@sharifhsn
Copy link

@sharifhsn sharifhsn commented Mar 15, 2026

Note: This PR is stacked on #50. Unique files: ring_info.{h,cc,rs}, descriptors.{h,cc,rs} modifications, src/descriptors.rs additions, test_ring_info.rs, test_extended_descriptors.rs.

Ring info

Direct ring queries on ROMol without a separate lifetime-bound struct:

let mol = ROMol::from_smiles("c1ccc2ccccc2c1").unwrap(); // naphthalene

assert_eq!(mol.num_rings(), 2);
assert!(mol.is_atom_in_ring_of_size(0, 6));
assert_eq!(mol.atom_ring_sizes(0), vec![6]);

20 targeted descriptors

Direct method calls on ROMol instead of Properties::compute_properties():

let mol = ROMol::from_smiles("c1ccccc1").unwrap();

// Lipinski filter in 4 calls — no HashMap, no computing 36 unused descriptors
let passes = mol.calc_amw() <= 500.0
    && mol.calc_clog_p() <= 5.0
    && mol.calc_num_hbd() <= 5
    && mol.calc_num_hba() <= 10;

Full list: calc_exact_mw, calc_amw, calc_mol_formula, calc_num_heavy_atoms, calc_fraction_csp3, calc_labute_asa, calc_tpsa, calc_clog_p, calc_num_hbd, calc_num_hba, calc_num_rotatable_bonds, calc_num_amide_bonds, calc_num_heteroatoms, calc_num_aromatic_rings, calc_num_aliphatic_rings, calc_num_saturated_rings, calc_num_heterocycles, calc_num_aromatic_heterocycles, calc_num_spiro_atoms, calc_num_bridgehead_atoms.

🤖 Generated with Claude Code

sharifhsn and others added 14 commits March 13, 2026 22:37
RDKit 2025.09.x uses constexpr virtual destructors and override
functions in its headers (e.g. Geometry/point.h), which require
C++20. Building with -std=c++17 causes compilation failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes mismatched_lifetime_syntaxes warning by making the elided
lifetime on Atom<'_> explicit, matching the &mut self borrow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused lifetime parameter 'a from PeriodicTableOps impl,
and replace names.len() != 0 with !names.is_empty().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 'version' key is deprecated in current rustfmt. Replace with
'style_edition = "2021"' in both workspace and rdkit-sys configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bind get_periodic_table() to a local variable instead of using it
as a temporary in tail expressions. In Rust 2024, temporaries in
tail expressions are dropped before locals, which would drop the
PeriodicTable UniquePtr before the CxxString it borrows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update edition to 2024 in both Cargo.toml files and rustfmt configs.
Apply 2024 style formatting: types before functions/macros in imports,
and long assert_eq! macro calls wrapped across multiple lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add get_atom_with_idx_const that takes const shared_ptr<ROMol>& and
returns const Atom&, using RDKit's const overload of getAtomWithIdx.
This maps to Pin<&Atom> in CXX, enabling &self access on the Rust side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AtomRef<'a> holds Pin<&'a Atom> (const) and exposes all read-only
atom methods. ROMol::atom_ref(&self) returns AtomRef, eliminating
the need to clone molecules just to read atom properties.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify AtomRef returns identical values for all read-only methods,
property getters work, multiple AtomRefs coexist (no &mut needed),
and iteration over all atoms works via &self.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks for the featurization hot path with 7 drug-like molecules:
- atom_ref (const, &self): ~7μs for 7 properties across all atoms
- atom_with_idx (&mut self): ~7μs same, but requires &mut
- clone + featurize (&ROMol callers): ~46μs — 6.5x slower due to clone
Shows AtomRef eliminates the clone tax for read-only workflows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- thiserror 1 -> 2 (derive macro API unchanged for our usage)
- env_logger 0.9/0.10 -> 0.11 (init() API unchanged)
- which 4 -> 8 (which::which() API unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove inflated clone+featurize and redundant one-property benchmarks.
Keep: atom_ref vs atom_mut (regression guard), clone cost, parse baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
crates.io/crate/ → crates.io/crates/ (missing plural)

Closes rdkit-rs#49

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ring info: num_rings, is_atom/bond_in_ring_of_size, atom/bond_ring_sizes.
Direct methods on ROMol avoid the need for a separate RingInfo lifetime.

Descriptors: calc_exact_mw, calc_amw, calc_mol_formula, calc_num_heavy_atoms,
calc_fraction_csp3, calc_labute_asa, calc_tpsa, calc_clog_p, calc_num_hbd,
calc_num_hba, calc_num_rotatable_bonds, calc_num_amide_bonds,
calc_num_heteroatoms, calc_num_aromatic/aliphatic/saturated_rings,
calc_num_heterocycles, calc_num_aromatic_heterocycles,
calc_num_spiro_atoms, calc_num_bridgehead_atoms.

These targeted calls avoid the overhead of Properties::compute_properties()
which computes all 40+ descriptors into a HashMap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sharifhsn sharifhsn marked this pull request as draft March 15, 2026 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant