Add configurable fingerprints and MACCS keys#52
Draft
sharifhsn wants to merge 14 commits intordkit-rs:mainfrom
Draft
Add configurable fingerprints and MACCS keys#52sharifhsn wants to merge 14 commits intordkit-rs:mainfrom
sharifhsn wants to merge 14 commits intordkit-rs:mainfrom
Conversation
RDKit 2025.09.x uses constexpr virtual destructors and override functions in its headers (e.g. Geometry/point.h), which require C++20. Building with -std=c++17 causes compilation failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes mismatched_lifetime_syntaxes warning by making the elided lifetime on Atom<'_> explicit, matching the &mut self borrow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused lifetime parameter 'a from PeriodicTableOps impl, and replace names.len() != 0 with !names.is_empty(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 'version' key is deprecated in current rustfmt. Replace with 'style_edition = "2021"' in both workspace and rdkit-sys configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bind get_periodic_table() to a local variable instead of using it as a temporary in tail expressions. In Rust 2024, temporaries in tail expressions are dropped before locals, which would drop the PeriodicTable UniquePtr before the CxxString it borrows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update edition to 2024 in both Cargo.toml files and rustfmt configs. Apply 2024 style formatting: types before functions/macros in imports, and long assert_eq! macro calls wrapped across multiple lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add get_atom_with_idx_const that takes const shared_ptr<ROMol>& and returns const Atom&, using RDKit's const overload of getAtomWithIdx. This maps to Pin<&Atom> in CXX, enabling &self access on the Rust side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AtomRef<'a> holds Pin<&'a Atom> (const) and exposes all read-only atom methods. ROMol::atom_ref(&self) returns AtomRef, eliminating the need to clone molecules just to read atom properties. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify AtomRef returns identical values for all read-only methods, property getters work, multiple AtomRefs coexist (no &mut needed), and iteration over all atoms works via &self. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmarks for the featurization hot path with 7 drug-like molecules: - atom_ref (const, &self): ~7μs for 7 properties across all atoms - atom_with_idx (&mut self): ~7μs same, but requires &mut - clone + featurize (&ROMol callers): ~46μs — 6.5x slower due to clone Shows AtomRef eliminates the clone tax for read-only workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- thiserror 1 -> 2 (derive macro API unchanged for our usage) - env_logger 0.9/0.10 -> 0.11 (init() API unchanged) - which 4 -> 8 (which::which() API unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove inflated clone+featurize and redundant one-property benchmarks. Keep: atom_ref vs atom_mut (regression guard), clone cost, parse baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
crates.io/crate/ → crates.io/crates/ (missing plural) Closes rdkit-rs#49 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Morgan, RDK, and Pattern fingerprints now accept custom parameters (radius, nBits, path length, size). Adds MACCS 166-bit keys. Fingerprint::new() now correctly truncates to actual bit count, fixing padding for non-power-of-64 fingerprints like MACCS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Configurable fingerprints
The existing
morgan_fingerprint()hardcodes radius=3, nBits=2048. Every ML pipeline needs to customize these:Also adds:
rdk_fingerprint_with_params(min_path, max_path, fp_size)pattern_fingerprint_with_params(fp_size)maccs_fingerprint()— 167-bit MACCS keysFingerprint::len(),is_empty()Bug fix:
Fingerprint::new()now truncates to the actual bit count. Previously, non-power-of-64 fingerprints (like MACCS 167-bit) included padding from u64 block boundaries. This is backward-compatible for existing 2048-bit fingerprints (2048 is a multiple of 64).🤖 Generated with Claude Code