Skip to content

233 feature implementation of pcr#254

Open
inarteroger wants to merge 24 commits into
mainfrom
233-feature-implementation-of-pcr
Open

233 feature implementation of pcr#254
inarteroger wants to merge 24 commits into
mainfrom
233-feature-implementation-of-pcr

Conversation

@inarteroger
Copy link
Copy Markdown
Collaborator

Adding Principal Component Regression (PCR) algorithm.

@inarteroger inarteroger requested a review from paucablop April 13, 2026 18:44
@inarteroger inarteroger self-assigned this Apr 13, 2026
@inarteroger inarteroger added the enhancement New feature or request label Apr 13, 2026
@inarteroger inarteroger linked an issue Apr 13, 2026 that may be closed by this pull request
@paucablop paucablop requested a review from Copilot April 13, 2026 18:45
@paucablop paucablop added this to the v1.0.0 milestone Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Principal Component Regression (PCR) estimator to chemotools.models, with a comprehensive sklearn-compatibility test suite.

Changes:

  • Introduces PrincipalComponentRegression (PCA + LinearRegression) as a sklearn-style estimator/transformer.
  • Exposes PCR from chemotools.models via __init__.py.
  • Adds extensive tests comparing predictions/attributes/scores vs an equivalent sklearn pipeline.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
chemotools/models/_principal_component_regression.py New PCR estimator implementation (fit/transform/predict + sklearn param validation).
chemotools/models/__init__.py Exports PrincipalComponentRegression from the chemotools.models public API.
tests/models/test_principal_component_regression.py New tests for sklearn estimator compliance and parity with a PCA+LinearRegression pipeline.
Comments suppressed due to low confidence (1)

chemotools/models/init.py:16

  • chemotools.models emits a FutureWarning on every import, but this PR adds PrincipalComponentRegression to this module. That means users importing PCR via the new public API (from chemotools.models import PrincipalComponentRegression) will now see a warning message that only discusses PLSRegression, which is confusing and noisy. Consider moving the warning so it only triggers when PLSRegression is imported/used, or updating the warning text/scope to reflect the expanded contents of chemotools.models.
from chemotools.models._cross_decomposition import PLSRegression
from ._principal_component_regression import PrincipalComponentRegression

__all__ = ["PLSRegression", "PrincipalComponentRegression"]

# Show deprecation notice on module import
warnings.warn(
    "chemotools.models.PLSRegression extends sklearn's PLSRegression with "
    "explained_x_variance_ratio_ and explained_y_variance_ratio_ attributes. "
    "This feature is being contributed to scikit-learn (see PR #32722). "
    "Once available in sklearn, this module may be deprecated. "
    "Track progress at: https://github.com/scikit-learn/scikit-learn/pull/32722",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +126 to +132
# Train PCA model
self.pca_ = PCA(n_components=self.n_components).fit(X)
x_scores = self.pca_.transform(X)

# Train linear regression model
self.lr_ = LinearRegression().fit(x_scores, y)

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimator exposes a copy parameter, but it is only used in validate_data and is not passed through to the underlying PCA(copy=...) (nor documented). This makes copy behavior differ from sklearn's PCA(copy=...) and from what users might expect in a PCR wrapper. Consider either propagating copy to PCA (and possibly LinearRegression) or renaming/removing it to avoid implying sklearn-PCA semantics.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +5
from chemotools.models._principal_component_regression import (
PrincipalComponentRegression,
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests import PrincipalComponentRegression from the private module chemotools.models._principal_component_regression, even though the PR adds it to chemotools.models.__init__ and __all__. To test the supported public API (and avoid coupling tests to internal module paths), import it from chemotools.models instead.

Suggested change
from chemotools.models._principal_component_regression import (
PrincipalComponentRegression,
)
from chemotools.models import PrincipalComponentRegression

Copilot uses AI. Check for mistakes.
-------
x_scores : np.ndarray of shape (n_samples,n_components)
The transformed data.
"""
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transform() calls self.pca_.transform(X) without verifying the estimator is fitted or validating feature count/dtype. This will raise an AttributeError instead of a scikit-learn NotFittedError, and skips validate_data(..., reset=False) checks that predict() already performs. Add check_is_fitted(self, ["pca_", "lr_"]) (or at least "pca_") and validate X similarly to predict() before transforming.

Suggested change
"""
"""
check_is_fitted(self, ["pca_", "lr_"])
X = validate_data(
self,
X,
ensure_2d=True,
reset=False,
copy=self.copy,
dtype=np.float64,
)

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +26
"""
Description

Parameters
----------
n_components : int, default = 2
The number of components used to calculate the PCA model
# add comments on parameter constraints

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring has several issues that will surface in generated docs: (1) it states n_components default is 2 but __init__ defaults to None; (2) the example import uses chemotools.decomposition which does not exist in this repo (should be chemotools.models); and (3) it contains a leftover placeholder line # add comments on parameter constraints. Please align the docstring with the actual public API and remove the placeholder.

Copilot uses AI. Check for mistakes.

def test_y_dimensions(self):
"""
Test that y dimesions predicted is the right one.
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several new test names/docstrings contain spelling mistakes that reduce readability/searchability (e.g., "dimesions" in the docstring). Please correct the spelling ("dimensions").

Suggested change
Test that y dimesions predicted is the right one.
Test that y dimensions predicted is the right one.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +80
def test_attributs(self):
"""
Test all the attributes are presents
"""
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several new identifiers have typos that make the test suite harder to navigate (e.g., test_attributs). Please rename to test_attributes to match the wording used elsewhere in the repo (see tests/models/test_cross_decomposition.py).

Copilot uses AI. Check for mistakes.
err_msg=f"Attribute {attr} should match sklearn exactly",
)

def test_tranform_as_sklearn(self):
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name test_tranform_as_sklearn makes it harder to discover and is inconsistent with the test_same_transform_as_sklearn naming used elsewhere (e.g., PLS tests). Consider renaming it to test_transform_as_sklearn (or test_same_transform_as_sklearn).

Suggested change
def test_tranform_as_sklearn(self):
def test_transform_as_sklearn(self):

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +89
_parameter_constraints: dict = {
"n_components": [
Interval(Integral, 0, None, closed="left"),
Interval(RealNotInt, 0, 1, closed="neither"),
StrOptions({"mle"}),
None,
],
"copy": ["boolean"],
}

def __init__(
self,
n_components: int | None = None,
copy: bool = True,
):
self.n_components = n_components
self.copy = copy
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter constraints indicate n_components may be an int, a float in (0, 1), the string "mle", or None, but the type hint in __init__ restricts it to int | None. This mismatch will confuse users/type-checkers and doesn't reflect the actual accepted values passed through to sklearn.decomposition.PCA. Consider widening the annotation (and updating the docstring) to match the supported types.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

feature: implementation of PCR

3 participants