233 feature implementation of pcr by inarteroger · Pull Request #254 · paucablop/chemotools

inarteroger · 2026-04-13T18:44:20Z

Adding Principal Component Regression (PCR) algorithm.

Copilot

Pull request overview

Adds a new Principal Component Regression (PCR) estimator to chemotools.models, with a comprehensive sklearn-compatibility test suite.

Changes:

Introduces PrincipalComponentRegression (PCA + LinearRegression) as a sklearn-style estimator/transformer.
Exposes PCR from chemotools.models via __init__.py.
Adds extensive tests comparing predictions/attributes/scores vs an equivalent sklearn pipeline.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`chemotools/models/_principal_component_regression.py`	New PCR estimator implementation (fit/transform/predict + sklearn param validation).
`chemotools/models/__init__.py`	Exports `PrincipalComponentRegression` from the `chemotools.models` public API.
`tests/models/test_principal_component_regression.py`	New tests for sklearn estimator compliance and parity with a PCA+LinearRegression pipeline.

Comments suppressed due to low confidence (1)

chemotools/models/init.py:16

chemotools.models emits a FutureWarning on every import, but this PR adds PrincipalComponentRegression to this module. That means users importing PCR via the new public API (from chemotools.models import PrincipalComponentRegression) will now see a warning message that only discusses PLSRegression, which is confusing and noisy. Consider moving the warning so it only triggers when PLSRegression is imported/used, or updating the warning text/scope to reflect the expanded contents of chemotools.models.

from chemotools.models._cross_decomposition import PLSRegression
from ._principal_component_regression import PrincipalComponentRegression

__all__ = ["PLSRegression", "PrincipalComponentRegression"]

# Show deprecation notice on module import
warnings.warn(
    "chemotools.models.PLSRegression extends sklearn's PLSRegression with "
    "explained_x_variance_ratio_ and explained_y_variance_ratio_ attributes. "
    "This feature is being contributed to scikit-learn (see PR #32722). "
    "Once available in sklearn, this module may be deprecated. "
    "Track progress at: https://github.com/scikit-learn/scikit-learn/pull/32722",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T18:49:04Z

+        # Train PCA model
+        self.pca_ = PCA(n_components=self.n_components).fit(X)
+        x_scores = self.pca_.transform(X)
+
+        # Train linear regression model
+        self.lr_ = LinearRegression().fit(x_scores, y)
+


The estimator exposes a copy parameter, but it is only used in validate_data and is not passed through to the underlying PCA(copy=...) (nor documented). This makes copy behavior differ from sklearn's PCA(copy=...) and from what users might expect in a PCR wrapper. Consider either propagating copy to PCA (and possibly LinearRegression) or renaming/removing it to avoid implying sklearn-PCA semantics.

Copilot · 2026-04-13T18:49:04Z

+from chemotools.models._principal_component_regression import (
+    PrincipalComponentRegression,
+)


The tests import PrincipalComponentRegression from the private module chemotools.models._principal_component_regression, even though the PR adds it to chemotools.models.__init__ and __all__. To test the supported public API (and avoid coupling tests to internal module paths), import it from chemotools.models instead.

Suggested change

from chemotools.models._principal_component_regression import (

PrincipalComponentRegression,

)

from chemotools.models import PrincipalComponentRegression

Copilot · 2026-04-13T18:49:05Z

+        -------
+        x_scores : np.ndarray of shape (n_samples,n_components)
+            The transformed data.
+        """


transform() calls self.pca_.transform(X) without verifying the estimator is fitted or validating feature count/dtype. This will raise an AttributeError instead of a scikit-learn NotFittedError, and skips validate_data(..., reset=False) checks that predict() already performs. Add check_is_fitted(self, ["pca_", "lr_"]) (or at least "pca_") and validate X similarly to predict() before transforming.

Suggested change

"""

"""

check_is_fitted(self, ["pca_", "lr_"])

X = validate_data(

self,

X,

ensure_2d=True,

reset=False,

copy=self.copy,

dtype=np.float64,

)

Copilot · 2026-04-13T18:49:05Z

+    """
+    Description
+
+    Parameters
+    ----------
+    n_components : int, default = 2
+        The number of components used to calculate the PCA model
+        # add comments on parameter constraints
+


The class docstring has several issues that will surface in generated docs: (1) it states n_components default is 2 but __init__ defaults to None; (2) the example import uses chemotools.decomposition which does not exist in this repo (should be chemotools.models); and (3) it contains a leftover placeholder line # add comments on parameter constraints. Please align the docstring with the actual public API and remove the placeholder.

Copilot · 2026-04-13T18:49:05Z

+
+    def test_y_dimensions(self):
+        """
+        Test that y dimesions predicted is the right one.


Several new test names/docstrings contain spelling mistakes that reduce readability/searchability (e.g., "dimesions" in the docstring). Please correct the spelling ("dimensions").

Suggested change

Test that y dimesions predicted is the right one.

Test that y dimensions predicted is the right one.

Copilot · 2026-04-13T18:49:05Z

+    def test_attributs(self):
+        """
+        Test all the attributes are presents
+        """


Several new identifiers have typos that make the test suite harder to navigate (e.g., test_attributs). Please rename to test_attributes to match the wording used elsewhere in the repo (see tests/models/test_cross_decomposition.py).

Copilot · 2026-04-13T18:49:05Z

+                err_msg=f"Attribute {attr} should match sklearn exactly",
+            )
+
+    def test_tranform_as_sklearn(self):


Typo in test name test_tranform_as_sklearn makes it harder to discover and is inconsistent with the test_same_transform_as_sklearn naming used elsewhere (e.g., PLS tests). Consider renaming it to test_transform_as_sklearn (or test_same_transform_as_sklearn).

Suggested change

def test_tranform_as_sklearn(self):

def test_transform_as_sklearn(self):

Copilot · 2026-04-13T18:49:06Z

+    _parameter_constraints: dict = {
+        "n_components": [
+            Interval(Integral, 0, None, closed="left"),
+            Interval(RealNotInt, 0, 1, closed="neither"),
+            StrOptions({"mle"}),
+            None,
+        ],
+        "copy": ["boolean"],
+    }
+
+    def __init__(
+        self,
+        n_components: int | None = None,
+        copy: bool = True,
+    ):
+        self.n_components = n_components
+        self.copy = copy


The parameter constraints indicate n_components may be an int, a float in (0, 1), the string "mle", or None, but the type hint in __init__ restricts it to int | None. This mismatch will confuse users/type-checkers and doesn't reflect the actual accepted values passed through to sklearn.decomposition.PCA. Consider widening the annotation (and updating the docstring) to match the supported types.

inarteroger added 23 commits February 17, 2026 18:38

feat:initial implementation of principal component regression

b9a672b

fix: formatting

bdb5c1d

fix: formatting

a165aed

test: integration test

1eadd72

fix: primary function

59635a1

fix: primary function

7cd1f86

fix: primary function

edc0fa4

fix: major function

606e60c

fix: major function

fd4f76f

fix: major function

d46d4e4

fix: major function

fd4c6b5

fix: major function and writing the test

5abb5eb

fix: major function and writing the test

c353993

fix: major function and writing the test

12cfcc1

fix: major function and writing the test

2a57db7

fix: major function and test

c48398e

fix: format

ce32cc2

fix: errors

f9526e2

fix: sklearn definition

485b586

fix: add scale as input argument for PCA

d0bb9bd

fix: adding scale as input in PCA function

16e04cf

fix: final corrections

7b9bcc8

fix: format

daae041

inarteroger requested a review from paucablop April 13, 2026 18:44

inarteroger self-assigned this Apr 13, 2026

inarteroger added the enhancement New feature or request label Apr 13, 2026

inarteroger linked an issue Apr 13, 2026 that may be closed by this pull request

feature: implementation of PCR #233

Open

paucablop requested a review from Copilot April 13, 2026 18:45

paucablop added this to CHEMOTOOLS Apr 13, 2026

paucablop added this to the v1.0.0 milestone Apr 13, 2026

paucablop moved this to Review in CHEMOTOOLS Apr 13, 2026

Copilot started reviewing on behalf of paucablop April 13, 2026 18:45 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

fix: avoid nested estimator

7ee12cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

233 feature implementation of pcr#254

233 feature implementation of pcr#254
inarteroger wants to merge 24 commits into
mainfrom
233-feature-implementation-of-pcr

inarteroger commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        """
+        """
+        check_is_fitted(self, ["pca_", "lr_"])
+        X = validate_data(
+            self,
+            X,
+            ensure_2d=True,
+            reset=False,
+            copy=self.copy,
+            dtype=np.float64,
+        )

	Test that y dimesions predicted is the right one.
	Test that y dimensions predicted is the right one.

	def test_tranform_as_sklearn(self):
	def test_transform_as_sklearn(self):

Conversation

inarteroger commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants