Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
3882236
wip: [44] test push: untracked VSCode settings; added requirements an…
IruNikZe Dec 17, 2023
14d2748
wip: [44] attempt to remove `settings.json` from project
IruNikZe Dec 23, 2023
9c10929
feat: [44] added `settings.json`-template for also including `ruff` (…
IruNikZe Dec 23, 2023
4db90e4
wip: [44] git-ignored `.vscode`-folder again to decouple user setting…
IruNikZe Dec 23, 2023
935062c
fix: [44] fixed wrong type hint; isort; black formatting; trim traili…
IruNikZe Dec 23, 2023
ae352b3
feat: [44] added ruff linting checks; added requirements for development
IruNikZe Dec 23, 2023
67faaf3
feat: [44] added finite difference computation for Whittaker smoothers
IruNikZe Dec 23, 2023
d13c5ce
feat: [44] added more banded linear algebra functions with `scipy`s L…
IruNikZe Dec 23, 2023
0041382
feat: [44] added centralized Whittaker smoother class
IruNikZe Dec 23, 2023
c3dff48
feat: [44] unified whittaker-smoother-like methods with performance e…
IruNikZe Dec 23, 2023
995a964
style: [44] made docstrings of whittaker smoothers black-compatible; …
IruNikZe Dec 23, 2023
cc325e3
Merge branch 'main' into 44-improve-airpls-and-arpls-performance-spar…
IruNikZe Dec 23, 2023
0437cc7
wip: [44] figured out pentapy problems and saved working status for f…
IruNikZe Dec 25, 2023
f37ffe7
refactor: [44] changed paradigm in Whittaker smoothing to work with m…
IruNikZe Dec 27, 2023
8f3b56f
fix: [44] added skip for pentapy tests
IruNikZe Dec 27, 2023
b9fd2e0
refactor: [44] temporarily disabled long computation of squared finit…
IruNikZe Dec 27, 2023
fa0479a
test push
MothNik Apr 18, 2024
25b5682
fix: removed `maturin` from dev `requirements`
MothNik Apr 20, 2024
11f4937
feat: re-added improved version of banded LU-decomposition; added tests
MothNik Apr 20, 2024
d9bb39a
feat: parallelized `pytest`
MothNik Apr 20, 2024
556b92f
feat: added wrappers for a dataclass-based banded Cholesky decomposit…
MothNik Apr 20, 2024
064afcf
refactor: implemented fast computation of both squared forward finite…
MothNik Apr 21, 2024
097d4f0
refactor: renamed decomposition model to solver model; added PentaPy …
MothNik Apr 22, 2024
bc2198b
feat: added new banded storage conversion function; added submodule h…
MothNik Apr 22, 2024
67815f9
refactor: removed Cholesky support
MothNik Apr 22, 2024
8cfa2f5
feat: added submodule header; organized submodule; added automated sm…
MothNik Apr 22, 2024
f7f6638
refactor: major refactoring of Whittaker smoother base class; removed…
MothNik Apr 22, 2024
6098848
refactor: went for more secure way of specifying the penalty weight
MothNik Apr 22, 2024
254f906
refactor: replaced intermediate solver distributed method by a dictio…
MothNik Apr 22, 2024
7b5385f
style: went for using `ndarray`-methods
MothNik May 1, 2024
51261d3
feat: added fixed lambda to models
MothNik May 1, 2024
a857ed2
refactor/feat: added automatic smoothing based on log marginal likeli…
MothNik May 1, 2024
8569b6f
fix: fixed missing early returns
MothNik May 1, 2024
70145e0
fix: removed unused import
MothNik May 1, 2024
bdc7c34
fix: simplified weight checker; fixed broken multi-signal-single-weig…
MothNik May 1, 2024
2664245
refactor: made log determinant sign computation less floating point d…
MothNik May 10, 2024
c72de74
tests: added parallelizable, parametrized tests for utility functions…
MothNik May 10, 2024
50ec397
feat: included doctests into `pytest` tests
MothNik May 10, 2024
22476c3
tests: wrote separate doctested utility function for banded log-deter…
MothNik May 10, 2024
d208e0d
fix: removed now wrong Whittaker implementation details
MothNik May 10, 2024
428948c
fix: made error message compatible to the questionable Sklearn standa…
MothNik May 10, 2024
e3dbf2f
fix: fixed original signal overwrite
MothNik May 10, 2024
a72900a
refactor: updated Whittaker-like baseline corrections with new Whitta…
MothNik May 10, 2024
5e41a03
refactor: adapted to new internal whittaker implementation
MothNik May 10, 2024
d4a2f1d
refactor: updated Whittaker functionality tests; reverted internal co…
MothNik May 10, 2024
664e66e
refactor: moved `pentapy` import check into dedicated `_runtime`-module
MothNik May 10, 2024
ff17817
fix: removed `# else nothing`
MothNik May 10, 2024
67694d9
refactor: raise error when weights are not provided for log marginal …
MothNik May 10, 2024
200e45e
fix: fixed type hints breaking Python 3.9 compatibility
MothNik May 10, 2024
45c2c38
fix: removed dead branch
MothNik May 10, 2024
fc206a1
refactor/fix: removed dead functions; excluded hard-to-produce-edge-c…
MothNik May 10, 2024
bf9c5c8
refactor: lowered method name
MothNik May 10, 2024
5a1f735
feat: added coverage to tests
MothNik May 10, 2024
a8ecb7d
tests: added parametrized parallelizable tests for the lambda-value o…
MothNik May 10, 2024
365c5dd
tests: added utility functions for model testing
MothNik May 10, 2024
9140e92
style: made class heritage more structured
MothNik May 11, 2024
199eac4
refactor: split up whittaker base to make it more modular, flexible, …
MothNik May 11, 2024
a3b5af6
refactor: split up whittaker base to make it more modular, flexible, …
MothNik May 11, 2024
1c709d3
tests/refactor: simplified model tests
MothNik May 11, 2024
63b1217
tests: added clearer comments to model tests
MothNik May 11, 2024
5ec6487
tests/refactor: removed `numpy` from model tests
MothNik May 11, 2024
70e9322
tests/refactor: renamed `Numeric` to `RealNumeric`
MothNik May 11, 2024
3fbbd7a
tests/refactor: slightly reduced type aliases
MothNik May 11, 2024
cb97bf4
tests/fix: fixed wrong description in docstring
MothNik May 11, 2024
e57ab4a
refactor: made weight iterator less error prone and better testable
MothNik May 11, 2024
fcb957b
tests/feat/refactor: added fixture for automated lambda estimation of…
MothNik May 11, 2024
0037c71
fix: fixed fatal repetition of upper Cholesky to LU banded storage co…
MothNik May 11, 2024
1841b8e
refactor: issue warning on too high difference order; added more info…
MothNik May 11, 2024
9ab87ee
tests/refactor: added NaN-values to noise-level to make test flexible
MothNik May 11, 2024
9bb29fe
fix: removed dead 1D-weight generator branch
MothNik May 11, 2024
4ed784f
tests/fix: removed dead direct call of fixture that caused an error
MothNik May 11, 2024
c48445c
refactor: made warning an explicit `UserWarning`
MothNik May 11, 2024
16ef45c
fix: fixed internally broken wrong use of unchecked weights in weight…
MothNik May 11, 2024
3d54add
doc: added TODO for why if statement was excluded from coverage
MothNik May 12, 2024
629cef4
test/docs: commented model tests clearer
MothNik May 12, 2024
a5a5864
tests: added input checker tests
MothNik May 12, 2024
85563c1
refactor: went from index-based to mask-based approach for performance
MothNik May 13, 2024
1e11bc7
feat: added true spectrum to the test dataset for the automated Whitt…
MothNik May 13, 2024
de17316
refactor: went for optimization scheme that can better be ported to l…
MothNik May 19, 2024
435d6b8
refactor: replaced subtraction by not-equals-comparison
MothNik May 19, 2024
160d3c3
tests/refactor: split up `utils` in functions and models
MothNik May 19, 2024
48da8b5
test/refactor/feat: implemented auto log lambda utility functions bas…
MothNik May 19, 2024
b924a7d
test/feat: implemented full tests for whittaker base
MothNik May 19, 2024
a7f07e7
doc: fixed type in docstring
MothNik May 19, 2024
fc8f698
test/cov: improved test coverage in Whittaker base by excluding edge …
MothNik May 19, 2024
dce0117
test: added test for `_datacopied`
MothNik May 19, 2024
01c032f
feat: added type utility module
MothNik May 20, 2024
d6e5d02
refactor: added defaults to auto-lambda-specs-model
MothNik May 20, 2024
b34eca1
refactor: made types consistent for whittaker base and estimator
MothNik May 20, 2024
2bea38b
feat: added author to `whittaker_base`
MothNik May 20, 2024
9394025
feat: enriched `__init__` of `smooth` by docstring; improved linting
MothNik May 20, 2024
c408040
refactor: adapted docstrings of WhittakerSmooth; unified type hints; …
MothNik May 20, 2024
0699a7c
refactor: made `whittaker_base` a private utility
MothNik May 20, 2024
6ef864a
refactor: made `banded_linalg` private for now
MothNik May 20, 2024
907baa4
refactor: made utility `models` and `types` private
MothNik May 20, 2024
face185
refactor: made `finite_differences` a private utility module for now
MothNik May 20, 2024
6885f7c
style: made variable names of Whittaker base more concise with specia…
MothNik May 20, 2024
e78823e
test/feat: tested weight generator more thoroughly
MothNik May 20, 2024
78610f4
feat: added author name; minor adaption in variable names
MothNik May 20, 2024
4f1eb69
doc: unified docstrings of Whittaker-like classes
MothNik May 20, 2024
02f59f2
Merge branch 'main' into 44-improve-airpls-and-arpls-performance-spar…
MothNik May 20, 2024
bb28499
feat: added `line_profiler` to dev requirements
MothNik May 20, 2024
c224915
doc: updated documentation of `WhittakerSmooth`
MothNik May 20, 2024
395c7c0
test/doc: clarified doctest origin
MothNik May 20, 2024
dd4cc7a
test/feat/refactor: added preliminary global/local noise estimation f…
MothNik May 20, 2024
77b9f6e
test/refactor: included `power` failure into wrong input test of nois…
MothNik May 20, 2024
f859bab
test/feat/refactor: added central finite differences test; made finit…
MothNik May 21, 2024
7f13f9b
feat: exposed noise level estimation where it can be useful
MothNik May 21, 2024
b042e60
refactor/docs: added kwargs for extrapolator; fixed wrong docstrings …
MothNik May 21, 2024
e6e8405
refactor: renamed `window_size` to `window_length`
MothNik May 23, 2024
c60c8e3
doc: added proper Notes for weight selection strategies of `Whittaker…
MothNik May 24, 2024
4a1be7e
refactor: added `psutil`-install for `pytest-xdist`
MothNik May 24, 2024
109be51
Revert "refactor: renamed `window_size` to `window_length`"
MothNik May 24, 2024
124cf50
refactor: renamed `window_length` in docstring to `window_size`
MothNik May 24, 2024
5697fd9
style: [44]
MothNik Jun 24, 2024
fc8572f
style: [44]
MothNik Jun 24, 2024
45b8224
style: [44]
MothNik Jun 24, 2024
36cbf29
refactor: [44]
MothNik Jun 24, 2024
e905250
feat: [44]
MothNik Jun 24, 2024
3826871
feat: [44]
MothNik Jun 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ celerybeat.pid

# Environments
.env
.venv
.venv*
env/
venv/
ENV/
Expand Down
8 changes: 0 additions & 8 deletions .vscode/settings.json

This file was deleted.

14 changes: 14 additions & 0 deletions .vscode/settings_template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"python.defaultInterpreterPath": "Enter interpreter path here for debugging",
"ruff.interpreter": [
"Enter interpreter path here for linting"
]
}
21 changes: 21 additions & 0 deletions chemotools/_runtime/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""
This submodule checks for the presence of the required software packages at runtime.

The following optional packages are checked for:
- `pentapy` for solving pentadiagonal systems of equations for the Whittaker-Henderson
smoothing algorithm.

"""

### Imports ###

# if possible, pentapy is imported since it provides a more efficient implementation
# of solving pentadiagonal systems of equations, but the package is not in the
# dependencies, so ``chemotools`` needs to be made aware of whether it is available
PENTAPY_AVAILABLE: bool = False
try:
import pentapy as pp # noqa: F401

PENTAPY_AVAILABLE: bool = True
except ImportError:
pass
24 changes: 15 additions & 9 deletions chemotools/augmentation/spectrum_scale.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from typing import Optional

import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin, OneToOneFeatureMixin
from sklearn.base import BaseEstimator, OneToOneFeatureMixin, TransformerMixin
from sklearn.utils.validation import check_is_fitted

from chemotools.utils.check_inputs import check_input
Expand All @@ -17,15 +19,15 @@ class SpectrumScale(OneToOneFeatureMixin, BaseEstimator, TransformerMixin):

random_state : int, default=None
The random state to use for the random number generator.

Attributes
----------
n_features_in_ : int
The number of features in the input data.

_is_fitted : bool
Whether the transformer has been fitted to data.

Methods
-------
fit(X, y=None)
Expand All @@ -35,15 +37,18 @@ class SpectrumScale(OneToOneFeatureMixin, BaseEstimator, TransformerMixin):
Transform the input data by scaling the spectrum.
"""


def __init__(self, scale: int = 0.0, random_state: int = None):
def __init__(
self,
scale: float = 0.0,
random_state: Optional[int] = None,
):
self.scale = scale
self.random_state = random_state

def fit(self, X: np.ndarray, y=None) -> "SpectrumScale":
"""
Fit the transformer to the input data.

Parameters
----------
X : np.ndarray of shape (n_samples, n_features)
Expand Down Expand Up @@ -97,7 +102,9 @@ def transform(self, X: np.ndarray, y=None) -> np.ndarray:

# Check that the number of features is the same as the fitted data
if X_.shape[1] != self.n_features_in_:
raise ValueError(f"Expected {self.n_features_in_} features but got {X_.shape[1]}")
raise ValueError(
f"Expected {self.n_features_in_} features but got {X_.shape[1]}"
)

# Calculate the scaled spectrum
for i, x in enumerate(X_):
Expand All @@ -106,6 +113,5 @@ def transform(self, X: np.ndarray, y=None) -> np.ndarray:
return X_.reshape(-1, 1) if X_.ndim == 1 else X_

def _scale_spectrum(self, x) -> np.ndarray:
scaling_factor = self._rng.uniform(low=1-self.scale, high=1+self.scale)
scaling_factor = self._rng.uniform(low=1 - self.scale, high=1 + self.scale)
return np.multiply(x, scaling_factor)

175 changes: 122 additions & 53 deletions chemotools/baseline/_air_pls.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,74 @@
"""
This module contains the ``AirPLS`` transformer, which performs baseline correction on
data according to the Whittaker-Henderson formulation of Penalized Least Squares which
was modified by the introduction of weights that are updated iteratively to improve the
baseline identification.

References
----------
It's based on the algorithms described in [1]_ and [2]_ where an implementational
adaption of [2]_ was required to make it numerically stable ([3]_).

.. [1] Z.-M. Zhang, S. Chen, and Y.-Z. Liang, "Baseline correction using adaptive
iteratively reweighted penalized least squares", Analyst 135 (5), 1138-1146 (2010)
.. [2] G. Biessy, "Revisiting Whittaker-Henderson smoothing", arXiv:2306.06932 (2023)
.. [3] https://math.stackexchange.com/q/4819039/1261538

"""

# Authors:
# Pau Cabaneros
# Niklas Zell <nik.zoe@web.de>


### Imports ###


import logging
from typing import Union

import numpy as np
from scipy.sparse import csc_matrix, eye, diags
from scipy.sparse.linalg import spsolve
from sklearn.base import BaseEstimator, TransformerMixin, OneToOneFeatureMixin
from sklearn.base import BaseEstimator, OneToOneFeatureMixin, TransformerMixin
from sklearn.utils.validation import check_is_fitted

from chemotools.utils._whittaker_base import WhittakerLikeSolver
from chemotools.utils.check_inputs import check_input

logger = logging.getLogger(__name__)

### Main Class ###

class AirPls(OneToOneFeatureMixin, BaseEstimator, TransformerMixin):

# TODO: is polynomial_order actually differences and if so, is the description correct?
class AirPls(
OneToOneFeatureMixin,
BaseEstimator,
TransformerMixin,
WhittakerLikeSolver,
):
"""
This class implements the AirPLS (Adaptive Iteratively Reweighted Penalized Least Squares) algorithm for baseline
correction of spectra data. AirPLS is a common approach for removing the baseline from spectra, which can be useful
in various applications such as spectroscopy and chromatography.
This class implements the Adaptive Iteratively Reweighted Penalized Least Squares
a.k.a AirPLS algorithm for baseline correction of spectra data. AirPLS is a common
approach for removing the baseline from spectra, which can be useful in various
applications such as spectroscopy and chromatography.

Parameters
----------
lam : float, optional default=1e2
The lambda parameter controls the smoothness of the baseline. Increasing the value of lambda results in
a smoother baseline.
lam : float or int, optional default=1e2
The lambda parameter that controls the smoothness of the baseline. Higher values
will result in a smoother baseline.

polynomial_order : int, optional default=1
The polynomial order determines the degree of the polynomial used to fit the baseline. A value of 1 corresponds
The degree of the polynomial used to fit the baseline. A value of 1 corresponds
to a linear fit, while higher values correspond to higher-order polynomials.
Higher values will result in a smoother baseline.
Currently, values ``>= 3`` are highly discouraged due to numerical instability
that might obscure the smoothing effect.

nr_iterations : int, optional default=15
The number of iterations used to calculate the baseline. Increasing the number of iterations can improve the
accuracy of the baseline correction, but also increases the computation time.
The number of iterations used to calculate the baseline. Increasing the number
of iterations can improve the accuracy of the baseline correction at the cost of
computation time.

Methods
-------
Expand All @@ -40,33 +80,43 @@ class AirPls(OneToOneFeatureMixin, BaseEstimator, TransformerMixin):

_calculate_whittaker_smooth(x, w)
Calculate the Whittaker smooth of a given input vector x, with weights w.

_calculate_air_pls(x)
Calculate the AirPLS baseline of a given input vector x.

References
----------
- Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least
squares. Analyst 135 (5), 1138-1146 (2010).
It's based on the algorithms described in [1]_ and [2]_ where an implementational
adaption of [2]_ was required to make it numerically stable ([3]_).

.. [1] Z.-M. Zhang, S. Chen, and Y.-Z. Liang, "Baseline correction using adaptive
iteratively reweighted penalized least squares", Analyst 135 (5), 1138-1146
(2010)
.. [2] G. Biessy, "Revisiting Whittaker-Henderson smoothing", arXiv:2306.06932
(2023)
.. [3] https://math.stackexchange.com/q/4819039/1261538

"""

# TODO: polynomial order is actually differences
def __init__(
self,
lam: int = 100,
lam: Union[float, int] = 100,
polynomial_order: int = 1,
nr_iterations: int = 15,
):
self.lam = lam
self.polynomial_order = polynomial_order
self.nr_iterations = nr_iterations
self.lam: Union[float, int] = lam
self.polynomial_order: int = polynomial_order
self.nr_iterations: int = nr_iterations

def fit(self, X: np.ndarray, y=None) -> "AirPls":
"""Fit the AirPls baseline correction estimator to the input data.

Parameters
----------
X : array-like of shape (n_samples, n_features)
The input data.
The input data. It is internally promoted to ``np.float64`` to avoid loss of
precision.

y : array-like of shape (n_samples,), optional (default=None)
The target values.
Expand All @@ -75,9 +125,25 @@ def fit(self, X: np.ndarray, y=None) -> "AirPls":
-------
self : AirPls
Returns the instance itself.

"""
# Check that X is a 2D array and has only finite values
X = self._validate_data(X)
X = BaseEstimator._validate_data( # type: ignore
self,
X,
reset=True,
ensure_2d=True,
force_all_finite=True,
dtype=WhittakerLikeSolver._WhittakerLikeSolver__dtype, # type: ignore
)

# the internal solver is set up
self._setup_for_fit(
num_data=X.shape[1],
differences=self.polynomial_order,
lam=self.lam,
child_class_name=self.__class__.__name__,
)

return self

Expand All @@ -96,60 +162,63 @@ def transform(self, X: np.ndarray, y=None) -> np.ndarray:
-------
X_ : array-like of shape (n_samples, n_features)
The transformed data with the baseline removed.

"""

# Check that the estimator is fitted
check_is_fitted(self, "n_features_in_")

# Check that X is a 2D array and has only finite values
X = check_input(X)
X = check_input(
X,
dtype=WhittakerLikeSolver._WhittakerLikeSolver__dtype, # type: ignore
)
X_ = X.copy()

# Check that the number of features is the same as the fitted data
if X_.shape[1] != self.n_features_in_:
# NOTE: ``n_features_in_`` is set in ``BaseEstimator._validate_data`` when
# ``reset`` is True
if X_.shape[1] != self.n_features_in_: # type: ignore
raise ValueError(
f"Expected {self.n_features_in_} features but got {X_.shape[1]}"
f"Expected {self.n_features_in_} features but got {X_.shape[1]}" # type: ignore # noqa: E501
)

# Calculate the air pls smooth
for i, x in enumerate(X_):
X_[i] = x - self._calculate_air_pls(x)

return X_.reshape(-1, 1) if X_.ndim == 1 else X_

def _calculate_whittaker_smooth(self, x, w):
X = np.array(x)
m = X.size
E = eye(m, format="csc")
for i in range(self.polynomial_order):
E = E[1:] - E[:-1]
W = diags(w, 0, shape=(m, m))
A = csc_matrix(W + (self.lam * E.T @ E))
B = csc_matrix(W @ X.T).toarray().ravel()
background = spsolve(A, B)
return np.array(background)
return X_

def _calculate_air_pls(self, x):
m = x.shape[0]
w = np.ones(m)

for i in range(1, self.nr_iterations):
z = self._calculate_whittaker_smooth(x, w)
# FIXME: this initial weighting strategy might not yield the best results
w = np.ones_like(x)
# FIXME: this initialisation will will fail for many signals and produce a
# zero-baseline
z = np.zeros_like(x)
dssn_thresh = max(1e-3 * np.abs(x).sum(), 1e-308) # to avoid 0 equalities

# FIXME: work on full Arrays and use internal loop of ``whittaker_solve``
for i in range(0, self.nr_iterations - 1):
# the baseline is fitted using the Whittaker smoother framework
z, _ = self._solve_single_b_fixed_lam(rhs_b=x, weights=w)
d = x - z
dssn = np.abs(d[d < 0].sum())

if dssn < 0.001 * np.abs(x).sum():
# the algorithm is stopped if the threshold is reached
if dssn <= dssn_thresh:
break

if i == self.nr_iterations - 1:
break

w[d >= 0] = 0
w[d < 0] = np.exp(i * np.abs(d[d < 0]) / dssn)

negative_d = d[d < 0]
if negative_d.size > 0:
w[0] = np.exp(i * negative_d.max() / dssn)
# the weights are updated
below_base_indics = d < 0
w[~below_base_indics] = 0.0
exp_mult = i + 1
w[below_base_indics] = np.exp(exp_mult * np.abs(d[d < 0]) / dssn)

d_negative = d[below_base_indics]
if d_negative.size > 0:
# FIXME: this might easily yield a weight of 1 if the maximum of the
# negative_d is very close to zero
w[0] = np.exp(exp_mult * d_negative.max() / dssn)

w[-1] = w[0]

Expand Down
Loading