GitHub - wearepal/fair-forge: A toolkit for developing group-aware ML methods

What is fair-forge?

fair-forge is a Python toolkit for developing and evaluating group-aware machine learning methods — models that take sensitive attributes (such as sex or race) into account during training in order to produce fairer outcomes.

The library provides:

Fairness metrics: group-aware metrics that measure disparities across demographic groups (e.g. equalized odds gaps, demographic parity ratios). Standard metrics like accuracy can be used directly from scikit-learn, and fair-forge can turn any such metric into a group-aware variant via as_group_metric().
Fairness-aware methods: these come in two flavours: preprocessing methods like Upsampler, which modify the training data and can be combined with any scikit-learn estimator via GroupPipeline; and in-processing methods like Reweighting (Kamiran & Calders, 2012), which alter the training dynamics directly (e.g. by adjusting sample weights).
An evaluation pipeline: a single evaluate() call that handles train/test splitting, preprocessing, training multiple methods, and computing all metrics, returning a tidy Polars DataFrame of results.
Example datasets: including the Adult income dataset, ready to use out of the box.

fair-forge is designed around scikit-learn compatibility: methods follow the estimator API, standard scikit-learn metrics and models can be used directly, and preprocessing transforms work with scikit-learn Pipelines via metadata routing. This means you can mix and match fair-forge components with the broader scikit-learn ecosystem without any adapter code.

Quick example

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import fair_forge as ff

# Load the Adult dataset with "Sex" as the sensitive attribute
adult = ff.load_adult("Sex")

# Compare a standard model against a fairness-aware variant
methods = {
    "LR": LogisticRegression(),
    "Reweighting": ff.Reweighting(LogisticRegression()),
}

# Evaluate with accuracy and the minimum accuracy over groups (“robust accuracy”)
results = ff.evaluate(
    dataset=adult,
    methods=methods,
    metrics=[accuracy_score],
    group_metrics=ff.as_group_metric((accuracy_score,), ff.MetricAgg.MIN),
)
print(results)

For a complete walkthrough, see examples/full_example.ipynb.

Installation

This library requires at least Python 3.12. Install it from pypi:

pip install fair-forge

or from GitHub:

pip install git+https://github.com/wearepal/fair-forge.git

If you want to use the neural-network-based methods, you need to add the nn extras:

pip install 'fair-forge[nn]'

or

pip install 'fair-forge[nn] @ git+https://github.com/wearepal/fair-forge.git'

Usage

fair-forge provides two main components: metrics and methods. Besides these, there are various utility functions to help with common tasks and also a few example datasets.

The core data type used in forge-fair is numpy arrays: all the methods and metrics expect numpy arrays as input. If you have data in a different form, it is usually easy to convert it to numpy arrays:

Pandas: to_numpy()
Polars: to_numpy()
PyTorch: numpy()
TensorFlow: make_ndarray()

Metrics

There are group-aware metrics and non-group-aware metrics. The non-group-aware metrics are callables with this function signature:

import numpy as np
from numpy.typing import NDArray

type Float = float | np.float16 | np.float32 | np.float64

def tpr(y_true: NDArray[np.int32], y_pred: NDArray[np.int32]) -> Float: ...

In other words, a non-group-aware metric accepts two numpy arrays — one with the true labels and one with the predicted labels — and returns a single Float. The API of the non-group-aware metrics is chosen such that any metric from scikit-learn can be used — for example, accuracy.

Group-aware metrics take an additional parameter, the group labels:

def cv(
    y_true: NDArray[np.int32],
    y_pred: NDArray[np.int32],
    *,
    groups: NDArray[np.int32],
) -> Float:

A very important function is fair_forge.as_group_metric(). It takes in a non-group-aware metric, and turns it into one or more group-aware metrics. This is done by first computing the metric value per group, and these individual metric values are then aggregated in different ways — for example, by taking the minimum or the ratio of the values. Here is how one would construct a robust accuracy metric (minimum accuracy across all groups):

import fair_forge as ff
from sklearn.metrics import accuracy_score

# Construct a metric for the minimum accuracy over all groups
(robust_accuracy,) = ff.as_group_metric(
    (accuracy_score,), agg=ff.MetricAgg.MIN
)

# Use it as a group-aware metric
robust_accuracy(y_true=y_true, y_pred=y_pred, groups=groups)

Methods

The group-aware vs non-group-aware distinction also exists for the methods provided in this library. The non-group-aware methods simply follow the scikit-learn API for an estimator (inheriting from BaseEstimator adds some mixin methods which are needed):

from sklearn.base import BaseEstimator

class Method(BaseEstimator):
    def fit(self, X: NDArray[np.float32], y: NDArray[np.int32]) -> Self:
        pass

    def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
        pass

The methods can be used like normal scikit-learn estimators.

On the other hand, we have the group-based methods, which take an additional parameter, the group labels:

from sklearn.base import BaseEstimator

class GroupMethod(BaseEstimator):
    def fit(self, X: NDArray[np.float32], y: NDArray[np.int32], *, group: NDArray[np.int32]) -> Self:
        pass

    def predict(self, X: NDArray[np.float32]) -> NDArray[np.int32]:
        pass

These methods can use the group information at training time to produce fairer models.

Besides methods which output a machine learning model, there are also methods which transform the data. These then have a transform method instead of a predict method:

from sklearn.base import BaseEstimator

class MyGroupBasedTransform(BaseEstimator):
    def fit(
        self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
    ) -> Self:
        pass

    def transform(self, X: NDArray[np.float32]) -> NDArray[np.float32]:
        pass

    def fit_transform(
        self, X: NDArray[np.float32], y: NDArray[np.int32], *, groups: NDArray[np.int32]
    ) -> NDArray[np.float32]:
        pass

(Unfortunately, you have to implement fit_transform manually, because otherwise it will not pass on the groups parameter.)

Such transformation methods can then be combined with non-group-aware methods with scikit-learn’s Pipeline:

from sklearn import config_context
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC

# Pipeline will only forward the `groups` argument if we
# set `enable_metadata_routing` to `True`.
with config_context(enable_metadata_routing=True):
    estimator = LinearSVC(random_state=42, max_iter=100)
    transform = MyGroupBasedTransform(random_state=42)
    # We need to explicitly request here that the transformation's
    # `fit` method gets the `groups` argument.
    transform.set_fit_request(groups=True)

    pipeline = Pipeline([("transform", transform), ("estimator", estimator)])

    # This will call `fit_and_transform` on MyGroupBasedTransform
    pipeline.fit(train_x, train_y, groups=train_groups)
    preds = pipeline.predict(test_x)

Alternatively, you can use ff.GroupPipeline which handles this specific case directly, without the need for metadata routing. However, ff.GroupPipeline is limited to exactly one transformation and one estimator.

Utilities

fair-forge provides many useful components for running experiments and collecting results:

example datasets (like Adult)
train-test splitting
facilities for running multiple methods and evaluating them with multiple metrics

For more information on this, see the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
data/raw		data/raw
docs		docs
examples		examples
fair_forge		fair_forge
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is fair-forge?

Quick example

Installation

Usage

Metrics

Methods

Utilities

About

Uh oh!

Releases 5

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is fair-forge?

Quick example

Installation

Usage

Metrics

Methods

Utilities

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Contributors

Uh oh!

Languages