Loss Functions

class fairret.loss.base.FairnessLoss

Bases: abc.ABC, torch.nn.modules.module.Module

Abstract base class for fairness losses, also referred to as fairrets.

abstract forward(pred, sens, *stat_args, pred_as_logit=True, **stat_kwargs)

Abstract method that should be implemented by subclasses to calculate the loss.

Parameters:
  • pred (torch.Tensor) – Predictions of shape \((N, 1)\), as we assume to be performing binary classification or regression.

  • sens (torch.Tensor) – Sensitive features of shape \((N, S)\) with S the number of sensitive features.

  • *stat_args (Any) – All arguments used by the statistic that this loss minimizes.

  • pred_as_logit (bool) – Whether the pred tensor should be interpreted as logits. Though most losses are will simply take the sigmoid of pred if pred_as_logit is True, some losses benefit from improved numerical stability if they handle the conversion themselves.

  • **stat_kwargs (Any) – All keyword arguments used by the statistic that this loss computes.

Returns:

The calculated loss as a scalar tensor.

Return type:

torch.Tensor

Violation-based Losses

class fairret.loss.violation.ViolationLoss

Bases: fairret.loss.base.FairnessLoss

Abstract base class for fairness losses that penalize the violation vector of a fairness constraint. The violation vector is computed as the gap between the statistics per sensitive feature and a target statistic.

Each subclass must implement the penalize_violation method.

__init__(statistic)
Parameters:

statistic (fairret.statistic.base.Statistic) – The statistic that should be used to calculate the violation vector. Preferably, a LinearFractionalStatistic is provided, as this allows for a straightforward calculation of the target statistic as the overall statistic.

abstract penalize_violation(violation)

Penalize the fairness violation.

Parameters:

violation (torch.Tensor) – The violation vector, i.e. the vector of gaps between the statistics per sensitive feature and the target statistic.

Returns:

A scalar tensor.

Return type:

torch.Tensor

forward(pred, sens, *stat_args, pred_as_logit=True, target_statistic=None, **stat_kwargs)

Calculate the violation vector in relation to the target_statistic and penalize this violation using the penalize_violation() method implemented by the subclass.

Parameters:
  • pred (torch.Tensor) – Predictions of shape \((N, 1)\), as we assume to be performing binary classification or regression.

  • sens (torch.Tensor) – Sensitive features of shape \((N, S)\) with S the number of sensitive features.

  • *stat_args (Any) – All arguments used by the statistic that this loss minimizes.

  • pred_as_logit (bool) – Whether the pred tensor should be interpreted as logits. Though most losses are will simply take the sigmoid of pred if pred_as_logit is True, some losses benefit from improved numerical stability if they handle the conversion themselves.

  • target_statistic (torch.Tensor | None) – The target statistic as a scalar tensor. If not provided for a LinearFractionalStatistic, the overall statistic will be used by default.

  • **stat_kwargs (Any) – All keyword arguments used by the statistic that this loss computes.

Returns:

The calculated loss as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.violation.NormLoss

Bases: fairret.loss.violation.ViolationLoss

Fairness loss that penalizes the p-norm of the violation vector.

__init__(statistic, p=1)
Parameters:
  • statistic (fairret.statistic.base.Statistic) – The statistic that should be used to calculate the violation vector. Preferably, a LinearFractionalStatistic is provided, as this allows for a straightforward calculation of the target statistic as the overall statistic.

  • p (int) – The order of the norm. Default is 1.

penalize_violation(violation)

Penalize the fairness violation.

Parameters:

violation (torch.Tensor) – The violation vector, i.e. the vector of gaps between the statistics per sensitive feature and the target statistic.

Returns:

A scalar tensor.

Return type:

torch.Tensor

class fairret.loss.violation.LSELoss

Bases: fairret.loss.violation.ViolationLoss

Fairness loss that penalizes the log-sum-exp of the violation vector. The log-sum-exp is a smooth approximation of the maximum function, hence it approximates the maximum violation (or its \(\Vert \cdot \Vert_\infty\) norm)

penalize_violation(violation)

Penalize the fairness violation.

Parameters:

violation (torch.Tensor) – The violation vector, i.e. the vector of gaps between the statistics per sensitive feature and the target statistic.

Returns:

A scalar tensor.

Return type:

torch.Tensor

Projection-based Losses

class fairret.loss.projection.ProjectionLoss

Bases: fairret.loss.base.FairnessLoss

Abstract base class for fairness losses that penalize the statistical distance between a set of predictions and the fair projection of those predictions. The fair projection satisfies the linear fairness constraint corresponding to a LinearFractionalStatistic that is fixed to a target value (such as the overall statistic).

The projections are computed using cvxpy. Hence, any subclass is expected to implement the statistical distance between distributions in both cvxpy and PyTorch by implementing the cvxpy_distance() method and the torch_distance() method respectively.

Optionally, the torch_distance_with_logits() method can be overwritten to provide a more numerically stable handling of predictions that are provided as logits. If left unimplemented, torch_distance() will be called instead, after applying the sigmoid function to the predictions.

Note

We use ‘statistical distance’ in a broad sense here, and do not require that the distance is a metric. See https://en.wikipedia.org/wiki/Statistical_distance for more information.

__init__(statistic, force_proj_normalized=True, proj_eps=0., **solver_kwargs)
Parameters:
  • statistic (fairret.statistic.linear_fractional.LinearFractionalStatistic) – The LinearFractionalStatistic that defines the fairness constraint. The projection is computed through convex optimization, so the constraint should be linear. This is achieved by fixing equality in the LinearFractionalStatistic values to the overall statistic.

  • force_proj_normalized (bool) – Whether to force the projected distribution to be normalized. This might not be the case if the optimization does not converge to a solution that satisfies the normalization constraint. Hence, setting this to True will renormalize the projected distribution to sum to 1.

  • proj_eps (float) – Every probability value in the projected distribution is clamped to the interval [proj_eps, 1 - proj_eps]. Default is 0. Setting this to a small, non-negative value helps prevent numerical instability if the optimization is not done to convergence.

  • solver_kwargs (Any) –

    Any keyword arguments to be passed to the cvxpy solver. The default configuration is:

    {
        'solver': 'SCS',
        'warm_start': True,
        'max_iters': 10,
        'ignore_dpp': True
    }
    

abstract cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

abstract torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

torch_distance_with_logits(pred, proj)

A more numerically stable alternative method to torch_distance(), where pred is assumed to be logits.

Parameters:
  • pred (torch.Tensor) – The predicted distribution as logits, in shape (N,1). As we assume binary classification, this is the logit of the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2) as probabilities. As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

forward(pred, sens, *stat_args, pred_as_logit=True, **stat_kwargs)

Calculate the fairness loss by projecting the predictions onto the fair set and computing the statistical distance between the predictions and the projection.

Parameters:
  • pred (torch.Tensor) – Predictions of shape \((N, 1)\), as we assume to be performing binary classification or regression.

  • sens (torch.Tensor) – Sensitive features of shape \((N, S)\) with S the number of sensitive features.

  • *stat_args (Any) – All arguments used by the statistic that this loss minimizes.

  • pred_as_logit (bool) – Whether the pred tensor should be interpreted as logits. Though most losses are will simply take the sigmoid of pred if pred_as_logit is True, some losses benefit from improved numerical stability if they handle the conversion themselves.

  • **stat_kwargs (Any) – All keyword arguments used by the statistic that this loss computes.

Returns:

The calculated loss as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.projection.KLProjectionLoss

Bases: fairret.loss.projection.ProjectionLoss

Fairness loss that penalizes the Kullback-Leibler divergence between the predicted distribution and the fair projection of the predictions.

cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

torch_distance_with_logits(pred, proj)

A more numerically stable alternative method to torch_distance(), where pred is assumed to be logits.

Parameters:
  • pred (torch.Tensor) – The predicted distribution as logits, in shape (N,1). As we assume binary classification, this is the logit of the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2) as probabilities. As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.projection.JensenShannonProjectionLoss

Bases: fairret.loss.projection.ProjectionLoss

Fairness loss that penalizes the Jensen-Shannon divergence between the predicted distribution and the fair projection of the predictions.

cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.projection.TotalVariationProjectionLoss

Bases: fairret.loss.projection.ProjectionLoss

Fairness loss that penalizes the Total Variation Distance between the predicted distribution and the fair projection of the predictions.

cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.projection.ChiSquaredProjectionLoss

Bases: fairret.loss.projection.ProjectionLoss

Fairness loss that penalizes the Chi-Squared Distance between the predicted distribution and the fair projection of the predictions.

cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor

class fairret.loss.projection.SquaredEuclideanProjectionLoss

Bases: fairret.loss.projection.ProjectionLoss

Fairness loss that penalizes the Squared Euclidean Distance between the predicted distribution and the fair projection of the predictions.

cvxpy_distance(pred, proj)

Compute the statistical distance between pred and proj in cvxpy. Used for the convex optimization problem.

Parameters:
  • pred (cp.Parameter) – The predicted distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

  • proj (cp.Variable) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance in shape (1,).

Return type:

cp.Expression

torch_distance(pred, proj)

Compute the statistical distance between pred and proj in PyTorch. Used for computing the gradient of the distance between the predictions and the projection (with respect to the predictions).

Parameters:
  • pred (torch.Tensor) – The predicted distribution in shape (N,1). As we assume binary classification, this is the probability of the positive class.

  • proj (torch.Tensor) – The projected distribution in shape (N,2). As we assume binary classification, the first column is the probability of the negative class and the second column is the probability of the positive class.

Returns:

The statistical distance as a scalar tensor.

Return type:

torch.Tensor