PBM

class humancompatible.train.dual_optim.PBM(m: int = None, penalty_mult: float = 0.1, gamma: float = 0.1, delta: float = 1.0, penalty_update: str = 'dimin_adapt', *, pbf: str = 'quadratic_logarithmic', init_duals: float | Tensor = None, init_penalties: float | Tensor = None, dual_range: Tuple[float, float] = (0.0001, 100.0), penalty_range: Tuple[float, float] = (0.1, 2.0), device=None, primal_update_process_length=1)

A Dual Optimizer that works on the dual maximization tasks according to the Penalty-Barrier Method rule. Creates and updates dual variables. Reference: https://doi.org/10.48550/arXiv.2605.18618

Note

Natively, this method only supports inequality constraints (see reference). However, it is easy to transform one into the other:

\[g(x) = |h(x)| \leq 0\]

We suggest using a small tolerance parameter on the right-hand side instead of 0.

Parameters:
  • m (int) – Number of constraints (determines the number of dual variables to create)

  • penalty_mult (float) – Multiplier for penalty update (K1 or K2). For K2 (adaptive penalty update), values close to 1 correspond to a high “momentum”.

  • gamma (float) – Multiplier for dual parameter update. Values close to 1 correspond to a high “momentum”.

  • delta (float) – Violation/satisfaction parameter for penalty update; values > 1 make the penalties decrease faster on violated constraints and vice versa.

  • penalty_update (str) – Penalty update strategy; must be one of dimin,`dimin_dual`,`dimin_adapt`,`const`. Defaults to`dimin_adapt`.

  • pbf (str) – Penalty-Barrier Function to use. Must be one of quadratic_logarithmic,`quadratic_reciprocal`

  • init_duals (float | Tensor) – Initial values for the dual variables. Defaults to dual lower bound for all.

  • init_penalties (float | Tensor) – Initial values for the penalty variables. Defaults to the penalty upper bound for all.

  • dual_range (Tuple[float, float]) – Safeguarding range for dual variables; they will be`clamp`-ed to this range.

add_constraint_group(m: int, penalty_mult: float = None, penalty_update: str = None, delta: float = None, pbf: str = None, init_duals: float | Tensor = None, init_penalties: float | Tensor = None, *, momentum: float = None, primal_update_process_length: int = 1) None

Adds an additional group of dual variables with separate hyperparameters and barrier functions.

Parameters:
  • m (int) – Number of constraints in this group (determines the number of dual variables to add)

  • penalty_mult (float) – Multiplier for penalty update (K1 or K2). If None, inherits from parent. For adaptive penalty update, values close to 1 correspond to high “momentum”.

  • penalty_update (str) – Penalty update strategy; must be one of dimin, dimin_dual, dimin_adapt, const. If None, defaults to dimin.

  • delta (float) – Violation/satisfaction parameter for penalty update. If None, inherits from parent.

  • pbf (str) – Penalty-Barrier Function to use. Must be one of quadratic_logarithmic, quadratic_reciprocal.

  • init_duals (float | Tensor) – Initial values for the dual variables in this group. Defaults to dual lower bound for all.

  • init_penalties (float | Tensor) – Initial values for the penalty variables in this group. Defaults to penalty upper bound for all.

  • momentum (float) – Multiplier for dual parameter update in this group. Values close to 1 correspond to high “momentum”. If None, inherits from parent.

  • primal_update_process_length (int) – Length of the primal update process for this group. If 1 (default), uses original algorithm variant.

property duals: Tensor

Returns all dual variables concatenated from all constraint groups.

Returns:

Dual variables, concatenated into a single tensor.

Return type:

Tensor

forward(loss: Tensor, constraints: Tensor) Tensor

Computes the Penalty-Barrier Lagrangian value for the given loss and constraints.

Parameters:
  • loss (torch.Tensor) – Loss (objective function) value.

  • constraints (torch.Tensor) – Tensor of constraint violations.

Returns:

Penalty-Barrier Lagrangian value.

Return type:

torch.Tensor

forward_update(loss: Tensor, constraints: Tensor) Tensor

Evaluates the Penalty-Barrier Lagrangian and updates the dual variables and penalties.

Combines the computation of the Lagrangian and the update of dual variables and penalties in a single step.

Parameters:
  • loss (torch.Tensor) – Loss (objective function) value.

  • constraints (torch.Tensor) – Tensor of constraint violations.

Returns:

Penalty-Barrier Lagrangian value.

Return type:

torch.Tensor

load_state_dict(state_dict: dict[str, Any]) None

Loads the optimizer state from a dictionary, including ranges and all constraint groups.

Parameters:

state_dict (dict[str, Any]) – Dictionary containing optimizer state (as returned by state_dict).

Returns:

None

Return type:

None

property penalties: Tensor

Returns all penalty variables concatenated from all constraint groups.

Returns:

Penalties, concatenated into a single tensor.

Return type:

Tensor

state_dict() dict[str, Any]

Returns the state of the optimizer as a dictionary, including dual and penalty ranges and all constraint groups.

Returns:

Dictionary containing optimizer state with param groups and configuration.

Return type:

dict[str, Any]

step(constraints: Tensor) None

Updates the dual variables and penalties based on the current constraint violations.

Parameters:

constraints (torch.Tensor) – Tensor of constraint violations.

Returns:

None

Return type:

None

update(constraints: Tensor) None

Updates the dual variables and penalties based on the current constraint violations.

Parameters:

constraints (torch.Tensor) – Tensor of constraint violations.

Returns:

None

Return type:

None

update_penalties(constraints: Tensor) None

Updates penalties according to the specified penalty update strategy for each constraint group.

Parameters:

constraints (torch.Tensor) – Tensor of constraint violations.

Returns:

None

Return type:

None