PBM
- class humancompatible.train.dual_optim.PBM(m: int = None, penalty_mult: float = 0.1, gamma: float = 0.1, delta: float = 1.0, penalty_update: str = 'dimin_adapt', *, pbf: str = 'quadratic_logarithmic', init_duals: float | Tensor = None, init_penalties: float | Tensor = None, dual_range: Tuple[float, float] = (0.0001, 100.0), penalty_range: Tuple[float, float] = (0.1, 2.0), device=None, primal_update_process_length=1)
A Dual Optimizer that works on the dual maximization tasks according to the Penalty-Barrier Method rule. Creates and updates dual variables. Reference: https://doi.org/10.48550/arXiv.2605.18618
Note
Natively, this method only supports inequality constraints (see reference). However, it is easy to transform one into the other:
\[g(x) = |h(x)| \leq 0\]We suggest using a small tolerance parameter on the right-hand side instead of 0.
- Parameters:
m (int) – Number of constraints (determines the number of dual variables to create)
penalty_mult (float) – Multiplier for penalty update (K1 or K2). For K2 (adaptive penalty update), values close to 1 correspond to a high “momentum”.
gamma (float) – Multiplier for dual parameter update. Values close to 1 correspond to a high “momentum”.
delta (float) – Violation/satisfaction parameter for penalty update; values > 1 make the penalties decrease faster on violated constraints and vice versa.
penalty_update (str) – Penalty update strategy; must be one of dimin,`dimin_dual`,`dimin_adapt`,`const`. Defaults to`dimin_adapt`.
pbf (str) – Penalty-Barrier Function to use. Must be one of quadratic_logarithmic,`quadratic_reciprocal`
init_duals (float | Tensor) – Initial values for the dual variables. Defaults to dual lower bound for all.
init_penalties (float | Tensor) – Initial values for the penalty variables. Defaults to the penalty upper bound for all.
dual_range (Tuple[float, float]) – Safeguarding range for dual variables; they will be`clamp`-ed to this range.
- add_constraint_group(m: int, penalty_mult: float = None, penalty_update: str = None, delta: float = None, pbf: str = None, init_duals: float | Tensor = None, init_penalties: float | Tensor = None, *, momentum: float = None, primal_update_process_length: int = 1) None
Adds an additional group of dual variables with separate hyperparameters and barrier functions.
- Parameters:
m (int) – Number of constraints in this group (determines the number of dual variables to add)
penalty_mult (float) – Multiplier for penalty update (K1 or K2). If None, inherits from parent. For adaptive penalty update, values close to 1 correspond to high “momentum”.
penalty_update (str) – Penalty update strategy; must be one of dimin, dimin_dual, dimin_adapt, const. If None, defaults to dimin.
delta (float) – Violation/satisfaction parameter for penalty update. If None, inherits from parent.
pbf (str) – Penalty-Barrier Function to use. Must be one of quadratic_logarithmic, quadratic_reciprocal.
init_duals (float | Tensor) – Initial values for the dual variables in this group. Defaults to dual lower bound for all.
init_penalties (float | Tensor) – Initial values for the penalty variables in this group. Defaults to penalty upper bound for all.
momentum (float) – Multiplier for dual parameter update in this group. Values close to 1 correspond to high “momentum”. If None, inherits from parent.
primal_update_process_length (int) – Length of the primal update process for this group. If 1 (default), uses original algorithm variant.
- property duals: Tensor
Returns all dual variables concatenated from all constraint groups.
- Returns:
Dual variables, concatenated into a single tensor.
- Return type:
Tensor
- forward(loss: Tensor, constraints: Tensor) Tensor
Computes the Penalty-Barrier Lagrangian value for the given loss and constraints.
- Parameters:
loss (torch.Tensor) – Loss (objective function) value.
constraints (torch.Tensor) – Tensor of constraint violations.
- Returns:
Penalty-Barrier Lagrangian value.
- Return type:
torch.Tensor
- forward_update(loss: Tensor, constraints: Tensor) Tensor
Evaluates the Penalty-Barrier Lagrangian and updates the dual variables and penalties.
Combines the computation of the Lagrangian and the update of dual variables and penalties in a single step.
- Parameters:
loss (torch.Tensor) – Loss (objective function) value.
constraints (torch.Tensor) – Tensor of constraint violations.
- Returns:
Penalty-Barrier Lagrangian value.
- Return type:
torch.Tensor
- load_state_dict(state_dict: dict[str, Any]) None
Loads the optimizer state from a dictionary, including ranges and all constraint groups.
- Parameters:
state_dict (dict[str, Any]) – Dictionary containing optimizer state (as returned by state_dict).
- Returns:
None
- Return type:
None
- property penalties: Tensor
Returns all penalty variables concatenated from all constraint groups.
- Returns:
Penalties, concatenated into a single tensor.
- Return type:
Tensor
- state_dict() dict[str, Any]
Returns the state of the optimizer as a dictionary, including dual and penalty ranges and all constraint groups.
- Returns:
Dictionary containing optimizer state with param groups and configuration.
- Return type:
dict[str, Any]
- step(constraints: Tensor) None
Updates the dual variables and penalties based on the current constraint violations.
- Parameters:
constraints (torch.Tensor) – Tensor of constraint violations.
- Returns:
None
- Return type:
None
- update(constraints: Tensor) None
Updates the dual variables and penalties based on the current constraint violations.
- Parameters:
constraints (torch.Tensor) – Tensor of constraint violations.
- Returns:
None
- Return type:
None
- update_penalties(constraints: Tensor) None
Updates penalties according to the specified penalty update strategy for each constraint group.
- Parameters:
constraints (torch.Tensor) – Tensor of constraint violations.
- Returns:
None
- Return type:
None