Bellman Periods¶
The bellman module turns block models into dynamic stochastic optimization
problems (DSOPs). Its module docstring below defines the meta-model notation
(arrival states, shocks, pre-decision states, controls, and rewards within a
period) used throughout the library, and BellmanPeriod is the central object
consumed by the neural network components and loss functions.
Dynamic Stochastic Optimization Problems (DSOPs) built on Block models.
Bellman timing within a period:
[arrival] + [shock] -> [pre] -> [control] -> [post] -> [arrival’]
[arrival]
sstate on arrival, before any shock[shock]
eexogenous random variable[pre]
mpre-decision state (the control’s iset)[control]
cchosen by the decision rule on m- [post] post-transition output: the bag of variables
realized in the period (m, c, u, b, s’), returned by
BellmanPeriod.post_function()
[arrival’]
s'next-period arrival state
Reward u and discount b are realized between [control] and
[arrival’].
State-variable naming (long / short / informal):
pre-decision:
pre_decision_state/pre_state/iset
The Bellman-timing distinction between the post-decision state (a
single timing point) and the post-transition bag is conflated in
post_function for now, and will be split in a future PR.
A _rule is a user-supplied callable on pre-decision variables; a
_function is a callable on arrival states. Module-level df
and vf are the decision and value callables; each accepts a
single Callable or a dict[str, Callable] (df keyed by
control symbol; vf by agent name).
- class skagent.bellman.BellmanPeriod(block, discount_variable, calibration, decision_rules=None)¶
A class representing a period of a Bellman or Dynamic Stochastic Optimization Problem.
This class wraps a Block model with calibration parameters and decision rules, providing methods for computing transitions, decisions, rewards, and their gradients.
- Parameters:
block (
Block) – The underlying block model containing dynamics, shocks, and reward definitions.discount_variable (
str) – A variable name which represents the discount factor for future value streams.calibration (
dict[str,Any]) – Dictionary of calibration parameters for the model.decision_rules (
dict[str,Callable] |None) – Dictionary mapping control variable names to decision rule functions.
- block¶
The underlying block model.
- Type:
Block
Notes
Future versions may introduce an abstract base class to support different block types beyond DBlock/RBlock.
- compute_controls(df, states, *, shocks=None, parameters=None)¶
Compute control variable values from a decision function or decision rules.
This generalises
decision_functionto also accept an external callable with signaturedf(states, shocks, parameters) -> controls.- Parameters:
df (
Union[dict[str,Callable],Callable]) – A callable decision function, or a dict of decision rules passed through todecision_function.shocks (
dict[str,Any] |None) – Current shock realizations (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).
- Returns:
Control variable values.
- Return type:
- Raises:
TypeError – If df is neither callable nor a dict.
- compute_pre_state(control_sym, states, *, shocks=None, parameters=None)¶
Return the pre-decision state values for control_sym as
{var: value}. The variables are those in the control’s information set (iset).If the pre-decision state variables are already in states or shocks, they are taken from there directly. Otherwise block dynamics are run from arrival states up to the control to produce them.
- compute_value(vf, states, *, shocks=None, parameters=None, agent=None)¶
Compute value-function output at states, parallel to
compute_controls().Accepts a single callable, or a dict
{agent: callable}from which agent selects an entry. The selected callable receives arrival states; any pre-decision (iset) computation it needs is the callable’s responsibility.
- decision_function(states, *, shocks=None, parameters=None, decision_rules=None)¶
Compute control variable values from decision rules.
- Parameters:
- Returns:
Control variable values computed from decision rules.
- Return type:
- get_arrival_states(calibration=None)¶
Get arrival state variable names for given calibration.
- get_reward_sym(agent=None)¶
Return the first reward symbol for agent (or any agent if agent is None).
If multiple reward symbols match, only the first is returned. Models with multiple rewards per agent are not currently supported.
- Raises:
ValueError – If no reward variables match the given agent.
- Parameters:
- Return type:
- get_reward_syms(agent=None)¶
Return all reward symbols for agent (or all agents if agent is None).
- Parameters:
agent (
str|None) – If specified, only return reward symbols for this agent.- Raises:
ValueError – If no reward variables match the given agent.
- Return type:
- grad_pre_state_function(states, wrt, *, shocks=None, parameters=None, control_sym=None, create_graph=False)¶
Compute gradients of pre-decision state variables with respect to arrival states.
This computes ∂m/∂s for each pre-decision state variable m and each arrival state s specified in wrt. This is needed for the envelope condition in dynamic programming, where the marginal value of an arrival state depends on how that state transforms through the dynamics before reaching the control.
The “pre-decision state” (or “pre-state”) is the state that exists immediately before the control decision is made. For example, cash-on-hand m = Ra + y is the pre-state before consumption c is chosen.
- By the envelope theorem:
V’(s) = u’(c) * ∂m/∂s
where m is the pre-decision state variable that the control depends on.
- For example, in a consumption-saving model with m = a*R + y:
∂m/∂a = R (the return on assets)
- Parameters:
states (
dict[str,Any]) – Arrival state variables (with requires_grad=True for gradient computation).wrt (
dict[str,Tensor]) – Dictionary of arrival states to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.shocks (
dict[str,Any] |None) – Shock variables (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).control_sym (
str|None) – Name of the control variable whose info-set we want gradients for. If None, uses the first control found in the block.create_graph (
bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.
- Returns:
Nested dictionary of gradients for each pre-state variable and arrival state: {pre_state_var: {state_sym: gradient}}. The gradient is a zero tensor when the pre-state variable does not depend on the arrival state.
- Return type:
- grad_reward_function(states, controls, wrt, *, shocks=None, parameters=None, agent=None, decision_rules=None, create_graph=False)¶
Compute gradients of reward function with respect to specified variables.
- Parameters:
wrt (
dict[str,Tensor]) – Dictionary of variables to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.shocks (
dict[str,Any] |None) – Shock variables (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).agent (
str|None) – If specified, only compute gradients for rewards belonging to this agent.decision_rules (
dict[str,Callable] |None) – Decision rules of control variables that will _not_ be given to the function.create_graph (
bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.
- Returns:
Nested dictionary of gradients for each reward symbol and variable: {reward_sym: {var_name: gradient}}. The gradient is a zero tensor when the reward does not depend on the variable.
- Return type:
- grad_transition_function(states, controls, wrt, *, shocks=None, parameters=None, decision_rules=None, create_graph=False)¶
Compute gradients of transition function with respect to specified variables.
This computes ∂s_{t+1}/∂x for each arrival state s_{t+1} and each variable x specified in wrt. This is needed for Euler equations where the gradient of future states with respect to current controls appears (e.g., ∂a_{t+1}/∂c_t = -1 for the budget constraint a_{t+1} = m_t - c_t).
- Parameters:
wrt (
dict[str,Tensor]) – Dictionary of variables to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.shocks (
dict[str,Any] |None) – Shock variables (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).decision_rules (
dict[str,Callable] |None) – Decision rules of control variables that will _not_ be given to the function.create_graph (
bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.
- Returns:
Nested dictionary of gradients for each arrival state and variable: {state_sym: {var_name: gradient}}. The gradient is a zero tensor when the arrival state does not depend on the variable.
- Return type:
- post_function(states, controls, *, shocks=None, parameters=None, agent=None, decision_rules=None)¶
Return the full ex post variables for the period.
- Parameters:
controls (
dict[str,Any]) – Current control variable values.shocks (
dict[str,Any] |None) – Current shock realizations (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).agent (
str|None) – Agent identifier (currently unused, reserved for future use).decision_rules (
dict[str,Callable] |None) – Decision rules (defaults to instance decision_rules).
- Returns:
All computed variables from the block transition.
- Return type:
- resolve_discount_factor(post)¶
Return
post[self.discount_variable], raisingKeyErrorwith a diagnostic message if the discount variable is missing. Expects the post-transition output returned bypost_function().
- reward_function(states, controls, *, shocks=None, parameters=None, agent=None, decision_rules=None)¶
Compute reward values for the current period.
- Parameters:
controls (
dict[str,Any]) – Current control variable values.shocks (
dict[str,Any] |None) – Current shock realizations (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).agent (
str|None) – If specified, only return rewards for this agent.decision_rules (
dict[str,Callable] |None) – Decision rules (defaults to instance decision_rules).
- Returns:
Reward values for the period.
- Return type:
- transition_function(states, controls, *, shocks=None, parameters=None, decision_rules=None)¶
Compute the transition to next-period arrival states.
- Parameters:
controls (
dict[str,Any]) – Current control variable values.shocks (
dict[str,Any] |None) – Current shock realizations (defaults to empty dict).parameters (
dict[str,Any] |None) – Model parameters (defaults to instance calibration).decision_rules (
dict[str,Callable] |None) – Decision rules (defaults to instance decision_rules).
- Returns:
Next-period arrival state values.
- Return type:
- skagent.bellman.estimate_bellman_foc_residual(bellman_period, vf, df, states_t, shocks, parameters=None, agent=None)¶
Compute the first-order condition (FOC) residual from the Bellman equation.
The Bellman equation is:
\[V(s) = \max_c \{ u(s,c,\varepsilon) + \beta E_{\varepsilon'}[V(s')] \}\]The FOC w.r.t. each control \(c_j\) is:
\[\frac{\partial u}{\partial c_j} + \beta \sum_s \frac{\partial V(s')}{\partial s'_s} \cdot \frac{\partial s'_s}{\partial c_j} = 0\]Adding a weighted FOC term to the Bellman loss improves convergence (Maliar et al. 2021, equation 14).
Unlike
estimate_euler_residual(), which replaces \(V'(s')\) with the envelope condition \(u'(c') \cdot \partial m'/\partial s'\) (where \(m'\) is the next-period pre-decision state), this function differentiates the value callable directly.- Parameters:
bellman_period (
BellmanPeriod) – The Bellman period.vf (
Union[dict[str,Callable],Callable]) – Value function on arrival states; either a single callable or a per-agent dict. Seeestimate_bellman_residual()for the full contract.df (
Union[dict[str,Callable],Callable]) – Decision callable on arrival states, or dict of decision rules.states_t (
dict[str,Any]) – Current arrival state variables.shocks (
dict[str,Any]) – Shock realizations with{sym}_0and{sym}_1keys.
- Returns:
Mapping from each control symbol to its FOC residual tensor.
- Return type:
- skagent.bellman.estimate_bellman_residual(bellman_period, vf, df, states_t, shocks, parameters=None, agent=None)¶
Computes the Bellman equation residual for given states and shocks.
The Bellman equation is:
\[\begin{split}V(s) = \\max_c \\{ u(s,c,\\varepsilon) + \\beta E_{\\varepsilon'}[V(s')] \\}\end{split}\]This function computes the residual:
\[\begin{split}f = V(s) - [u(s,c,\\varepsilon) + \\beta V(s')]\end{split}\]where \(s' = f(s,c,\\varepsilon)\) and \(V(s')\) is evaluated at a specific future shock realization \(\\varepsilon'\).
- Parameters:
bellman_period (
BellmanPeriod) – The Bellman period with transitions, rewards, etc. The discount factor is extracted from the post-transition variables viabellman_period.discount_variable.vf (
Union[dict[str,Callable],Callable]) – Value functionvf(states_t, shocks_t, parameters) -> tensoron arrival states, or a dict mappingagentname to such a callable for multi-agent models (in which caseagentmust be specified). Any pre-decision (iset) computation the underlying approximator needs is the callable’s responsibility.df (
Union[dict[str,Callable],Callable]) – Decision callabledf(states_t, shocks_t, parameters) -> controls_ton arrival states, or a dict of decision rules keyed by control symbol (callables on the iset).states_t (
dict[str,Any]) – Current arrival state variables.shocks (
dict[str,Any]) – Shock realizations for both periods: - {shock_sym}_0: period t shocks (for immediate reward and transitions) - {shock_sym}_1: period t+1 shocks (for continuation value evaluation)parameters (
dict[str,Any] |None) – Model parameters for calibration (defaults to empty dict).
- Returns:
Bellman equation residual.
- Return type:
- Raises:
ValueError – If no reward variables are found in the block.
KeyError – If required shock variables are missing from the shocks dict.
Notes
Single-reward, multi-control: this function returns a single residual tensor, evaluated against the first reward variable matching
agent. For multi-control models it complementsestimate_euler_residual()(which returns one residual per control) andestimate_bellman_foc_residual()(which returns one FOC residual per control by differentiating the value callable).
- skagent.bellman.estimate_discounted_lifetime_reward(bellman_period, dr, states_0, big_t, shocks_by_t=None, parameters=None, agent=None)¶
Compute the discounted lifetime reward for a model given a fixed T of periods to simulate forward.
Based on Maliar, Maliar, and Winant (2021, JME).
- Parameters:
bellman_period (
BellmanPeriod) – The Bellman period object containing the model. The discount factor is extracted from the post-transition variables viabellman_period.discount_variable.dr (
Union[dict[str,Callable],Callable]) – Decision rules (dict of functions), or a decision function that returns the decisions given states, shocks, and parameters.states_0 (
dict[str,Any]) – Initial states as a dictionary mapping symbols to values. Both scalar and vector values are supported.big_t (
int) – Number of time steps to simulate forward.shocks_by_t (
dict[str,Any] |None) – Dictionary mapping shock symbols to arrays of shock values at each time period. The first axis must have lengthbig_t; remaining axes are batch dimensions (e.g., shape(big_t, n_samples)).parameters (
dict[str,Any] |None) – Calibration parameters (defaults to empty dict).agent (
str|None) – Name of reference agent for rewards. If None, all rewards are summed.
- Returns:
The total discounted lifetime reward.
- Return type:
- skagent.bellman.estimate_euler_residual(bellman_period, df, states_t, shocks, parameters=None, agent=None, controls_t=None)¶
Compute the Euler equation residual for given states and shocks.
The Euler equation is the first-order condition from the Bellman equation, relating marginal rewards across periods. For each control variable \(c_j\), this function computes the residual:
\[f_j = u'(c_{j,t}) + \beta \cdot u'(c_{j,t+1}) \cdot \sum_s \left[ \frac{\partial s_{t+1}}{\partial c_{j,t}} \cdot \frac{\partial m'_j}{\partial s_{t+1}} \right]\]At optimality \(f_j = 0\) for every control \(j\).
The discount factor \(\beta\) is obtained from the model via
bellman_period.discount_variable.Following Maliar et al. (2021, JME) Definition 2.7, this function uses two independent shock realizations (AiO expectation operator).
- Parameters:
bellman_period (
BellmanPeriod) – The Bellman period with transitions, rewards, etc. The discount factor is extracted from the post-transition variables viabellman_period.discount_variable.df (
Union[dict[str,Callable],Callable]) – Decision function or dict of decision rules.states_t (
dict[str,Any]) – Current state variables (arrival states).shocks (
dict[str,Any]) – Shock realizations for both periods ({sym}_0and{sym}_1).parameters (
dict[str,Any] |None) – Model parameters for calibration.controls_t (
dict[str,Any] |None) – Pre-computed period-t controls. When provided, the function skips its internalcompute_controlscall for period t. This is used byEulerEquationLossto share the same control tensors between the residual computation and the constraint slack computation.
- Returns:
Mapping from each control symbol to its Euler residual tensor.
- Return type: