Bellman Periods¶

The bellman module turns block models into dynamic stochastic optimization problems (DSOPs). Its module docstring below defines the meta-model notation (arrival states, shocks, pre-decision states, controls, and rewards within a period) used throughout the library, and BellmanPeriod is the central object consumed by the neural network components and loss functions.

Dynamic Stochastic Optimization Problems (DSOPs) built on Block models.

Bellman timing within a period:

[arrival] + [shock] -> [pre] -> [control] -> [post] -> [arrival’]

[arrival] s state on arrival, before any shock
[shock] e exogenous random variable
[pre] m pre-decision state (the control’s iset)
[control] c chosen by the decision rule on m
[post] post-transition output: the bag of variables
realized in the period (m, c, u, b, s’), returned by BellmanPeriod.post_function()
[arrival’] s' next-period arrival state

Reward u and discount b are realized between [control] and [arrival’].

State-variable naming (long / short / informal):

pre-decision: pre_decision_state / pre_state / iset

The Bellman-timing distinction between the post-decision state (a single timing point) and the post-transition bag is conflated in post_function for now, and will be split in a future PR.

A _rule is a user-supplied callable on pre-decision variables; a _function is a callable on arrival states. Module-level df and vf are the decision and value callables; each accepts a single Callable or a dict[str, Callable] (df keyed by control symbol; vf by agent name).

class skagent.bellman.BellmanPeriod(block, discount_variable, calibration, decision_rules=None)¶

A class representing a period of a Bellman or Dynamic Stochastic Optimization Problem.

This class wraps a Block model with calibration parameters and decision rules, providing methods for computing transitions, decisions, rewards, and their gradients.

Parameters:

block (Block) – The underlying block model containing dynamics, shocks, and reward definitions.
discount_variable (str) – A variable name which represents the discount factor for future value streams.
calibration (dict[str, Any]) – Dictionary of calibration parameters for the model.
decision_rules (dict[str, Callable] | None) – Dictionary mapping control variable names to decision rule functions.

block¶

The underlying block model.

Type:: Block

discount_variable¶

The name of the discount factor variable.

Type:: str

calibration¶

The calibration parameters.

Type:: dict[str, Any]

decision_rules¶

The decision rules for control variables.

Type:: dict[str, Callable] | None

arrival_states¶

The set of arrival state variable names.

Type:: set[str]

Notes

Future versions may introduce an abstract base class to support different block types beyond DBlock/RBlock.

compute_controls(df, states, *, shocks=None, parameters=None)¶

Compute control variable values from a decision function or decision rules.

This generalises decision_function to also accept an external callable with signature df(states, shocks, parameters) -> controls.

Parameters:

df (Union[dict[str, Callable], Callable]) – A callable decision function, or a dict of decision rules passed through to decision_function.
states (dict[str, Any]) – Current state variables.
shocks (dict[str, Any] | None) – Current shock realizations (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).

Returns:

Control variable values.

Return type:

dict[str, Any]

Raises:

TypeError – If df is neither callable nor a dict.

compute_pre_state(control_sym, states, *, shocks=None, parameters=None)¶

Return the pre-decision state values for control_sym as {var: value}. The variables are those in the control’s information set (iset).

If the pre-decision state variables are already in states or shocks, they are taken from there directly. Otherwise block dynamics are run from arrival states up to the control to produce them.

Parameters:

control_sym (str)
states (dict[str, Any])
shocks (dict[str, Any] | None)
parameters (dict[str, Any] | None)

Return type:

dict[str, Any]

compute_value(vf, states, *, shocks=None, parameters=None, agent=None)¶

Compute value-function output at states, parallel to compute_controls().

Accepts a single callable, or a dict {agent: callable} from which agent selects an entry. The selected callable receives arrival states; any pre-decision (iset) computation it needs is the callable’s responsibility.

Parameters:

vf (Union[dict[str, Callable], Callable])
states (dict[str, Any])
shocks (dict[str, Any] | None)
parameters (dict[str, Any] | None)
agent (str | None)

Return type:

Any

decision_function(states, *, shocks=None, parameters=None, decision_rules=None)¶

Compute control variable values from decision rules.

Parameters:

states (dict[str, Any]) – Current state variables.
shocks (dict[str, Any] | None) – Current shock realizations (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
decision_rules (dict[str, Callable] | None) – Decision rules (defaults to instance decision_rules).

Returns:

Control variable values computed from decision rules.

Return type:

dict[str, Any]

get_arrival_states(calibration=None)¶

Get arrival state variable names for given calibration.

Parameters:: calibration (dict[str, Any] | None)
Return type:: set[str]

get_controls()¶

Get control variables from the block.

Return type:: dict[str, Any]

get_reward_sym(agent=None)¶

Return the first reward symbol for agent (or any agent if agent is None).

If multiple reward symbols match, only the first is returned. Models with multiple rewards per agent are not currently supported.

Raises:: ValueError – If no reward variables match the given agent.
Parameters:: agent (str | None)
Return type:: str

get_reward_syms(agent=None)¶

Return all reward symbols for agent (or all agents if agent is None).

Parameters:: agent (str | None) – If specified, only return reward symbols for this agent.
Raises:: ValueError – If no reward variables match the given agent.
Return type:: list[str]

get_shocks()¶

Get shock distributions from the block.

Return type:: dict[str, Any]

grad_pre_state_function(states, wrt, *, shocks=None, parameters=None, control_sym=None, create_graph=False)¶

Compute gradients of pre-decision state variables with respect to arrival states.

This computes ∂m/∂s for each pre-decision state variable m and each arrival state s specified in wrt. This is needed for the envelope condition in dynamic programming, where the marginal value of an arrival state depends on how that state transforms through the dynamics before reaching the control.

The “pre-decision state” (or “pre-state”) is the state that exists immediately before the control decision is made. For example, cash-on-hand m = Ra + y is the pre-state before consumption c is chosen.

By the envelope theorem:: V’(s) = u’(c) * ∂m/∂s

where m is the pre-decision state variable that the control depends on.

For example, in a consumption-saving model with m = a*R + y:: ∂m/∂a = R (the return on assets)

Parameters:

states (dict[str, Any]) – Arrival state variables (with requires_grad=True for gradient computation).
wrt (dict[str, Tensor]) – Dictionary of arrival states to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.
shocks (dict[str, Any] | None) – Shock variables (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
control_sym (str | None) – Name of the control variable whose info-set we want gradients for. If None, uses the first control found in the block.
create_graph (bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.

Returns:

Nested dictionary of gradients for each pre-state variable and arrival state: {pre_state_var: {state_sym: gradient}}. The gradient is a zero tensor when the pre-state variable does not depend on the arrival state.

Return type:

dict[str, dict[str, Tensor]]

grad_reward_function(states, controls, wrt, *, shocks=None, parameters=None, agent=None, decision_rules=None, create_graph=False)¶

Compute gradients of reward function with respect to specified variables.

Parameters:

states (dict[str, Any]) – State variables.
controls (dict[str, Any]) – Control variables.
wrt (dict[str, Tensor]) – Dictionary of variables to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.
shocks (dict[str, Any] | None) – Shock variables (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
agent (str | None) – If specified, only compute gradients for rewards belonging to this agent.
decision_rules (dict[str, Callable] | None) – Decision rules of control variables that will _not_ be given to the function.
create_graph (bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.

Returns:

Nested dictionary of gradients for each reward symbol and variable: {reward_sym: {var_name: gradient}}. The gradient is a zero tensor when the reward does not depend on the variable.

Return type:

dict[str, dict[str, Tensor]]

grad_transition_function(states, controls, wrt, *, shocks=None, parameters=None, decision_rules=None, create_graph=False)¶

Compute gradients of transition function with respect to specified variables.

This computes ∂s_{t+1}/∂x for each arrival state s_{t+1} and each variable x specified in wrt. This is needed for Euler equations where the gradient of future states with respect to current controls appears (e.g., ∂a_{t+1}/∂c_t = -1 for the budget constraint a_{t+1} = m_t - c_t).

Parameters:

states (dict[str, Any]) – State variables.
controls (dict[str, Any]) – Control variables.
wrt (dict[str, Tensor]) – Dictionary of variables to compute gradients with respect to. Keys are variable names, values are tensors with requires_grad=True.
shocks (dict[str, Any] | None) – Shock variables (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
decision_rules (dict[str, Callable] | None) – Decision rules of control variables that will _not_ be given to the function.
create_graph (bool) – If True, the graph of the derivative is constructed, allowing higher-order derivatives and end-to-end training through the gradient computation.

Returns:

Nested dictionary of gradients for each arrival state and variable: {state_sym: {var_name: gradient}}. The gradient is a zero tensor when the arrival state does not depend on the variable.

Return type:

dict[str, dict[str, Tensor]]

post_function(states, controls, *, shocks=None, parameters=None, agent=None, decision_rules=None)¶

Return the full ex post variables for the period.

Parameters:

states (dict[str, Any]) – Current state variables.
controls (dict[str, Any]) – Current control variable values.
shocks (dict[str, Any] | None) – Current shock realizations (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
agent (str | None) – Agent identifier (currently unused, reserved for future use).
decision_rules (dict[str, Callable] | None) – Decision rules (defaults to instance decision_rules).

Returns:

All computed variables from the block transition.

Return type:

dict[str, Any]

resolve_discount_factor(post)¶

Return post[self.discount_variable], raising KeyError with a diagnostic message if the discount variable is missing. Expects the post-transition output returned by post_function().

Parameters:: post (dict[str, Any])
Return type:: Any

reward_function(states, controls, *, shocks=None, parameters=None, agent=None, decision_rules=None)¶

Compute reward values for the current period.

Parameters:

states (dict[str, Any]) – Current state variables.
controls (dict[str, Any]) – Current control variable values.
shocks (dict[str, Any] | None) – Current shock realizations (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
agent (str | None) – If specified, only return rewards for this agent.
decision_rules (dict[str, Callable] | None) – Decision rules (defaults to instance decision_rules).

Returns:

Reward values for the period.

Return type:

dict[str, Any]

transition_function(states, controls, *, shocks=None, parameters=None, decision_rules=None)¶

Compute the transition to next-period arrival states.

Parameters:

states (dict[str, Any]) – Current state variables.
controls (dict[str, Any]) – Current control variable values.
shocks (dict[str, Any] | None) – Current shock realizations (defaults to empty dict).
parameters (dict[str, Any] | None) – Model parameters (defaults to instance calibration).
decision_rules (dict[str, Callable] | None) – Decision rules (defaults to instance decision_rules).

Returns:

Next-period arrival state values.

Return type:

dict[str, Any]

skagent.bellman.estimate_bellman_foc_residual(bellman_period, vf, df, states_t, shocks, parameters=None, agent=None)¶

Compute the first-order condition (FOC) residual from the Bellman equation.

The Bellman equation is:

\[V(s) = \max_c \{ u(s,c,\varepsilon) + \beta E_{\varepsilon'}[V(s')] \}\]

The FOC w.r.t. each control \(c_j\) is:

\[\frac{\partial u}{\partial c_j} + \beta \sum_s \frac{\partial V(s')}{\partial s'_s} \cdot \frac{\partial s'_s}{\partial c_j} = 0\]

Adding a weighted FOC term to the Bellman loss improves convergence (Maliar et al. 2021, equation 14).

Unlike estimate_euler_residual(), which replaces \(V'(s')\) with the envelope condition \(u'(c') \cdot \partial m'/\partial s'\) (where \(m'\) is the next-period pre-decision state), this function differentiates the value callable directly.

Parameters:

bellman_period (BellmanPeriod) – The Bellman period.
vf (Union[dict[str, Callable], Callable]) – Value function on arrival states; either a single callable or a per-agent dict. See estimate_bellman_residual() for the full contract.
df (Union[dict[str, Callable], Callable]) – Decision callable on arrival states, or dict of decision rules.
states_t (dict[str, Any]) – Current arrival state variables.
shocks (dict[str, Any]) – Shock realizations with {sym}_0 and {sym}_1 keys.
parameters (dict[str, Any] | None) – Model parameters.
agent (str | None) – Agent identifier.

Returns:

Mapping from each control symbol to its FOC residual tensor.

Return type:

dict[str, Tensor]

skagent.bellman.estimate_bellman_residual(bellman_period, vf, df, states_t, shocks, parameters=None, agent=None)¶

Computes the Bellman equation residual for given states and shocks.

The Bellman equation is:

\[\begin{split}V(s) = \\max_c \\{ u(s,c,\\varepsilon) + \\beta E_{\\varepsilon'}[V(s')] \\}\end{split}\]

This function computes the residual:

\[\begin{split}f = V(s) - [u(s,c,\\varepsilon) + \\beta V(s')]\end{split}\]

where \(s' = f(s,c,\\varepsilon)\) and \(V(s')\) is evaluated at a specific future shock realization \(\\varepsilon'\).

Parameters:

bellman_period (BellmanPeriod) – The Bellman period with transitions, rewards, etc. The discount factor is extracted from the post-transition variables via bellman_period.discount_variable.
vf (Union[dict[str, Callable], Callable]) – Value function vf(states_t, shocks_t, parameters) -> tensor on arrival states, or a dict mapping agent name to such a callable for multi-agent models (in which case agent must be specified). Any pre-decision (iset) computation the underlying approximator needs is the callable’s responsibility.
df (Union[dict[str, Callable], Callable]) – Decision callable df(states_t, shocks_t, parameters) -> controls_t on arrival states, or a dict of decision rules keyed by control symbol (callables on the iset).
states_t (dict[str, Any]) – Current arrival state variables.
shocks (dict[str, Any]) – Shock realizations for both periods: - {shock_sym}_0: period t shocks (for immediate reward and transitions) - {shock_sym}_1: period t+1 shocks (for continuation value evaluation)
parameters (dict[str, Any] | None) – Model parameters for calibration (defaults to empty dict).
agent (str | None) – Agent identifier for rewards.

Returns:

Bellman equation residual.

Return type:

Tensor

Raises:

ValueError – If no reward variables are found in the block.
KeyError – If required shock variables are missing from the shocks dict.

Notes

Single-reward, multi-control: this function returns a single residual tensor, evaluated against the first reward variable matching agent. For multi-control models it complements estimate_euler_residual() (which returns one residual per control) and estimate_bellman_foc_residual() (which returns one FOC residual per control by differentiating the value callable).

skagent.bellman.estimate_discounted_lifetime_reward(bellman_period, dr, states_0, big_t, shocks_by_t=None, parameters=None, agent=None)¶

Compute the discounted lifetime reward for a model given a fixed T of periods to simulate forward.

Based on Maliar, Maliar, and Winant (2021, JME).

Parameters:

bellman_period (BellmanPeriod) – The Bellman period object containing the model. The discount factor is extracted from the post-transition variables via bellman_period.discount_variable.
dr (Union[dict[str, Callable], Callable]) – Decision rules (dict of functions), or a decision function that returns the decisions given states, shocks, and parameters.
states_0 (dict[str, Any]) – Initial states as a dictionary mapping symbols to values. Both scalar and vector values are supported.
big_t (int) – Number of time steps to simulate forward.
shocks_by_t (dict[str, Any] | None) – Dictionary mapping shock symbols to arrays of shock values at each time period. The first axis must have length big_t; remaining axes are batch dimensions (e.g., shape (big_t, n_samples)).
parameters (dict[str, Any] | None) – Calibration parameters (defaults to empty dict).
agent (str | None) – Name of reference agent for rewards. If None, all rewards are summed.

Returns:

The total discounted lifetime reward.

Return type:

float | Tensor

skagent.bellman.estimate_euler_residual(bellman_period, df, states_t, shocks, parameters=None, agent=None, controls_t=None)¶

Compute the Euler equation residual for given states and shocks.

The Euler equation is the first-order condition from the Bellman equation, relating marginal rewards across periods. For each control variable \(c_j\), this function computes the residual:

\[f_j = u'(c_{j,t}) + \beta \cdot u'(c_{j,t+1}) \cdot \sum_s \left[ \frac{\partial s_{t+1}}{\partial c_{j,t}} \cdot \frac{\partial m'_j}{\partial s_{t+1}} \right]\]

At optimality \(f_j = 0\) for every control \(j\).

The discount factor \(\beta\) is obtained from the model via bellman_period.discount_variable.

Following Maliar et al. (2021, JME) Definition 2.7, this function uses two independent shock realizations (AiO expectation operator).

Parameters:

bellman_period (BellmanPeriod) – The Bellman period with transitions, rewards, etc. The discount factor is extracted from the post-transition variables via bellman_period.discount_variable.
df (Union[dict[str, Callable], Callable]) – Decision function or dict of decision rules.
states_t (dict[str, Any]) – Current state variables (arrival states).
shocks (dict[str, Any]) – Shock realizations for both periods ({sym}_0 and {sym}_1).
parameters (dict[str, Any] | None) – Model parameters for calibration.
agent (str | None) – Agent identifier for rewards.
controls_t (dict[str, Any] | None) – Pre-computed period-t controls. When provided, the function skips its internal compute_controls call for period t. This is used by EulerEquationLoss to share the same control tensors between the residual computation and the constraint slack computation.

Returns:

Mapping from each control symbol to its Euler residual tensor.

Return type:

dict[str, Tensor]