Models

The skagent.models subpackage contains predefined economic models. For a narrative overview of the benchmark registry and how to call it, see the Benchmark Models guide; for runnable, plotted walkthroughs see the Examples.

Consumer Models

Benchmark Registry

The benchmark registry catalogues discrete-time dynamic programming problems with known closed-form policies (plus a few that are kept for numerical validation). Its public helpers and analytical-policy functions are listed below.

Analytically Solvable Consumption-Savings Models

This module implements a collection of discrete-time consumption-savings dynamic programming problems for which the literature has succeeded in writing down true closed-form policies. These represent well-known benchmark problems from the economic literature with established analytical solutions.

THEORETICAL FOUNDATION

An entry qualifies for inclusion ONLY if: (i) The problem is a bona-fide dynamic programming problem (ii) The optimal c_t (and any other control) can be written in closed form with no recursive objects left implicit

Standard Timing Convention (Adopted Throughout)

t ∈ {0,1,2,…} : period index A_{t-1} : beginning-of-period assets (arrival state, before interest) y_t : non-capital income (realized in period t) R : gross return on assets (R = 1 + r > 1) m_t = R*A_{t-1} + y_t : cash-on-hand (market resources available for consumption) c_t : consumption (control variable) A_t = m_t - c_t : end-of-period assets (state for next period) H_t = E_t[∑_{s=1}^∞ R^{-s} y_{t+s}] : human wealth (present value of future income) W_t = m_t + H_t : total wealth (cash-on-hand plus human wealth) u(c) : period utility function β : discount factor TVC : lim_{T→∞} E_0[β^T u’(c_T) A_T] = 0 (transversality condition)

Registry access

skagent.models.benchmarks.list_benchmark_models()

List all discrete-time benchmark models.

Note: Most models have analytical solutions, but some (e.g., U-3 buffer stock) require numerical solution due to borrowing constraints + income uncertainty.

Return type:

Dict[str, str]

skagent.models.benchmarks.get_benchmark_model(model_id)

Get benchmark model by ID (D-1 through D-4, U-1 through U-3).

Returns an independent deep copy of the registered block. The registered blocks are module-level singletons whose shock specs are mutated in place by construct_shocks (the (class, args) tuples are replaced by distribution objects). Returning a copy keeps that mutation local to the caller, so constructing or recalibrating one block never leaks into another caller (or another test) holding “the same” model.

Parameters:

model_id (str)

Return type:

DBlock

skagent.models.benchmarks.get_benchmark_calibration(model_id)

Get benchmark calibration by model ID

Parameters:

model_id (str)

Return type:

Dict[str, Any]

skagent.models.benchmarks.get_analytical_policy(model_id)

Get analytical policy function by model ID.

Raises ValueError if the model does not have an analytical policy (e.g., U-3 buffer stock model requires numerical solution).

Parameters:

model_id (str)

Return type:

Callable

skagent.models.benchmarks.has_analytical_policy(model_id)

Return whether the registry exposes a closed-form policy for model_id.

Most benchmark entries pair their DBlock with an analytical decision function; a few (for example U-3, the buffer-stock model) are registered without one because a borrowing constraint combined with income uncertainty leaves them with no closed form. This predicate lets callers branch on that distinction without catching the ValueError that get_analytical_policy() raises.

Parameters:

model_id (str) – Registry key, e.g. "D-1" or "U-3".

Returns:

True if a closed-form policy is available, False otherwise.

Return type:

bool

Raises:

ValueError – If model_id is not a registered benchmark model.

skagent.models.benchmarks.get_reference_policy(model_id)

Get a numerical reference policy for a model lacking a closed form.

Some benchmarks (currently D-4) have no analytical policy because a binding constraint precludes one, but do have an independent numerical oracle (e.g. value-function iteration) suitable for validating trained policies.

Raises ValueError if the model has no reference policy registered.

Parameters:

model_id (str)

Return type:

Callable

skagent.models.benchmarks.get_test_states(model_id, test_points=10)

Get test states for model validation by model ID

Parameters:
  • model_id (str)

  • test_points (int)

Return type:

Dict[str, Tensor]

skagent.models.benchmarks.get_custom_validation(model_id)

Get custom validation function for model (if it has one)

Parameters:

model_id (str)

Return type:

Optional[Callable]

Validation

skagent.models.benchmarks.validate_analytical_solution(model_id, test_points=10, tolerance=1e-08)

Validate analytical solution satisfies optimality conditions and budget constraints

Parameters:
Return type:

Dict[str, Any]

skagent.models.benchmarks.euler_equation_test(model_id, test_points=100)

Test Euler equation satisfaction for stochastic analytical solutions

Parameters:
  • model_id (str)

  • test_points (int)

Return type:

Dict[str, Any]

skagent.models.benchmarks.get_analytical_lifetime_reward(model_id, *args, **kwargs)

Get analytical lifetime reward for a benchmark model.

Parameters:
  • model_id (str) – Model identifier (D-1, D-2, D-3, etc.)

  • *args – Arguments to pass to the specific analytical function

  • **kwargs – Arguments to pass to the specific analytical function

Returns:

Analytical lifetime reward value

Return type:

float

Analytical policies

skagent.models.benchmarks.d1_analytical_policy(states, shocks, parameters)

Optimal policy for D-1: finite-horizon log-utility consumption.

With log utility and a deterministic \(T\)-period horizon, the agent solves

\[V_T(W_T) = \log W_T, \qquad V_t(W_t) = \max_{c_t \in (0,\, W_t]} \, \log c_t + \beta \, V_{t+1}\bigl((W_t - c_t)\, R\bigr),\]

where \(W_t\) is wealth at the start of period \(t\), \(R\) is the gross return, and \(\beta < 1\) is the discount factor. The value function takes the form \(V_t(W) = \alpha_t + \log W\) for a time-varying additive constant \(\alpha_t\), and the first-order condition gives the remaining-horizon rule

\[c_t \;=\; \frac{1 - \beta}{\,1 - \beta^{T - t}\,} \, W_t.\]

In the terminal period (\(T - t = 1\)) the formula simplifies to \(c_t = W_t\) since \((1-\beta)/(1-\beta) = 1\); the implementation handles this case directly to avoid the \(0/0\) form that would arise once \(T - t = 0\). As \(T - t \to \infty\), the rule converges to the infinite-horizon constant-MPC policy \(c_t = (1 - \beta) W_t\), the \(\sigma = 1\) special case of D-2.

Parameters:
  • states (dict) – Arrival states. Must contain "W" (wealth) and "t" (time index, defaults to 0).

  • shocks (dict) – Unused; the model is deterministic.

  • parameters (dict) – Must contain "DiscFac" (\(\beta\)) and "T" (horizon).

Returns:

{"c": c_optimal} whose dtype matches the input wealth.

Return type:

dict

Raises:

ValueError – If \(\beta \geq 1\), since the closed form is undefined.

skagent.models.benchmarks.d2_analytical_policy(states, shocks, parameters)

Optimal policy for D-2: infinite-horizon CRRA perfect foresight.

The canonical perfect-foresight consumption problem solves

\[\max_{\{c_t\}} \, \sum_{t=0}^{\infty} \beta^t \, \frac{c_t^{\,1-\sigma}}{1 - \sigma} \quad \text{s.t.} \quad A_t = R\, A_{t-1} + y - c_t,\]

with constant labor income \(y > 0\), gross return \(R > 1\), and the transversality condition \(\lim_{T\to\infty} \beta^T \, u'(c_T)\, A_T = 0\). Equivalently, cash-on-hand \(m_t = R\, A_{t-1} + y\) evolves with end-of-period assets \(A_t = m_t - c_t\). Under return-impatience \((\beta R)^{1/\sigma} < R\), the closed-form policy is linear in total wealth,

\[c_t \;=\; \kappa \, W_t, \qquad \kappa \;=\; \frac{R - (\beta R)^{1/\sigma}}{R}, \qquad W_t \;=\; m_t + H,\]

where \(m_t = R\, A_{t-1} + y\) is cash-on-hand at time \(t\), \(H = y / r\) is human wealth (the present value of the constant future income stream), and \(r = R - 1\). The marginal propensity to consume \(\kappa\) depends only on deep parameters, not on the state.

Parameters:
  • states (dict) – Must contain "a" (arrival assets).

  • shocks (dict) – Unused.

  • parameters (dict) – Must contain "DiscFac" (\(\beta\)), "R", "CRRA" (\(\sigma\)), and "y".

Returns:

{"c": c_optimal}.

Return type:

dict

Raises:

ValueError – If return-impatience is violated, or if \(R \leq 1\).

See also

d3_analytical_policy

Same model with i.i.d. survival risk.

skagent.models.benchmarks.d3_analytical_policy(states, shocks, parameters)

Optimal policy for D-3: Blanchard (1985) discrete-time mortality.

Extends D-2 by giving the agent i.i.d. survival probability \(s \in (0, 1)\) per period. The objective becomes

\[\max \, \sum_{t=0}^{\infty} (s\beta)^t \, \frac{c_t^{\,1-\sigma}}{1 - \sigma},\]

with the same budget constraint as D-2. Mortality is observationally equivalent to scaling the discount factor from \(\beta\) to \(s\beta\), so the linear consumption rule survives:

\[c_t \;=\; \kappa_s \, (m_t + H), \qquad \kappa_s \;=\; \frac{R - (s\beta R)^{1/\sigma}}{R},\]

with \(H = y / r\) as before. The MPC \(\kappa_s > \kappa\) because mortality erodes effective patience: the agent consumes a larger share of wealth each period.

Parameters:
  • states (dict) – Must contain "a" (arrival assets). The "liv" alive indicator is part of the DBlock simulator dynamics but is not read by the analytical policy.

  • shocks (dict) – May contain "live" (Bernoulli survival shock); unused for the analytical policy itself.

  • parameters (dict) – Must contain "DiscFac", "R", "CRRA", "y", and "SurvivalProb" (\(s\)).

Returns:

{"c": c_optimal}.

Return type:

dict

Raises:

ValueError – If mortality-adjusted return-impatience is violated, or if \(R \leq 1\).

References

Blanchard, O.J. (1985). “Debt, deficits, and finite horizons.” Journal of Political Economy, 93(2), 223-247.

See also

d2_analytical_policy

Underlying perfect-foresight model without mortality.

skagent.models.benchmarks.u1_analytical_policy(states, shocks, parameters)

Optimal policy for U-1: Hall (1978) random-walk consumption.

With quadratic utility \(u(c) = ac - bc^2/2\) and the neutral stochastic discount factor \(\beta R = 1\), the Euler equation collapses to the martingale property

\[\mathbb{E}_t[c_{t+1}] = c_t,\]

so consumption follows a random walk regardless of the income process. Hall’s contribution was to derive this implication and confront it with consumption data.

The decision rule consistent with this Euler equation, plus transversality, is the Permanent Income Hypothesis: consume the annuity value of total wealth,

\[c_t \;=\; \frac{r}{R} \, (m_t + H),\]

where \(m_t = R \, A_{t-1} + y_t\) is cash-on-hand, \(H = \mathbb{E}_t y / r\) is the present value of the expected future income stream, and \(r = R - 1\). The martingale property is a consequence of this PIH policy, not the policy itself.

Parameters:
  • states (dict) – Must contain "A" (arrival assets) and "y" (current income realization).

  • shocks (dict) – May contain "eta" (mean-zero income innovation); unused once \(y_t\) is known.

  • parameters (dict) – Must contain "DiscFac", "R", and "y_mean".

Returns:

{"c": c_optimal}.

Return type:

dict

Notes

Logs a warning via the module logger if \(|\beta R - 1| > 10^{-6}\), since the PIH derivation hinges on \(\beta R = 1\) exactly. With high income variance, transversality may also fail.

References

Hall, R.E. (1978). “Stochastic implications of the life cycle-permanent income hypothesis: Theory and evidence.” Journal of Political Economy, 86(6), 971-987.

skagent.models.benchmarks.u2_analytical_policy(states, shocks, parameters)

Optimal policy for U-2: log utility with permanent income shocks (normalized).

The buffer-stock problem with log utility, geometric random-walk permanent income, and no borrowing constraint admits a closed-form policy in normalized variables. Dividing every level variable by permanent income \(P_t\) yields the lowercase ratios \(m = M/P\), \(c = C/P\), \(a = A/P\), with normalized transition

\[m_{t+1} = \frac{R}{\psi_{t+1}} \, a_t + 1,\]

where \(\psi_{t+1}\) is the mean-one permanent income shock. The constant +1 represents normalized transitory income, which is identically one in U-2; the more general two-shock case (with a stochastic transitory component) is U-3. Under log utility the closed form is

\[c_t \;=\; (1 - \beta) \, (m_t + h), \qquad h \;=\; 1/r,\]

independent of the realized shock path. The MPC \((1 - \beta)\) is the limiting MPC for any unconstrained CRRA agent; log utility (the \(\sigma = 1\) case) makes it exact rather than asymptotic.

Parameters:
  • states (dict) – Must contain "a" (normalized arrival assets).

  • shocks (dict) – May contain "psi" (permanent income shock, defaults to ones).

  • parameters (dict) – Must contain "DiscFac" and "R".

Returns:

{"c": c_optimal} (normalized consumption).

Return type:

dict

Notes

Setting "sigma_psi": 0 makes \(\psi \equiv 1\), so the PIH analytical solution holds exactly. The Control upper bound \(0.1\, m + 2\) is loose enough for the analytical policy \(c \approx 0.04\, m + 1.33\) but rules out Ponzi-scheme solutions that satisfy the Euler equation while violating transversality.

See also

u3_block

Same problem with a binding borrowing constraint and CRRA utility, which has no closed form.

skagent.models.benchmarks.crra_utility(c, gamma)

CRRA utility: u(c) = c^(1-gamma)/(1-gamma) for gamma != 1, log(c) for gamma == 1

Fisher Two-Period Model

Fisher (1930) two-period intertemporal consumption.

The simplest dynamic programming problem with a closed-form solution. An agent receives income \(y_0\) in period 0 and \(y_1\) in period 1, borrows or saves at gross rate \(R\), and chooses consumption \(c_0, c_1\) to maximize

\[u(c_0) + \beta \, u(c_1)\]

subject to the lifetime budget constraint \(c_0 + c_1/R = m_0 + y_1/R\), with \(m_0 = R\, a_{-1} + y_0\). With CRRA utility \(u(c) = c^{1-\sigma}/(1-\sigma)\), the Euler equation \(u'(c_0) = \beta R \, u'(c_1)\) together with the budget constraint gives the closed form

\[c_0 \;=\; \frac{m_0 + y_1/R}{\,1 + (\beta R)^{1/\sigma}/R\,}, \qquad c_1 \;=\; (\beta R)^{1/\sigma} \, c_0.\]

The two-period horizon makes the model an exact analogue of the intertemporal-choice diagram in introductory macroeconomics, while the recursive form is the simplest non-trivial test case for value-function iteration and Euler-equation solvers in skagent.

Notes

The math above uses \(R\) for the gross return; the block parameter key is Rfree.

References

Fisher, I. (1930). The Theory of Interest. New York: Macmillan.

Perfect Foresight Models

Perfect-foresight consumption-savings with stochastic survival and permanent income growth.

Block representation of the canonical perfect-foresight problem with i.i.d. survival probability \(s \in (0, 1)\) and gross permanent income growth \(G\). The agent solves

\[\max \, \sum_{t=0}^{\infty} (s\beta)^t \, \frac{c_t^{\,1-\sigma}}{1 - \sigma} \quad \text{s.t.} \quad m_{t+1} = R \, (m_t - c_t) + y_{t+1}, \qquad y_{t+1} = G \, P_t,\]

with permanent income growing as \(P_{t+1} = G\, P_t\). Two conditions are needed for the closed form: mortality-adjusted return-impatience \((s\beta R)^{1/\sigma} < R\) (so consumption does not explode), and \(R > G\) (so human wealth is finite). Under both, the consumption rule is linear in total wealth \(W_t = m_t + H_t\), with human wealth \(H_t = G\, P_t / (R - G)\) and MPC \(\kappa_s = (R - (s\beta R)^{1/\sigma})/R\). The companion module skagent.models.perfect_foresight_normalized solves the same problem in variables divided by \(P_t\), which collapses the state space and is the form used in the buffer-stock literature.

Notes

The math above uses \(R\), \(G\), and \(s\) for the gross return, permanent income growth, and survival probability; the corresponding block parameter keys are Rfree, PermGroFac, and LivPrb.

References

Carroll, C.D. (2024). Solution Methods for Solving Microeconomic Dynamic Stochastic Optimization Problems. https://llorracc.github.io/SolvingMicroDSOPs/

Perfect Foresight Models (Normalized)

Perfect-foresight consumption-savings in normalized variables.

The same problem as skagent.models.perfect_foresight, but with every level variable divided by permanent income \(P_t\). Lowercase ratios \(m = M/P\), \(c = C/P\), \(a = A/P\) evolve via the effective return \(R_{\text{eff}} = R / G\),

\[b_{t+1} = R_{\text{eff}} \, a_t, \qquad m_{t+1} = b_{t+1} + 1, \qquad a_{t+1} = m_{t+1} - c_{t+1},\]

with normalized income identically equal to one. Normalization reduces the state space from \((M, P)\) to the single ratio \(m\), which matters both for analytical tractability and for neural-network solvers, where the network learns a one-dimensional function \(c(m)\) instead of a two-dimensional \(c(M, P)\).

The closed-form normalized policy is \(c_t = \kappa_s\, (m_t + h)\) with \(\kappa_s = (R - (s\beta R)^{1/\sigma})/R\) and \(h = 1 / (R_{\text{eff}} - 1)\). Although the normalized Bellman discounts future utility by \(s\beta G^{1-\sigma}\) rather than \(s\beta\), the algebra collapses \((s\beta G^{1-\sigma} R_{\text{eff}})^{1/\sigma}\) to \((s\beta R)^{1/\sigma}/G\), so the level and normalized MPCs agree.

Notes

The math above uses \(R\), \(G\), and \(s\) for the gross return, permanent income growth, and survival probability; the corresponding block parameter keys are Rfree, PermGroFac, and LivPrb.

References

Carroll, C.D. (2024). Solution Methods for Solving Microeconomic Dynamic Stochastic Optimization Problems. https://llorracc.github.io/SolvingMicroDSOPs/

Resource Extraction

Resource Extraction with Optimal Escapement (Reed 1979)

Resource extraction models analyze the optimal management of renewable or depletable resources over time. These models appear in:

Fisheries management: Determining sustainable harvest rates for fish populations Forestry: Optimal timber harvesting schedules Environmental economics: Managing renewable natural resources (water, wildlife) Energy: Optimal depletion of oil fields and mineral deposits Finance: Portfolio liquidation and asset drawdown strategies

The core problem involves balancing immediate extraction (profit now) against preserving the resource stock (profit later), accounting for natural growth dynamics and environmental uncertainty.

This implementation follows the model from:

Reed, W.J. (1979). “Optimal escapement levels in stochastic and deterministic harvesting models.” Journal of Environmental Economics and Management, 6(4), 350-363.

Reed showed that under multiplicative environmental shocks and stock-dependent harvesting costs, the optimal policy has a simple “constant escapement” form: maintain a target stock level S* and harvest any surplus above it. This optimal escapement level S* can be computed analytically without solving the full dynamic programming problem, making it an excellent benchmark for testing reinforcement learning algorithms.

Mathematical Model

State: \(x_t\) = resource stock at time \(t\)

Control: \(u_t\) = harvest, constrained by \(0 \leq u_t \leq x_t\)

Dynamics:

\[x_{t+1} = r (x_t - u_t) \epsilon_t\]

where \(r > 1\) is the growth rate, \(\epsilon_t\) is a mean-one log-normal shock, and \((x_t - u_t)\) is the escapement (remaining stock).

Profit:

\[\pi(u_t, x_t) = \left(p - \frac{c_0}{x_t}\right) u_t\]

where \(p\) is price and \(c_0/x_t\) is the stock-dependent unit cost.

Objective: Maximize \(\mathbb{E}\left[\sum_{t=0}^{\infty} \delta^t \pi(u_t, x_t)\right]\)

Optimal Policy

Reed (1979) [Reed1979] proved the optimal policy has constant escapement form:

\[u_t^* = \max(0, x_t - S^*)\]

where the optimal escapement level is:

\[S^* = \frac{c_0 (1 - \delta)}{p (1 - \delta r)}\]

This requires the impatience condition \(\delta r < 1\).

This model provides an excellent benchmark for RL algorithms since \(S^*\) can be computed analytically without dynamic programming.

References

[Reed1979]

Reed, W.J. (1979). “Optimal escapement levels in stochastic and deterministic harvesting models.” Journal of Environmental Economics and Management, 6(4), 350-363.

skagent.models.resource_extraction.df_u(states, shocks, parameters)

Decision function compatible with DBlock interface.

Parameters:
  • states (dict) – Contains 'x' (current stock)

  • shocks (dict) – Random shocks (not used in optimal policy)

  • parameters (dict) – Model parameters (not used after S* is computed)

Returns:

u – Optimal harvest

Return type:

float

skagent.models.resource_extraction.dr_u(x)

Optimal harvest under constant escapement policy.

Parameters:

x (float or array_like) – Current stock level(s)

Returns:

u – Optimal harvest \(u = \max(0, x - S^*)\)

Return type:

float or ndarray

skagent.models.resource_extraction.make_optimal_decision_rule(parameters)

Compute the optimal constant-escapement policy from Reed (1979).

Reed showed that for the model with multiplicative shocks and stock-dependent costs \(c(x) = c_0/x\), the optimal policy has the form:

\[u^*(x) = \max(0, x - S^*)\]

where \(S^*\) is the optimal escapement level (target stock to maintain). This can be computed analytically from the first-order condition without solving the full dynamic programming problem.

Parameters:

parameters (dict) –

Model parameters including:

  • r : float, growth rate

  • p : float, price per unit

  • c_0 : float, cost parameter

  • DiscFac : float, discount factor

Returns:

  • decision_rule (callable) – Maps stock x to optimal harvest \(u^*(x) = \max(0, x - S^*)\)

  • decision_function (callable) – Compatible with DBlock interface: decision_function(states, shocks, parameters)

Notes

The optimal escapement \(S^*\) satisfies the first-order condition:

\[p - \frac{c_0}{S^*} = \delta r \left(p - \frac{c_0}{r S^*}\right)\]

where \(\delta\) is the discount factor. Rearranging gives:

\[S^* = \frac{c_0 (1 - \delta)}{p (1 - \delta r)}\]

This requires the “impatience condition” \(\delta r < 1\), which ensures the agent prefers extraction over indefinite accumulation.