The Maliar Training Loop on a Model With No Closed-Form Solution

The companion example trains a network on a fixed grid and checks it against a closed-form policy. This one runs the full Maliar, Maliar, and Winant (2021) algorithm, maliar_training_loop(), on a model that has no closed-form solution: Carroll’s buffer-stock consumption problem (benchmark U-3). Solving such models is the reason the neural-network method exists, so it is the honest setting in which to show the algorithm at work.

What the Maliar algorithm adds over plain gradient descent

A neural Bellman/Euler solver has three separable ingredients. The first two already appear in train_block_nn():

  1. an all-in-one expectation operator: the conditional expectation \(\mathbb{E}[\,\cdot\,]\) in the optimality condition is evaluated with two independent shock copies per state, so their product is an unbiased estimate of the squared residual (squaring a single noisy draw would bias it upward);

  2. stochastic gradient descent on the resulting residual loss.

The Maliar loop adds the third:

  1. an outer loop over the state distribution. After each block of SGD updates it simulates the model forward under the current policy and uses the states it lands on as the next training grid. Training therefore concentrates where the agent actually spends time (its ergodic set) rather than on an arbitrary fixed grid. maliar_training_loop() alternates epochs_per_iteration inner SGD steps with one such forward simulation, for up to max_iterations rounds or until the policy stops moving.

The model: buffer-stock saving (U-3)

U-3 is the normalized buffer-stock problem (Carroll 1992, 1997): CRRA utility with risk aversion \(\gamma = 2\), permanent and transitory income shocks, and a borrowing constraint \(c \le m\). In ratios to permanent income the arrival state is normalized assets \(a\), cash-on-hand is \(m = R a / \psi + \theta\) for gross return \(R\), and the control is normalized consumption \(c\). See skagent.models.benchmarks for the block definition and calibration.

The constraint and income risk interact, so there is no closed-form policy. We instead validate the trained policy against two properties that buffer-stock theory guarantees: consumption never exceeds cash-on-hand (\(0 < c \le m\)), and the agent consumes a shrinking share of its resources as wealth rises, so the average propensity to consume \(c / m\) falls (precautionary saving weakens as a buffer accumulates).

import matplotlib.pyplot as plt
import numpy as np
import torch

import skagent.algos.maliar as maliar
import skagent.bellman as bellman
import skagent.grid as grid
import skagent.loss as loss
from skagent.ann import device
from skagent.models.benchmarks import (
    get_benchmark_calibration,
    get_benchmark_model,
)

SEED = 10077693

Step 1: Load the U-3 buffer-stock model and build a BellmanPeriod

construct_shocks draws the income-shock support that the all-in-one expectation operator samples from during training.

u3_block = get_benchmark_model("U-3")
u3_calibration = get_benchmark_calibration("U-3")

rng = np.random.default_rng(SEED)
u3_block.construct_shocks(u3_calibration, rng=rng)

bp = bellman.BellmanPeriod(u3_block, "DiscFac", u3_calibration)

Step 2: Define the Euler-equation loss with the borrowing constraint

constrained=True replaces the hard complementarity condition of the borrowing constraint with the smooth Fischer-Burmeister equation (Maliar et al. 2021, eq. 25), which the control’s upper_bound (\(c \le m\)) supplies. The loss is policy-only here, so a plain BlockPolicyNet is trained internally by the loop.

euler_loss_fn = loss.EulerEquationLoss(bp, parameters=u3_calibration, constrained=True)

Step 3: Run the Maliar training loop

The initial grid only seeds the first iteration; simulation_steps forward draws then move the training states toward the ergodic set each round. shock_copies=2 gives each training state a current-period shock draw and one next-period draw; the all-in-one operator’s second, independent next-period draw is generated inside the loss. Their product is an unbiased estimate of the squared expected residual, avoiding the upward bias of squaring a single draw.

states_0 = grid.Grid.from_config({"a": {"min": 0.1, "max": 4.0, "count": 200}})

trained_net, final_states = maliar.maliar_training_loop(
    bp,
    euler_loss_fn,
    states_0,
    u3_calibration,
    shock_copies=2,
    max_iterations=40,
    tolerance=1e-6,
    random_seed=SEED,
    simulation_steps=1,
    network_width=32,
)

Step 4: Evaluate the trained consumption function

We read the policy over the ergodic set of assets the loop actually trained on, holding the income shocks at their mean (\(\psi = \theta = 1\)) so the slice is deterministic. Then we check the two buffer-stock properties.

decision_fn = trained_net.get_decision_function()

# Evaluate over the ergodic set the loop trained on (the assets the simulated
# agent visits), not a fixed grid: extrapolating past it shows an artifactual,
# untrained downward bend in consumption at high wealth.
n_test = 60
ergodic_a = final_states["a"].detach().to(device)
test_a = torch.linspace(
    float(ergodic_a.min()), float(ergodic_a.max()), n_test, device=device
)
mean_shocks = {
    "psi": torch.ones(n_test, device=device),
    "theta": torch.ones(n_test, device=device),
}

c = decision_fn({"a": test_a}, mean_shocks, u3_calibration)["c"].detach()
R = u3_calibration["R"]
m = R * test_a / mean_shocks["psi"] + mean_shocks["theta"]

# Property 1: the borrowing constraint 0 < c <= m holds everywhere, up to
# the 1e-6 numerical tolerance on the upper bound (the sigmoid-based
# bounding can approach but never exactly reach m).
respects_constraint = bool(torch.all(c > 0) and torch.all(c <= m + 1e-6))

# Property 2: the average propensity to consume c / m falls as the buffer
# grows. The ratio is robust (no differentiation of the network output) and,
# for a concave policy above the constraint, monotone: near the constraint the
# agent consumes nearly all of its resources, and it consumes a shrinking share
# once a buffer accumulates. Compare the lowest-wealth tenth of the grid to the
# highest.
apc = c / m
tenth = max(1, n_test // 10)
apc_low = apc[:tenth].mean().item()
apc_high = apc[-tenth:].mean().item()

print(f"Trained over {final_states.n()} ergodic-set states in the final round.")
print(f"Borrowing constraint 0 < c <= m holds on the test grid: {respects_constraint}")
print(f"APC c/m near the constraint: {apc_low:.2f}")
print(f"APC c/m at high wealth:      {apc_high:.2f}")
print(f"APC falls as the buffer grows (buffer-stock signature): {apc_high < apc_low}")
Trained over 200 ergodic-set states in the final round.
Borrowing constraint 0 < c <= m holds on the test grid: True
APC c/m near the constraint: 0.75
APC c/m at high wealth:      0.45
APC falls as the buffer grows (buffer-stock signature): True

Step 5: Plot the trained policy and its APC

The upper panel shows consumption against cash-on-hand, with the 45-degree line \(c = m\) marking the borrowing constraint. Over the ergodic set the agent holds a precautionary buffer, so consumption stays strictly below the constraint (the gap to \(c = m\) is that buffer) and rises concavely. The lower panel shows the average propensity to consume \(c / m\) falling as wealth grows.

m_np = m.cpu().numpy()
c_np = c.cpu().numpy()
apc_np = apc.cpu().numpy()

fig, (ax_policy, ax_apc) = plt.subplots(
    2, 1, figsize=(8, 7), sharex=True, height_ratios=[3, 2]
)

ax_policy.plot(m_np, m_np, "k:", linewidth=1.5, label="Constraint $c = m$")
ax_policy.plot(m_np, c_np, "C0-", linewidth=2, label="Trained policy (Maliar loop)")
ax_policy.set_ylabel("Normalized consumption $c$")
ax_policy.set_title("Maliar loop recovers a concave buffer-stock policy (U-3)")
ax_policy.legend()
ax_policy.grid(True, alpha=0.3)

ax_apc.plot(m_np, apc_np, "C2-", linewidth=1.5)
ax_apc.set_xlabel("Cash-on-hand $m$")
ax_apc.set_ylabel("APC $c / m$")
ax_apc.set_ylim(0.0, 1.0)
ax_apc.grid(True, alpha=0.3)

fig.tight_layout()
plt.show()
Maliar loop recovers a concave buffer-stock policy (U-3)

Total running time of the script: (0 minutes 57.000 seconds)

Gallery generated by Sphinx-Gallery