Hidden Markov Models for Market Regimes: A Practical Guide (With Python)
The Hidden Markov Model is one of the most powerful tools in quantitative finance. It's been used by central banks to model business cycle transitions, by hedge funds to detect regime changes in real-time, and by the Federal Reserve to analyze the dynamics of economic expansions and contractions. Yet it remains underused by most systematic traders.
This guide is the one I wish existed when I first started implementing HMMs for market regime detection. We'll cover the mathematical foundations, the three fundamental algorithms (forward, Viterbi, Baum-Welch), a complete Python implementation using hmmlearn, and the practical decisions that separate a research toy from a production system.
By the end, you'll understand not just how to run the code, but why each piece works — and where the model can fail.
What Is a Hidden Markov Model?
A Hidden Markov Model is a statistical model in which the system being modeled is a Markov chain with unobservable (hidden) states. You can observe the outputs of the system, but not the states themselves — you have to infer the states from the observations.
The word "hidden" is the key. If you could directly observe whether the market was in a "risk-on low-volatility trend" state or a "risk-off high-volatility chop" state, you wouldn't need a model. The challenge is that market regimes are latent — they influence what you observe (returns, volatility, correlations), but they aren't directly observable.
An HMM has three components:
- Transition matrix A: A k×k matrix where A[i,j] is the probability of transitioning from state i to state j. For a 3-state model, this is a 3×3 matrix with rows summing to 1.
- Emission distributions B: For each hidden state, a probability distribution over observations. In Gaussian HMMs, each state has its own mean (μ) and covariance (Σ).
- Initial state distribution π: The probability of starting in each state at time zero.
The model says: at each time step, the market is in some hidden state. That state generates the observed return according to a Gaussian distribution. Then the market transitions to the next state according to the transition matrix.
Hamilton (1989) introduced the first regime-switching model for economic time series and demonstrated that U.S. real GNP growth switches between two distinct regimes — one with high average growth, one with negative average growth — matching the structure of business cycles with remarkable precision. Subsequent work by Ang and Bekaert (2004), Guidolin and Timmermann (2007), and dozens of others extended this framework to equity markets, volatility dynamics, and multi-asset portfolio allocation.
The Three Core Algorithms
Understanding HMMs requires understanding three algorithms. They solve different problems but are tightly related.
1. The Forward Algorithm: What is P(observations | model)?
The forward algorithm computes the probability of observing a particular sequence of returns given the model parameters. It uses dynamic programming to avoid the combinatorial explosion of summing over all possible state sequences.
Define α_t(i) as the probability of being in state i at time t AND having observed the sequence o_1, o_2, ..., o_t. The recursion is:
Initialization: α_1(i) = π_i · b_i(o_1)
Recursion: α_t(j) = [Σ_i α_{t-1}(i) · a_{ij}] · b_j(o_t)
Termination: P(O|λ) = Σ_i α_T(i)
In practice, these probabilities become extremely small for long sequences and underflow to zero in floating point. The solution is to work in log-space, summing log-probabilities with the log-sum-exp trick.
2. The Viterbi Algorithm: What is the most likely state sequence?
The Viterbi algorithm finds the single most likely sequence of hidden states given the observations. It uses the same dynamic programming structure as the forward algorithm, but takes the maximum over previous states rather than summing.
Define δ_t(i) as the probability of the most probable path ending in state i at time t. The recursion:
Initialization: δ_1(i) = π_i · b_i(o_1)
Recursion: δ_t(j) = max_i [δ_{t-1}(i) · a_{ij}] · b_j(o_t)
Backtrack: q*_t = argmax_i ψ_{t+1}(i) at each step
Viterbi gives you the most likely regime label for each day in your backtest. But be careful: "most likely sequence" ≠ "most likely state at each time point." The sequence probability optimizes jointly over all time steps. If you want the marginal probability of being in each state at a specific time t, you need the smoothed posterior probabilities from the forward-backward algorithm.
3. The Baum-Welch Algorithm: How do we learn the parameters?
Baum-Welch is the Expectation-Maximization (EM) algorithm applied to HMMs. It learns the model parameters (A, B, π) from observed data without knowing the true hidden states.
The E-step computes the expected sufficient statistics using the forward and backward probabilities. The backward variable β_t(i) is the probability of observing o_{t+1}, ..., o_T given that we're in state i at time t:
γ_t(i) = P(q_t = i | O, λ) = α_t(i)·β_t(i) / Σ_j α_t(j)·β_t(j)
ξ_t(i,j) = P(q_t=i, q_{t+1}=j | O, λ) = α_t(i)·a_{ij}·b_j(o_{t+1})·β_{t+1}(j) / P(O|λ)
The M-step re-estimates parameters using these expected counts. Iterate until convergence. Baum-Welch is guaranteed to converge to a local maximum of the likelihood — not necessarily the global maximum, which is why initialization matters.
Python Implementation
Let's build a complete HMM-based regime detector. We'll use hmmlearn for the core model and yfinance for data.
import numpy as np
import pandas as pd
import yfinance as yf
from hmmlearn.hmm import GaussianHMM
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
# ── 1. Fetch data ────────────────────────────────────────────
spy = yf.download("SPY", start="2010-01-01", end="2025-01-01",
auto_adjust=True)["Close"]
vix = yf.download("^VIX", start="2010-01-01", end="2025-01-01",
auto_adjust=True)["Close"]
# ── 2. Feature engineering ────────────────────────────────────
df = pd.DataFrame(index=spy.index)
df['returns'] = spy.pct_change()
df['log_returns'] = np.log(spy / spy.shift(1))
df['realized_vol'] = df['log_returns'].rolling(20).std() * np.sqrt(252)
df['vix'] = vix
df['vol_of_vol'] = df['realized_vol'].rolling(20).std()
df['momentum_5d'] = df['log_returns'].rolling(5).sum()
df['momentum_20d'] = df['log_returns'].rolling(20).sum()
df = df.dropna()
features = ['log_returns', 'realized_vol', 'vix',
'vol_of_vol', 'momentum_5d', 'momentum_20d']
X = df[features].values
# ── 3. Normalize features ─────────────────────────────────────
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# ── 4. Fit Gaussian HMM ───────────────────────────────────────
N_STATES = 3
model = GaussianHMM(
n_components=N_STATES,
covariance_type="full",
n_iter=1000,
random_state=42,
tol=1e-4
)
model.fit(X_scaled)
# ── 5. Decode regimes ─────────────────────────────────────────
hidden_states = model.predict(X_scaled) # Viterbi
state_probs = model.predict_proba(X_scaled) # Smoothed posteriors
df['regime'] = hidden_states
for i in range(N_STATES):
df[f'prob_state_{i}'] = state_probs[:, i]
# ── 6. Characterize regimes ───────────────────────────────────
for state in range(N_STATES):
mask = df['regime'] == state
annualized_ret = df.loc[mask, 'log_returns'].mean() * 252
annualized_vol = df.loc[mask, 'log_returns'].std() * np.sqrt(252)
sharpe = annualized_ret / annualized_vol
avg_vix = df.loc[mask, 'vix'].mean()
freq = mask.sum() / len(df)
print(f"State {state}: ret={annualized_ret:.1%} vol={annualized_vol:.1%} "
f"sharpe={sharpe:.2f} vix={avg_vix:.1f} freq={freq:.1%}")
Running this on SPX data from 2010–2024 typically produces three states that correspond recognizably to market regimes: a bull/trend state (positive drift, low volatility, low VIX), a choppy/transitional state (near-zero drift, elevated volatility), and a crisis/bear state (negative drift, high volatility, high VIX). The exact labels assigned to each state are arbitrary — you need to characterize each state by its statistical properties.
Feature Engineering: What to Feed the Model
The choice of input features is one of the most important decisions in building an HMM regime detector. The model can only detect regimes along the dimensions that your features capture. Poor feature choice = poor regime detection.
Here's a categorization of feature types from most to least useful in practice:
Tier 1: Core volatility features
- Realized volatility (20-day, 60-day rolling)
- VIX level (implied volatility)
- Vol-of-vol (volatility of the volatility series)
- Return absolute value (|r_t|) as a proxy for instantaneous vol
Tier 2: Trend/momentum features
- Rolling return sums (5-day, 20-day, 60-day)
- Price relative to moving average (ratio of price to 200d MA)
- Slope of linear trend fit to log-price over rolling window
Tier 3: Cross-asset / macro features
- Credit spread (HYG/LQD ratio or actual credit spread data)
- Treasury yield curve slope (10Y-2Y spread)
- Dollar index (DXY)
- Equity-bond correlation (rolling)
A critical practical warning: be very careful about look-ahead bias in rolling features. A 20-day realized volatility calculated at time t uses data from t-20 to t — that's fine. But if you use a 200-day moving average to compute a "price relative" feature, you need to ensure this is available at each point in your training data timeline, not retroactively calculated.
The Number of States: How Many Regimes?
A common question: how many hidden states should I use? The answer depends on what you're trying to accomplish, but here's the practical guidance:
2 states: Simple risk-on/risk-off classification. Good for binary position sizing decisions. Loses the nuance between "trending" and "range-bound" favorable regimes.
3 states: The sweet spot for most applications. Typically discovers bull/trend, choppy/transitional, and crisis/bear states. Enough granularity for meaningful strategy conditioning without overfitting.
4 states: Allows for differentiation between low-vol trend and low-vol range-bound, and between high-vol choppy and high-vol trending (crisis). Useful if you're trading a strategy sensitive to this distinction (e.g., options sellers care a lot about the difference between "quiet chop" and "quiet trend").
5+ states: Usually overfits on historical data and produces unstable regime assignments. Unless you have very strong domain theory about the specific states that should exist, avoid.
To select the number of states formally, use information criteria. The Bayesian Information Criterion (BIC) penalizes model complexity:
# Model selection via BIC
results = []
for n_states in range(2, 7):
m = GaussianHMM(n_components=n_states, covariance_type="full",
n_iter=500, random_state=42)
m.fit(X_scaled)
log_likelihood = m.score(X_scaled) * len(X_scaled)
# Number of free parameters
n_features = X_scaled.shape[1]
n_params = (n_states - 1) # initial probs
n_params += n_states * (n_states - 1) # transitions
n_params += n_states * n_features # means
n_params += n_states * n_features*(n_features+1)/2 # covariances
bic = -2 * log_likelihood + n_params * np.log(len(X_scaled))
results.append({'n_states': n_states, 'bic': bic, 'll': log_likelihood})
print(f"n={n_states}: BIC={bic:.0f}, LL={log_likelihood:.0f}")
best_n = min(results, key=lambda x: x['bic'])['n_states']
print(f"Optimal n_states by BIC: {best_n}")
Critical Pitfall: Initialization Sensitivity
Baum-Welch finds a local maximum of the likelihood function, not the global maximum. This means results can differ substantially based on the initialization of the parameters. This is not a theoretical concern — it happens in practice and produces meaningfully different regime assignments.
The solution is multi-start optimization: run Baum-Welch from many different random initializations and take the result with the highest log-likelihood:
best_model = None
best_score = -np.inf
for seed in range(50): # 50 random restarts
m = GaussianHMM(n_components=3, covariance_type="full",
n_iter=500, random_state=seed)
try:
m.fit(X_scaled)
s = m.score(X_scaled)
if s > best_score:
best_score = s
best_model = m
except:
continue
print(f"Best log-likelihood: {best_score:.2f}")
Fifty restarts is a good baseline. For production systems, 100–200 restarts with parallelization gives you higher confidence you've found a good local optimum.
Building an Ensemble
A single HMM is a fragile regime detector. Its regime assignments depend on initialization, the specific features chosen, the training period, and the number of states. Small changes in any of these can produce different regime labelings.
The production approach is an ensemble: run multiple HMMs with different feature sets, different numbers of states, and different training windows. Combine their outputs through probability averaging.
from sklearn.preprocessing import StandardScaler
import warnings; warnings.filterwarnings('ignore')
feature_sets = [
['log_returns', 'realized_vol', 'vix'],
['log_returns', 'realized_vol', 'momentum_20d'],
['log_returns', 'vix', 'vol_of_vol', 'momentum_5d'],
['realized_vol', 'vix', 'vol_of_vol'],
]
all_favorable_probs = []
for feat_set in feature_sets:
X_sub = df[feat_set].values
scaler = StandardScaler()
X_sub_scaled = scaler.fit_transform(X_sub)
best_m, best_s = None, -np.inf
for seed in range(30):
m = GaussianHMM(n_components=3, covariance_type="full",
n_iter=500, random_state=seed)
try:
m.fit(X_sub_scaled)
s = m.score(X_sub_scaled)
if s > best_s:
best_s = s; best_m = m
except: continue
probs = best_m.predict_proba(X_sub_scaled)
# Find the state with highest average return (= "favorable")
states = best_m.predict(X_sub_scaled)
mean_returns = [df.loc[states==s, 'log_returns'].mean() for s in range(3)]
favorable_state = np.argmax(mean_returns)
all_favorable_probs.append(probs[:, favorable_state])
# Ensemble: average favorable probability across all models
df['favorable_prob'] = np.mean(all_favorable_probs, axis=0)
print(df[['log_returns', 'regime', 'favorable_prob']].tail(10))
Online vs. Batch: The Look-Ahead Problem
The most insidious source of overfitting in HMM-based trading systems is the look-ahead bias from batch training. When you fit an HMM on the full historical dataset and then use it to label regimes, the model's regime assignments for early dates are informed by data that didn't exist at those times. This makes backtests look much better than they actually are.
The solution for production is an expanding-window walk-forward approach:
- Train the model on data from t=0 to t=T_train
- Generate regime forecasts for t=T_train+1 to t=T_train+30 (or however many days out)
- Advance T_train by one month, retrain, generate new forecasts
- Concatenate all out-of-sample forecasts for your backtest
This is computationally expensive (you're refitting the model ~120 times for a 10-year backtest with monthly refit), but it's the only way to get an honest estimate of what the regime detector would have told you in real time.
Kim (1994) proposed a real-time filter for Hamilton's regime-switching model that produces optimal real-time regime probability estimates using only information available up to each point in time — without requiring full-batch fitting. This is the theoretically correct approach for real-time applications.
From Research to Production: What RegimeForecast Does Differently
Building a research-quality HMM is the easy part. Running it as a production system that delivers reliable regime signals every trading day requires several additional engineering decisions:
Data freshness and pipeline reliability: The model is only as good as the data going in. A production regime detector needs automated data pipelines with validation checks, anomaly detection (did VIX move 50% overnight? is that a real data point?), and fallback handling. For more on why this matters for live trading, see how regime-dependent strategy performance breaks down when signals are stale or mis-classified.
Regime label persistence: HMMs can flip between state labels between training runs because the mapping from latent state index to regime label is arbitrary. A regime labeled "state 2" in one training run might correspond to "state 0" in the next run. Production systems need a stable labeling convention based on state characteristics (e.g., the state with highest volatility is always labeled "high-vol") rather than arbitrary index.
Calibration monitoring: Regime probability outputs should be calibrated. A model that says "70% favorable" should be right about 70% of the time when it makes that statement. Reliability diagrams and Brier scores should be computed regularly to validate that the model remains well-calibrated as market conditions evolve.
Model aging and refit schedules: Market structure changes over time. The HMM trained in 2015 may not accurately represent the regime dynamics of 2025. Implement automated drift detection (monitoring whether observed return distributions continue to match model emission distributions) and refit the model when significant drift is detected.
The complexity of maintaining a production-quality HMM regime detector is non-trivial. The research is straightforward; the engineering is hard. This is why purpose-built tools that handle the full pipeline — data collection, feature engineering, model fitting, ensemble aggregation, and real-time signal delivery — provide meaningful value over DIY implementations.
References
- Ang, A., & Bekaert, G. (2004). How regimes affect asset allocation. Financial Analysts Journal, 60(2), 86–99.
- Baum, L. E., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 37(6), 1554–1563.
- Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1), 164–171.
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
- Guidolin, M., & Timmermann, A. (2007). Asset allocation under multivariate regime switching. Journal of Economic Dynamics and Control, 31(11), 3503–3544.
- Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2), 357–384.
- Kim, C. J. (1994). Dynamic linear models with Markov-switching. Journal of Econometrics, 60(1–2), 1–22.
- Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
- Rydén, T., Teräsvirta, T., & Åsbrink, S. (1998). Stylized facts of daily return series and the hidden Markov model. Journal of Applied Econometrics, 13(3), 217–244.
- Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
- Welch, L. R. (2003). Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter, 53(4), 1–13.
Share this article
Continue Reading
Ready to detect regimes in real time?
RegimeForecast runs ensemble HMM models on live market data and delivers regime classifications, probability signals, and 7-day forecasts — every trading day.
Start Free 14-Day TrialNo charge until day 15. Cancel anytime.