To create better strategies using factor investing, we will go deeper into this topic and explore more mathematical and coding elements.
Factor investing involves selecting securities based on certain attributes - or factors - that are associated with higher returns. By understanding and leveraging these factors, investors can develop strategies that aim to outperform the market.
In QuantJourney, we have already identified over 100 factors, but we still need to:
Compile and encode all new potential factors, drawing from both academic research and practitioner insights.
Develop our own factors based on your intuition, research, and unique perspectives - we will make a post to show you how to build your own factors, and replicate / modify existing, and test them
Establish a robust approach to testing these new factors to determine whether they have meaningful relationships with returns. (We will cover this also in a separate post, explaining p-values, t-statistics, how to use Bonferroni Correction, Family-Wise Error Rate [FWER], False Discovery Rate [FDR], etc.)
Code existing and new factors in QuantJourney, our integrated platform for quantitative analysis, to implement and test our strategies and do Backtesting within QuantJourney.
If you prefer to jump straight to the code, it's available in the second part of this post; reading the mathematical equations is not mandatory.
Before diving into factor construction, it's essential to grasp foundational asset pricing concepts. These principles will help us understand how factors influence asset prices and expected returns.
What Determines Prices and Expected Returns?
In finance, asset prices and expected returns are fundamentally driven by risk and time preferences. In perfect markets - without frictions like taxes, transaction costs, liquidity constraints, or imperfect information - asset pricing can be understood through simple models of state prices and expected returns.
State prices
State prices are a key concept in asset pricing. They represent the price today of receiving a specific payoff in a particular future state of the world. Each future state is associated with uncertainty (different scenarios can play out), and state prices tell us how much we should pay now to receive a unit of payoff in each possible future state.
To simplify this, let's assume there are a few possible future states (e.g., a market upturn, a downturn, or stable performance). Each state has a state price q(s) that indicates how much investors are willing to pay today to receive a payoff in state s.
where,
P is the current price of the asset.
q(s) is the state price for state s.
X(s) is the payoff of the asset in state s.
This means that the price of the asset today is the sum of the discounted future payoffs across all possible states, weighted by their respective state prices.
Expected Returns
Understanding expected returns is crucial because it determines how investors are compensated for the risks they take. Let's explore how expected returns are calculated and their significance in asset pricing.
In general, investors demand higher expected returns for assets that have payoffs that are more uncertain or that perform poorly in bad states of the world. The link between risk and return can be described using the stochastic discount factor (SDF), denoted as $m_{t+1}$, which reflects both time preferences and risk aversion (see below more about SDF).
The current price $P_t$ of an asset as the expected discounted future payoff:
$P_t$ is current price,
$X_{t+1}$ is the future payoff,
$m_{t+1}$ is the SDF (pricing kernel).
The expected return of an asset is linked to how its payoff covaries with the stochastic discount factor (SDF). The fundamental pricing equation is:
In this equation, the stochastic discount factor reflects the market’s risk preferences. Assets that provide higher payoffs in bad states of the world are more valuable and tend to have lower expected returns because they provide insurance during negative outcomes. Conversely, assets that pay off in good states of the world are riskier, and investors will demand higher expected returns for holding them.
Relation to State Prices
Now that we've discussed state prices and expected returns separately, it's important to see how they are interconnected. The relationship between state prices and the stochastic discount factor (SDF) bridges the gap between theoretical concepts and practical asset pricing.
State prices $q(s)$ and the stochastic discount factor $m_{t+1}$ are closely related. The SDF effectively summarises the state prices, taking into account both the probability of each state $s$ and the investor’s preferences for risk and time. The relation is:
where, $P(s)$ is the probability of state $s$ occurring.
Let’s now see how State Prices and the SDF integrate into Factor Models
With a grasp of state prices and the SDF, we can delve into how these concepts are applied within factor models. This integration is key to understanding the mechanics behind factor investing.
In factor-based investing, the returns of assets are explained by their exposure to various systematic risk factors, such as market risk, size, value, momentum, etc.
Therefore it’s clear now that the stochastic discount factor (SDF) and state prices play a key role in linking factor models to asset prices:
Factor Models: Factor models, like the Fama-French three-factor model or multi-factor models, decompose an asset's return into several components related to different risk factors. For example, in the Fama-French model, the factors include:
Market excess return ($R_m−R_f$)
Size factor (SMB: Small minus Big)
Value factor (HML: High minus Low)
State Prices and SDF: Factor models can be seen as special cases of the general state price or SDF approach to asset pricing. Each risk factor is essentially capturing certain states of the world (e.g., "value stocks outperform growth stocks in certain conditions"). The returns on assets can be understood as being priced according to how they perform in these states. The SDF is then a linear function of these factors:
Where f_{k,t+1} are the factor returns, and \lambda_k are factor risk prices.
Risk Premia and Factor-Based Investing
Risk premia are the rewards investors receive for taking on additional risk. In factor-based investing, identifying and capturing these premia is essential for achieving superior returns. For instance:
The market risk premium compensates for the risk of the overall equity market.
The size premium compensates for the additional risk of small-cap stocks.
The value premium compensates for holding stocks with high book-to-market ratios.
Thus Expected Excess Return in Factor Models can be expressed as:
In frictionless markets, these risk premia can be thought of as stemming from risk-neutral probabilities or state prices. Assets that provide better returns in bad states (e.g., downturns or recessions) are priced higher because they act as a form of insurance, resulting in lower expected returns. Conversely, assets that perform poorly in bad states are riskier and have higher expected returns as compensation for this risk.
Do I really Need this to Code my Factor-Based Strategies?
No, you don't need to explicitly work with these concepts on a day-to-day basis when building or applying factor models.
However, the no-arbitrage condition and the connection between risk and return via the SDF are fundamental to why factor models work in practice.
We will focus on identifying risk premia associated with factors like size, value, and momentum, and will rely heavily on this when creating new factors from academic papers. Therefore, understanding that these factors essentially reflect different states of the world helps justify why factors earn risk premia.
Multi-Factor Models
The basic CAPM is a single-factor model, where market risk is the only factor that determines returns. However, real-world returns are often influenced by multiple factors. Multi-factor models, like the Fama-French three-factor model, extend this by incorporating additional sources of risk:
Here, SMB represents the size factor (small minus big) and HML represents the value factor (high minus low book-to-market).
Clearly, multi-factor models provide more flexibility and explanatory power, especially when accounting for multiple sources of risk. Some of the key reasons for using multi-factor models are:
Capturing Different Risk Premiums: Factors like size, value, and momentum capture additional risk premiums not explained by the market alone.
Building Mimicking Portfolios: Construct portfolios whose returns replicate the behavior of these factors, isolating each factor's contribution to asset returns.
Using the Arbitrage Pricing Theory (APT): APT argues that expected returns are influenced by multiple factors, with returns determined by a few factors plus idiosyncratic risk.
Advanced Models: Incorporate models like the Fama-French Five-Factor Model, the q-Factor Model (Hou, Xue, and Zhang, 2015), and others.
Choosing Weights for a Factor Portfolio
After identifying the factors we want to invest in, the next critical step is determining how to weight the assets in our portfolio. The weighting scheme can significantly impact the portfolio's performance and risk characteristics.
1. Dollar-neutral (e.g., $1 long and $1 short):
Sort the stocks based on deciles (or quintiles/terciles) using your signal.
Common weighting schemes:
Value-weighted: Weights based on market capitalization.
Capped-value-weighted: Weights are capped to avoid excessive concentration in large firms.
Equal-weighted: Each stock has equal weight.
Risk-weighted: Weights based on risk measures like volatility.
Rank-weighted: Assign weights based on signal rank.
Signal-weighted: Higher signal stocks receive higher weights.
2. Beta-neutral:
Adjust the long and short portfolios such that the overall portfolio beta equals zero.
Hedge market exposure to ensure you're only capturing the factor return.
3. Risk-adjusted factor:
First, construct the factor using one of the above weighting methods.
Then, rescale the portfolio to maintain a constant ex-ante volatility.
Example: Python Code for Factor Construction
Below is an example of how to construct a momentum and value factor portfolio. We calculate both a momentum factor (long high-momentum, short low-momentum) and a value factor (using Price-to-Book ratio):
# Shift returns by 21 days to exclude the most recent month
shifted_returns = returns.shift(21)
# Calculate momentum (12-month cumulative return excluding the most recent month)
momentum = (1 + shifted_returns).rolling(window=231).apply(np.prod, raw=True) - 1
# Calculate value factor (Book-to-Price ratio)
book_to_price = book_value_per_share / price
# Rank stocks by momentum and value factors
momentum_rank = momentum.rank(axis=1, ascending=False)
value_rank = book_to_price.rank(axis=1, ascending=False)
# Combine ranks by averaging
combined_rank = (momentum_rank + value_rank) / 2
# Determine median of combined ranks
median_rank = combined_rank.median(axis=1)
# Select long and short stocks based on combined rank
long_stocks = returns[combined_rank <= median_rank]
short_stocks = returns[combined_rank > median_rank]
# Create a combined factor portfolio (long high-ranked stocks, short low-ranked stocks)
portfolio_returns = long_stocks.mean(axis=1) - short_stocks.mean(axis=1)
# Calculate performance metrics (Annualized Sharpe Ratio)
sharpe_ratio = (portfolio_returns.mean() * 252) / (portfolio_returns.std() * np.sqrt(252))
print(f"Annualised Sharpe Ratio: {sharpe_ratio}")
Let’s do another example and implement a multi-factor model. This example will simulate asset returns based on a few factors, compute betas, and visualise the relationship between the factors and asset returns using a regression-based approach.
Step 1: Generate Simulated Factor and Asset Return Data
import statsmodels.api as sm
np.random.seed(42)
n_assets = 5 # number of assets
n_obs = 100 # number of observations
n_factors = 3 # number of factors
# Simulate factor returns (factors could represent market, size, value, etc.)
factor_returns = np.random.normal(0, 1, (n_obs, n_factors))
factor_returns_df = pd.DataFrame(factor_returns, columns=['Factor1', 'Factor2', 'Factor3'])
# Simulate asset returns based on factors (with random factor loadings 'betas')
betas = np.random.normal(0.5, 0.1, (n_assets, n_factors))
epsilon = np.random.normal(0, 0.2, (n_obs, n_assets)) # idiosyncratic risk (epsilon)
# Asset returns as linear combination of factors + idiosyncratic noise
asset_returns = factor_returns @ betas.T + epsilon
asset_returns_df = pd.DataFrame(asset_returns, columns=[f'Asset{i+1}' for i in range(n_assets)])
# View the data
factor_returns_df.head(), asset_returns_df.head()
Step 2: Run OLS Regression to Estimate Betas
With our simulated data ready, we can now use Ordinary Least Squares (OLS) regression to estimate the beta coefficients for each asset, revealing their sensitivity to each factor.
def run_ols(asset_returns_df, factor_returns_df):
betas_ols = []
for asset in asset_returns_df.columns:
X = sm.add_constant(factor_returns_df) # add constant (alpha)
y = asset_returns_df[asset]
model = sm.OLS(y, X)
results = model.fit()
betas_ols.append(results.params)
print(f'Regression results for {asset}:')
print(results.summary())
return betas_ols
betas_ols = run_ols(asset_returns_df, factor_returns_df)
The OLS regression results for Asset5 are shown below:"
where OLS regression results show that the three factors explain 95.8% of the variation in returns, as indicated by the high R-squared value. All factors (Factor1, Factor2, and Factor3) are highly significant, with p-values close to zero and strong t-statistics, demonstrating a robust relationship between the factors and the asset's performance.
Similarly, the beta coefficients - 0.6171 for Factor1, 0.6141 for Factor2, and 0.6474 for Factor3 - indicate that Asset5 is similarly sensitive to all three factors. The constant (alpha) is small and not significant, implying no substantial abnormal return beyond what the factors explain. Overall, the model fits well, and the factors are key drivers of Asset5’s returns.
Factor construction doesn’t stop at simple ranking or weighting methods. You can improve your factor models using more advanced techniques, including:
Interaction Terms: Combine multiple factors, such as momentum and value, to create a multi-factor portfolio. For example, a stock with both high momentum and a low Price-to-Book ratio could receive higher weights.
Out-of-sample Testing: Always validate your factor model using out-of-sample data to ensure robustness.
Cross-sectional Regression: Use techniques like Fama-MacBeth regressions to estimate factor premiums across multiple time periods.
More factors
Here you can find subset of over 100 factors already coded in QuantJourney Framework, which we will use for building more robust strategies in coming weeks:
Steps to Build a Factor Model
1. Data Collection - Gather raw market and fundamental data for the assets you want to analyze.
Sources: Stock prices, returns, volume, risk-free rate, fundamentals (e.g., earnings, book value), and any additional factors (macro data, industry trends).
Key Consideration: Ensure the data is clean, free from biases (like survivorship bias or look-ahead bias), and adjusted for corporate actions (splits, dividends).
2. Signal Creation (Factor Definition) - Define the signals or characteristics that will form the basis of your factor model.
Momentum: Stocks with the highest past returns (e.g., 12-month momentum).
Value: Low price-to-book or price-to-earnings ratios.
Volatility: Stocks with low historical volatility.
Profitability: High return on equity (ROE) or gross margins.
Key Consideration: Avoid data mining and overfitting by choosing factors based on both theory (academic literature) and empirical evidence.
3. Signal Transformation and Normalization - Process raw signals to make them comparable across different assets.
Ranking: Rank stocks based on the signal (e.g., top decile for high momentum).
Standardization: Standardize signals (e.g., z-scores) to avoid large discrepancies in scale.
Key Consideration: Use robust techniques to avoid outliers distorting your rankings.
4. Weighting Schemes - Assign weights to each stock based on the created signals.
Equal-Weighted: Each stock gets the same weight.
Value-Weighted: Larger stocks (based on market capitalization) get higher weights.
Rank-Weighted: Higher-ranked stocks (based on the signal) get higher weights.
Risk-Weighted: Allocate weights based on the inverse of volatility or risk.
Key Consideration: Ensure the chosen method aligns with your overall strategy, whether it's risk-adjusted, size-neutral, or factor-balanced.
5. Portfolio Construction - Create a long/short portfolio based on the factor signals.
Long Positions: Go long on stocks with the strongest signals (e.g., top decile for value or momentum).
Short Positions: Go short on stocks with the weakest signals (e.g., bottom decile).
Dollar-Neutral: Ensure the long and short portfolios are equal in dollar value for neutral market exposure.
Key Consideration: Incorporate transaction cost analysis and liquidity constraints when constructing the portfolio.
Conclusion
Understanding the theoretical underpinnings of factor investing enhances our ability to develop robust strategies. By leveraging the QuantJourney Framework and the multitude of pre-coded factors, we can implement sophisticated models that stand up to rigorous testing.