Most quantitative traders evaluate their signals using simple correlation with future returns. This approach has a fundamental flaw: it doesn't distinguish between genuine predictive power and exposure to known risk factors. A signal might show 15% correlation with returns, but if that correlation comes entirely from loading on small-cap stocks during a small-cap rally, you haven't discovered alpha - you've rediscovered the size factor.
Numerai, the hedge fund that crowdsources predictions from data scientists, recently released their Alpha scoring framework. While designed for their tournament, the methodology offers valuable lessons for any systematic trader. They've essentially open-sourced a institutional-grade signal evaluation system, complete with factor models, liquidity adjustments, and risk controls.
What Numerai Built (And Released)
Numerai provides three key datasets that make their Alpha framework reproducible:
Neutralization Matrix (N): 200 factors used to remove systematic exposures
Sample Weights Vector (v): Liquidity and volatility adjustments for each stock
Residual Target: Returns with all factor exposure removed
This is quite uncommon transparency from a hedge fund - so I decided to spend some time on this and write this post. They're essentially sharing their factor model and risk framework with the community.
The Three-Step Framework
Step 1: Factor Neutralization - Separating Alpha from Beta
What it is: Factor neutralization removes systematic exposures from your signal. Instead of predicting raw returns, you predict the component of returns unexplained by known factors.
Why we need it: Raw correlation captures everything - alpha, factor exposures, and noise. A signal that just loads on value stocks during a value rally isn't providing unique insight. Factor neutralization isolates the truly predictive component.
The mathematics:
neutral_signal = signal - (N @ (N.T @ (weighted_signal)))
This projects your signal onto the factor space and subtracts that projection, leaving only the orthogonal component (this is what we almost always like).
Common factors to neutralize:
Size: Market capitalization effects
Value: Book-to-market, P/E ratios
Momentum: Recent price trends
Quality: ROE, debt ratios, earnings stability
Volatility: Historical price volatility
Sector: Industry exposures
Practical implementation:
from sklearn.linear_model import LinearRegression
def neutralize_signal(signal, factors):
"""
Remove factor exposures from signal
"""
# Fit factor model
reg = LinearRegression().fit(factors, signal)
# Remove factor component
factor_component = factors @ reg.coef_
neutral_signal = signal - factor_component
return neutral_signal, reg.coef_ # Return loadings for analysis
Value creation: This step transforms your signal from a factor-contaminated predictor to pure alpha. The difference can be dramatic - signals with 20% correlation to raw returns might have only 5% correlation to factor-neutral returns, but that 5% is tradeable alpha.
Step 2: Liquidity Weighting - Making Signals Tradeable
What it is: Liquidity weighting adjusts signal strength based on how much capital can actually be deployed in each stock. High-conviction predictions on illiquid stocks get downweighted.
Why we need it: Academic backtests often ignore market impact. A signal might work perfectly on paper but fail when you try to trade $100M (or even $1M or less in small-caps) because your trades move prices. Liquidity weighting makes signals realistic for institutional scale.
Where liquidity data comes from:
Average Daily Volume (ADV): Shares traded per day
Average Dollar Volume: ADV × Price
Bid-ask spreads: Transaction cost proxy
Market impact models: Price movement per dollar traded
The mathematics:
def calculate_liquidity_weights(prices, volumes, method='sqrt_adv'):
"""
Calculate liquidity-based weights
"""
adv_dollars = volumes * prices
if method == 'sqrt_adv':
# Square root rule for market impact
weights = np.sqrt(adv_dollars / np.median(adv_dollars))
elif method == 'linear_adv':
# Linear scaling
weights = adv_dollars / np.median(adv_dollars)
# Cap extreme weights
weights = np.clip(weights, 0.1, 5.0)
return weights
Practical considerations:
Market impact: Scales roughly as square root of volume
Capacity constraints: Limit position sizes to fraction of ADV
Cross-sectional effects: Compare liquidity within universe, not absolute levels
Value creation: This step ensures your signal remains profitable at scale. Many academic strategies break down when applied with real money because they rely on illiquid stocks. Liquidity weighting identifies strategies that work for institutional capital.
Step 3: Volatility Adjustment - Managing Idiosyncratic Risk
What it is: Volatility adjustment reduces weight on stocks with high idiosyncratic (stock-specific) volatility. Stocks that jump around randomly get smaller positions even if the signal is strong.
Why we do this: Idiosyncratic volatility represents unpredictable, uncompensated risk. Two stocks might have the same expected alpha, but the low-volatility stock is more attractive because it delivers that alpha more consistently.
The mathematics:
def calculate_risk_weights(returns, factors, lookback=60):
"""
Calculate volatility-based risk weights
"""
# Estimate idiosyncratic volatility
residuals = returns - (factors @ estimate_factor_loadings(returns, factors))
idiosyncratic_vol = np.std(residuals[-lookback:], axis=0)
# Inverse volatility weighting
risk_weights = 1 / np.sqrt(idiosyncratic_vol / np.median(idiosyncratic_vol))
return np.clip(risk_weights, 0.2, 3.0)
Types of volatility to consider:
Total volatility: All price movements
Systematic volatility: Factor-driven movements
Idiosyncratic volatility: Stock-specific noise (what we want to avoid)
Value creation: This step improves risk-adjusted returns. By avoiding high-volatility stocks, you reduce portfolio turnover and drawdowns while maintaining similar gross returns. The result is higher Sharpe ratios and more stable performance.
Complete Implementation
Here's how the three steps work together:
def risk_aware_score(signal, returns, factors, volumes, prices):
"""
Complete risk-aware signal evaluation
"""
# Step 1: Factor neutralization
neutral_signal = neutralize_signal(signal, factors)
# Step 2: Liquidity weighting
liquidity_weights = calculate_liquidity_weights(prices, volumes)
# Step 3: Volatility adjustment
risk_weights = calculate_risk_weights(returns, factors)
# Combine all adjustments
combined_weights = liquidity_weights * risk_weights
final_signal = neutral_signal * combined_weights
# Normalize to portfolio weights
portfolio_weights = final_signal / np.sum(np.abs(final_signal))
# Score against future returns
return np.dot(portfolio_weights, returns)
Key Takeaways
1. Correlation isn't alpha: Raw correlation with returns conflates predictive power with factor exposure. Always neutralize known factors first.
2. Liquidity constraints matter: Academic backtests often ignore market impact. Weight your signals by tradeable capacity from day one.
3. Volatility is uncompensated risk: High idiosyncratic volatility destroys risk-adjusted returns. Prefer consistent performers over volatile ones.
4. Test your framework locally: Unlike correlation, which requires out-of-sample data, you can optimize for risk-adjusted scores during model training.
5. Think like an institution: These adjustments reflect how real hedge funds evaluate strategies. Your signals become more valuable when they account for real-world constraints.
Meta Portfolio Contribution: Rewarding Diversity
Alongside Alpha, Numerai introduced Meta Portfolio Contribution (MPC) - a metric that rewards signals for their unique contribution to the overall tournament portfolio. This addresses a critical problem in crowdsourced alpha: strategy convergence.
The crowding problem: When multiple participants discover the same market inefficiency, their combined trades eliminate the opportunity. MPC incentives participants to find orthogonal sources of alpha rather than crowding into the same trades.
How MPC works: It evaluates how much a signal improves the Alpha of the Stake-Weighted Portfolio (SWP) - essentially measuring the marginal contribution of your signal to the ensemble.
MPC concept applies to our multi-strategy portfolios. When combining our own signals, we should prioritize those that are orthogonal to our existing strategies, to get more alpha.
How we apply this in QuantJourney
By implementing this framework, we’re not just improving our metric - but fundamentally changing how to think about signal generation:
Better signal discovery: Training models to maximize risk-adjusted scores leads to more robust predictors that work across market regimes.
Realistic backtests: Your historical performance becomes a better estimate of live trading results because you account for execution constraints.
Institutional quality: Your strategies become viable for larger capital bases, making them more valuable and scalable.
Risk management: Built-in risk controls reduce drawdowns and improve consistency, making your strategies more investable.
Competitive advantage: Most retail quants still use simple correlation metrics. This framework puts you closer to institutional standards.
Beyond the Basics - Conclusion
Once we added this to our basic framework, we may think these extensions:
Dynamic factor models: Update factor loadings over time as market structure changes
Transaction cost modeling: Include bid-ask spreads and market impact in your scoring (we made it recently for crypto with our libraries qj_bid_ask)
Regime-aware weighting: Adjust liquidity and volatility estimates based on market conditions (here is what we made also for crypto)
Multi-timeframe signals: Apply the framework across different prediction horizons
You can check their implementation at https://github.com/numerai/numerai-tools/blob/master/numerai_tools/scoring.py