Market Regimes - Further enhancing our Backtesting Framework

Apr 24, 2024

∙ Paid

In this post you will read about:

How to code regime identification based on numerous parameters and indicators?
What are the key steps in applying machine learning to identify market regimes effectively?
Explore advanced techniques for enhancing market regime identification, including Hidden Markov Models (HMMs), feature selection, dimensionality reduction, model selection, hyperparameter tuning, and model interpretation using SHAP values.

I have been actively developing our Backtesting Framework, and recently, have integrated a powerful regime identification module into the framework. By accurately identifying market regimes, investors can make informed decisions and adapt their strategies, accordingly, potentially enhancing their overall performance. The code for the Backtesting Framework, including the regime identification module, is available exclusively to paid subscribers of the blog.

Market regimes represent periods where the market behaves consistently, which could be bullish, bearish, or sideways. Therefore, understanding the current market behavior and how to trade in diverse market regimes is as important, if not more so, as deciding what to trade. These regimes, which may last from a few days to a few months, exhibit different characteristics across various trading environments.

Bull Volatile
Bull Quiet
Neutral
Bear Quiet
Bear Volatile

However, one can create more regimes, also depending on other factors like macro and fundamentals. Each regime is a measurement of the direction of travel of the underlying asset: Bullish, Bearish, or Neutral.

Let’s look at a couple of ways to categorize market regimes:

Macro Economics — This is one of the most frequently discussed aspects in business news, encompassing interest rates, unemployment, GDP growth, PMI, etc. There are numerous market categories and forecasts from this viewpoint e.g. inflationary, deflationary, stagflation, growth, recession, etc., which can provide valuable insights into the overall economic health and potential market direction.

Low Volatility / High Volatility / Trending / Mean Reversion — One effective method for categorizing market regimes involves using indicators like the System Quality Number (SQN) and Average True Range (ATR). These indicators, derived from actual price direction or percentage changes in price, allow for comparisons against other periods to identify repetitive characteristics. By analyzing these indicators, traders can gain a better understanding of the current market conditions and adapt their strategies accordingly.

Fundamentals — Fundamental analysis involves using ratios such as P/E (Price-to-Earnings), growth rates, performance of management teams, industry sectors, and other traditional financial analysis tools to determine whether a company is undervalued, in a growth phase, nearing bankruptcy, or experiencing other significant events. These factors can help identify specific market regimes, as the performance of individual companies and sectors can influence overall market sentiment. However, it's important to note that these conditions are not persistent forever and can change over time, leading to shifts in market regimes.

The full code is available on private GitHub for paid subscribers. Please subscribe to support my work and further development of the complete Trading Framework for investors.

In first approach I have used the benchmark behavior to assess the regime with MarketRegime class:

class MarketRegime:
	def __init__(self, config: Dict[str, Any]):
		"""
		Initializes the MarketRegime detector with a list of indicators.
		
		Args:
			config (dict): Configuration options for the market regime.
		"""
		self.config = config
		self.indicators = config.get('indicators', ['moving_average_slope', 'breadth_thrust'])
		self.moving_average_slope = config.get('moving_average_slope', 20)
		self.breadth_thrust = config.get('breadth_thrust', 0.5)
		self.regime_data = None
		self.benchmark = None

def identify_regime(self) -> pd.Series:
		"""
		Identifies the market regime based on the values of specified indicators and the benchmark for each date in the trading range.
		
		Returns:
			A pandas Series with the identified market regime ('bull', 'bear', or 'sideways') for each date.
		"""
		benchmark_data = self.regime_data[self.benchmark]
		
		market_regime_data = pd.Series(index=benchmark_data.index, dtype='object')
		
		for date in benchmark_data.index:
			indicator_values = {}
			for indicator in self.indicators:
				indicator_values[indicator] = self.compute_indicator(indicator, benchmark_data.loc[:date])
			
			# Determine the market regime based on the indicator values
			if indicator_values.get('moving_average_slope', 0) > 0 and indicator_values.get('breadth_thrust', 0) > 0.5:
				regime = 'bull'
			elif indicator_values.get('moving_average_slope', 0) < 0 and indicator_values.get('breadth_thrust', 0) < -0.5:
				regime = 'bear'
			else:
				regime = 'sideways'
			
			market_regime_data.loc[date] = regime
		
		return market_regime_data

	def compute_indicator(self, indicator, regime_data) -> float:
		"""
		Computes the specified indicator based on the market data and configuration options.

		Args:
			indicator (str): String representing the indicator to compute.
			regime_data (pd.DataFrame): pandas DataFrame with market data.

		Returns:
			float: The computed indicator value.
		"""
		if indicator == 'moving_average_slope':
			window = self.config.get('moving_average_period', 20)
			return self.compute_moving_average_slope(regime_data, window)
		elif indicator == 'breadth_thrust':
			threshold = self.config.get('breadth_threshold', 0.5)
			return self.compute_breadth_thrust(regime_data, threshold)
		# Implement other indicators as needed
		else:
			raise ValueError(f"Unknown indicator: {indicator}")

	def compute_moving_average_slope(self, regime_data, window) -> float:
		"""
		Computes the slope of the moving average over a given window.

		Args:
			regime_data (pd.DataFrame): pandas DataFrame with market data.
			window (int): The window size for the moving average.

		Returns:
			float: The slope of the moving average.
		"""
		close_prices = regime_data['Close']
		ma = close_prices.rolling(window=window).mean()
		
		# Check if there are enough data points for the slope calculation
		if len(ma) < window:
			return 0.0  # Return 0.0 if there are not enough data points
		
		ma_slope = (ma.iloc[-1] - ma.iloc[-window]) / window
		return ma_slope

	def compute_breadth_thrust(self, regime_data, threshold) -> float:
		"""
		Computes the breadth thrust indicator.

		Args:
			regime_data (pd.DataFrame): pandas DataFrame with market data.
			threshold (float): The threshold value for the breadth thrust calculation.

		Returns:
			float: The breadth thrust indicator value.
		"""
		close_prices = regime_data['Close']
		
		# Check if there are at least two data points for the breadth thrust calculation
		if len(close_prices) < 2:
			return 0.0  # Return 0.0 if there are not enough data points
		
		advancing_stocks = (close_prices.iloc[-1] > close_prices.iloc[-2]).sum()
		declining_stocks = (close_prices.iloc[-1] < close_prices.iloc[-2]).sum()
		total_stocks = advancing_stocks + declining_stocks
		breadth_thrust = advancing_stocks / total_stocks if total_stocks > 0 else 0

		if breadth_thrust > threshold:
			return breadth_thrust
		else:
			return -breadth_thrust

So, the MarketRegime class is responsible for identifying the current market regime based on a set of specified indicators. The identify_regime() method is the core of this class, which determines the market regime for each date in the trading range.

First approach

The method starts by extracting the benchmark data from the regime_data attribute using the specified benchmark. It then initializes a pandas Series called market_regime_data to store the identified market regime for each date.

Next, it iterates over each date in the benchmark data index. For each date, it computes the values of the specified indicators using the compute_indicator() method. The indicator values are stored in the indicator_values dictionary.

Based on the computed indicator values, the method determines the market regime.

The first - and simple approach is to assess if the moving_average_slope is positive and the breadth_thrust is greater than 0.5, which classifies the regime as 'bull'. And respectively if both are less than -0.5 the regime is ‘bear’, otherwise it is ‘sideways’.

The compute_moving_average_slope() method calculates the slope of the moving average over a given window. It computes the moving average of the close prices using the specified window size and then calculates the slope by comparing the last value of the moving average with the value at the start of the window.

The compute_breadth_thrust() method calculates the breadth thrust indicator. It compares the current close price with the previous close price for each stock and determines the number of advancing and declining stocks. The breadth thrust is calculated as the ratio of advancing stocks to the total number of stocks. If the breadth thrust is above the specified threshold, it returns the breadth thrust value; otherwise, it returns the negative of the breadth thrust value.

The approach to identify market regimes based on indicators is quite a commonly used technique, hence the results are less than mediocre in most cases.

Second approach

In this one I have built identify_regime() method based on the assessment of the SQN (System Quality Number) and ATR (Average True Range) indicators.

The SQN measures the average percentage change from close to close of the previous 100 days (or a specified window) and then takes the square root of the result. This quantifies whether the market is bullish or bearish. If the average change is positive over the past 100 trading days, it indicates a bullish market, and if the average change is negative, it indicates a bearish market. As the percentage change increases, the market becomes more volatile, and as it decreases, the market becomes less volatile. This provides a quantified methodology to measure market conditions.

def calculate_sqn(price_data, window=100):
    """
    Calculates the System Quality Number (SQN) for a given price series.
    
    Args:
        price_data (pd.Series): The price series data.
        window (int): The window size for calculating the SQN (default: 100).
    
    Returns:
        float: The calculated SQN value.
    """
    pct_change = price_data.pct_change()
    rolling_pct_change = pct_change.rolling(window=window).mean()
    sqn = np.sqrt(rolling_pct_change.iloc[-1])
    return sqn

def calculate_atr(high_data, low_data, close_data, window=14):
    """
    Calculates the Average True Range (ATR) for a given price series.
    
    Args:
        high_data (pd.Series): The high price series data.
        low_data (pd.Series): The low price series data.
        close_data (pd.Series): The close price series data.
        window (int): The window size for calculating the ATR (default: 14).
    
    Returns:
        pd.Series: The calculated ATR series.
    """
    tr = pd.concat([high_data - low_data, 
                    abs(high_data - close_data.shift(1)), 
                    abs(low_data - close_data.shift(1))], axis=1).max(axis=1)
    atr = tr.rolling(window=window).mean()
    return atr

def identify_regime(price_data, sqn_threshold=1.5, atr_threshold=2.0):
    """
    Identifies the market regime based on SQN and ATR values.
    
    Args:
        price_data (pd.DataFrame): The price data containing 'High', 'Low', and 'Close' columns.
        sqn_threshold (float): The threshold for SQN to determine the regime (default: 1.5).
        atr_threshold (float): The threshold for ATR to determine the regime (default: 2.0).
    
    Returns:
        str: The identified market regime ('Bullish', 'Bearish', or 'Sideways').
    """
    sqn = calculate_sqn(price_data['Close'])
    atr = calculate_atr(price_data['High'], price_data['Low'], price_data['Close'])
    
    if sqn > sqn_threshold and atr.iloc[-1] > atr_threshold:
        regime = 'Bullish'
    elif sqn < -sqn_threshold and atr.iloc[-1] > atr_threshold:
        regime = 'Bearish'
    else:
        regime = 'Sideways'
    
    return regime

The calculate_sqn() function calculates the SQN by computing the percentage change of the price data, calculating the rolling mean of the percentage change over the specified window size, and taking the square root of the last value of the rolling mean.

The calculate_atr() function calculates the Average True Range (ATR) for a given price series. The ATR is a measure of market volatility and is used in conjunction with the SQN to determine the market regime.

The identify_regime() function combines the SQN and ATR values to classify the market regime as 'Bullish', 'Bearish', or 'Sideways' based on specified thresholds.

However, it's important to note that the SQN is a trailing indicator, meaning it relies on historical data and may not predict future market movements with high accuracy. It should not be considered a "holy grail" or highly fragile indicator that will accurately predict significant market moves.

Third approach

In this approach, we'll explore how to improve market regime identification using various machine learning techniques. Primarily, we'll use classes already written in the Backtesting Framework to identify the regime. However, as for this text, I will not go into many limitations or challenges such as the importance of having sufficient and representative training data, the risk of overfitting, or the necessity for ongoing model monitoring and updates.

Understanding Market Regimes with Hidden Markov Models:

Hidden Markov Models (HMMs) have emerged as a powerful tool for modeling financial markets, particularly in the context of regime identification. HMMs are probabilistic models that assume the existence of hidden states, which represent different market regimes, and observable emissions, which are the actual market data. The true market regime is not directly observable but can be inferred from the available data using HMMs.

The Hidden Markov model is a stochastic process with an underlying stochastic process that is non-observable. The Hidden Markov model is from the family of Markov models and inherits the properties from a Markov process, where future states depend only on the current state. Define a Markov process y_t taking many finite states N that is not observable. Given observable data x_t exists then, y_t is the corresponding hidden state. For all x_t, we infer the corresponding hidden states y_t. The HMM is unsupervised machine learning algorithm, meaning the at each time point, a data point is observed, which depends on the current state.

As HMMs capture the latent structure and transition dynamics between different market states, they make them well-suited for regime analysis. By learning the transition probabilities between regimes and the emission probabilities of each regime, HMMs can identify the sequence of market regimes based on the observed data. This allows for a more accurate representation of market dynamics and provides valuable insights for decision-making.

Step 1: Feature Selection

The first step in enhancing market regime identification is to select the most relevant features from the available market data. Feature selection plays a crucial role in reducing dimensionality, eliminating noise, and focusing on the most informative variables. In the code provided, the FeatureSelector class offers several methods for feature selection

class FeatureSelector:
    """
    A class to perform feature selection based on importance, correlation, and regression coefficients.
    Implements methods to select important features using Random Forest, Lasso, and correlation.
    """
    def __init__(self, data: pd.DataFrame, target: str):
        """
        Initializes the FeatureSelector with the input data and target variable.
        
        Args:
            data (pd.DataFrame): The input data containing features and target variable.
            target (str): The name of the target variable.
        """
        self.data = data
        self.target = target
        self.features = data.drop(columns=[target]).columns

    def select_features_by_importance(self, model, threshold: float = 0.1) -> List[str]:
        """
        Select important features based on feature importance scores from a model.
        This method selects features with importance scores above the threshold, if provided.
        
        Example:
            selector = FeatureSelector(data, target)
            model = RandomForestClassifier(n_estimators=100, random_state=42)
            important_features = selector.select_features_by_importance(model, threshold=0.1)
            
        Types of selectors:
            - RandomForest: Ensemble model that calculates feature importance based on decision trees.
            - XGBoost: Gradient boosting model that calculates feature importance based on decision trees.
            - LightGBM: Gradient boosting model that calculates feature importance based on decision trees.
            
        Args:
            model: The model used to calculate feature importances.
            threshold: The threshold value to select important features.
            
)       Returns:
            important_features: A list of important features selected based on the threshold.
        """
        try:
            model.fit(self.data[self.features], self.data[self.target])
            importances = model.feature_importances_
            important_features = [feature for feature, importance in zip(self.features, importances) if importance > threshold]
            logger.info(f"Selected {len(important_features)} important features by importance.")
            return important_features
        except Exception as e:
            logger.error(f"Error in selecting features by importance: {e}")
            raise

    def select_features_by_correlation(self, threshold: float = 0.8) -> List[str]:
        """
        Select important features based on correlation coefficients, which are above the threshold.
        This method removes one of the highly correlated features, keeping only one.
        
        Example:
            selector = FeatureSelector(data, target)
            important_features = selector.select_features_by_correlation(threshold=0.8)
            
        Types of selectors:
            - Pearson: Linear correlation coefficient between two variables.
            - Spearman: Nonlinear rank correlation coefficient between two variables.
            - Kendall: Rank correlation coefficient based on the number of concordant and discordant pairs.
        
        Args:
            threshold: The correlation threshold value to select important features.
        
        Returns:
            important_features: A list of important features selected based on the correlation threshold.
        """
        try:
            corr_matrix = self.data[self.features].corr()
            upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
            to_drop = [column for column in upper.columns if any(upper[column] > threshold)]
            important_features = [feature for feature in self.features if feature not in to_drop]
            logger.info(f"Selected {len(important_features)} important features by correlation.")
            return important_features
        except Exception as e:
            logger.error(f"Error in selecting features by correlation: {e}")
            raise

    def select_features_by_regression(self, model, threshold: float = 0.1) -> List[str]:
        """
        Select important features based on regression coefficients from a linear model.
        This method selects features with coefficients above the threshold, if provided.
        
        Example:
            selector = FeatureSelector(data, target)
            model = Lasso(alpha=0.1)
            important_features = selector.select_features_by_regression(model, threshold=0.1)
            
        Types of Regression:
            - Lasso: L1 regularization that can lead to sparsity in coefficients.
            - Ridge: L2 regularization that can shrink coefficients.
            - ElasticNet: Combination of L1 and L2 regularization.
            
        Args:
            model: The linear regression model used to calculate coefficients.
            threshold: The threshold value to select important features.
        
        Returns:
            important_features: A list of important features selected based on the threshold.
        """
        try:
            model.fit(self.data[self.features], self.data[self.target])
            coef = model.coef_
            important_features = [feature for feature, coef in zip(self.features, coef) if abs(coef) > threshold]
            logger.info(f"Selected {len(important_features)} important features by regression.")
            return important_features
        except Exception as e:
            logger.error(f"Error in selecting features by regression: {e}")
            raise

All those classes and methods are added into Backtesting Framework and will be soon shared with paid subscribers.
If you’re interested or wish to participate in the journey of building our Backtesting Framework please do subscribe. Thank you.

And deliver methods for:

Select features by importance - utilizes a Random Forest classifier to rank features based on their importance scores. Features with important scores above a specified threshold are selected.
Select features by correlation - identifies highly correlated features and removes one of them, keeping only one feature from each correlated pair. This helps in reducing redundancy and multicollinearity.
Select features by regression - employs linear regression models, such as Lasso or Ridge, to select features based on their regression coefficients. Features with coefficients above a specified threshold are considered important.

By applying these feature selection techniques, we can identify the most relevant variables that contribute to distinguishing between different market regimes. For example, in the unit test FEATURE_SELECTION, we demonstrate how to use the FeatureSelector class to select important features from a dataset containing price and volume data for the SPY ETF.

Step 2: Dimensionality Reduction

After selecting the most relevant features, the next step is to reduce the dimensionality of the data while preserving the essential information. Dimensionality reduction techniques help in capturing the underlying structure of the data and improving computational efficiency. The DimensionalityReducer class in the code provides several methods for dimensionality reduction:

Principal Component Analysis (PCA) - identifies the principal components that explain most of the variance in the data. By selecting a subset of the top principal components, we can effectively reduce the dimensionality while retaining the most significant information.
Independent Component Analysis (ICA) - separates the data into independent components, which can be useful for identifying hidden factors driving market regimes.
Autoencoders are neural networks that learn a compressed representation of the input data. By training an autoencoder to reconstruct the original data from the compressed representation, we can obtain a lower-dimensional representation that captures the essential features. However, the results are not satisfactory, it’s worth playing with them a bit longer - we will do a bit more in the following weeks on that.

Quant Journey

Market Regimes - Further enhancing our Backtesting Framework

This post is for paid subscribers