How to make more profit from Crypto - smart crypto bots tuning with Optuna
The 5x improvement in Sharpe ratio (0.27 → 1.39) while maintaining similar drawdown - interested how to do it?
TL;DR: In June Alex trained 20+ people to build 24/7 crypto bots with great feedback. Meanwhile, I spent that time probing different crypto strategies and trying various optimisations for our bots. Here I summarise what works and show how a unimodality-first workflow plus purged walk-forward with Optuna can significantly improve our risk-adjusted returns without curve-fitting - what to tune, how to evaluate, and where automation actually helps.
👀 We’ll be covering this in our next Crypto Bot Training later this month. Save your seat here: Crypto Bot Training
Full code from this post: https://github.com/QuantJourneyOrg/qj_public_code/blob/main/optuna_crypto_bot.py
What are we even optimising?
Optimising a crypto bot means fine-tuning the various rules that decide when and how it trades. These rules are hyperparameters - settings you choose before running the bot (unlike learned parameters in ML). The goal is to select hyperparameter values that lead to the best possible performance in live trading.
Here's a breakdown of the main categories of hyperparameters in a typical crypto bot:
Signal parameters: These control how the bot generates buy or sell signals, e.g. indicator windows like EMA(12) for a short-term exponential moving average or EMA(26) for a longer one in a MACD strategy; RSI thresholds (e.g., buy below 30, sell above 70) to detect overbought/oversold conditions; volatility lookbacks (e.g., 14 periods for ATR to measure market swings); or regime filters (e.g., using ADX to distinguish trending vs. ranging markets).
Execution/risk: These manage how trades are entered, sized, and exited to control risk. Examples include position size (e.g., risking 1% of capital per trade), stop loss / take profit levels (e.g., 2% below entry for SL, 5% above for TP), trailing stops (e.g., moving SL up as price rises), or signal confirmation bars (e.g., wait 2 bars after a signal to confirm). They are crucial in crypto's volatile environment, where a single bad trade can wipe out gains.
Market plumbing: These handle the operational side, like which assets to trade (pair universe, e.g., only BTC/USDT or a basket of top 10 altcoins), the timeframe (e.g., 5-minute vs. daily charts), fee/slippage models (accounting for exchange costs like 0.1% fees plus price impact), or minimum trade count requirements (to ensure statistical significance). Ignoring these can make backtests look great but fail in reality due to hidden costs.
So, to make better strategy, we may optimise these hyperparameters by testing their different combinations and selecting those which maximise a target metric - a quantifiable measure of success, e.g.:
Out-of-sample Sharpe ratio (risk-adjusted return, calculated as (average return - risk-free rate) / volatility; higher is better, aiming for >1 in crypto).
CAGR (compound annual growth rate) with a drawdown constraint (e.g., maximise growth while keeping max drawdown <20%).
P&L per unit of turnover (profit/loss divided by trading volume, to favour efficient strategies) - this is what you usually check at the end of each day.
Or a custom score blending multiple factors, like Sharpe minus a penalty for high turnover.
The key is that optimisation isn't about maximising in-sample performance (which leads to curve-fitting); it's about finding robust settings that generalise to unseen data and thus in real trading.
What a good evaluation must include (for crypto/time-series)
Years back, I was thinking that evaluating hyperparameter choices is just about running multiple backtests and picking the one with the highest score - but it’s not. Especially in crypto, where markets are noisy, non-stationary (trends change over time), and prone to black swans like flash crashes we have to get another approach. A solid evaluation framework prevents fooling yourself with overly optimistic results.
Here's what it must incorporate:
Point-in-time data (no peeking at future candles or revised history): Always simulate as if trading in real-time, using only data available up to the decision moment. Peeking ahead (data leakage) inflates results unrealistically e.g., don't use future prices to calculate indicators.
Fees + slippage modeled realistically (maker/taker, spread, partial fills): Crypto exchanges charge fees (e.g., 0.075% on Binance futures), and slippage occurs due to bid-ask spreads or market impact (worse in low-liquidity pairs). Include these to avoid strategies that look profitable but get eaten by costs; model partial fills for large orders.
Walk-forward or purged CV (never random K-Fold): Time-series data has temporal dependencies, so random cross-validation mixes past and future, causing leakage. Use walk-forward optimisation (train on past, test on future, roll forward) or purged cross-validation (remove overlapping data points) to mimic real deployment.
Enough trades (avoid “lucky” results with 5 trades in a quarter): Require a minimum number of trades (e.g., 30+ per test period) for statistical reliability. Few trades can yield high scores by chance, but they're not robust - crypto bots need volume to average out noise.
Stability across regimes (chop, trend, high vol, low vol): Crypto markets cycle through phases (e.g., bull trends, sideways chop, high-vol crashes). Test performance in diverse conditions; a bot that shines in trends but bleeds in chop isn't stable. Use regime detection or stratified testing to ensure adaptability.
Before jumping into advanced optimisation tools like Optuna, it's key to grasp foundational concepts like unimodality in hyperparameters. Unimodality describes a situation where the performance metric (e.g., Sharpe ratio) improves as you adjust a parameter in one direction, reaches a single peak, and then declines as you continue adjusting it further. This is common in trading bots because parameters often have a "sweet spot" - too low or too high leads to suboptimal results, like overfitting or missing signals.
We cover this upfront because recognising unimodality allows you to use simpler, more efficient methods first, saving computational resources, reducing overfitting risks, and building intuition about your bot's behaviour before resorting to heavier tools. This approach aligns with practical bot development, where compute is limited and understanding "why" a parameter works matters more than blind automation.
Before Optuna: exploit “unimodality” and simple search
Many hyperparameters (or "knobs") in crypto bots exhibit unimodality: when you plot performance against the parameter value, it forms a hill shape - increasing the value boosts results up to a peak, then they drop off. Examples include regularisation strength (too little causes overfitting, too much underfits), lookback lengths (too short is noisy, too long misses recent changes), or thresholds (too low triggers too many trades, too high too few). Exploiting this property lets you optimise efficiently without complex algorithms.
This optimization history plot visualises the objective value (e.g., a custom Sharpe-drawdown score) across sequential trials during a simple search like coordinate descent or sweeps. Blue dots represent each trial's performance, while the orange line tracks the cumulative best value, showing how the optimizer quickly converges to a plateau before occasional improvements. In crypto bot context, this helps spot if your unimodal methods are efficient - early stagnation might indicate weak parameter interactions, while late jumps (as seen here around trial 2000) suggest exploring more trials or advanced tools like Optuna. It's a key diagnostic to avoid over-optimizing noisy data.
This hyperparameter importances bar chart quantifies each parameter's relative impact on the objective value, computed via methods like fANOVA in Optuna (or approximations in simple sweeps). For crypto bots, this reveals priorities - focus tuning on high-importance params like exits (SL/TP) to handle volatility, while low ones like cooldown can use defaults to save compute. Use it post-optimisation to interpret "why" certain configs work, reducing overfitting risks.
Finally, this slice plot displays how the objective value varies across the range of each hyperparameter, with dots colored by trial number (darker blue for later trials) to show exploration over time. Columns represent individual params (e.g., ema_fast from 10-40, sl_pct log-scaled ~0.003-0.03), revealing unimodal "sweet spots" like higher objective clusters around ema_slow 50-100 or tp_pct ~0.01-0.05. In bot development, it builds intuition on parameter sensitivity—dense high-value areas indicate robust ranges for live trading, while sparse/low ones warn of fragility in crypto regimes. Compare with importances to prioritize refinements.
For unimodal knobs, start simple:
1-D sweeps (coarse → refine): Test a grid of values along one dimension (e.g., EMA slow from 5 to 50 in steps of 5), plot the results, and zoom in on the peak area. This is fast, visual, and builds intuition - e.g., see how Sharpe rises then falls.
Ternary / golden-section search: These are efficient algorithms to find the peak with fewer evaluations (logarithmic complexity, O(log N)). Ternary divides the search space into thirds and discards the worse part iteratively; golden-section uses the golden ratio for similar efficiency. Ideal for continuous parameters like thresholds.
Coordinate descent (optimise one knob at a time, cycle a few times): Fix all but one parameter, optimise that one (via sweep or search), then move to the next. Repeat 2-3 cycles. It's "good enough" for weakly interacting parameters and much faster than full grid search.
Reserve heavy auto-tuning (like Optuna) for cases with strong interactions between knobs, discrete choices (e.g., on/off flags), or multi-objective trade-offs (e.g., balancing return and risk). Starting simple prevents wasting compute on problems that don't need sophistication.
Let’s check when Optuna shines
Once you’ve squeezed the low-hanging fruit with unimodal sweeps (1-D grids, ternary/golden search, light coordinate descent), it’s time for Optuna. Optuna is an open-source hyperparameter optimizer that explores your search space intelligently - faster and usually better than grid/random search - without needing massive compute. It’s framework-agnostic (Keras, scikit-learn, PyTorch, custom code) and uses a define-by-run API, so you can construct dynamic spaces (e.g., conditional branches per strategy) with minimal boilerplate.
Optuna automatically finds optimal hyperparameter values based on an optimisation target. It is mainly designed for machine learning but can be used on non-ML task as long as you can define an objective function.
How Optuna Works (in 3 steps)
You define an objective - a function that returns a scalar score (higher is better).
Optuna suggests parameters - e.g., EMA windows, SL/TP, cooldown.
It iterates - learning from results to propose better trials next (unlike grid/random which keep guessing blindly).
That’s it. No rocket science—just smarter search.
Why Optuna is great for bots
Smart samplers (TPE, multivariate): Handle mixed spaces (integers like EMA periods, floats like thresholds, categorical like strategy types). TPE models "good" vs. "bad" regions based on past trials, focusing on promising areas - great for high-dimensional bot params.
Pruning (stop bad trials early): During evaluation (e.g., across CV folds), if a config underperforms midway, Optuna kills it to save time. Saving time on long backtests.
Multi-objective search: Optimise multiple metrics at once, like maximising Sharpe while minimising drawdown or turnover. Finds Pareto fronts (e.g., maximize Sharpe, minimize drawdown) using NSGA-II.
Persistent storage: Store studies in SQLite/Postgres; resume, compare, and audit runs.
When to use it
Strong interactions (e.g., EMA windows × regime filter × SL/TP).
Conditional spaces (choose among strategies with different parameter sets).
Trade-offs matter (e.g., Sharpe ↑ while MDD ↓ and turnover ↓).
Optuna is the bridge between manual tuning and full automation, giving you robust configs without over-engineering.
A minimal, realistic Optuna pipeline (crypto-ready)
We’ll optimise a simple EMA-crossover bot (long only): buy when the fast EMA crosses above the slow EMA, exit on reverse cross or SL/TP. This “minimal” pipeline is still realistic: it uses next-bar execution, bar-level performance, fees & slippage, and purged & embargoed walk-forward with Optuna.
See results first:
and:
Base EMA metrics: {'sharpe': 0.27, 'mdd': 0.26, 'trades': 189}
Tuned EMA metrics: {'sharpe': 1.39, 'mdd': 0.26, 'trades': 76}
Walk-forward OOS (stitched) metrics: {'sharpe': 0.61, 'mdd': 0.31, 'trades': 42}
Walk-forward OOS score: 0.45
1) Fetch data (Yahoo Finance)
Note: intraday history on YF is limited (e.g., 15m ~60 days). For a quick demo, use 1h or 1d. Switch SYMBOL/INTERVAL as needed.
import numpy as np
import pandas as pd
import yfinance as yf
from dataclasses import dataclass
SYMBOL = "BTC-USD" # e.g., "ETH-USD"
INTERVAL = "1h" # try "1d", "1h", or "15m"
PERIOD = "730d" # for 1h/1d it's fine; shorter for intraday
df = yf.download(SYMBOL, period=PERIOD, interval=INTERVAL,
auto_adjust=True, progress=False)
df = df.rename(columns={"Close": "close"}).dropna()
df.index = pd.to_datetime(df.index) # ensure DateTimeIndex
assert not df.empty, "No data returned — adjust SYMBOL/INTERVAL/PERIOD."
2) Bars-per-year helper (24/7 crypto)
def bars_per_year(interval: str) -> int:
if interval.endswith("m"): # minutes
m = int(interval[:-1]); return int((60/m)*24*365)
if interval.endswith("h"): # hours
h = int(interval[:-1]); return int((24/h)*365)
return 365 # "1d" default
BARS_PER_YEAR = bars_per_year(INTERVAL)
This helper function estimates how many data bars occur in a year given a string interval like “15m”, “1h”, or “1d”. If the interval ends with “m”, it treats it as minutes, computes how many bars fit into an hour, then into a day, then into a year. If it ends with “h”, it calculates bars per day based on hours, then multiplies by 365. If neither suffix matches, it assumes daily data and returns 365 bars per year (which is okay for crypto).
3) Bar-level EMA backtest (next-bar exec, fees, slippage)
We compute equity per bar, execute on the next bar, and apply a small multiplicative cost at entries/exits. Metrics use bar-level returns (Sharpe annualized with BARS_PER_YEAR
) and MDD on the equity curve.
Note: Signals are generated on bar t–1 and executed on bar t (next-bar execution) with fees and slippage; metrics are computed on bar-level returns and annualised by bars/year.
FEE_BPS = 6 # defaults; tune per exchange tier
SLIP_BPS = 2
PENALTY = 0.001 # unified soft penalty for too-few trades
MIN_TRADES = 30
@dataclass
class BTConfig:
ema_fast: int
ema_slow: int
sl_pct: float
tp_pct: float
fee_bps: float = FEE_BPS
slip_bps: float = SLIP_BPS
min_bars_between: int = 0
def _apply_cost(equity: float, fee_bps: float, slip_bps: float) -> float:
cost = (fee_bps + slip_bps) * 1e-4
return equity * (1.0 - cost)
def run_backtest(df: pd.DataFrame, cfg: BTConfig, bars_per_year: int):):
px = df["close"].to_numpy()
n = len(px)
if n < max(cfg.ema_fast, cfg.ema_slow) + 5:
return {"sharpe": 0.0, "mdd": 1.0, "trades": 0}
ema_fast = pd.Series(px).ewm(span=cfg.ema_fast, adjust=False).mean().to_numpy()
ema_slow = pd.Series(px).ewm(span=cfg.ema_slow, adjust=False).mean().to_numpy()
fee = cfg.fee_bps * 1e-4
slip = cfg.slip_bps * 1e-4
equity, position = 1.0, 0
last_bar, entry_px = -10**9, np.nan
trades_completed = 0
eq_series = [equity]
start = max(cfg.ema_fast, cfg.ema_slow) + 1 # execute next bar
for i in range(start, n-1):
# signals computed on i-1 vs i-2
long_up = (ema_fast[i-2] <= ema_slow[i-2]) and (ema_fast[i-1] > ema_slow[i-1])
long_down = (ema_fast[i-2] >= ema_slow[i-2]) and (ema_fast[i-1] < ema_slow[i-1])
# SL/TP vs entry
if position == 1 and not np.isnan(entry_px):
r_from_entry = (px[i-1] - entry_px) / entry_px
stop = r_from_entry <= -cfg.sl_pct
take = r_from_entry >= cfg.tp_pct
else:
stop = take = False
exit_now = (position == 1) and (long_down or stop or take)
enter_now = (position == 0) and long_up and (i - last_bar) >= cfg.min_bars_between
# execute exits first
if exit_now:
equity = _apply_cost(equity, cfg.fee_bps, cfg.slip_bps)
position, last_bar, entry_px = 0, i, np.nan
trades_completed += 1
# then entries
if enter_now:
equity = _apply_cost(equity, cfg.fee_bps, cfg.slip_bps)
position, last_bar, entry_px = 1, i, px[i-1]
# bar PnL
r_bar = (px[i] / px[i-1] - 1.0) if position == 1 else 0.0
equity *= (1.0 + r_bar)
eq_series.append(equity)
eq = np.array(eq_series, float)
if len(eq) < 5 or trades_completed < 1:
return {"sharpe": 0.0, "mdd": 1.0, "trades": trades_completed}
r = eq[1:] / eq[:-1] - 1.0
mu = float(np.mean(r)); sd = float(np.std(r, ddof=1) or 1e-12)
sharpe = float(mu / sd * np.sqrt(bars_per_year))
roll_max = np.maximum.accumulate(eq)
mdd = float(1.0 - np.min(eq / roll_max))
return {"sharpe": sharpe, "mdd": mdd, "trades": int(trades_completed)}
This code defines a simple EMA-crossover backtesting function using a configuration dataclass BTConifg to store strategy parameters like EMA periods, stop-loss, take-profit, fees, slippage, and trade spacing. Then run_backtest computes two exponential moving averages (fast and slow) from the closing prices and uses crossover conditions to generate entry/exit signals.
When a long signal occurs and the strategy is flat, it opens a position, adjusting the entry price for slippage; when in a position, it exits if there’s an opposite crossover, stop-loss, or take-profit trigger. Each trade’s net return is calculated after deducting fees and slippage, then the list of trades is converted to performance metrics: Sharpe ratio (annualised per trade) and maximum drawdown. The function returns a dictionary with Sharpe ratio, max drawdown, and number of trades, returning default values if too few trades occurred.
4) Unimodality pre-step: 1-D sweep (fast) → ternary (slow)
Exploit unimodality first; it’s cheap and builds intuition.
def score_dict_to_scalar(m):
return m["sharpe"] - 0.5*m["mdd"] - PENALTY * max(0, MIN_TRADES - m["trades"])
def sweep_ema_fast(df, ema_slow=50, sl=0.03, tp=0.06, rng=range(5, 41), bpy=365*24):
scores = []
for fast in rng:
if fast >= ema_slow:
scores.append(-1e9); continue
m = run_backtest_ema_barlevel(df, BTConfig(fast, ema_slow, sl, tp),
bars_per_year=bpy)
scores.append(score_dict_to_scalar(m))
best_fast = rng[int(np.argmax(scores))]
return best_fast, scores
def ternary_search_int(df, name, lo, hi, base_cfg, iters=18, bpy=365*24):
def eval_with(v):
params = {**base_cfg.__dict__}
params[name] = int(round(v))
cfg = base_cfg.__class__(**params)
if cfg.ema_fast >= cfg.ema_slow: return -1e9
return score_dict_to_scalar(run_backtest_ema_barlevel(df, cfg, bars_per_year=bpy))
a, b = float(lo), float(hi)
for _ in range(iters):
m1, m2 = a + (b-a)/3, b - (b-a)/3
if eval_with(m1) < eval_with(m2): a = m1
else: b = m2
return int(round((a+b)/2))
This code performs a 1-D sweep over ema_fast values while holding ema_slow fixed at 50, using a temporary config. It loops through a range (5-40), runs the backtest for each, computes a custom score (Sharpe minus 0.5drawdown), and stores results. The best_fast is the value maximizing the score, demonstrating unimodality via potential plotting. This builds intuition quickly before refinement.
This function performs also a ternary search to find the optimal value of a single parameter in a backtest configuration. It defines an inner score function that creates a modified copy of base_cfg, with the parameter name, set to x. Then runs the backtest, and returns a combined objective (sharpe - 0.5mdd). The ternary search loop repeatedly splits the current search range into thirds, evaluates two midpoints (m1) and (m2), and discards the side with the lower score, narrowing the range. After a fixed number of iterations, it returns the midpoint of the final range as the best parameter value, rounded to an integer.*
Quick demo run:
base = BTConfig(
ema_fast=12, ema_slow=50, # starting guess
sl_pct=0.03, tp_pct=0.06, # 3% SL, 6% TP
fee_bps=6, slip_bps=2, # realistic fees/slippage
min_bars_between=5
)
print("Base EMA metrics:", run_backtest_ema_barlevel(df, base, BARS_PER_YEAR))
best_fast, _ = sweep_ema_fast(df, ema_slow=50, sl=0.03, tp=0.06, bpy=BARS_PER_YEAR)
tmp = BTConfig(best_fast, 50, 0.03, 0.06, base.fee_bps, base.slip_bps, base.min_bars_between)
best_slow = ternary_search_int(df, "ema_slow", lo=max(best_fast+5, 30), hi=160,
base_cfg=tmp, bpy=BARS_PER_YEAR)
cd_cfg = BTConfig(best_fast, best_slow, 0.03, 0.06, base.fee_bps, base.slip_bps, base.min_bars_between)
print("Coord-descent candidate:", cd_cfg)
sl_pct=0.03
and tp_pct=0.06
- these match typical short-term BTC/ETH swing sizes on 1h data - makes it easier to adapt for us.
That alone often gets you 70–90% of the gains, fast:
Finding EMA lengths that fit the volatility profile of BTC/ETH
Setting SL/TP to avoid getting whipsawed or holding losers too long
Avoiding overtrading by spacing trades (
min_bars_between
)
5) Purged & embargoed walk-forward (≈ 20 lines)
Rule of thumb: set embargo ≈ your max lookback / label horizon.
def make_purged_embargo_splits(n, n_splits=6, min_train_frac=0.5,
test_len=None, purge=0, embargo=0):
assert 0 < min_train_frac < 1 and n > 0
min_train = int(n * min_train_frac)
rem = max(0, n - min_train)
fold = test_len if test_len is not None else max(1, rem // n_splits)
splits = []
for k in range(n_splits):
tr_end = min_train + k * fold
te_start = tr_end + embargo
te_end = min(n, te_start + fold)
if te_end <= te_start: break
tr_end_purged = max(0, tr_end - purge)
train = np.arange(0, tr_end_purged, dtype=int)
test = np.arange(te_start, te_end, dtype=int)
if train.size and test.size:
splits.append((train, test))
return splits
SPLITS = make_purged_embargo_splits(len(df), n_splits=6, min_train_frac=0.5,
test_len=None, purge=0, embargo=24)
This code defines a function to create purged and embargoed walk-forward splits for time-series data, ensuring no information leakage between training and testing periods by incorporating an embargo buffer. It calculates the minimum training size based on the fraction provided, divides the remaining data into folds, and generates lists of train and test indices while skipping the embargo period. The SPLITS variable applies this function to the dataframe length with specific parameters like 6 splits and a 24-bar embargo. The score_cfg_walkforward function evaluates a configuration by running backtests on each test split, computing a custom score that balances Sharpe ratio, maximum drawdown, and a penalty for low trade counts, then returns the average score across folds.
(Optional) Walk-forward score with re-opt on train → score on test:
def optimize_on_train_ema(train_df, base_cfg, bpy):
best_fast, _ = sweep_ema_fast(train_df, ema_slow=base_cfg.ema_slow,
sl=base_cfg.sl_pct, tp=base_cfg.tp_pct, bpy=bpy)
tmp = BTConfig(best_fast, base_cfg.ema_slow, base_cfg.sl_pct, base_cfg.tp_pct,
base_cfg.fee_bps, base_cfg.slip_bps, base_cfg.min_bars_between)
ema_slow = ternary_search_int(train_df, "ema_slow",
lo=max(tmp.ema_fast+5, 30), hi=160, base_cfg=tmp, bpy=bpy)
return BTConfig(tmp.ema_fast, ema_slow, tmp.sl_pct, tmp.tp_pct,
tmp.fee_bps, tmp.slip_bps, tmp.min_bars_between)
def walkforward_score_ema(df, splits, base_cfg, bpy, reoptimize=True):
scores = []
for tr_idx, te_idx in splits:
tr, te = df.iloc[tr_idx], df.iloc[te_idx]
cfg = optimize_on_train_ema(tr, base_cfg, bpy) if reoptimize else base_cfg
m = run_backtest_ema_barlevel(te, cfg, bars_per_year=bpy)
scores.append(score_dict_to_scalar(m))
return float(np.mean(scores))
print("WFO score (re-opt=True):", walkforward_score_ema(df, SPLITS, base, BARS_PER_YEAR, True))
6) Single-objective Optuna (maximize Sharpe – 0.5·MDD – penalty)
Each trial proposes EMA/SL/TP/cooldown; for every split we re-opt on train (fast unimodality step) and score on test. Uses TPE + pruning + SQLite.
import optuna
def optuna_single_objective(df, splits, symbol="BTC-USD", interval="1h", bpy=365*24,
storage="sqlite:///studies/btc_ema_single.db"):
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)
pruner = optuna.pruners.MedianPruner(n_warmup_steps=1)
study = optuna.create_study(direction="maximize", sampler=sampler, pruner=pruner,
storage=storage, study_name=f"{symbol.lower()}_{interval}_ema_single",
load_if_exists=True)
def objective(trial: optuna.Trial):
ema_fast = trial.suggest_int("ema_fast", 5, 40)
ema_slow = trial.suggest_int("ema_slow", 30, 160)
if ema_fast >= ema_slow: raise optuna.TrialPruned()
sl_pct = trial.suggest_float("sl_pct", 0.003, 0.03, log=True)
tp_pct = trial.suggest_float("tp_pct", 0.005, 0.05, log=True)
cooldown = trial.suggest_int("min_bars_between", 0, 12)
base_cfg = BTConfig(ema_fast, ema_slow, sl_pct, tp_pct, FEE_BPS, SLIP_BPS, cooldown)
fold_scores, trades_acc = [], []
for i, (tr_idx, te_idx) in enumerate(splits):
tr, te = df.iloc[tr_idx], df.iloc[te_idx]
tuned = optimize_on_train_ema(tr, base_cfg, bpy) # re-opt on TRAIN
m = run_backtest_ema_barlevel(te, tuned, bars_per_year=bpy) # TEST
s = score_dict_to_scalar(m)
fold_scores.append(s); trades_acc.append(m["trades"])
trial.report(float(np.mean(fold_scores)), step=i)
if trial.should_prune(): raise optuna.TrialPruned()
trial.set_user_attr("trades", float(np.mean(trades_acc))) # optional for hard constraints
return float(np.mean(fold_scores))
study.optimize(objective, n_trials=200, timeout=3600)
print("Best SO value:", study.best_value)
print("Best SO params:", study.best_params)
return study
# Run it (optional)
# so_study = optuna_single_objective(df, SPLITS, symbol=SYMBOL, interval=INTERVAL, bpy=BARS_PER_YEAR)
This code imports Optuna and sets up a TPE sampler with a seed for reproducibility and multivariate handling, along with a median pruner to stop underperforming trials early after one warmup step. It creates an optimization study aimed at maximizing the objective, using persistent SQLite storage for resumability and loading existing studies if available. The objective function suggests hyperparameters for EMA periods, stop-loss and take-profit percentages (on log scale), and cooldown, prunes invalid trials where fast EMA exceeds slow, builds a config, evaluates it across walk-forward test folds with progressive reporting for pruning, and returns the average custom score. Finally, it runs 200 trials with a 1-hour timeout and prints the best value and parameters found.
Pattern Recap:
TPE (multivariate) → handles parameter interactions well.
MedianPruner → stops weak configs early.
Tight ranges + log scale → reduce variance & overfit risk.
TL;DR of this section
Execution realism: signals at t-1, orders at t, with fees/slippage.
Metrics on bars: Sharpe/MDD computed from bar-level equity; annualize with
BARS_PER_YEAR
.Unimodality first: 1-D sweep + ternary saves compute and reduces overfit risk.
WFO: re-opt on train → score on test across purged & embargoed splits.
Optuna: use TPE + pruning + persistent storage; objective =
Sharpe – 0.5·MDD – penalty
.
mstudy = optuna.create_study(
directions=["maximize", "minimize"],
sampler=optuna.samplers.NSGAIISampler(seed=42),
pruner=optuna.pruners.MedianPruner(),
storage="sqlite:///studies/btc_ema_mo.db", # change to eth_ for ETH-USD
study_name=f"{SYMBOL.lower()}_{INTERVAL}_ema_mo",
load_if_exists=True,
)
def objective_mo(trial):
# same param space as above
cfg = BTConfig(
trial.suggest_int("ema_fast", 5, 40),
trial.suggest_int("ema_slow", 30, 150),
trial.suggest_float("sl_pct", 0.003, 0.03, log=True),
trial.suggest_float("tp_pct", 0.005, 0.05, log=True),
fee_bps=6, slip_bps=2,
min_bars_between=trial.suggest_int("min_bars_between", 0, 12)
)
# avg across folds
sharpe, mdd = [], []
for tr_idx, te_idx in SPLITS:
m = run_backtest(df.iloc[te_idx], cfg)
sharpe.append(m["sharpe"]); mdd.append(m["mdd"])
return float(np.mean(sharpe)), float(np.mean(mdd))
mstudy.optimize(objective_mo, n_trials=300)
pareto = mstudy.best_trials # List of Pareto-optimal trials
for t in pareto:
print(f"Pareto trial: value {t.values}, params {t.params}") # Example for ETH-USD: Multiple entries, e.g., value [1.1, 0.15], params {'ema_fast': 12, ...}
This code creates a multi-objective study in Optuna to simultaneously maximise Sharpe ratio and minimise maximum drawdown, using an NSGA-II sampler with a seed and a median pruner, along with persistent storage. The objective_mo function suggests the same hyperparameters as before, prunes invalid EMA configurations, builds the BTConfig, runs backtests on each test fold to collect Sharpe and MDD values, and returns their averages as a tuple for Pareto optimization. It then optimizes over 300 trials, retrieves the list of best (non-dominated) trials from the Pareto front, and prints their values and parameters for selection. This approach helps identify trade-off solutions rather than a single optimum.
7) Constraints (e.g., require ≥ 30 trades)
In crypto, low trade counts can lead to overfitting on rare events like flash crashes; constraints ensure robustness. Two ways:
(A) Penalty inside the objective (simple, version-agnostic):
score = m["sharpe"] - 0.5*m["mdd"] - 0.0001*max(0, 30 - m["trades"]) # Integrate into objective functions above
This single-line code computes a penalized score by starting with the Sharpe ratio, subtracting half the maximum drawdown for risk adjustment, and applying a small penalty if the number of trades falls below 30 to discourage low-activity strategies. The penalty uses max(0, 30 - trades) to only activate when trades are insufficient, scaled by 0.0001 to avoid overpowering the main metrics. It can be easily integrated into any objective function to softly enforce the constraint without altering Optuna's core setup.
(B) Hard constraints (Optuna 3+; requires compatible sampler like BoTorchSampler for single-objective):
def constraints(trial):
# <=0 means feasible; >0 violated
min_trades_violation = max(0, 30 - trial.user_attrs.get("trades", 0))
return (min_trades_violation,)
study = optuna.create_study(..., constraints_func=constraints) # Add to study creation
def objective(trial):
cfg = ... # As before
# after computing metrics (avg trades across folds):
trial.set_user_attr("trades", np.mean(trades_list)) # Assume trades_list collected
return score
This code defines a constraints function that checks if the average number of trades (stored as a user attribute) meets or exceeds 30, returning a violation value greater than 0 if not, to mark the trial as infeasible. It modifies the study creation to include this constraints function, ensuring only configurations satisfying the hard constraint are considered optimal. In the objective function, after calculating metrics across folds, it sets the 'trades' user attribute with the mean trade count for the constraint evaluation. This method enforces strict requirements but requires a sampler compatible with constraints, like BoTorch for Bayesian optimization.
For future one can also add:
Incorporate Bayesian Alternatives or Hybrids: Optuna's TPE is solid, but for even smarter sampling in high-dimensional spaces, consider integrating libraries like SMAC3 or Hyperopt (if not already). For crypto's regime shifts, adaptive samplers that weigh recent data more (e.g., via recency-biased priors) can help.
Robustness Testing Beyond Walk-Forward: Add Monte Carlo simulations (resample trades with noise) or synthetic data generation (e.g., using GANs for fake volatility spikes) to stress-test against black swans. Also, ensemble multiple optimized configs (e.g., average signals from top 3 Pareto points) for smoother equity curves.
Happy trading!
Jakub