Skip to content

Backtest Results

WhaleClaw doesn’t ask you to trust a black box. The signal pipeline has been backtested with walk-forward validation — the gold standard for strategy verification.

588Signals Analyzed
+101.4RTotal Return
46%Win Rate
6Profitable KOLs

The backtest covers the end-to-end signal pipeline:

  1. Data collection — Raw KOL messages from Telegram (Jul 2025 – present, ~4,500+ messages)
  2. Signal parsing — GPT-4o-mini extraction of entry, stop-loss, and target from each message
  3. KOL filtering — Only signals from KOLs with positive walk-forward performance
  4. Trade simulation — Each signal tested against actual BTC hourly OHLC data

This isn’t a hypothetical model. These are real messages from real traders, parsed and validated against real market data.

Walk-forward testing is the industry standard for avoiding overfitting:

Dataset: Jul 2025 – Present (588 parsed signals)
┌──────────────────────────────┐
│ TRAIN (60%) │ ← Find which KOLs are profitable
│ Jul 2025 – ~Feb 2026 │
├──────────────────────────────┤
│ TEST (40%) │ ← Validate on unseen data
│ ~Feb 2026 – Present │
└──────────────────────────────┘

Why this matters: Many trading systems look great on historical data but fail in real time (overfitting). Walk-forward testing splits the data — you find the strategy on one portion and prove it works on data the model has never seen.

The 6 profitable KOLs were identified in the training set and confirmed profitable on the test set. This is the validation that matters.

The backtest identified 6 KOLs whose signals consistently produced positive returns across both training and test periods:

KOLStyleWRTotal RSignals
TareeqMacro52%+28.6RHigh volume, consistent
WoodsScalper48%+22.1RFast entries, tight stops
ElizMacro51%+19.4RSwing trades, wide targets
MuzzaginScalper44%+14.8RHigh frequency, positive expectancy
Binance KillersMixed43%+10.2RStructured signals with levels
VivianMacro47%+6.3RSelective, high conviction

These 6 form the core tier-1 KOLs that carry the highest weighting in the consensus algorithm. The other 122+ KOLs provide directional context but with lower individual influence.

A 46% win rate might sound low. In trading, it’s not. What matters is the ratio of average win to average loss (R-multiple):

  • Average winning trade: +2.2R
  • Average losing trade: -1.0R
  • Expectancy per trade: +0.21R

You lose more often than you win, but your wins are more than twice the size of your losses. Over 588 signals, that compounds to +101.4R.

This is the cumulative R across all 588 signals from the 6 profitable KOLs. It accounts for:

  • Slippage (conservative estimates)
  • Losing streaks (the longest was 8 consecutive losses)
  • Market conditions (trending and ranging periods)

The backtest validates the core signal pipeline — that AI can extract tradeable signals from KOL messages and that certain KOLs consistently produce edge.

WhaleClaw goes further by:

  1. Weighting the 6 core KOLs more heavily (based on these results)
  2. Adding 122+ additional KOLs for broader consensus context
  3. Layering in orderflow, structure, and macro for multi-source confirmation
  4. Running continuously (not batch) for real-time signal updates

The backtest is the foundation. The live system is the full product built on top of it.

  • Past performance doesn’t guarantee future results. Market regimes change. KOLs go through cold streaks.
  • The backtest covers BTC only. We don’t test or trade altcoins.
  • Signal parsing isn’t perfect. Some messages are ambiguous. The AI gets ~95% of signals correct; ~5% have some parsing error.
  • Walk-forward is the best test, not a perfect one. Real-time trading has slippage, emotional factors, and timing differences.

We share these limitations because trust is built on transparency, not on cherry-picked numbers.

See how KOLs are selected

Three-filter process: performance tracking, AI analysis, and human review.

KOL Selection →