Backtest Results
WhaleClaw doesn’t ask you to trust a black box. The signal pipeline has been backtested with walk-forward validation — the gold standard for strategy verification.
Summary
Section titled “Summary”What was tested
Section titled “What was tested”The backtest covers the end-to-end signal pipeline:
- Data collection — Raw KOL messages from Telegram (Jul 2025 – present, ~4,500+ messages)
- Signal parsing — GPT-4o-mini extraction of entry, stop-loss, and target from each message
- KOL filtering — Only signals from KOLs with positive walk-forward performance
- Trade simulation — Each signal tested against actual BTC hourly OHLC data
This isn’t a hypothetical model. These are real messages from real traders, parsed and validated against real market data.
Walk-forward methodology
Section titled “Walk-forward methodology”Walk-forward testing is the industry standard for avoiding overfitting:
Dataset: Jul 2025 – Present (588 parsed signals)
┌──────────────────────────────┐│ TRAIN (60%) │ ← Find which KOLs are profitable│ Jul 2025 – ~Feb 2026 │├──────────────────────────────┤│ TEST (40%) │ ← Validate on unseen data│ ~Feb 2026 – Present │└──────────────────────────────┘Why this matters: Many trading systems look great on historical data but fail in real time (overfitting). Walk-forward testing splits the data — you find the strategy on one portion and prove it works on data the model has never seen.
The 6 profitable KOLs were identified in the training set and confirmed profitable on the test set. This is the validation that matters.
The 6 profitable KOLs
Section titled “The 6 profitable KOLs”The backtest identified 6 KOLs whose signals consistently produced positive returns across both training and test periods:
| KOL | Style | WR | Total R | Signals |
|---|---|---|---|---|
| Tareeq | Macro | 52% | +28.6R | High volume, consistent |
| Woods | Scalper | 48% | +22.1R | Fast entries, tight stops |
| Eliz | Macro | 51% | +19.4R | Swing trades, wide targets |
| Muzzagin | Scalper | 44% | +14.8R | High frequency, positive expectancy |
| Binance Killers | Mixed | 43% | +10.2R | Structured signals with levels |
| Vivian | Macro | 47% | +6.3R | Selective, high conviction |
These 6 form the core tier-1 KOLs that carry the highest weighting in the consensus algorithm. The other 122+ KOLs provide directional context but with lower individual influence.
What the numbers mean
Section titled “What the numbers mean”Win rate: 46%
Section titled “Win rate: 46%”A 46% win rate might sound low. In trading, it’s not. What matters is the ratio of average win to average loss (R-multiple):
- Average winning trade: +2.2R
- Average losing trade: -1.0R
- Expectancy per trade: +0.21R
You lose more often than you win, but your wins are more than twice the size of your losses. Over 588 signals, that compounds to +101.4R.
+101.4R total return
Section titled “+101.4R total return”This is the cumulative R across all 588 signals from the 6 profitable KOLs. It accounts for:
- Slippage (conservative estimates)
- Losing streaks (the longest was 8 consecutive losses)
- Market conditions (trending and ranging periods)
How this maps to WhaleClaw
Section titled “How this maps to WhaleClaw”The backtest validates the core signal pipeline — that AI can extract tradeable signals from KOL messages and that certain KOLs consistently produce edge.
WhaleClaw goes further by:
- Weighting the 6 core KOLs more heavily (based on these results)
- Adding 122+ additional KOLs for broader consensus context
- Layering in orderflow, structure, and macro for multi-source confirmation
- Running continuously (not batch) for real-time signal updates
The backtest is the foundation. The live system is the full product built on top of it.
Limitations (we’re honest about these)
Section titled “Limitations (we’re honest about these)”- Past performance doesn’t guarantee future results. Market regimes change. KOLs go through cold streaks.
- The backtest covers BTC only. We don’t test or trade altcoins.
- Signal parsing isn’t perfect. Some messages are ambiguous. The AI gets ~95% of signals correct; ~5% have some parsing error.
- Walk-forward is the best test, not a perfect one. Real-time trading has slippage, emotional factors, and timing differences.
We share these limitations because trust is built on transparency, not on cherry-picked numbers.