Technical Deep Dive

MathShard AI

A graph-based soccer prediction engine using streaming Elo ratings, multinomial logistic regression, and disciplined bankroll management.

+32.6%
Growth Mode ROI
54.4%
Model Accuracy
80%
High-Conf Accuracy
5
Leagues Covered

Four-Layer Architecture

MathShard processes match data through four distinct layers: sourcing, algorithms, execution, and strategy. Each layer is isolated and testable.

๐Ÿ“Š
Layer 1: Data Sourcing
Raw match data, odds, fixtures
StatsBomb xG football-data.co.uk Bet365 Odds JSONL Format
๐Ÿงฎ
Layer 2: Algorithms
Elo ratings, feature engineering, ML models
ELO_UPDATE ROLLING_MEAN EWMA LOGISTIC_FIT SOFTMAX_PREDICT CALIBRATION
โšก
Layer 3: Execution
Graph DAG, streaming compute, API server
Graph DSL DAG Executor REST API Walk-Forward Backtest
๐Ÿ’ฐ
Layer 4: Strategy
Bankroll management, risk controls, execution
Growth Mode ACCA Fun Kelly Criterion Circuit Breakers

Data Sourcing

Match results, expected goals (xG), and bookmaker odds flow through a unified JSONL pipeline.

โšฝ
StatsBomb
xG, shots, passes
โ†’
๐Ÿ“ˆ
ETL Pipeline
statsbomb-etl
โ†’
๐Ÿ“„
JSONL
matches_agg.jsonl
โ†’
๐ŸŽฐ
Odds Merge
Bet365, Pinnacle
โ†’
๐Ÿ“Š
Table
Column store
data/epl/matches_agg.jsonl JSONL
{
  "match_id": "12345",
  "date": "2024-08-17",
  "home_team_id": "Arsenal",
  "away_team_id": "Chelsea",
  "home_score": 2,
  "away_score": 1,
  "home_xg": 1.8,
  "away_xg": 0.9,
  "outcome": 0  // 0=Home, 1=Draw, 2=Away
}
๐Ÿ“Š
StatsBomb xG
Event-level expected goals data from open StatsBomb datasets. Aggregated per-match for home/away xG totals.
shots passes xG
๐Ÿ“ˆ
football-data.co.uk
Historical match results and closing odds from major bookmakers. 20+ seasons of EPL, La Liga, Serie A, Bundesliga, Ligue 1.
results odds CSV
๐ŸŽฐ
Live Odds API
Real-time odds from betting exchanges and sportsbooks. Used for edge calculation and arbitrage detection.
Bet365 Pinnacle real-time
๐Ÿ“„
JSONL Format
Newline-delimited JSON for streaming ingestion. One match per line, schema-validated, append-only.
streaming immutable sortable

Algorithms

Streaming Elo ratings form the backbone. Features flow through a DAG of operations into multinomial logistic regression for 3-way prediction.

Elo Update Formula
Elo' = Elo + K ร— (S - E)
๐Ÿ’ก
Key insight: A simple 2-feature Elo model (elo_diff + bias) outperforms complex xG models. Simplicity wins in noisy domains.

Core Operations

Operation Type Description
ELO_UPDATE Feature Streaming Elo ratings with home advantage, optional xG hybrid mode
ROLLING_MEAN Feature Rolling window statistics per team (form indicators)
EWMA Feature Exponentially weighted moving average (recency bias)
DIFF Transform Element-wise difference (home - away)
STACK_COLUMNS Transform Concatenate features into matrix X
LOGISTIC_FIT Model Multinomial logistic regression with L2 regularization
SOFTMAX_PREDICT Model Softmax probabilities โ†’ p_home, p_draw, p_away
CALIBRATION Post Temperature scaling for probability calibration (T=0.27)
ops/elo.go Go
// EloState holds streaming Elo ratings for all teams
type EloState struct {
    K       float64                       // K-factor (default: 20)
    HomeAdv float64                       // Home advantage in Elo points
    InitElo float64                       // Initial Elo for new teams
    Ratings map[string]map[string]float64 // scope โ†’ team โ†’ elo

    // Layer 2 upgrades
    ShrinkageLambda float64  // Early-season shrinkage
    AdaptiveK0      float64  // Adaptive K-factor base
    TeamHomeAdv     map[...]  // Team-specific home advantage
}

// Update modes: "goals" (default), "xg", "hybrid"
// Hybrid: S = (1-blend)*S_result + blend*S_xg
Expected Score (Elo)
E = 1 / (1 + 10(Eloaway - Elohome - HomeAdv) / 400)

Execution Engine

Models are defined as JSON DAGs. The executor topologically sorts nodes and runs operations in dependency order.

๐Ÿ“
Graph JSON
DAG definition
โ†’
โœ…
Validate
Cycle detection
โ†’
๐Ÿ“Š
Topo Sort
Execution order
โ†’
โšก
Execute
Op dispatch
โ†’
๐Ÿ“ค
Output
Predictions
models/elo_baseline.json JSON
{
  "graph_id": "soccer_3way_elo_baseline",
  "nodes": [
    {
      "id": "elo",
      "op": "ELO_UPDATE",
      "in": ["matches"],
      "out": ["elo_pre_home", "elo_pre_away"],
      "params": { "k": 20.0, "home_adv": 100.0 }
    },
    {
      "id": "elo_diff",
      "op": "DIFF",
      "in": ["elo_pre_home", "elo_pre_away"],
      "out": ["elo_diff"]
    },
    {
      "id": "fit",
      "op": "LOGISTIC_FIT",
      "in": ["X", "outcome"],
      "out": ["W"],
      "params": { "num_classes": 3, "l2": 0.01 }
    }
  ],
  "outputs": ["p_home", "p_draw", "p_away"]
}

API Endpoints

Method Endpoint Description
POST /api/v1/predict Predict single match outcome
GET /api/slate Weekly predictions with edge vs odds
POST /api/v1/backtest Walk-forward backtesting
GET /api/season-odds Monte Carlo season simulation
POST /api/v1/graph/run Execute arbitrary computation graph

Betting Strategy

Two complementary systems: Growth Mode for serious alpha extraction, ACCA Fun for controlled entertainment betting.

Primary
Growth Mode v1
+32.6% ROI
Type Singles only
Leagues EPL + La Liga
Allocation Core 65% / Agg 35%
Max Bet 1.5% / 2.5%
Weekly Cap 18% exposure
Edge Threshold 6% (Core) / 8% (Agg)
Secondary
ACCA Fun
+72.6% ROI
Type Doubles only
Tickets/Week 5 doubles
Weekly Budget 30% bankroll
Leg Quality A+ (โ‰ฅ60%) or A (โ‰ฅ55%)
Throttle 50% after 3 losses
Survival Floor $300 (branch dies)

Risk Controls (4-Tier System)

๐Ÿ””
Tier 1: Soft Brake
At -15% aggressive drawdown, reduce max bet to 1.5%. Release when DD recovers to -8%.
โš ๏ธ
Tier 2: Kill Switch
At -20% aggressive drawdown, pause aggressive sleeve for 14 days.
๐Ÿ›‘
Tier 3: Circuit Breaker
At -18% total portfolio drawdown, pause ALL betting for 7 days.
๐Ÿ“‰
Tier 4: Minimum Threshold
If aggressive sleeve falls below 10% of bankroll, keep it disabled.

Performance Metrics

Validated on 160 out-of-sample 2025/26 EPL matches using walk-forward backtesting.

Accuracy by Confidence

โ‰ฅ75% Confidence
80%
โ‰ฅ65% Confidence
62%
โ‰ฅ55% Confidence
58%
All Predictions
54.4%

ROI vs Bookmakers

Strategy ROI Notes
Model picks (Growth Mode) +32.6% 4-month validation (Aug-Dec 2025)
Model picks (basic) +6.56% All model picks vs Bet365
Always home -8% Baseline strategy
Random -10% House edge
๐ŸŽฏ
Key insight: Draws are hard. The model never predicts draws (0% draw accuracy). All alpha comes from correctly identifying home/away winners at high confidence.