MathShard AI | Technical Deep Dive

System Overview

Four-Layer Architecture

MathShard processes match data through four distinct layers: sourcing, algorithms, execution, and strategy. Each layer is isolated and testable.

📊

Layer 1: Data Sourcing

Raw match data, odds, fixtures

StatsBomb xG football-data.co.uk Bet365 Odds JSONL Format

🧮

Layer 2: Algorithms

Elo ratings, feature engineering, ML models

ELO_UPDATE ROLLING_MEAN EWMA LOGISTIC_FIT SOFTMAX_PREDICT CALIBRATION

⚡

Layer 3: Execution

Graph DAG, streaming compute, API server

Graph DSL DAG Executor REST API Walk-Forward Backtest

💰

Layer 4: Strategy

Bankroll management, risk controls, execution

Growth Mode ACCA Fun Kelly Criterion Circuit Breakers

Layer 1

Data Sourcing

Match results, expected goals (xG), and bookmaker odds flow through a unified JSONL pipeline.

⚽

StatsBomb

xG, shots, passes

→

📈

ETL Pipeline

statsbomb-etl

→

📄

JSONL

matches_agg.jsonl

→

🎰

Odds Merge

Bet365, Pinnacle

→

📊

Table

Column store

          data/epl/matches_agg.jsonl
          JSONL
        

{
  "match_id": "12345",
  "date": "2024-08-17",
  "home_team_id": "Arsenal",
  "away_team_id": "Chelsea",
  "home_score": 2,
  "away_score": 1,
  "home_xg": 1.8,
  "away_xg": 0.9,
  "outcome": 0  // 0=Home, 1=Draw, 2=Away
}
        

📊

StatsBomb xG

Event-level expected goals data from open StatsBomb datasets. Aggregated per-match for home/away xG totals.

shots passes xG

📈

football-data.co.uk

Historical match results and closing odds from major bookmakers. 20+ seasons of EPL, La Liga, Serie A, Bundesliga, Ligue 1.

results odds CSV

🎰

Live Odds API

Real-time odds from betting exchanges and sportsbooks. Used for edge calculation and arbitrage detection.

Bet365 Pinnacle real-time

📄

JSONL Format

Newline-delimited JSON for streaming ingestion. One match per line, schema-validated, append-only.

streaming immutable sortable

Layer 2

Algorithms

Streaming Elo ratings form the backbone. Features flow through a DAG of operations into multinomial logistic regression for 3-way prediction.

Elo Update Formula

Elo' = Elo + K × (S - E)

💡

Key insight: A simple 2-feature Elo model (elo_diff + bias) outperforms complex xG models. Simplicity wins in noisy domains.

Core Operations

Operation	Type	Description
`ELO_UPDATE`	Feature	Streaming Elo ratings with home advantage, optional xG hybrid mode
`ROLLING_MEAN`	Feature	Rolling window statistics per team (form indicators)
`EWMA`	Feature	Exponentially weighted moving average (recency bias)
`DIFF`	Transform	Element-wise difference (home - away)
`STACK_COLUMNS`	Transform	Concatenate features into matrix X
`LOGISTIC_FIT`	Model	Multinomial logistic regression with L2 regularization
`SOFTMAX_PREDICT`	Model	Softmax probabilities → p_home, p_draw, p_away
`CALIBRATION`	Post	Temperature scaling for probability calibration (T=0.27)

          ops/elo.go
          Go
        

// EloState holds streaming Elo ratings for all teams
type EloState struct {
    K       float64                       // K-factor (default: 20)
    HomeAdv float64                       // Home advantage in Elo points
    InitElo float64                       // Initial Elo for new teams
    Ratings map[string]map[string]float64 // scope → team → elo

    // Layer 2 upgrades
    ShrinkageLambda float64  // Early-season shrinkage
    AdaptiveK0      float64  // Adaptive K-factor base
    TeamHomeAdv     map[...]  // Team-specific home advantage
}

// Update modes: "goals" (default), "xg", "hybrid"
// Hybrid: S = (1-blend)*S_result + blend*S_xg
        

Expected Score (Elo)

E = 1 / (1 + 10^{(Elo_away - Elo_home - HomeAdv) / 400})

Layer 3

Execution Engine

Models are defined as JSON DAGs. The executor topologically sorts nodes and runs operations in dependency order.

📝

Graph JSON

DAG definition

→

✅

Validate

Cycle detection

→

📊

Topo Sort

Execution order

→

⚡

Execute

Op dispatch

→

📤

Output

Predictions

          models/elo_baseline.json
          JSON
        

{
  "graph_id": "soccer_3way_elo_baseline",
  "nodes": [
    {
      "id": "elo",
      "op": "ELO_UPDATE",
      "in": ["matches"],
      "out": ["elo_pre_home", "elo_pre_away"],
      "params": { "k": 20.0, "home_adv": 100.0 }
    },
    {
      "id": "elo_diff",
      "op": "DIFF",
      "in": ["elo_pre_home", "elo_pre_away"],
      "out": ["elo_diff"]
    },
    {
      "id": "fit",
      "op": "LOGISTIC_FIT",
      "in": ["X", "outcome"],
      "out": ["W"],
      "params": { "num_classes": 3, "l2": 0.01 }
    }
  ],
  "outputs": ["p_home", "p_draw", "p_away"]
}
        

API Endpoints

Method	Endpoint	Description
`POST`	`/api/v1/predict`	Predict single match outcome
`GET`	`/api/slate`	Weekly predictions with edge vs odds
`POST`	`/api/v1/backtest`	Walk-forward backtesting
`GET`	`/api/season-odds`	Monte Carlo season simulation
`POST`	`/api/v1/graph/run`	Execute arbitrary computation graph

Layer 4

Betting Strategy

Two complementary systems: Growth Mode for serious alpha extraction, ACCA Fun for controlled entertainment betting.

Primary

Growth Mode v1

+32.6% ROI

Type Singles only

Leagues EPL + La Liga

Allocation Core 65% / Agg 35%

Max Bet 1.5% / 2.5%

Weekly Cap 18% exposure

Edge Threshold 6% (Core) / 8% (Agg)

Secondary

ACCA Fun

+72.6% ROI

Type Doubles only

Tickets/Week 5 doubles

Weekly Budget 30% bankroll

Leg Quality A+ (≥60%) or A (≥55%)

Throttle 50% after 3 losses

Survival Floor $300 (branch dies)

Risk Controls (4-Tier System)

🔔

Tier 1: Soft Brake

At -15% aggressive drawdown, reduce max bet to 1.5%. Release when DD recovers to -8%.

⚠️

Tier 2: Kill Switch

At -20% aggressive drawdown, pause aggressive sleeve for 14 days.

🛑

Tier 3: Circuit Breaker

At -18% total portfolio drawdown, pause ALL betting for 7 days.

📉

Tier 4: Minimum Threshold

If aggressive sleeve falls below 10% of bankroll, keep it disabled.

Results

Performance Metrics

Validated on 160 out-of-sample 2025/26 EPL matches using walk-forward backtesting.

Accuracy by Confidence

≥75% Confidence

80%

≥65% Confidence

62%

≥55% Confidence

58%

All Predictions

54.4%

ROI vs Bookmakers

Strategy	ROI	Notes
Model picks (Growth Mode)	+32.6%	4-month validation (Aug-Dec 2025)
Model picks (basic)	+6.56%	All model picks vs Bet365
Always home	-8%	Baseline strategy
Random	-10%	House edge

🎯

Key insight: Draws are hard. The model never predicts draws (0% draw accuracy). All alpha comes from correctly identifying home/away winners at high confidence.