ML4T Book 3rd Edition

🇹🇭 ภาษาไทย

Machine Learning for Algorithmic Trading (3rd Edition) โดย Stefan Jansen — หนังสือฉบับที่สาม กำหนดออก June 2026 มี 27 chapters แบ่งเป็น 6 ส่วน ครอบคลุมตั้งแต่ data foundations ถึง production deployment

เพิ่ม 4 chapters ใหม่ทั้งหมด (Ch 16, 17 Strategy Simulation & Portfolio Construction, Ch 22-24 RAG/KG/Agents) และ Part 6 Production ใหม่ทั้ง part

โครงสร้าง 6 ส่วน

Partหัวข้อChapters
1 — FoundationData & Strategy SetupCh 1–6
2 — FeaturesFeature EngineeringCh 7–10
3 — ModelsML Pipeline & SynthesisCh 11–15
4 — StrategyBacktest to ExecutionCh 16–20
5 — Advanced AIRL, RAG & AgentsCh 21–24
6 — ProductionDeploy & MonitorCh 25–27

Part 1 — Foundation (Ch 1–6)

Chชื่อhighlight
1The Process Is Your EdgeML4T workflow แบบ 2-layer; regime detection ด้วย GMM (volatility ratio Risk-On vs Risk-Off = 1.3x); กำหนด evidence boundary
2The Financial Data Universe8 asset classes; bitemporal PIT; storage: Parquet 3.4x compression, Polars ASOF 3.8x faster
3Market MicrostructureLOB reconstruction (NASDAQ ITCH 423M msg/day, 97.6% cancel rate); dollar bars ดีที่สุด (JB=84.7 vs 3838 time bars)
4Fundamental & Alternative Databitemporal SEC EDGAR pipeline; entity resolution 3 stage; published factors สูญเสีย ~58% หลัง publication
5Synthetic Financial DataTimeGAN TSTR=1.70; Tail-GAN VaR error 102%→11.5%; Diffusion-TS KS=0.06; GReaT/distilgpt2 AUC=0.84
6Strategy Research Framework3-layer metrics; 5 leakage types; walk-forward CV; baseline checkpoint; run logging + DSR

Part 2 — Features (Ch 7–10)

Chชื่อhighlight
7Defining the Learning Tasklabel engineering; feature-label evaluation fold-by-fold; search accounting; correlation→causality
8Financial Feature Engineering3 filters (horizon/driver/role); price-derived, cross-instrument, contextual families; SPY-TLT regime conditioning: 17% IC swing
9Model-Based Feature Extractiondiagnostics, spectral, volatility, uncertainty, regime, cross-sectional features; walk-forward fitting
10Text Feature Engineeringlexical→static embeddings→sequential→Transformers; financial NLP workflow; PIT-safe text features

Part 3 — Models (Ch 11–15)

Chชื่อhighlight
11The ML PipelineRidge/LASSO/Elastic Net; Ridge 1.5x ICIR vs OLS; conformal prediction (CQR+ACI 88.1%); SHAP diagnostics
12Advanced Models for Tabular DataXGBoost, LightGBM, CatBoost; GBM beats linear ใน 7-8/9 case studies; TabM competitive; TreeSHAP
13Deep Learning for Time SeriesN-BEATS, PatchTST, iTransformer, TFT; DL rarely beats GBM baseline; linear model beats Transformer (Zeng 2022)
14Latent Factor ModelsPCA, IPCA, RP-PCA, CAE, adversarial SDF; factor zoo problem (400+ factors, 65% ล้มเหลว); CAE IC +0.073
15Causal Machine LearningDML; BSTS event impact; PCMCI/NOTEARS causal discovery; predictive vs causal signal

Part 4 — Strategy (Ch 16–20)

Chชื่อhighlight
16Strategy Simulationbacktest = falsification; 6 failure modes; DSR; IC champion ≠ Sharpe champion; cadence mediates IC→Sharpe
17Portfolio Constructionequal-weight hard to beat (DeMiguel 2009); Kelly criterion; HRP (no matrix inversion); no universal allocator winner
18Transaction Costscost taxonomy; square-root impact model; Almgren-Chriss optimal execution; TCA feedback loop; alpha-to-go
19Risk ManagementVaR/CVaR; drawdown path risk; factor decomposition; stress testing; GARCH/EWMA adaptive controls; kill switches
20Strategy Synthesis9 case study verdicts; NASDAQ-100 IC=0.008 แต่ Sharpe=4.22; GBM ชนะ 6/9; median holdout decay ~50%

Part 5 — Advanced AI (Ch 21–24)

Chชื่อhighlight
21Reinforcement LearningMDP formulation; DQN→PPO→SAC; optimal execution; market making; deep hedging (pfhedge); IRL
22RAG for Financial Researchhallucination → RAG solution; structure-aware parsing; hybrid retrieval + BM25; KG-guided +24% accuracy -85% tokens
23Knowledge Graphsgraph justified เมื่อ multi-hop query; LLM extraction pipeline; Graph RAG; institutional crowding features
24Autonomous AgentsReAct/ToT/Reflexion; explicit state + memory schema; tool contracts; multi-agent forecasting; Warden security pattern

Part 6 — Production (Ch 25–27)

Chชื่อhighlight
25Live Trading Systemsunified backtest↔live framework; IBKR/Alpaca/QuantConnect; order state machine 11 states; pipeline verification
26MLOps & Governancetechnical vs statistical failure distinction; PSI/KS/SHAP drift; shadow mode; circuit breakers; MLflow/DVC/Feast
27The Systematic Edgeprocess = durable edge; quant career archetypes (T-shaped); quantum/DeFi/AI ethics frontiers; learning system design

Key Numbers จากหนังสือ

  • 9 case studies: ETF, US Equities, NASDAQ-100, CME Futures, S&P500 Options, Crypto Perps, FX, Commodities, Firm Characteristics
  • GBM ชนะ 6/9 case studies downstream (Sharpe); linear wins ด้วย Ridge ใน asset ที่ correlated features
  • Median holdout decay ~50% across strategies
  • Backtest Sharpe: gross 1.76 → net -62.61 (NASDAQ-100 intraday case study) — cost assumptions matter มาก

🇬🇧 English

Machine Learning for Algorithmic Trading (3rd Edition) by Stefan Jansen — the third edition, due June 2026, covering 27 chapters across 6 parts from data foundations to live production deployment.

Adds 4 entirely new chapters (Strategy Simulation Ch16, Portfolio Construction Ch17, RAG Ch22, Knowledge Graphs Ch23, Autonomous Agents Ch24) and an entirely new Part 6 (Production).

6-Part Structure

PartThemeChaptersNotebooks
1 — FoundationData & Strategy SetupCh 1–6~59
2 — FeaturesFeature EngineeringCh 7–10~42
3 — ModelsML Pipeline & SynthesisCh 11–15~88
4 — StrategyBacktest to ExecutionCh 16–20~60
5 — Advanced AIRL, RAG & AgentsCh 21–24~36
6 — ProductionDeploy & MonitorCh 25–27~31

Part 1 — Foundation

Ch 1 — The Process Is Your Edge The ML4T workflow as a 2-layer system: a stable data infrastructure plus an iterative research loop. Evidence boundary separates what can be tested from what must be assumed. Regime detection using GMM on AQR factor data produces a 1.3x volatility ratio between Risk-On and Risk-Off regimes. Causal inference and GenAI are integrated into the workflow as augmentation tools, not replacements for statistical rigor.

Learning Objectives:

  • Distinguish structural breaks, regimes, data drift, concept drift, and online detection, and explain why static trading models degrade in changing markets.
  • Explain the ML4T Workflow as a research-to-production system, including its data infrastructure foundation, scoping invariants, iterative research modules, and feedback loops from live trading back to research.
  • Define the evidence boundary between exploration and confirmation, and explain how trial logging, sealed holdouts, and selection-aware evaluation preserve research integrity.
  • Describe how causal inference and generative AI fit within a disciplined trading workflow, including the main benefits they provide and the new failure modes they introduce.
  • Apply regime thinking, implementability checks, and monitoring logic to diagnose strategy vulnerabilities and to adapt workflow discipline across independent and institutional settings.

Ch 2 — The Financial Data Universe Eight asset classes (equities, ETFs, fixed income, commodities, FX, crypto, options, derivatives). PIT correctness and bitemporal storage as core data engineering constraints. Storage benchmarks: Parquet achieves 3.4x compression vs CSV; DuckDB excels for SQL analytics; Polars ASOF joins run 3.8x faster than pandas.

Learning Objectives:

  • Distinguish among market, fundamental, and alternative data, and explain how dataset definitions shape what each source means in research and trading applications.
  • Compare the observability, conventions, and engineering constraints of major asset classes, and identify how market structure changes what can be measured and modeled.
  • Apply a financial data quality framework to diagnose common failure modes, especially point-in-time violations, survivorship bias, corporate action errors, and identifier mismatches.
  • Conduct vendor due diligence across data quality, legal and compliance, and technical and commercial dimensions.
  • Choose storage and query architectures that fit research and production needs, including when to use partitioned files, embedded analytical databases, or server-based systems.

Ch 3 — Market Microstructure LOB reconstruction from NASDAQ TotalView-ITCH (423M messages/day, 97.6% cancellation rate, 41% within 500ms). Bar sampling comparison: dollar bars achieve JB=84.7 vs 3,838 for time bars on NVDA — dollar bars are the recommended default for ML workflows. Lee-Ready trade classification: 96% accuracy vs 84% for tick test alone.

Learning Objectives:

  • Explain how liquidity, order types, market design, and intraday trading regimes shape observed market data and execution quality.
  • Distinguish among major market data products, including L1, L2, L3, TAQ, and enriched bar datasets, and choose data that matches a research or trading objective.
  • Parse message-based exchange data and reconstruct a venue-local limit order book while enforcing core lifecycle and accounting invariants.
  • Interpret key order-book measures and empirical microstructure patterns, while recognizing the limits of visible single-venue data.
  • Build and compare time-, activity-, and information-driven bars, including when trade-direction classification and Lee-Ready alignment are required.
  • Apply intraday data-quality and sessionization checks that prevent sequencing, timestamp, and calendar errors from contaminating downstream analysis.

Ch 4 — Fundamental and Alternative Data Bitemporal pipeline from SEC EDGAR for point-in-time correctness. Three-stage entity resolution: deterministic (LEI/CIK/FIGI) → probabilistic (string similarity) → embedding-based. Published return predictors lose ~58% of performance post-publication (McLean & Pontiff 2016). SEC 10-K NLP pipeline: MD&A (Item 7) + Risk Factors (Item 1A).

Learning Objectives:

  • Explain why point-in-time correctness and entity consistency are the core engineering constraints for fundamental and alternative data.
  • Implement bitemporal storage and as-of query patterns for revision-prone financial datasets.
  • Build a point-in-time corporate fundamentals pipeline from SEC EDGAR and XBRL filing histories.
  • Design time-valid entity, security, and contract mapping workflows using deterministic, probabilistic, and embedding-based resolution methods with appropriate QA gates.
  • Apply point-in-time alignment rules to macro, commodity, and on-chain datasets, including release timestamps, vintages, contract mapping, and finality policies.
  • Evaluate alternative datasets for incremental signal, data quality, legal and compliance risk, and commercial or engineering feasibility.
  • Extract, clean, and store SEC filing text as an auditable point-in-time corpus for downstream NLP feature engineering.

Ch 5 — Synthetic Financial Data Classical baselines (bootstrap, GBM, GARCH) as benchmarks. GAN variants: TimeGAN TSTR ratio 1.70; Tail-GAN VaR error 102%→11.5%; Sig-CWGAN TSTR 0.97. Diffusion-TS: KS statistic 0.06, TSTR 1.00, 2.6x volatility ratio between regimes. LLM tabular generation: GReaT/distilgpt2, TSTR AUC-ROC 0.84.

Learning Objectives:

  • Explain why trading research is path-limited and how adaptive search and multiple testing can inflate apparent backtest performance.
  • Use classical simulation baselines, including bootstrap and stochastic volatility models, as interpretable benchmarks for synthetic data generation.
  • Select a synthetic-data approach that matches the data structure and downstream objective, including learned generators for time series and tabular financial data.
  • Diagnose generated data using stylized-fact, dependence, and task-based evaluation methods, including Train-Synthetic-Test-Real comparisons.
  • Assess privacy and generator-specific risks, including leakage, bias amplification, overfitting to the generator, and limited scenario novelty.

Ch 6 — Strategy Research Framework Three-layer metric framework: model diagnostics / signal diagnostics / strategy outcomes. Five forms of data leakage. Walk-forward CV with temporal buffers. Baseline checkpoint (timing, coverage, trading-intensity sanity). Four-level trial taxonomy for run logging. Deflated Sharpe Ratio (DSR) as search-aware inference.

Learning Objectives:

  • Place a strategy idea on the strategy map by linking it to a strategy family, a plausible source of edge, and the dominant feasibility constraints and failure modes.
  • Define a versioned trading setup in decision-time terms: what is tradable, when decisions are made, what information is admissible, how scores become positions, and which constraints and costs are treated as material.
  • Define “better” economically and keep model diagnostics, signal diagnostics, and strategy outcomes in distinct roles during research and evaluation.
  • Design a time-series evaluation protocol that preserves chronology, prevents overlap leakage, and separates model selection from final performance estimation.
  • Establish a narrow baseline checkpoint with timing, coverage, and trading-intensity sanity checks before expanding the search space.
  • Keep search auditable, reproducible, and countable using a simple trial taxonomy and automatic run logging.

Part 2 — Features

Ch 7 — Defining the Learning Task Label engineering: fixed-horizon vs event-style constructions, overlap diagnosis, break-even cost checks. Feature-label evaluation fold by fold. Search accounting and multiple-testing adjustments. Mechanism plausibility to distinguish stable signal from confounded proxies.

Learning Objectives:

  • Build split-aware preprocessing pipelines that produce stable, auditable inputs for label and feature computation.
  • Define execution-consistent labels, including fixed-horizon and event-style constructions, and diagnose overlap, resolution behavior, and implied trading intensity.
  • Evaluate feature-label bundles fold by fold using appropriate diagnostics for continuous and discrete targets, including stability, shape, and feasibility.
  • Screen candidates for implementation feasibility using turnover, break-even cost, and liquidity or capacity checks.
  • Account for search bias by defining searched sets, separating exploration from confirmation, and applying appropriate multiple-testing adjustments to fold-level summaries.
  • Use mechanism plausibility checks to distinguish potentially stable signal channels from confounded proxies, timing artifacts, and aggregation effects.

Ch 8 — Financial Feature Engineering Three filters: horizon alignment, driver hypothesis (persistence/reversion/risk compensation/predictable-clock), role separation (signal vs state variable). Price-derived families: trend/momentum, reversal, volatility (Parkinson/Garman-Klass/Yang-Zhang, 5-14x efficiency gain), liquidity, microstructure. Cross-instrument: SPY-TLT correlation conditioning momentum IC with 17-percentage-point swing across regimes. Contextual: fundamentals, calendar (sin/cos), macro state. Degrees-of-freedom discipline: one knob at a time.

Learning Objectives:

  • Translate a trading hypothesis into a documented feature specification using horizon alignment, driver hypothesis, and role separation.
  • Choose a feature’s reference frame, representation, and aggregation to match the economic claim and execution horizon, and distinguish hypothesis-changing choices from noise-control choices.
  • Distinguish signal features from state variables and identify when each should be used marginally, as an interaction, or as a conditioning variable.
  • Design representative feature specifications across price-derived, structural and cross-instrument, and contextual data families, with explicit timing assumptions and failure modes.
  • Combine signals with state variables using gating, scaling, and conditional variants, and evaluate whether the interaction adds incremental information.
  • Apply point-in-time discipline to slow-moving and revised data, including reporting lags, event timing, and vintage-aware availability rules.
  • Control feature-search degrees of freedom using one-knob-at-a-time exploration, within-family deduplication, and multiple-testing-aware triage.

Ch 9 — Model-Based Feature Extraction Model-based features extracted from fitted procedures rather than raw price series. Families: diagnostics/stationarity, spectral/signal transforms, volatility (GARCH), uncertainty, regime (HMM), cross-sectional/panel. Key discipline: all fitting must happen within training windows (walk-forward) to preserve PIT correctness.

Learning Objectives:

  • Distinguish direct features from model-based features and judge when a fitted procedure adds useful information beyond raw series.
  • Use fitted procedures to extract forecasts, filtered states, residuals, conditional volatility, regime probabilities, and cross-sectional rankings.
  • Design a compact, interpretable set of model-based features from diagnostics, signal transforms, volatility models, uncertainty, and regime families.
  • Enforce point-in-time correctness by fitting and selecting models within training windows, using filtered rather than smoothed outputs.
  • Transform asset-level temporal outputs into cross-sectional, benchmark-adjusted, pairwise, and universe-level features.
  • Distinguish between exploratory time-series methods that are useful for research diagnosis and deployable features that meet PIT requirements.
  • Use uncertainty and regime outputs primarily as conditioning features, and recognize when they should not be treated as direct signals.

Ch 10 — Text Feature Engineering Evolution: lexical/TF-IDF → Word2Vec/GloVe static embeddings → LSTM/GRU sequential → Transformer contextual embeddings. Self-attention resolves polysemy and long-range dependence. Modern workflow: pre-trained checkpoint → domain adaptation → task fine-tuning. PIT-safe timestamps using model cutoffs and aggregation rules.

Learning Objectives:

  • Distinguish lexical features, static embeddings, sequential models, and Transformers in terms of the information each representation preserves and loses.
  • Explain how Transformer self-attention produces contextual embeddings and why this resolves key limitations of earlier NLP methods, including polysemy and long-range dependence.
  • Apply a practical financial NLP workflow that combines pre-trained checkpoints, domain adaptation when needed, and task fine-tuning for classification or extraction tasks.
  • Design text-derived features such as sentiment, narrative surprise, or structured event signals using point-in-time-safe timestamps, model cutoffs, and aggregation rules.
  • Evaluate text-derived signals using horizon-aware diagnostics, coverage-aware analysis, and event-time alignment rather than benchmark accuracy alone.
  • Use token-level attribution and related diagnostics to audit, debug, and stress-test NLP features before deployment.

Part 3 — Models

Ch 11 — The ML Pipeline Ridge (L2), LASSO (L1), Elastic Net as principled regularization for high-dimensional, correlated financial features. Ridge achieves 1.5x ICIR improvement over OLS at optimal regularization on ETF case study. Conformal prediction: CQR+ACI progressively closes conditional coverage gap during high-volatility periods (82.3%→88.1% for 90% target). SHAP four-layer protocol: sign consistency, magnitude plausibility, stability, regime-conditional analysis.

Learning Objectives:

  • Choose between regression and classification formulations based on how predictions will be translated into trading decisions.
  • Fit leakage-safe regularized linear models, including Ridge, LASSO, Elastic Net, and logistic regression, using point-in-time preprocessing and standardization.
  • Tune and evaluate linear models with walk-forward validation, temporal buffers, and nested cross-validation to reduce selection bias.
  • Interpret model behavior with SHAP-based diagnostics to assess feature importance, economic plausibility, and stability across refits.
  • Construct and evaluate conformal prediction intervals or prediction sets, and monitor where coverage degrades under non-stationary market conditions.
  • Use cross-case-study evidence to judge when linear models provide a strong baseline and when weak linear signal motivates more flexible models.

Ch 12 — Advanced Models for Tabular Data XGBoost (regularized objective, second-order approximation), LightGBM (GOSS, leaf-wise growth), CatBoost (ordered target statistics). GBMs beat linear baselines in 7-8/9 primary-label comparisons. TabM (rank-1 adapters) beats GBM on several case studies. Optuna TPE with pruning can halve computation. TreeSHAP interaction decomposition: momentum regime-conditional (collapses above 90th-percentile volatility).

Learning Objectives:

  • Explain how boosting differs from bagging and why sequential error correction makes GBMs effective for financial tabular data.
  • Select among XGBoost, LightGBM, and CatBoost based on categorical structure, compute environment, latency needs, and dataset size.
  • Choose appropriate GBM objectives and constraints for financial tasks, including pointwise regression, learning to rank, and monotonic constraints.
  • Tune GBMs efficiently with Optuna using pruning, multi-objective search, and time-series-aware validation.
  • Use TreeSHAP to analyze feature effects, interactions, instability, and drift in deployed tree-based models.
  • Evaluate when tabular deep learning alternatives such as TabPFN, TabM, and TabR are worth considering relative to GBMs.
  • Interpret cross-case-study evidence to decide when nonlinear tree models earn their added complexity relative to linear baselines.

Ch 13 — Deep Learning for Time Series LSTM/GRU limitations: sequential bottleneck, gradient degradation. N-BEATS: basis expansion for trend+seasonality. Critical finding (Zeng 2022): linear models outperform Transformers 20-50% across LTSF benchmarks — Transformers largely ignore temporal order. PatchTST, iTransformer, TFT as post-critique architectures. Foundation models: TSFMs underperform tree-based on return prediction but show promise for volatility/VaR. Cross-dataset verdict: DL rarely outperforms strong tabular baselines; crypto perps is clearest DL win.

Learning Objectives:

  • Explain why recurrent sequence models became a computational and optimization bottleneck for long-context forecasting tasks.
  • Compare the main temporal modeling philosophies — decomposition-based, attention-based, state-space, and strong linear baselines — and explain when each is most appropriate.
  • Use strong baselines and diagnostics, including linear models and walk-forward evaluation, to judge whether sequence-model complexity is warranted.
  • Distinguish the design logic of modern time-series Transformer variants, including PatchTST, iTransformer, and TFT, and relate those choices to multivariate structure, covariates, and forecast horizon.
  • Decide when a financial prediction problem should be framed as direct panel regression with sequential inputs rather than multi-step time-series forecasting.
  • Evaluate time-series foundation model adaptation modes for financial applications, including the implications of transfer mismatch and pretraining contamination.
  • Apply practical uncertainty estimation methods, including MC Dropout and deep ensembles, to support risk-aware trading decisions.

Ch 14 — Latent Factor Models Factor zoo problem: 400+ published factors, 65% failed replication (Hou, Xue, Zhang). PCA → IPCA (time-varying characteristic betas) → RP-PCA (pricing-error penalties) → CAE (nonlinear beta mapping) → adversarial SDF (no-arbitrage minimax). Yield curve: 3 PCA factors explain 95-99% variance. Equity latent factors: best IC ~0.073-0.074 but t-stat below Harvey-Liu-Zhu t>3.0 threshold.

Learning Objectives:

  • Distinguish covariance-explaining attribution factors from priced factors, and explain why that distinction matters for prediction, risk decomposition, and trading applications.
  • Implement PCA on asset returns, interpret principal components as latent risk dimensions or eigenportfolios, and diagnose key practical issues including covariance noise, component selection, and loading instability.
  • Explain how IPCA and RP-PCA extend PCA by introducing time-varying characteristic-based betas and pricing-error penalties, and evaluate when these extensions are preferable to plain variance maximization.
  • Implement and evaluate Conditional Autoencoders using walk-forward validation, ensemble averaging, and interpretability diagnostics such as SHAP, while recognizing their main failure modes.
  • Explain how adversarial SDF estimation enforces no-arbitrage restrictions, how its objective differs from CAE reconstruction, and when direct pricing-error minimization is likely to add value.
  • Compare latent factor methods across datasets and modeling objectives, and choose among PCA, IPCA, RP-PCA, CAE, and SDF approaches based on dimensionality, economic goal, and evaluation design.

Ch 15 — Causal Machine Learning DAGs for causal question formulation. Double Machine Learning (DML) for continuous treatment effect estimation with high-dimensional confounders. Bayesian Structural Time-Series (BSTS) for event impact via counterfactual baselines. Causal discovery: PCMCI, NOTEARS, VAR-LiNGAM. Distinguishing predictive signal from causal effect is a stability predictor.

Learning Objectives:

  • Define a causal research question in terms of treatment, outcome, estimand, and counterfactual, and use DAGs to encode assumptions and identify confounders.
  • Apply validation and refutation tools, including placebo tests, sensitivity analysis, and subset-stability checks, to assess credibility of causal estimates.
  • Use Double Machine Learning (DML) to estimate causal effects of continuous treatments in the presence of high-dimensional confounders.
  • Use Bayesian Structural Time-Series (BSTS) to estimate the impact of discrete events by constructing data-driven counterfactual baselines.
  • Use causal discovery methods such as PCMCI, NOTEARS, and VAR-LiNGAM to generate candidate structures and interpret their limitations.
  • Distinguish predictive signal from causal effect, and interpret cross-dataset evidence with attention to confounding and stability.

Part 4 — Strategy

Ch 16 — Strategy Simulation Backtest as falsification, not verification. Six failure modes: lookahead, survivorship, data snooping, unrealistic execution, cost underestimation, regime fragility. Non-ML baseline Sharpe 0.76 fails to beat 60/40. DSR, White’s Reality Check, Rademacher Anti-Serum (RAS). Key cross-dataset finding: IC champion ≠ Sharpe champion in most case studies; rebalancing cadence mediates IC-to-Sharpe translation more than model choice.

Learning Objectives:

  • Formalize a backtest as an explicit trading protocol covering signal timing, execution, rebalancing, sizing, costs, constraints, data availability, and benchmark choice.
  • Distinguish vectorized and event-driven backtesting in terms of protocol semantics, state dependence, and appropriate use cases rather than treating one style as universally superior.
  • Build and interpret a transparent non-ML baseline strategy that provides a stable reference point for later model comparisons.
  • Evaluate a strategy using a core reporting stack that includes gross and net performance, drawdowns, turnover, baseline comparison, cost sensitivity, and regime-sliced diagnostics.
  • Assess whether a reported Sharpe ratio is credible by separating fixed-strategy estimation error from search-aware inference and applying tools such as confidence intervals, Reality Check logic, and the Deflated Sharpe Ratio.
  • Explain why prediction quality and trading quality can diverge, and why IC alone is insufficient for selecting deployable strategies.

Ch 17 — Portfolio Construction Fundamental Law of Active Management: IC=0.03 still useful with sufficient breadth. Equal-weight famously hard to beat (DeMiguel, Garlappi, Uppal 2009). Kelly criterion → fractional Kelly (half/quarter sizing). HRP: agglomerative clustering + recursive bisection, avoids matrix inversion. No universal winner across allocators — depends on trading environment.

Learning Objectives:

  • Formalize portfolio construction in terms of expected returns, covariance, constraints, leverage, and rebalancing choices.
  • Identify the allocator-specific evaluation metrics that complement the Chapter 16 backtest report, especially benchmark-relative performance, concentration, diversification, and implementation stability.
  • Explain why simple baselines such as equal weight, inverse volatility, and related heuristic allocators remain demanding benchmarks.
  • Apply mean-variance optimization with shrinkage, realistic constraints, and turnover-aware regularization.
  • Interpret Kelly sizing, especially fractional Kelly, as a log-growth principle for translating signal strength into position size.
  • Build and evaluate hierarchical allocations that prioritize diversification stability over direct covariance-matrix inversion.
  • Compare allocators under a common research protocol while limiting allocator-selection bias and other forms of overfitting.

Ch 18 — Transaction Costs Cost taxonomy: explicit (commissions, financing, borrow, taxes) / implicit (spread, slippage, impact) / capacity costs. Range: <1 bps liquid ETFs to >100 bps illiquid options. Square-root impact model has strong empirical support. TWAP, VWAP, adaptive participation, Almgren-Chriss optimal execution. Alpha-to-go: fast-decaying signals may lose most value before positions are fully established.

Learning Objectives:

  • Identify where transaction costs enter the ML4T workflow, from factor evaluation and backtesting to portfolio construction, risk management, and production monitoring.
  • Distinguish explicit, implicit, and capacity-related trading costs and map each component to the relevant modeling choice.
  • Explain why execution costs vary with market regime, intraday liquidity, volatility, and execution urgency.
  • Choose and calibrate baseline backtest cost models, from spread-based assumptions to linear and square-root impact models, using conservative research defaults when direct execution data is unavailable.
  • Compare common execution approaches, including TWAP, VWAP, adaptive participation, and Almgren-Chriss-style optimal execution, in terms of impact, timing risk, and signal decay.
  • Use transaction cost analysis to decompose realized costs, diagnose model misspecification, and recalibrate ex ante assumptions.
  • Apply break-even turnover, minimum required edge, alpha-to-go, capacity analysis, and precommitted kill criteria to decide whether a strategy remains economically viable after costs.

Ch 19 — Risk Management Seven risk categories: market, factor, leverage, concentration, liquidity/capacity, model, operational. VaR/CVaR + regime-conditional estimates. Drawdown: Ulcer Index integrates depth and duration. Factor decomposition: market beta increases in volatile regimes when it’s most costly. Adaptive controls: GARCH/EWMA targeting, STVU. Graduated kill switches: watch at 5%→terminate at 30% drawdown.

Learning Objectives:

  • Measure tail risk with VaR and CVaR, including regime-conditional estimates and liquidity-aware interpretation.
  • Evaluate path risk using drawdown depth, drawdown duration, recovery time, and related path-dependent metrics.
  • Decompose portfolio risk into market, factor, sector, geographic, and macro exposures to distinguish intended from unintended bets.
  • Design and interpret historical, hypothetical, and reverse stress tests that challenge return, cost, volatility, and correlation assumptions together.
  • Build adaptive risk controls, including volatility targeting, exposure caps, and position-level exits, using only information available at decision time.
  • Specify kill switches, drift monitoring, and governance artifacts that turn a backtested strategy into a deployable trading system.

Ch 20 — Strategy Synthesis Nine case study verdicts: advance (US firm characteristics, FX), iterate (ETFs, NASDAQ-100), reframe (CME, S&P options, crypto). Key finding: NASDAQ-100 has weakest IC (0.008) but highest Sharpe (4.22). Median holdout Sharpe decay ~50% across studies. GBM is downstream champion in 6/9 studies. Cost-survival tiers: US firm characteristics survives above 100 bps; S&P options is negative at zero friction.

Learning Objectives:

  • Explain why the information coefficient is a useful entry metric for financial signals but does not translate directly into strategy performance.
  • Distinguish signal quality, portfolio translation, cost survival, and temporal stability as separate stages in strategy evaluation.
  • Compare how major model families perform after the full pipeline, and identify when robustness matters more than peak in-sample performance.
  • Diagnose holdout disappointment using distinct failure modes, including prediction decay, translation decay, and structural break.
  • Evaluate trading strategies under realistic implementation constraints, including instrument-appropriate cost models, capacity limits, and regime sensitivity.
  • Identify the highest-return next steps after a first research pass, including label redesign, ensembling, feature engineering, and iteration.
  • Apply a practitioner workflow that moves from data and diagnostics through signal generation, strategy construction, and validation with iteration.

Part 5 — Advanced AI

Ch 21 — Reinforcement Learning RL’s comparative advantage: execution, market making, hedging (not alpha discovery). MDP formulation: state space, continuous action spaces, reward engineering. PPO for execution (modest improvement over TWAP), SAC for market making. Deep Hedging via pfhedge: no-transaction bands emerge from cost-aware policies. Inverse RL for reward inference from order flow. Key risk: simulation-to-reality gap (non-stationarity, impact reflexivity).

Learning Objectives:

  • Formulate execution, market making, and derivatives hedging problems as partially observed Markov Decision Processes with economically coherent state, action, reward, and constraint design.
  • Match value-based and actor-critic RL methods to financial tasks based on action-space structure, sample-efficiency needs, and stability requirements.
  • Benchmark RL execution policies against TWAP and Almgren-Chriss-style schedules in controlled simulated and crypto-data settings, and interpret apparent gains with appropriate caution.
  • Compare deep hedging results with delta hedging and Whalley-Wilmott-style benchmarks under transaction costs using P&L distributions and tail-risk metrics.
  • Distinguish inverse reinforcement learning from behavior cloning and explain what reward inference can and cannot recover from observed trading behavior.
  • Diagnose the simulation-to-reality risks that govern deployability, including non-stationarity, reward hacking, market impact, partial observability, latency, and benchmark mismatch.

Ch 22 — RAG for Financial Research Hallucination is unacceptable in finance → RAG as architectural response. Structure-aware parsing (LlamaParse, Docling, Marker) vs naive fixed-size chunking. Domain-specific embeddings (Voyage AI finance, Fin-E5): FinMTEB benchmark shows consistent gap vs general models. Hybrid retrieval: semantic + BM25 via Reciprocal Rank Fusion. Re-ranking with cross-encoders. KG-guided retrieval: +24% correctness, -85% token consumption vs page-window retrieval (FinReflectKG-MultiHop). Retrieve-extract-compute-narrate for numeric questions.

Learning Objectives:

  • Explain why hallucination makes ungrounded LLM use unacceptable in finance and why retrieval-augmented generation is the core architectural response.
  • Design a financial RAG pipeline from document ingestion through retrieval and grounded generation, including structure-aware parsing, chunking, metadata, embeddings, and citation support.
  • Compare generic and domain-specific embedding models and evaluate retrieval quality on a target corpus using practical retrieval metrics and latency trade-offs.
  • Build a retrieval stack that combines semantic search, lexical search, metadata filtering, and re-ranking to improve precision and recall on financial documents.
  • Use constraint-based prompting, citation checks, and tool-verified computation to make generated answers more faithful, auditable, and numerically reliable.
  • Diagnose RAG failures by separating retrieval, context, synthesis, computation, and abstention errors, and apply targeted evaluation methods to improve each component.
  • Distinguish when to use RAG versus fine-tuning for financial applications, and explain how RAG functions as one tool within broader agentic workflows.

Ch 23 — Knowledge Graphs Graph justified for: multi-hop dependency queries, structural crowding analysis, temporal relationship evolution. Not justified for: single-entity lookups, narrative synthesis, sparse graphs. Five-stage LLM extraction pipeline with governance-first approach. Three-timestamp model (event/disclosure/extraction time) — disclosure time is the PIT visibility gate. GNNs: fraud detection production-ready; alpha generation experimental. Start with hand-crafted graph features.

Learning Objectives:

  • Distinguish financial questions that genuinely require graph structure from those better served by tabular databases.
  • Design a compact, typed, and auditable financial knowledge graph with stable entity identity, finite relationship types, and provenance contracts.
  • Build and validate LLM-assisted extraction pipelines that convert disclosures into replayable graph objects while enforcing governance-first quality controls.
  • Explain how Graph RAG differs from vector retrieval and implement safe relational query workflows using constrained Cypher generation.
  • Transform graph structure into leakage-aware machine learning features, including topology, crowding, concentration, and temporal dynamics.
  • Evaluate explicit knowledge graphs, statistical financial networks, and learned graph representations pragmatically against out-of-sample metrics and transaction costs.
  • Apply a three-timestamp framework and disclosure-time cutoff rules to prevent temporal leakage in graph queries and feature generation.
  • Make sound engineering choices about graph databases, ontology scope, query safety, and schema evolution for production financial workflows.

Ch 24 — Autonomous Agents ReAct (auditable loops) → Tree of Thoughts (parallel hypothesis exploration) → Reflexion (post-run critique). Explicit three-tier memory: working / session / persistent. Tool contracts as primary quality determinant. Context engineering: expose only phase-appropriate tools and PIT-consistent evidence. Warden security pattern: policy proxy with allowlists. Multi-agent forecasting: Neyman extremization + Platt calibration. Scope: read-only research agents (L1 decision support), not order execution.

Learning Objectives:

  • Explain when agentic workflows add value in finance and when conventional statistical or rules-based pipelines remain the better choice.
  • Distinguish the roles of ReAct, Tree of Thoughts, and Reflexion, and choose appropriate reasoning budgets and compositions for evidence-driven financial tasks.
  • Design explicit agent state and memory schemas that support provenance, checkpointing, replay, schema evolution, and post-outcome evaluation.
  • Specify robust tool contracts, structured outputs, source policies, and context-engineering rules for read-only research and forecasting agents.
  • Compare framework styles and define a migration path from notebook prototypes to operational forecasting services without sacrificing visibility and control.
  • Build a single-agent evidence-first research workflow with quality gates, abstention behavior, and replayable artifacts.
  • Design and evaluate multi-agent forecasting pipelines using specialist diversity, aggregation, calibration, baselines, and ablation analysis.
  • Define the operational, statistical, and security controls required to make financial-agent outputs decision-grade, including point-in-time integrity, contamination-aware testing, observability, policy gates, and human approval boundaries.

Part 6 — Production

Ch 25 — Live Trading Systems Technical divergence between backtest and live is the primary self-inflicted failure mode. Unified framework: same strategy code in ml4t-backtest and ml4t-live. Brokers: IBKR (SmartRouting, no PFOF), Alpaca (commission-free REST API), QuantConnect (LEAN engine). Order lifecycle: 11-state machine with 23 valid transitions. Pipeline verification: feed identical inputs through both systems and compare at each stage. Crypto case study: LightGBM classifier deployed to OKX with prediction-flip exits.

Learning Objectives:

  • Explain why technical divergence between research and production is a primary failure mode in live trading, and how a unified framework reduces that risk.
  • Design a dual-mode, event-driven trading architecture in which deterministic strategy logic runs unchanged in backtest, paper, and live execution.
  • Compare broker, exchange, and managed-platform deployment paths and evaluate them in terms of asset coverage, execution quality, operational burden, and control.
  • Model order handling as an explicit state machine that supports partial fills, cancellations, rejections, reconciliation, and idempotent crash recovery.
  • Verify technical parity across the full pipeline, from raw data and features to predictions, sizing decisions, and generated orders.
  • Plan a staged live rollout using pre-flight checks, shadow or paper trading, kill switches, reconciliation procedures, and awareness of venue and jurisdictional constraints.

Ch 26 — MLOps and Governance Technical failure (pipeline divergence) vs statistical failure (model decay) — requires different diagnostics. Three drift types: data drift (PSI/KS), feature drift (SHAP monitoring), concept drift (ADWIN/DDM). Shadow mode evaluation before champion-challenger promotion. Minimum effect size: 0.2-0.3 Sharpe improvement required for promotion. Four-level circuit breakers: trade / strategy / portfolio / system. MLOps stack: Feast (feature store), DVC (data versioning), MLflow (model registry, SR 11-7 compliance).

Learning Objectives:

  • Distinguish technical pipeline divergence from statistical performance decay and choose the corresponding diagnostic and remediation response.
  • Build a live-monitoring framework that combines data-integrity gates, rolling performance metrics, backtest-to-live realization ratios, and execution-quality tracking.
  • Apply drift diagnostics to production artifacts, including PSI, K-S, SHAP-based feature monitoring, and online change-detection algorithms.
  • Design a safe model-update workflow using shadow mode, champion-challenger evaluation, explicit promotion criteria, and tested rollback procedures.
  • Implement multi-level circuit breakers across trade, strategy, portfolio, and system layers, with clear recovery and resume criteria.
  • Evaluate and right-size the supporting MLOps stack, including feature stores, data versioning and lineage, model registries, and observability tooling.

Ch 27 — The Systematic Edge Process is the durable edge. Five quant archetypes: researcher, trader, developer, portfolio manager, risk manager. Quantamental roles (systematic + fundamental) as the dominant industry trend. T-shaped expertise. Frontiers: quantum computing (mid-2030s for meaningful advantage), DeFi (live alpha today from on-chain data, AMMs), AI ethics (EU AI Act now a compliance requirement). Burnout as professional risk. Four career failure modes: over-specialization, underestimating soft skills, ignoring regulation, perpetual learning without application.


Cross-Dataset Key Numbers

MetricValueSource
Gross Sharpe (NASDAQ-100 intraday)+1.76Ch 16
Net Sharpe (NASDAQ-100 intraday)-62.61Ch 16
Median holdout decay~50%Ch 20
GBM wins (downstream Sharpe)6/9 case studiesCh 20
DSR adjustments: materially change conclusionsseveral candidatesCh 16
US firm char: validation Sharpe+3.03Ch 20
US firm char: holdout Sharpe+2.52Ch 20
FX: only study where holdout > validationCh 20