ML4T Book 3rd Edition

🇹🇭 ภาษาไทย

Machine Learning for Algorithmic Trading (3rd Edition) โดย Stefan Jansen — หนังสือฉบับที่สาม กำหนดออก June 2026 มี 27 chapters แบ่งเป็น 6 ส่วน ครอบคลุมตั้งแต่ data foundations ถึง production deployment

เพิ่ม 4 chapters ใหม่ทั้งหมด (Ch 16, 17 Strategy Simulation & Portfolio Construction, Ch 22-24 RAG/KG/Agents) และ Part 6 Production ใหม่ทั้ง part

โครงสร้าง 6 ส่วน

Part	หัวข้อ	Chapters
1 — Foundation	Data & Strategy Setup	Ch 1–6
2 — Features	Feature Engineering	Ch 7–10
3 — Models	ML Pipeline & Synthesis	Ch 11–15
4 — Strategy	Backtest to Execution	Ch 16–20
5 — Advanced AI	RL, RAG & Agents	Ch 21–24
6 — Production	Deploy & Monitor	Ch 25–27

Part 1 — Foundation (Ch 1–6)

Ch	ชื่อ	highlight
1	The Process Is Your Edge	ML4T workflow แบบ 2-layer; regime detection ด้วย GMM (volatility ratio Risk-On vs Risk-Off = 1.3x); กำหนด evidence boundary
2	The Financial Data Universe	8 asset classes; bitemporal PIT; storage: Parquet 3.4x compression, Polars ASOF 3.8x faster
3	Market Microstructure	LOB reconstruction (NASDAQ ITCH 423M msg/day, 97.6% cancel rate); dollar bars ดีที่สุด (JB=84.7 vs 3838 time bars)
4	Fundamental & Alternative Data	bitemporal SEC EDGAR pipeline; entity resolution 3 stage; published factors สูญเสีย ~58% หลัง publication
5	Synthetic Financial Data	TimeGAN TSTR=1.70; Tail-GAN VaR error 102%→11.5%; Diffusion-TS KS=0.06; GReaT/distilgpt2 AUC=0.84
6	Strategy Research Framework	3-layer metrics; 5 leakage types; walk-forward CV; baseline checkpoint; run logging + DSR

Part 2 — Features (Ch 7–10)

Ch	ชื่อ	highlight
7	Defining the Learning Task	label engineering; feature-label evaluation fold-by-fold; search accounting; correlation→causality
8	Financial Feature Engineering	3 filters (horizon/driver/role); price-derived, cross-instrument, contextual families; SPY-TLT regime conditioning: 17% IC swing
9	Model-Based Feature Extraction	diagnostics, spectral, volatility, uncertainty, regime, cross-sectional features; walk-forward fitting
10	Text Feature Engineering	lexical→static embeddings→sequential→Transformers; financial NLP workflow; PIT-safe text features

Part 3 — Models (Ch 11–15)

Ch	ชื่อ	highlight
11	The ML Pipeline	Ridge/LASSO/Elastic Net; Ridge 1.5x ICIR vs OLS; conformal prediction (CQR+ACI 88.1%); SHAP diagnostics
12	Advanced Models for Tabular Data	XGBoost, LightGBM, CatBoost; GBM beats linear ใน 7-8/9 case studies; TabM competitive; TreeSHAP
13	Deep Learning for Time Series	N-BEATS, PatchTST, iTransformer, TFT; DL rarely beats GBM baseline; linear model beats Transformer (Zeng 2022)
14	Latent Factor Models	PCA, IPCA, RP-PCA, CAE, adversarial SDF; factor zoo problem (400+ factors, 65% ล้มเหลว); CAE IC +0.073
15	Causal Machine Learning	DML; BSTS event impact; PCMCI/NOTEARS causal discovery; predictive vs causal signal

Part 4 — Strategy (Ch 16–20)

Ch	ชื่อ	highlight
16	Strategy Simulation	backtest = falsification; 6 failure modes; DSR; IC champion ≠ Sharpe champion; cadence mediates IC→Sharpe
17	Portfolio Construction	equal-weight hard to beat (DeMiguel 2009); Kelly criterion; HRP (no matrix inversion); no universal allocator winner
18	Transaction Costs	cost taxonomy; square-root impact model; Almgren-Chriss optimal execution; TCA feedback loop; alpha-to-go
19	Risk Management	VaR/CVaR; drawdown path risk; factor decomposition; stress testing; GARCH/EWMA adaptive controls; kill switches
20	Strategy Synthesis	9 case study verdicts; NASDAQ-100 IC=0.008 แต่ Sharpe=4.22; GBM ชนะ 6/9; median holdout decay ~50%

Part 5 — Advanced AI (Ch 21–24)

Ch	ชื่อ	highlight
21	Reinforcement Learning	MDP formulation; DQN→PPO→SAC; optimal execution; market making; deep hedging (pfhedge); IRL
22	RAG for Financial Research	hallucination → RAG solution; structure-aware parsing; hybrid retrieval + BM25; KG-guided +24% accuracy -85% tokens
23	Knowledge Graphs	graph justified เมื่อ multi-hop query; LLM extraction pipeline; Graph RAG; institutional crowding features
24	Autonomous Agents	ReAct/ToT/Reflexion; explicit state + memory schema; tool contracts; multi-agent forecasting; Warden security pattern

Part 6 — Production (Ch 25–27)

Ch	ชื่อ	highlight
25	Live Trading Systems	unified backtest↔live framework; IBKR/Alpaca/QuantConnect; order state machine 11 states; pipeline verification
26	MLOps & Governance	technical vs statistical failure distinction; PSI/KS/SHAP drift; shadow mode; circuit breakers; MLflow/DVC/Feast
27	The Systematic Edge	process = durable edge; quant career archetypes (T-shaped); quantum/DeFi/AI ethics frontiers; learning system design

Key Numbers จากหนังสือ

9 case studies: ETF, US Equities, NASDAQ-100, CME Futures, S&P500 Options, Crypto Perps, FX, Commodities, Firm Characteristics
GBM ชนะ 6/9 case studies downstream (Sharpe); linear wins ด้วย Ridge ใน asset ที่ correlated features
Median holdout decay ~50% across strategies
Backtest Sharpe: gross 1.76 → net -62.61 (NASDAQ-100 intraday case study) — cost assumptions matter มาก

ML4T Platform — platform ที่หนังสือเป็นส่วนหนึ่ง
ML4T Book 2nd Edition — เวอร์ชัน 2020 (23 chapters)
Algorithmic Trading — domain concept
ML4T Trading Approaches — analysis: แนวทางเทรดจาก ML4T

🇬🇧 English

Machine Learning for Algorithmic Trading (3rd Edition) by Stefan Jansen — the third edition, due June 2026, covering 27 chapters across 6 parts from data foundations to live production deployment.

Adds 4 entirely new chapters (Strategy Simulation Ch16, Portfolio Construction Ch17, RAG Ch22, Knowledge Graphs Ch23, Autonomous Agents Ch24) and an entirely new Part 6 (Production).

6-Part Structure

Part	Theme	Chapters	Notebooks
1 — Foundation	Data & Strategy Setup	Ch 1–6	~59
2 — Features	Feature Engineering	Ch 7–10	~42
3 — Models	ML Pipeline & Synthesis	Ch 11–15	~88
4 — Strategy	Backtest to Execution	Ch 16–20	~60
5 — Advanced AI	RL, RAG & Agents	Ch 21–24	~36
6 — Production	Deploy & Monitor	Ch 25–27	~31

Part 1 — Foundation

Ch 1 — The Process Is Your Edge The ML4T workflow as a 2-layer system: a stable data infrastructure plus an iterative research loop. Evidence boundary separates what can be tested from what must be assumed. Regime detection using GMM on AQR factor data produces a 1.3x volatility ratio between Risk-On and Risk-Off regimes. Causal inference and GenAI are integrated into the workflow as augmentation tools, not replacements for statistical rigor.

Learning Objectives:

Distinguish structural breaks, regimes, data drift, concept drift, and online detection, and explain why static trading models degrade in changing markets.
Explain the ML4T Workflow as a research-to-production system, including its data infrastructure foundation, scoping invariants, iterative research modules, and feedback loops from live trading back to research.
Define the evidence boundary between exploration and confirmation, and explain how trial logging, sealed holdouts, and selection-aware evaluation preserve research integrity.
Describe how causal inference and generative AI fit within a disciplined trading workflow, including the main benefits they provide and the new failure modes they introduce.
Apply regime thinking, implementability checks, and monitoring logic to diagnose strategy vulnerabilities and to adapt workflow discipline across independent and institutional settings.

Ch 2 — The Financial Data Universe Eight asset classes (equities, ETFs, fixed income, commodities, FX, crypto, options, derivatives). PIT correctness and bitemporal storage as core data engineering constraints. Storage benchmarks: Parquet achieves 3.4x compression vs CSV; DuckDB excels for SQL analytics; Polars ASOF joins run 3.8x faster than pandas.

Learning Objectives:

Distinguish among market, fundamental, and alternative data, and explain how dataset definitions shape what each source means in research and trading applications.
Compare the observability, conventions, and engineering constraints of major asset classes, and identify how market structure changes what can be measured and modeled.
Apply a financial data quality framework to diagnose common failure modes, especially point-in-time violations, survivorship bias, corporate action errors, and identifier mismatches.
Conduct vendor due diligence across data quality, legal and compliance, and technical and commercial dimensions.
Choose storage and query architectures that fit research and production needs, including when to use partitioned files, embedded analytical databases, or server-based systems.

Ch 3 — Market Microstructure LOB reconstruction from NASDAQ TotalView-ITCH (423M messages/day, 97.6% cancellation rate, 41% within 500ms). Bar sampling comparison: dollar bars achieve JB=84.7 vs 3,838 for time bars on NVDA — dollar bars are the recommended default for ML workflows. Lee-Ready trade classification: 96% accuracy vs 84% for tick test alone.

Learning Objectives:

Explain how liquidity, order types, market design, and intraday trading regimes shape observed market data and execution quality.
Distinguish among major market data products, including L1, L2, L3, TAQ, and enriched bar datasets, and choose data that matches a research or trading objective.
Parse message-based exchange data and reconstruct a venue-local limit order book while enforcing core lifecycle and accounting invariants.
Interpret key order-book measures and empirical microstructure patterns, while recognizing the limits of visible single-venue data.
Build and compare time-, activity-, and information-driven bars, including when trade-direction classification and Lee-Ready alignment are required.
Apply intraday data-quality and sessionization checks that prevent sequencing, timestamp, and calendar errors from contaminating downstream analysis.

Ch 4 — Fundamental and Alternative Data Bitemporal pipeline from SEC EDGAR for point-in-time correctness. Three-stage entity resolution: deterministic (LEI/CIK/FIGI) → probabilistic (string similarity) → embedding-based. Published return predictors lose ~58% of performance post-publication (McLean & Pontiff 2016). SEC 10-K NLP pipeline: MD&A (Item 7) + Risk Factors (Item 1A).

Learning Objectives:

Explain why point-in-time correctness and entity consistency are the core engineering constraints for fundamental and alternative data.
Implement bitemporal storage and as-of query patterns for revision-prone financial datasets.
Build a point-in-time corporate fundamentals pipeline from SEC EDGAR and XBRL filing histories.
Design time-valid entity, security, and contract mapping workflows using deterministic, probabilistic, and embedding-based resolution methods with appropriate QA gates.
Apply point-in-time alignment rules to macro, commodity, and on-chain datasets, including release timestamps, vintages, contract mapping, and finality policies.
Evaluate alternative datasets for incremental signal, data quality, legal and compliance risk, and commercial or engineering feasibility.
Extract, clean, and store SEC filing text as an auditable point-in-time corpus for downstream NLP feature engineering.

Ch 5 — Synthetic Financial Data Classical baselines (bootstrap, GBM, GARCH) as benchmarks. GAN variants: TimeGAN TSTR ratio 1.70; Tail-GAN VaR error 102%→11.5%; Sig-CWGAN TSTR 0.97. Diffusion-TS: KS statistic 0.06, TSTR 1.00, 2.6x volatility ratio between regimes. LLM tabular generation: GReaT/distilgpt2, TSTR AUC-ROC 0.84.

Learning Objectives:

Explain why trading research is path-limited and how adaptive search and multiple testing can inflate apparent backtest performance.
Use classical simulation baselines, including bootstrap and stochastic volatility models, as interpretable benchmarks for synthetic data generation.
Select a synthetic-data approach that matches the data structure and downstream objective, including learned generators for time series and tabular financial data.
Diagnose generated data using stylized-fact, dependence, and task-based evaluation methods, including Train-Synthetic-Test-Real comparisons.
Assess privacy and generator-specific risks, including leakage, bias amplification, overfitting to the generator, and limited scenario novelty.

Ch 6 — Strategy Research Framework Three-layer metric framework: model diagnostics / signal diagnostics / strategy outcomes. Five forms of data leakage. Walk-forward CV with temporal buffers. Baseline checkpoint (timing, coverage, trading-intensity sanity). Four-level trial taxonomy for run logging. Deflated Sharpe Ratio (DSR) as search-aware inference.

Learning Objectives:

Place a strategy idea on the strategy map by linking it to a strategy family, a plausible source of edge, and the dominant feasibility constraints and failure modes.
Define a versioned trading setup in decision-time terms: what is tradable, when decisions are made, what information is admissible, how scores become positions, and which constraints and costs are treated as material.
Define “better” economically and keep model diagnostics, signal diagnostics, and strategy outcomes in distinct roles during research and evaluation.
Design a time-series evaluation protocol that preserves chronology, prevents overlap leakage, and separates model selection from final performance estimation.
Establish a narrow baseline checkpoint with timing, coverage, and trading-intensity sanity checks before expanding the search space.
Keep search auditable, reproducible, and countable using a simple trial taxonomy and automatic run logging.

Part 2 — Features

Ch 7 — Defining the Learning Task Label engineering: fixed-horizon vs event-style constructions, overlap diagnosis, break-even cost checks. Feature-label evaluation fold by fold. Search accounting and multiple-testing adjustments. Mechanism plausibility to distinguish stable signal from confounded proxies.

Learning Objectives:

Build split-aware preprocessing pipelines that produce stable, auditable inputs for label and feature computation.
Define execution-consistent labels, including fixed-horizon and event-style constructions, and diagnose overlap, resolution behavior, and implied trading intensity.
Evaluate feature-label bundles fold by fold using appropriate diagnostics for continuous and discrete targets, including stability, shape, and feasibility.
Screen candidates for implementation feasibility using turnover, break-even cost, and liquidity or capacity checks.
Account for search bias by defining searched sets, separating exploration from confirmation, and applying appropriate multiple-testing adjustments to fold-level summaries.
Use mechanism plausibility checks to distinguish potentially stable signal channels from confounded proxies, timing artifacts, and aggregation effects.

Ch 8 — Financial Feature Engineering Three filters: horizon alignment, driver hypothesis (persistence/reversion/risk compensation/predictable-clock), role separation (signal vs state variable). Price-derived families: trend/momentum, reversal, volatility (Parkinson/Garman-Klass/Yang-Zhang, 5-14x efficiency gain), liquidity, microstructure. Cross-instrument: SPY-TLT correlation conditioning momentum IC with 17-percentage-point swing across regimes. Contextual: fundamentals, calendar (sin/cos), macro state. Degrees-of-freedom discipline: one knob at a time.

Learning Objectives:

Translate a trading hypothesis into a documented feature specification using horizon alignment, driver hypothesis, and role separation.
Choose a feature’s reference frame, representation, and aggregation to match the economic claim and execution horizon, and distinguish hypothesis-changing choices from noise-control choices.
Distinguish signal features from state variables and identify when each should be used marginally, as an interaction, or as a conditioning variable.
Design representative feature specifications across price-derived, structural and cross-instrument, and contextual data families, with explicit timing assumptions and failure modes.
Combine signals with state variables using gating, scaling, and conditional variants, and evaluate whether the interaction adds incremental information.
Apply point-in-time discipline to slow-moving and revised data, including reporting lags, event timing, and vintage-aware availability rules.
Control feature-search degrees of freedom using one-knob-at-a-time exploration, within-family deduplication, and multiple-testing-aware triage.

Ch 9 — Model-Based Feature Extraction Model-based features extracted from fitted procedures rather than raw price series. Families: diagnostics/stationarity, spectral/signal transforms, volatility (GARCH), uncertainty, regime (HMM), cross-sectional/panel. Key discipline: all fitting must happen within training windows (walk-forward) to preserve PIT correctness.

Learning Objectives:

Distinguish direct features from model-based features and judge when a fitted procedure adds useful information beyond raw series.
Use fitted procedures to extract forecasts, filtered states, residuals, conditional volatility, regime probabilities, and cross-sectional rankings.
Design a compact, interpretable set of model-based features from diagnostics, signal transforms, volatility models, uncertainty, and regime families.
Enforce point-in-time correctness by fitting and selecting models within training windows, using filtered rather than smoothed outputs.
Transform asset-level temporal outputs into cross-sectional, benchmark-adjusted, pairwise, and universe-level features.
Distinguish between exploratory time-series methods that are useful for research diagnosis and deployable features that meet PIT requirements.
Use uncertainty and regime outputs primarily as conditioning features, and recognize when they should not be treated as direct signals.

Ch 10 — Text Feature Engineering Evolution: lexical/TF-IDF → Word2Vec/GloVe static embeddings → LSTM/GRU sequential → Transformer contextual embeddings. Self-attention resolves polysemy and long-range dependence. Modern workflow: pre-trained checkpoint → domain adaptation → task fine-tuning. PIT-safe timestamps using model cutoffs and aggregation rules.

Learning Objectives:

Distinguish lexical features, static embeddings, sequential models, and Transformers in terms of the information each representation preserves and loses.
Explain how Transformer self-attention produces contextual embeddings and why this resolves key limitations of earlier NLP methods, including polysemy and long-range dependence.
Apply a practical financial NLP workflow that combines pre-trained checkpoints, domain adaptation when needed, and task fine-tuning for classification or extraction tasks.
Design text-derived features such as sentiment, narrative surprise, or structured event signals using point-in-time-safe timestamps, model cutoffs, and aggregation rules.
Evaluate text-derived signals using horizon-aware diagnostics, coverage-aware analysis, and event-time alignment rather than benchmark accuracy alone.
Use token-level attribution and related diagnostics to audit, debug, and stress-test NLP features before deployment.

Part 3 — Models

Ch 11 — The ML Pipeline Ridge (L2), LASSO (L1), Elastic Net as principled regularization for high-dimensional, correlated financial features. Ridge achieves 1.5x ICIR improvement over OLS at optimal regularization on ETF case study. Conformal prediction: CQR+ACI progressively closes conditional coverage gap during high-volatility periods (82.3%→88.1% for 90% target). SHAP four-layer protocol: sign consistency, magnitude plausibility, stability, regime-conditional analysis.

Learning Objectives:

Choose between regression and classification formulations based on how predictions will be translated into trading decisions.
Fit leakage-safe regularized linear models, including Ridge, LASSO, Elastic Net, and logistic regression, using point-in-time preprocessing and standardization.
Tune and evaluate linear models with walk-forward validation, temporal buffers, and nested cross-validation to reduce selection bias.
Interpret model behavior with SHAP-based diagnostics to assess feature importance, economic plausibility, and stability across refits.
Construct and evaluate conformal prediction intervals or prediction sets, and monitor where coverage degrades under non-stationary market conditions.
Use cross-case-study evidence to judge when linear models provide a strong baseline and when weak linear signal motivates more flexible models.

Ch 12 — Advanced Models for Tabular Data XGBoost (regularized objective, second-order approximation), LightGBM (GOSS, leaf-wise growth), CatBoost (ordered target statistics). GBMs beat linear baselines in 7-8/9 primary-label comparisons. TabM (rank-1 adapters) beats GBM on several case studies. Optuna TPE with pruning can halve computation. TreeSHAP interaction decomposition: momentum regime-conditional (collapses above 90th-percentile volatility).

Learning Objectives:

Explain how boosting differs from bagging and why sequential error correction makes GBMs effective for financial tabular data.
Select among XGBoost, LightGBM, and CatBoost based on categorical structure, compute environment, latency needs, and dataset size.
Choose appropriate GBM objectives and constraints for financial tasks, including pointwise regression, learning to rank, and monotonic constraints.
Tune GBMs efficiently with Optuna using pruning, multi-objective search, and time-series-aware validation.
Use TreeSHAP to analyze feature effects, interactions, instability, and drift in deployed tree-based models.
Evaluate when tabular deep learning alternatives such as TabPFN, TabM, and TabR are worth considering relative to GBMs.
Interpret cross-case-study evidence to decide when nonlinear tree models earn their added complexity relative to linear baselines.

Ch 13 — Deep Learning for Time Series LSTM/GRU limitations: sequential bottleneck, gradient degradation. N-BEATS: basis expansion for trend+seasonality. Critical finding (Zeng 2022): linear models outperform Transformers 20-50% across LTSF benchmarks — Transformers largely ignore temporal order. PatchTST, iTransformer, TFT as post-critique architectures. Foundation models: TSFMs underperform tree-based on return prediction but show promise for volatility/VaR. Cross-dataset verdict: DL rarely outperforms strong tabular baselines; crypto perps is clearest DL win.

Learning Objectives:

Explain why recurrent sequence models became a computational and optimization bottleneck for long-context forecasting tasks.
Compare the main temporal modeling philosophies — decomposition-based, attention-based, state-space, and strong linear baselines — and explain when each is most appropriate.
Use strong baselines and diagnostics, including linear models and walk-forward evaluation, to judge whether sequence-model complexity is warranted.
Distinguish the design logic of modern time-series Transformer variants, including PatchTST, iTransformer, and TFT, and relate those choices to multivariate structure, covariates, and forecast horizon.
Decide when a financial prediction problem should be framed as direct panel regression with sequential inputs rather than multi-step time-series forecasting.
Evaluate time-series foundation model adaptation modes for financial applications, including the implications of transfer mismatch and pretraining contamination.
Apply practical uncertainty estimation methods, including MC Dropout and deep ensembles, to support risk-aware trading decisions.

Ch 14 — Latent Factor Models Factor zoo problem: 400+ published factors, 65% failed replication (Hou, Xue, Zhang). PCA → IPCA (time-varying characteristic betas) → RP-PCA (pricing-error penalties) → CAE (nonlinear beta mapping) → adversarial SDF (no-arbitrage minimax). Yield curve: 3 PCA factors explain 95-99% variance. Equity latent factors: best IC ~0.073-0.074 but t-stat below Harvey-Liu-Zhu t>3.0 threshold.

Learning Objectives:

Distinguish covariance-explaining attribution factors from priced factors, and explain why that distinction matters for prediction, risk decomposition, and trading applications.
Implement PCA on asset returns, interpret principal components as latent risk dimensions or eigenportfolios, and diagnose key practical issues including covariance noise, component selection, and loading instability.
Explain how IPCA and RP-PCA extend PCA by introducing time-varying characteristic-based betas and pricing-error penalties, and evaluate when these extensions are preferable to plain variance maximization.
Implement and evaluate Conditional Autoencoders using walk-forward validation, ensemble averaging, and interpretability diagnostics such as SHAP, while recognizing their main failure modes.
Explain how adversarial SDF estimation enforces no-arbitrage restrictions, how its objective differs from CAE reconstruction, and when direct pricing-error minimization is likely to add value.
Compare latent factor methods across datasets and modeling objectives, and choose among PCA, IPCA, RP-PCA, CAE, and SDF approaches based on dimensionality, economic goal, and evaluation design.

Ch 15 — Causal Machine Learning DAGs for causal question formulation. Double Machine Learning (DML) for continuous treatment effect estimation with high-dimensional confounders. Bayesian Structural Time-Series (BSTS) for event impact via counterfactual baselines. Causal discovery: PCMCI, NOTEARS, VAR-LiNGAM. Distinguishing predictive signal from causal effect is a stability predictor.

Learning Objectives:

Define a causal research question in terms of treatment, outcome, estimand, and counterfactual, and use DAGs to encode assumptions and identify confounders.
Apply validation and refutation tools, including placebo tests, sensitivity analysis, and subset-stability checks, to assess credibility of causal estimates.
Use Double Machine Learning (DML) to estimate causal effects of continuous treatments in the presence of high-dimensional confounders.
Use Bayesian Structural Time-Series (BSTS) to estimate the impact of discrete events by constructing data-driven counterfactual baselines.
Use causal discovery methods such as PCMCI, NOTEARS, and VAR-LiNGAM to generate candidate structures and interpret their limitations.
Distinguish predictive signal from causal effect, and interpret cross-dataset evidence with attention to confounding and stability.

Part 4 — Strategy

Ch 16 — Strategy Simulation Backtest as falsification, not verification. Six failure modes: lookahead, survivorship, data snooping, unrealistic execution, cost underestimation, regime fragility. Non-ML baseline Sharpe 0.76 fails to beat 60/40. DSR, White’s Reality Check, Rademacher Anti-Serum (RAS). Key cross-dataset finding: IC champion ≠ Sharpe champion in most case studies; rebalancing cadence mediates IC-to-Sharpe translation more than model choice.

Learning Objectives:

Formalize a backtest as an explicit trading protocol covering signal timing, execution, rebalancing, sizing, costs, constraints, data availability, and benchmark choice.
Distinguish vectorized and event-driven backtesting in terms of protocol semantics, state dependence, and appropriate use cases rather than treating one style as universally superior.
Build and interpret a transparent non-ML baseline strategy that provides a stable reference point for later model comparisons.
Evaluate a strategy using a core reporting stack that includes gross and net performance, drawdowns, turnover, baseline comparison, cost sensitivity, and regime-sliced diagnostics.
Assess whether a reported Sharpe ratio is credible by separating fixed-strategy estimation error from search-aware inference and applying tools such as confidence intervals, Reality Check logic, and the Deflated Sharpe Ratio.
Explain why prediction quality and trading quality can diverge, and why IC alone is insufficient for selecting deployable strategies.

Ch 17 — Portfolio Construction Fundamental Law of Active Management: IC=0.03 still useful with sufficient breadth. Equal-weight famously hard to beat (DeMiguel, Garlappi, Uppal 2009). Kelly criterion → fractional Kelly (half/quarter sizing). HRP: agglomerative clustering + recursive bisection, avoids matrix inversion. No universal winner across allocators — depends on trading environment.

Learning Objectives:

Formalize portfolio construction in terms of expected returns, covariance, constraints, leverage, and rebalancing choices.
Identify the allocator-specific evaluation metrics that complement the Chapter 16 backtest report, especially benchmark-relative performance, concentration, diversification, and implementation stability.
Explain why simple baselines such as equal weight, inverse volatility, and related heuristic allocators remain demanding benchmarks.
Apply mean-variance optimization with shrinkage, realistic constraints, and turnover-aware regularization.
Interpret Kelly sizing, especially fractional Kelly, as a log-growth principle for translating signal strength into position size.
Build and evaluate hierarchical allocations that prioritize diversification stability over direct covariance-matrix inversion.
Compare allocators under a common research protocol while limiting allocator-selection bias and other forms of overfitting.

Ch 18 — Transaction Costs Cost taxonomy: explicit (commissions, financing, borrow, taxes) / implicit (spread, slippage, impact) / capacity costs. Range: <1 bps liquid ETFs to >100 bps illiquid options. Square-root impact model has strong empirical support. TWAP, VWAP, adaptive participation, Almgren-Chriss optimal execution. Alpha-to-go: fast-decaying signals may lose most value before positions are fully established.

Learning Objectives:

Identify where transaction costs enter the ML4T workflow, from factor evaluation and backtesting to portfolio construction, risk management, and production monitoring.
Distinguish explicit, implicit, and capacity-related trading costs and map each component to the relevant modeling choice.
Explain why execution costs vary with market regime, intraday liquidity, volatility, and execution urgency.
Choose and calibrate baseline backtest cost models, from spread-based assumptions to linear and square-root impact models, using conservative research defaults when direct execution data is unavailable.
Compare common execution approaches, including TWAP, VWAP, adaptive participation, and Almgren-Chriss-style optimal execution, in terms of impact, timing risk, and signal decay.
Use transaction cost analysis to decompose realized costs, diagnose model misspecification, and recalibrate ex ante assumptions.
Apply break-even turnover, minimum required edge, alpha-to-go, capacity analysis, and precommitted kill criteria to decide whether a strategy remains economically viable after costs.

Ch 19 — Risk Management Seven risk categories: market, factor, leverage, concentration, liquidity/capacity, model, operational. VaR/CVaR + regime-conditional estimates. Drawdown: Ulcer Index integrates depth and duration. Factor decomposition: market beta increases in volatile regimes when it’s most costly. Adaptive controls: GARCH/EWMA targeting, STVU. Graduated kill switches: watch at 5%→terminate at 30% drawdown.

Learning Objectives:

Measure tail risk with VaR and CVaR, including regime-conditional estimates and liquidity-aware interpretation.
Evaluate path risk using drawdown depth, drawdown duration, recovery time, and related path-dependent metrics.
Decompose portfolio risk into market, factor, sector, geographic, and macro exposures to distinguish intended from unintended bets.
Design and interpret historical, hypothetical, and reverse stress tests that challenge return, cost, volatility, and correlation assumptions together.
Build adaptive risk controls, including volatility targeting, exposure caps, and position-level exits, using only information available at decision time.
Specify kill switches, drift monitoring, and governance artifacts that turn a backtested strategy into a deployable trading system.

Ch 20 — Strategy Synthesis Nine case study verdicts: advance (US firm characteristics, FX), iterate (ETFs, NASDAQ-100), reframe (CME, S&P options, crypto). Key finding: NASDAQ-100 has weakest IC (0.008) but highest Sharpe (4.22). Median holdout Sharpe decay ~50% across studies. GBM is downstream champion in 6/9 studies. Cost-survival tiers: US firm characteristics survives above 100 bps; S&P options is negative at zero friction.

Learning Objectives:

Explain why the information coefficient is a useful entry metric for financial signals but does not translate directly into strategy performance.
Distinguish signal quality, portfolio translation, cost survival, and temporal stability as separate stages in strategy evaluation.
Compare how major model families perform after the full pipeline, and identify when robustness matters more than peak in-sample performance.
Diagnose holdout disappointment using distinct failure modes, including prediction decay, translation decay, and structural break.
Evaluate trading strategies under realistic implementation constraints, including instrument-appropriate cost models, capacity limits, and regime sensitivity.
Identify the highest-return next steps after a first research pass, including label redesign, ensembling, feature engineering, and iteration.
Apply a practitioner workflow that moves from data and diagnostics through signal generation, strategy construction, and validation with iteration.

Part 5 — Advanced AI

Ch 21 — Reinforcement Learning RL’s comparative advantage: execution, market making, hedging (not alpha discovery). MDP formulation: state space, continuous action spaces, reward engineering. PPO for execution (modest improvement over TWAP), SAC for market making. Deep Hedging via pfhedge: no-transaction bands emerge from cost-aware policies. Inverse RL for reward inference from order flow. Key risk: simulation-to-reality gap (non-stationarity, impact reflexivity).

Learning Objectives:

Formulate execution, market making, and derivatives hedging problems as partially observed Markov Decision Processes with economically coherent state, action, reward, and constraint design.
Match value-based and actor-critic RL methods to financial tasks based on action-space structure, sample-efficiency needs, and stability requirements.
Benchmark RL execution policies against TWAP and Almgren-Chriss-style schedules in controlled simulated and crypto-data settings, and interpret apparent gains with appropriate caution.
Compare deep hedging results with delta hedging and Whalley-Wilmott-style benchmarks under transaction costs using P&L distributions and tail-risk metrics.
Distinguish inverse reinforcement learning from behavior cloning and explain what reward inference can and cannot recover from observed trading behavior.
Diagnose the simulation-to-reality risks that govern deployability, including non-stationarity, reward hacking, market impact, partial observability, latency, and benchmark mismatch.

Ch 22 — RAG for Financial Research Hallucination is unacceptable in finance → RAG as architectural response. Structure-aware parsing (LlamaParse, Docling, Marker) vs naive fixed-size chunking. Domain-specific embeddings (Voyage AI finance, Fin-E5): FinMTEB benchmark shows consistent gap vs general models. Hybrid retrieval: semantic + BM25 via Reciprocal Rank Fusion. Re-ranking with cross-encoders. KG-guided retrieval: +24% correctness, -85% token consumption vs page-window retrieval (FinReflectKG-MultiHop). Retrieve-extract-compute-narrate for numeric questions.

Learning Objectives:

Explain why hallucination makes ungrounded LLM use unacceptable in finance and why retrieval-augmented generation is the core architectural response.
Design a financial RAG pipeline from document ingestion through retrieval and grounded generation, including structure-aware parsing, chunking, metadata, embeddings, and citation support.
Compare generic and domain-specific embedding models and evaluate retrieval quality on a target corpus using practical retrieval metrics and latency trade-offs.
Build a retrieval stack that combines semantic search, lexical search, metadata filtering, and re-ranking to improve precision and recall on financial documents.
Use constraint-based prompting, citation checks, and tool-verified computation to make generated answers more faithful, auditable, and numerically reliable.
Diagnose RAG failures by separating retrieval, context, synthesis, computation, and abstention errors, and apply targeted evaluation methods to improve each component.
Distinguish when to use RAG versus fine-tuning for financial applications, and explain how RAG functions as one tool within broader agentic workflows.

Ch 23 — Knowledge Graphs Graph justified for: multi-hop dependency queries, structural crowding analysis, temporal relationship evolution. Not justified for: single-entity lookups, narrative synthesis, sparse graphs. Five-stage LLM extraction pipeline with governance-first approach. Three-timestamp model (event/disclosure/extraction time) — disclosure time is the PIT visibility gate. GNNs: fraud detection production-ready; alpha generation experimental. Start with hand-crafted graph features.

Learning Objectives:

Distinguish financial questions that genuinely require graph structure from those better served by tabular databases.
Design a compact, typed, and auditable financial knowledge graph with stable entity identity, finite relationship types, and provenance contracts.
Build and validate LLM-assisted extraction pipelines that convert disclosures into replayable graph objects while enforcing governance-first quality controls.
Explain how Graph RAG differs from vector retrieval and implement safe relational query workflows using constrained Cypher generation.
Transform graph structure into leakage-aware machine learning features, including topology, crowding, concentration, and temporal dynamics.
Evaluate explicit knowledge graphs, statistical financial networks, and learned graph representations pragmatically against out-of-sample metrics and transaction costs.
Apply a three-timestamp framework and disclosure-time cutoff rules to prevent temporal leakage in graph queries and feature generation.
Make sound engineering choices about graph databases, ontology scope, query safety, and schema evolution for production financial workflows.

Ch 24 — Autonomous Agents ReAct (auditable loops) → Tree of Thoughts (parallel hypothesis exploration) → Reflexion (post-run critique). Explicit three-tier memory: working / session / persistent. Tool contracts as primary quality determinant. Context engineering: expose only phase-appropriate tools and PIT-consistent evidence. Warden security pattern: policy proxy with allowlists. Multi-agent forecasting: Neyman extremization + Platt calibration. Scope: read-only research agents (L1 decision support), not order execution.

Learning Objectives:

Explain when agentic workflows add value in finance and when conventional statistical or rules-based pipelines remain the better choice.
Distinguish the roles of ReAct, Tree of Thoughts, and Reflexion, and choose appropriate reasoning budgets and compositions for evidence-driven financial tasks.
Design explicit agent state and memory schemas that support provenance, checkpointing, replay, schema evolution, and post-outcome evaluation.
Specify robust tool contracts, structured outputs, source policies, and context-engineering rules for read-only research and forecasting agents.
Compare framework styles and define a migration path from notebook prototypes to operational forecasting services without sacrificing visibility and control.
Build a single-agent evidence-first research workflow with quality gates, abstention behavior, and replayable artifacts.
Design and evaluate multi-agent forecasting pipelines using specialist diversity, aggregation, calibration, baselines, and ablation analysis.
Define the operational, statistical, and security controls required to make financial-agent outputs decision-grade, including point-in-time integrity, contamination-aware testing, observability, policy gates, and human approval boundaries.

Part 6 — Production

Ch 25 — Live Trading Systems Technical divergence between backtest and live is the primary self-inflicted failure mode. Unified framework: same strategy code in ml4t-backtest and ml4t-live. Brokers: IBKR (SmartRouting, no PFOF), Alpaca (commission-free REST API), QuantConnect (LEAN engine). Order lifecycle: 11-state machine with 23 valid transitions. Pipeline verification: feed identical inputs through both systems and compare at each stage. Crypto case study: LightGBM classifier deployed to OKX with prediction-flip exits.

Learning Objectives:

Explain why technical divergence between research and production is a primary failure mode in live trading, and how a unified framework reduces that risk.
Design a dual-mode, event-driven trading architecture in which deterministic strategy logic runs unchanged in backtest, paper, and live execution.
Compare broker, exchange, and managed-platform deployment paths and evaluate them in terms of asset coverage, execution quality, operational burden, and control.
Model order handling as an explicit state machine that supports partial fills, cancellations, rejections, reconciliation, and idempotent crash recovery.
Verify technical parity across the full pipeline, from raw data and features to predictions, sizing decisions, and generated orders.
Plan a staged live rollout using pre-flight checks, shadow or paper trading, kill switches, reconciliation procedures, and awareness of venue and jurisdictional constraints.

Ch 26 — MLOps and Governance Technical failure (pipeline divergence) vs statistical failure (model decay) — requires different diagnostics. Three drift types: data drift (PSI/KS), feature drift (SHAP monitoring), concept drift (ADWIN/DDM). Shadow mode evaluation before champion-challenger promotion. Minimum effect size: 0.2-0.3 Sharpe improvement required for promotion. Four-level circuit breakers: trade / strategy / portfolio / system. MLOps stack: Feast (feature store), DVC (data versioning), MLflow (model registry, SR 11-7 compliance).

Learning Objectives:

Distinguish technical pipeline divergence from statistical performance decay and choose the corresponding diagnostic and remediation response.
Build a live-monitoring framework that combines data-integrity gates, rolling performance metrics, backtest-to-live realization ratios, and execution-quality tracking.
Apply drift diagnostics to production artifacts, including PSI, K-S, SHAP-based feature monitoring, and online change-detection algorithms.
Design a safe model-update workflow using shadow mode, champion-challenger evaluation, explicit promotion criteria, and tested rollback procedures.
Implement multi-level circuit breakers across trade, strategy, portfolio, and system layers, with clear recovery and resume criteria.
Evaluate and right-size the supporting MLOps stack, including feature stores, data versioning and lineage, model registries, and observability tooling.

Ch 27 — The Systematic Edge Process is the durable edge. Five quant archetypes: researcher, trader, developer, portfolio manager, risk manager. Quantamental roles (systematic + fundamental) as the dominant industry trend. T-shaped expertise. Frontiers: quantum computing (mid-2030s for meaningful advantage), DeFi (live alpha today from on-chain data, AMMs), AI ethics (EU AI Act now a compliance requirement). Burnout as professional risk. Four career failure modes: over-specialization, underestimating soft skills, ignoring regulation, perpetual learning without application.

Cross-Dataset Key Numbers

Metric	Value	Source
Gross Sharpe (NASDAQ-100 intraday)	+1.76	Ch 16
Net Sharpe (NASDAQ-100 intraday)	-62.61	Ch 16
Median holdout decay	~50%	Ch 20
GBM wins (downstream Sharpe)	6/9 case studies	Ch 20
DSR adjustments: materially change conclusions	several candidates	Ch 16
US firm char: validation Sharpe	+3.03	Ch 20
US firm char: holdout Sharpe	+2.52	Ch 20
FX: only study where holdout > validation	—	Ch 20

ML4T Platform — platform and Python libraries
ML4T Book 2nd Edition — 2020 edition (23 chapters)
Algorithmic Trading — domain concept
ML4T Trading Approaches — analysis: trading strategy roadmap

PrasitN Wiki

รายการหน้า

ML4T Book 3rd Edition

ML4T Book 3rd Edition

🇹🇭 ภาษาไทย

โครงสร้าง 6 ส่วน

Part 1 — Foundation (Ch 1–6)

Part 2 — Features (Ch 7–10)

Part 3 — Models (Ch 11–15)

Part 4 — Strategy (Ch 16–20)

Part 5 — Advanced AI (Ch 21–24)

Part 6 — Production (Ch 25–27)

Key Numbers จากหนังสือ

🇬🇧 English

6-Part Structure

Part 1 — Foundation

Part 2 — Features

Part 3 — Models

Part 4 — Strategy

Part 5 — Advanced AI

Part 6 — Production

Cross-Dataset Key Numbers

มุมมองกราฟ

สารบัญ

PrasitN Wiki

รายการหน้า

ML4T Book 3rd Edition

ML4T Book 3rd Edition

🇹🇭 ภาษาไทย

โครงสร้าง 6 ส่วน

Part 1 — Foundation (Ch 1–6)

Part 2 — Features (Ch 7–10)

Part 3 — Models (Ch 11–15)

Part 4 — Strategy (Ch 16–20)

Part 5 — Advanced AI (Ch 21–24)

Part 6 — Production (Ch 25–27)

Key Numbers จากหนังสือ

Related

🇬🇧 English

6-Part Structure

Part 1 — Foundation

Part 2 — Features

Part 3 — Models

Part 4 — Strategy

Part 5 — Advanced AI

Part 6 — Production

Cross-Dataset Key Numbers

Related

มุมมองกราฟ

สารบัญ