ML4T Book 2nd Edition
🇹🇭 ภาษาไทย
Machine Learning for Algorithmic Trading (2nd Edition) โดย Stefan Jansen ตีพิมพ์ กรกฎาคม 2020 | สำนักพิมพ์ Packt Publishing | 858 หน้า | 23 chapters + appendix
3rd Edition กำลังมา (June 2026) ใน ML4T Platform — ขยายเป็น 27 chapters เพิ่ม GenAI, causal inference, production MLOps
ML4T Workflow (Framework กลาง)
Data → Features → ML Model → Signal → Backtest → Portfolio → Live
โครงสร้าง Chapters
Part 1 — Data (Ch 1–5)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 1 | ML for Trading – From Idea to Execution | ML4T workflow, use cases, strategy lifecycle |
| 2 | Market and Fundamental Data | ITCH feed, tick→bars, pandas-datareader |
| 3 | Alternative Data for Finance | Categories, evaluation criteria, web scraping |
| 4 | Financial Feature Engineering | Alpha factors, TA-Lib, Kalman filter, Alphalens |
| 5 | Portfolio Optimization and Performance Evaluation | Sharpe, HRP, pyfolio |
Part 2 — ML Foundations (Ch 6–8)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 6 | The Machine Learning Process | Bias-variance, cross-validation, purging/embargoing |
| 7 | Linear Models | OLS, ridge, lasso, Fama-French, logistic regression |
| 8 | The ML4T Workflow – From Model to Strategy Backtesting | backtrader, Zipline Pipeline API |
Part 3 — Classical ML (Ch 9–13)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 9 | Time-Series Models | ARIMA, GARCH, VAR, cointegration, pairs trading |
| 10 | Bayesian ML | PyMC3, Bayesian Sharpe ratio, rolling regression |
| 11 | Random Forests | Decision trees, RF, long-short Japanese stocks, LightGBM |
| 12 | Boosting | GBM, XGBoost, LightGBM, CatBoost, SHAP |
| 13 | Unsupervised Learning | PCA, clustering, HRP portfolio |
Part 4 — NLP (Ch 14–16)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 14 | Text Data for Trading – Sentiment Analysis | spaCy, TF-IDF, naive Bayes |
| 15 | Topic Modeling | LDA (sklearn + Gensim), earnings call topics |
| 16 | Word Embeddings | word2vec, GloVe, doc2vec, BERT intro |
Part 5 — Deep Learning (Ch 17–21)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 17 | Deep Learning for Trading | Feedforward NN, TF2, PyTorch, long-short strategy |
| 18 | CNNs | LeNet5, transfer learning, 1D conv for time series |
| 19 | RNNs | LSTM, GRU, multivariate time series, SEC filings |
| 20 | Autoencoders | VAE, conditional autoencoder for asset pricing |
| 21 | GANs | TimeGAN, synthetic financial time series |
Part 6 — RL + Conclusions (Ch 22–23)
| Ch | ชื่อ | สาระสำคัญ |
|---|---|---|
| 22 | Deep Reinforcement Learning | DDQN, OpenAI Gym, custom TradingEnvironment |
| 23 | Conclusions and Next Steps | Key lessons, backtest overfitting, platform comparison |
Appendix: 100+ alpha factors ใน TA-Lib, WorldQuant formulaic alphas
Concepts หลัก
| Concept | ความหมาย |
|---|---|
| IC (Information Coefficient) | Spearman rank correlation ระหว่าง predicted vs. actual returns |
| Lookahead Bias | ใช้ข้อมูลอนาคตโดยไม่ตั้งใจ |
| Deflated Sharpe Ratio | Sharpe ratio ที่ปรับสำหรับ multiple testing |
| Purging/Embargoing | Cross-validation technique สำหรับ time series |
| HRP | Hierarchical Risk Parity — portfolio construction ด้วย clustering |
Related
- ML4T Platform — 3rd edition ecosystem
- TradingView MCP — connect Claude Code กับ TradingView Desktop
- Algorithmic Trading — domain concept
🇬🇧 English
Machine Learning for Algorithmic Trading (2nd Edition) by Stefan Jansen Published July 2020 | Packt Publishing | 858 pages | 23 chapters + appendix | 400+ notebooks
The 3rd Edition is coming (June 2026) as part of ML4T Platform — expands to 27 chapters, adds GenAI, causal inference, and production MLOps.
The ML4T Workflow (Central Framework)
Data → Features → ML Model → Signal → Backtest → Portfolio → Live
↑ ↓
└───────── learn from results ───────┘
Every chapter applies this workflow to a different ML approach or data type.
Complete Chapter Structure
Part 1 — Data (Ch 1–5)
| Ch | Title | Key Content |
|---|---|---|
| 1 | ML for Trading – From Idea to Execution | ML4T workflow overview, use cases, strategy lifecycle |
| 2 | Market and Fundamental Data | Nasdaq ITCH feed, tick→bars (time/volume/dollar), pandas-datareader, XBRL |
| 3 | Alternative Data for Finance | Categories (individuals/business/sensors/satellites), evaluation criteria, web scraping |
| 4 | Financial Feature Engineering | Alpha factors: momentum, value, volatility, quality; TA-Lib; Kalman filter; Alphalens |
| 5 | Portfolio Optimization and Performance Evaluation | Sharpe ratio, mean-variance, Black-Litterman, Kelly criterion, HRP, pyfolio |
Part 2 — ML Foundations (Ch 6–8)
| Ch | Title | Key Content |
|---|---|---|
| 6 | The Machine Learning Process | Supervised/unsupervised/RL overview; bias-variance tradeoff; cross-validation; purging/embargoing |
| 7 | Linear Models | OLS, ridge, lasso, CAPM→Fama-French factor models, logistic regression, predict returns |
| 8 | The ML4T Workflow – From Model to Strategy Backtesting | Backtest pitfalls (lookahead/survivorship/outlier); backtrader; Zipline Pipeline API |
Part 3 — Classical ML (Ch 9–13)
| Ch | Title | Key Content |
|---|---|---|
| 9 | Time-Series Models | ARIMA, SARIMAX, ARCH/GARCH, VAR, cointegration, pairs trading backtest |
| 10 | Bayesian ML | PyMC3, MAP/MCMC/variational inference; Bayesian Sharpe ratio; rolling regression for pairs |
| 11 | Random Forests | Decision trees, bagging, RF; long-short Japanese stocks with LightGBM; Alphalens evaluation |
| 12 | Boosting Your Trading Strategy | AdaBoost, GBM, XGBoost, LightGBM, CatBoost; SHAP values; intraday strategy |
| 13 | Unsupervised Learning for Risk Factors | PCA, ICA, t-SNE, UMAP; k-means, hierarchical, DBSCAN clustering; HRP portfolio |
Part 4 — NLP (Ch 14–16)
| Ch | Title | Key Content |
|---|---|---|
| 14 | Text Data for Trading – Sentiment Analysis | NLP pipeline (spaCy, TextBlob); TF-IDF; naive Bayes on news and Yelp data |
| 15 | Topic Modeling | LSI, pLSA, LDA (sklearn + Gensim); earnings call topic modeling |
| 16 | Word Embeddings | word2vec, GloVe, doc2vec; BERT/transformer intro; SEC filings for return prediction |
Part 5 — Deep Learning (Ch 17–21)
| Ch | Title | Key Content |
|---|---|---|
| 17 | Deep Learning for Trading | Feedforward NN, activation functions, dropout, SGD/Adam; TF2 and PyTorch; long-short strategy |
| 18 | CNNs for Financial Time Series | LeNet5, AlexNet, VGG16 transfer learning; 1D convolutions; CNN-TA clustering |
| 19 | RNNs for Multivariate Time Series | LSTM, GRU, bidirectional RNN; S&P500 regression; multivariate macro; SEC filing sentiment |
| 20 | Autoencoders | Feedforward/conv/denoising autoencoders; VAE; conditional autoencoder for asset pricing |
| 21 | GANs for Synthetic Time-Series Data | DCGAN, conditional GAN; TimeGAN (train on synthetic, test on real) |
Part 6 — RL + Conclusions (Ch 22–23)
| Ch | Title | Key Content |
|---|---|---|
| 22 | Deep Reinforcement Learning | MDP, value iteration, Q-learning, DDQN; OpenAI Gym; custom TradingEnvironment |
| 23 | Conclusions and Next Steps | Key lessons: data quality, bias-variance, backtest overfitting, platform comparison |
Appendix — Alpha Factor Library: 100+ factors in TA-Lib (moving averages, momentum, volume, volatility) + WorldQuant formulaic alphas (Alpha001, Alpha054).
Key Concepts Across the Book
| Concept | Description |
|---|---|
| IC (Information Coefficient) | Spearman rank correlation between predicted and actual returns — the primary signal quality metric |
| Lookahead Bias | Accidentally using future information in features — causes unrealistically good backtests |
| Deflated Sharpe Ratio | Sharpe ratio adjusted for multiple testing — guards against backtest overfitting |
| Purging/Embargoing | Cross-validation technique for time series — prevents leakage between train and test |
| Alpha Factor | A signal expected to predict returns before being arbitraged away |
| HRP | Hierarchical Risk Parity — portfolio construction using clustering instead of matrix inversion |
Main Tools Used
Data & Features: pandas, NumPy, TA-Lib, Quandl, yfinance, Zipline bundles ML: scikit-learn, statsmodels, PyMC3, LightGBM, XGBoost, CatBoost Deep Learning: TensorFlow 2, PyTorch, Keras Backtesting: backtrader, Zipline, Alphalens, pyfolio NLP: spaCy, Gensim, TextBlob