Transformers and attention-based networks in quantitative trading: a comprehensive survey
DOI: https://doi.org/10.1145/3677052.3698684
ICAIF '24: 5th ACM International Conference on AI in Finance, Brooklyn, NY, USA, November 2024
ACM Reference Format:
Lucas Coelho e Silva, Gustavo de Freitas Fonseca, and Paulo Andre L. Castro. 2024. Transformers and attention-based networks in quantitative trading: a comprehensive survey. In 5th ACM International Conference on AI in Finance (ICAIF '24), November 14--17, 2024, Brooklyn, NY, USA. ACM, New York, NY, USA 9 Pages. https://doi.org/10.1145/3677052.3698684
Summary of literature of transformers and attention-based networks for quantitative trading
Ref. | Specialization | Research problem | Modeling approach | Market |
---|---|---|---|---|
[3] | Alpha | Predicting log-return of Bitcoin with LOB data. | Transformer, Autoformer, FEDformer, and HFformer architectures. | Crypto |
[5] | Alpha | Prediction of LOB mid-price, LOB mid-price difference, and LOB mid-price movement. | Transformer, Autoformer, Informer, Reformer, and FEDformer architectures, compared against LSTMs. | Crypto |
[18] | Alpha | Modeling signals for high-frequency trading. | Vanilla Transformer without the decoder, coupled with an Exponential Moving Average model applied to the input for representing data on different time scales. | Forex |
[11] | Alpha | Price movement prediction. | Transformer with time embeddings for regression and classification tasks. | Forex |
[12] | Alpha | Price movement prediction. | Transformers trained on top of price and technical analysis features such as Bollinger bands and Relative Strength Index. Benchmarked against ResNet LSTMs and a ResNet network. | Forex |
[29] | Alpha | Price forecasting that yield a trading signal based on the movement. | Hybrid approach with BERT and LSTMs. | Futures |
[35] | Alpha | Price forecasting. | Ensemble of sliding-window temporal transformers. | Equity |
[15] | Alpha | Price forecasting. | Canonical transformer. | Futures |
[25] | Alpha | Price movement prediction. | Transformer Encoder Attention. | Equity |
[31] | Alpha | Sentiment analysis for creating features and trading signals. | DistilBart-MNLI-12-1, GPT-1, GPT-2, GPT-3, GPT-4, BERT. | Equity |
[52] | Alpha | Sentiment analysis for creating features and trading signals. | Chinese-GPT, Chinese-FinBERT, Erlangshen-RoBERTa110M-Sentiment. | Equity |
[19] | Alpha | Sentiment analysis for creating features and trading signals. | Pre-training of crypto-specific language models based on BERT. | Crypto |
[48] | Alpha | Learning of trading action and weights (position size). | Momentum Transformer, a hybrid architecture that combines LSTMs and self-attention-mechanism capabilities, directly optimizing the Shape Ratio during training. | Futures |
[50] | Alpha | Learning of trading action and weights (position size). | Transformer-based and U-Net neural networks within a deep RL context for end-to-end models for single stock trading. | Equity & cryptos |
[38] | Risk | Volatility forecasting. | Hybrid approach with Transformers, Multi-Transformers, autoregressive models, and LSTM units. | Equity |
[41] | Risk | Volatility forecasting. | TFT, Informer, Autoformer, and PatchTST, compared against N-BEATSx, NHITS, and HAR. | Equity |
[21] | Risk | Modeling and predicting volatility surfaces. | PINN, ConvLSTM, self-attention ConvLSTM, phyisics-informed convolutional transformer. | Options |
[44] | Risk | Hedging derivatives | SigFormer: a novel architecture that couples transformers with path signatures. | Options |
[39] | Portfolio | Construction of portfolios via ranking. | Rank Transformer, a model that allocates assets by predicting the rank of instrument efficiency. | Equity & Forex |
[53] | Portfolio | Prediction of stock returns for portfolio composition. | Transfer learning from sentiment analysis applications. Adapted transformer architecture. | Equity |
[42] | Portfolio | Directly output weights for each portfolio component. | Arrangement of convolutions and graph-attention mechanisms, with risk-averaged returns used during training. | Equity |
[40] | Portfolio | Point forecasting of cryptocurrency prices and portfolio composition. | N-BEATS Perceiver, constructed on top of the Perceiver IO architecture. | Crypto |
[7] | Portfolio | Directly output weights for each portfolio component. | Additive attention and self-attention mechanisms with LSTMs and GRUs. | Equity. |
[22] | Portfolio | RL for portfolio construction. | Deep RL actor-critic formulation with a transformer variant that implements two-dimensional attention with gating layers. | Equity |
[1] | Execution | Fill-time prediction of limit orders. | Combination of attention layers and convolutions. | Equity |
[27] | Execution | Learning vectorized representations of LOB for market simulations. | Autoencoder neural network design with stacker transformer blocks for capturing temporal correlations. | Equity |
[36] | Execution | Learning LOB representations for anomaly detection. | Autoencoder based on transformers for unsupervised capturing of temporal vector representations. | Equity |
[13] | Execution | Modeling LOB for market making. | Combination of convolutions and attention for extracting features from the LOB. | Equity |
References
- Álvaro Arroyo, Álvaro Cartea, Fernando Moreno-Pino, and Stefan Zohren. 2024. Deep attentive survival analysis in limit order books: estimating fill probabilities with convolutional-transformers. Quantitative Finance 24, 1 (Jan. 2024), 35–57. https://doi.org/10.1080/14697688.2023.2286351
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. http://arxiv.org/abs/1409.0473 arXiv:1409.0473 [cs, stat].
- Fazl Barez, Paul Bilokon, Arthur Gervais, and Nikita Lisitsyn. 2023. Exploring the Advantages of Transformers for High-Frequency Trading. http://arxiv.org/abs/2302.13850 arXiv:2302.13850 [cs, q-fin].
- Trevor Bekolay, James Bergstra, Eric Hunsberger, Travis DeWolf, Terrence C. Stewart, Daniel Rasmussen, Xuan Choo, Aaron Russell Voelker, and Chris Eliasmith. 2014. Nengo: a Python tool for building large-scale functional brain models. Frontiers in Neuroinformatics 7 (2014). https://doi.org/10.3389/fninf.2013.00048
- Paul Bilokon and Yitao Qiu. 2023. Transformers versus LSTMs for electronic trading. http://arxiv.org/abs/2309.11400 arXiv:2309.11400 [cs, econ, q-fin].
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. http://arxiv.org/abs/2005.14165 arXiv:2005.14165 [cs].
- Hieu K. Cao, Han K. Cao, and Binh T. Nguyen. 2020. DELAFO: An Efficient Portfolio Optimization Using Deep Neural Networks. In Advances in Knowledge Discovery and Data Mining, Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.). Vol. 12084. Springer International Publishing, Cham, 623–635. https://doi.org/10.1007/978-3-030-47426-3_48 Series Title: Lecture Notes in Computer Science.
- Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath, and Arun K. Somani. 2022. Neural Architecture Search for Transformers: A Survey. IEEE Access 10 (2022), 108374–108412. https://doi.org/10.1109/ACCESS.2022.3212767
- Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724–1734. https://doi.org/10.3115/v1/D14-1179
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/abs/1810.04805 arXiv:1810.04805 [cs].
- Tizian Fischer, Marius Sterling, and Stefan Lessmann. 2024. Fx-spot predictions with state-of-the-art transformer and time embeddings. Expert Systems with Applications 249 (Sept. 2024), 123538. https://doi.org/10.1016/j.eswa.2024.123538
- Przemysław Grądzki and Piotr Wójcik. 2024. Is attention all you need for intraday Forex trading?Expert Systems 41, 2 (Feb. 2024), e13317. https://doi.org/10.1111/exsy.13317
- Hong Guo, Jianwu Lin, and Fanlin Huang. 2023. Market Making with Deep Reinforcement Learning from Limit Order Books. In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, Gold Coast, Australia, 1–8. https://doi.org/10.1109/IJCNN54540.2023.10191123
- Maosheng Guo, Yu Zhang, and Ting Liu. 2019. Gaussian Transformer: A Lightweight Approach for Natural Language Inference. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (July 2019), 6489–6496. https://doi.org/10.1609/aaai.v33i01.33016489
- Wenyang Huang, Tianxiao Gao, Yun Hao, and Xiuqing Wang. 2023. Transformer-based forecasting for intraday trading in the Shanghai crude oil market: Analyzing open-high-low-close prices. Energy Economics 127 (Nov. 2023), 107106. https://doi.org/10.1016/j.eneco.2023.107106
- Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, and Joāo Carreira. 2022. Perceiver IO: A General Architecture for Structured Inputs & Outputs. http://arxiv.org/abs/2107.14795 arXiv:2107.14795 [cs, eess].
- Uday Kamath, Kenneth L. Graham, and Wael Emara. 2022. Transformers for Machine Learning: A Deep Dive (1 ed.). Chapman and Hall/CRC, Boca Raton. https://doi.org/10.1201/9781003170082
- Konstantinos T. Kantoutsis, Adamantia N. Mavrogianni, and Nikolaos P. Theodorakatos. 2024. Transformers in High-Frequency Trading. Journal of Physics: Conference Series 2701, 1 (Feb. 2024), 012134. https://doi.org/10.1088/1742-6596/2701/1/012134
- Gyeongmin Kim, Minsuk Kim, Byungchul Kim, and Heuiseok Lim. 2023. CBITS: Crypto BERT Incorporated Trading System. IEEE Access 11 (2023), 6912–6921. https://doi.org/10.1109/ACCESS.2023.3236032
- Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, and Kurt Keutzer. 2022. Squeezeformer: An Efficient Transformer for Automatic Speech Recognition. Advances in Neural Information Processing Systems 35 (Dec. 2022), 9361–9373. https://proceedings.neurips.cc/paper_files/paper/2022/hash/3ccf6da39eeb8fefc8bbb1b0124adbd1-Abstract-Conference.html
- Soohan Kim, Seok-Bae Yun, Hyeong-Ohk Bae, Muhyun Lee, and Youngjoon Hong. 2024. Physics-informed convolutional transformer for predicting volatility surface. Quantitative Finance 24, 2 (Feb. 2024), 203–220. https://doi.org/10.1080/14697688.2023.2294799
- Tae Wan Kim and Matloob Khushi. 2020. Portfolio Optimization with 2D Relative-Attentional Gated Transformer. In 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). IEEE, Gold Coast, Australia, 1–6. https://doi.org/10.1109/CSDE50874.2020.9411635
- Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The Efficient Transformer. In International Conference on Learning Representations. ICLR, Online, N/A. https://doi.org/10.48550/arXiv.2001.04451
- Junghwan Lee, Chen Xu, and Yao Xie. 2024. Transformer Conformal Prediction for Time Series. http://arxiv.org/abs/2406.05332 arXiv:2406.05332 [cs].
- Yawei Li, Shuqi Lv, Xinghua Liu, and Qiuyue Zhang. 2022. Incorporating Transformers and Attention Networks for Stock Movement Prediction. Complexity 2022 (Feb. 2022), 1–10. https://doi.org/10.1155/2022/7739087
- Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large Language Models in Finance: A Survey. In 4th ACM International Conference on AI in Finance. ACM, Brooklyn NY USA, 374–382. https://doi.org/10.1145/3604237.3626869
- Yuanzhe Li, Yue Wu, and Peng Yang. 2024. SimLOB: Learning Representations of Limited Order Book for Financial Market Simulation. http://arxiv.org/abs/2406.19396 arXiv:2406.19396 [cs].
- Bryan Lim, Sercan Ö. Arık, Nicolas Loeff, and Tomas Pfister. 2021. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 37, 4 (2021), 1748–1764. https://doi.org/10.1016/j.ijforecast.2021.03.012
- Chang Liu, Jie Yan, Feiyue Guo, and Min Guo. 2022. Forecasting the Market with Machine Learning Algorithms: An Application of NMC-BERT-LSTM-DQN-X Algorithm in Quantitative Trading. ACM Transactions on Knowledge Discovery from Data 16, 4 (Aug. 2022), 1–22. https://doi.org/10.1145/3488378
- Yang Liu, Yao Zhang, Yixin Wang, Feng Hou, Jin Yuan, Jiang Tian, Yang Zhang, Zhongchao Shi, Jianping Fan, and Zhiqiang He. 2024. A Survey of Visual Transformers. IEEE Transactions on Neural Networks and Learning Systems 35, 6 (June 2024), 7478–7498. https://doi.org/10.1109/TNNLS.2022.3227717
- Alejandro Lopez-Lira and Yuehua Tang. 2023. Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models. http://arxiv.org/abs/2304.07619 arXiv:2304.07619 [cs, q-fin].
- Ambarish Moharil, Joaquin Vanschoren, Prabhant Singh, and Damian Tamburri. 2024. Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data. Machine Learning 113 (July 2024), 7011–7053. https://doi.org/10.1007/s10994-024-06568-1
- Rishi K. Narang. 2013. Inside the black box: a simple guide to quantitative and high-frequency trading (second edition ed.). John Wiley & Sons, Inc, Hoboken, New Jersey.
- Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In International Conference on Learning Representations. ICLR, Online. https://doi.org/10.48550/arXiv.2211.14730
- Kenniy Olorunnimbe and Herna Viktor. 2024. Ensemble of temporal Transformers for financial time series. Journal of Intelligent Information Systems 62 (March 2024), 1087–1111. https://doi.org/10.1007/s10844-024-00851-2
- Cédric Poutré, Didier Chételat, and Manuel Morales. 2024. Deep unsupervised anomaly detection in high-frequency markets. The Journal of Finance and Data Science 10 (Dec. 2024), 100129. https://doi.org/10.1016/j.jfds.2024.100129
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- Eduardo Ramos-Pérez, Pablo J. Alonso-González, and José Javier Núñez-Velázquez. 2021. Multi-Transformer: A New Neural Network-Based Architecture for Forecasting S&P Volatility. Mathematics 9, 15 (July 2021), 1794. https://doi.org/10.3390/math9151794
- Shosuke Sakagawa and Naoki Mori. 2022. Neural Ranking Strategy for Portfolio Construction Using Transformers. In 2022 13th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter). IEEE, Phuket, Thailand, 95–100. https://doi.org/10.1109/IIAI-AAI-Winter58034.2022.00029
- Attilio Sbrana and Paulo André Lima De Castro. 2023. N-BEATS Perceiver: A Novel Approach for Robust Cryptocurrency Portfolio Forecasting. Computational Economics 64 (Sept. 2023), 1047–1081. https://doi.org/10.1007/s10614-023-10470-8
- Hugo Gobato Souto and Amir Moradi. 2024. Can transformers transform financial forecasting?China Finance Review International ahead-of-print (June 2024). https://doi.org/10.1108/CFRI-01-2024-0032
- Jifeng Sun, Wentao Fu, Jianwu Lin, Yong Jiang, and Shu-Tao Xia. 2022. Deep Portfolio Optimization Modeling based on Conv-Transformers with Graph Attention Mechanism. In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, Padua, Italy, 01–08. https://doi.org/10.1109/IJCNN55064.2022.9892317
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. http://arxiv.org/abs/1409.3215 arXiv:1409.3215 [cs].
- Anh Tong, Thanh Nguyen-Tang, Dongeun Lee, Toan M Tran, and Jaesik Choi. 2023. SigFormer: Signature Transformers for Deep Hedging. In 4th ACM International Conference on AI in Finance. ACM, Brooklyn NY USA, 124–132. https://doi.org/10.1145/3604237.3626841
- Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. http://arxiv.org/abs/2302.13971 arXiv:2302.13971 [cs].
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., Long Beach, CA, USA, 5998–6008. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick Von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
- Kieran Wood, Sven Giegerich, Stephen Roberts, and Stefan Zohren. 2022. Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture. http://arxiv.org/abs/2112.08534 arXiv:2112.08534 [cs, q-fin, stat].
- Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc., Online, 22419–22430. https://proceedings.neurips.cc/paper_files/paper/2021/file/bcc0d400288793e8bdcd7c19a8ac0c2b-Paper.pdf
- Bing Yang, Ting Liang, Jian Xiong, and Chong Zhong. 2023. Deep reinforcement learning based on transformer and U-Net framework for stock trading. Knowledge-Based Systems 262 (Feb. 2023), 110211. https://doi.org/10.1016/j.knosys.2022.110211
- Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are Transformers Effective for Time Series Forecasting?Proceedings of the AAAI Conference on Artificial Intelligence 37, 9 (June 2023), 11121–11128. https://doi.org/10.1609/aaai.v37i9.26317
- Haohan Zhang, Fengrui Hua, Chengjin Xu, Hao Kong, Ruiting Zuo, and Jian Guo. 2024. Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?http://arxiv.org/abs/2306.14222 arXiv:2306.14222 [cs, q-fin].
- Zhaofeng Zhang, Banghao Chen, Shengxin Zhu, and Nicolas Langrené. 2024. From attention to profit: quantitative trading strategy based on transformer. http://arxiv.org/abs/2404.00424 arXiv:2404.00424 [cs, q-fin].
- Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. AAAI, Online, 11106–11115. https://doi.org/10.1609/aaai.v35i12.17325
- Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, Online, 27268–27286. https://proceedings.mlr.press/v162/zhou22g.html
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
ICAIF '24, November 14–17, 2024, Brooklyn, NY, USA
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-1081-0/24/11.
DOI: https://doi.org/10.1145/3677052.3698684