This book provides an overview of the recent advances in representation learning theory, algorithms, and applications for natural language processing (NLP), ranging from word embeddings to pre-trained language models. It is divided into four parts. Part I presents the representation learning techniques for multiple language entries, including words, sentences and documents, as well as pre-training techniques. Part II then introduces the related representation techniques to NLP, including graphs, cross-modal entries, and robustness. Part III then introduces the representation techniques for the knowledge that are closely related to NLP, including entity-based world knowledge, sememe-based linguistic knowledge, legal domain knowledge and biomedical domain knowledge. Lastly, Part IV discusses the remaining challenges and future research directions.
The theories and algorithms of representation learning presented can also benefit other related domains such as machine learning, social network analysis, semantic Web, information retrieval, data mining and computational biology. This book is intended for advanced undergraduate and graduate students, post-doctoral fellows, researchers, lecturers, and industrial engineers, as well as anyone interested in representation learning and natural language processing.As compared to the first edition, the second edition (1) provides a more detailed introduction to representation learning in Chapter 1; (2) adds four new chapters to introduce pre-trained language models, robust representation learning, legal knowledge representation learning and biomedical knowledge representation learning; (3) updates recent advances in representation learning in all chapters; and (4) corrects some errors in the first edition. The new contents will be approximately 50%+ compared to the first edition.
This is an open access book.
About the Author: Zhiyuan Liu is an Associate Professor at the Department of Computer Science and Technology at Tsinghua University, China. His research interests include pretrained language models, knowledge graphs and social computation, and he has published more than 120 papers at leading conferences and in respected journals with over 28000 Google Scholar citations. He has received several awards/honors, including Excellent Doctoral Dissertation awards from Tsinghua University and the Chinese Association for Artificial Intelligence, and was named as one of MIT Technology Review Innovators Under 35 China (MIT TR-35 China). He has served as area chair for various conferences, including ACL, EMNLP, COLING.
Yankai Lin is an Assistant Professor at Gaoling School of Artificial Intelligence, Renmin University of China. His research interests include pretrained language models and knowledge-guided natural language processing. He has published more than 50 papers at leading conferences, including ACL, EMNLP, IJCAI, AAAI and NeurIPS with over 8000 Google Scholar citations. He was named an Academic Rising Star of Tsinghua University and a Baidu Scholar. He has served as area chair for EMNLP and ACL ARR.
Maosong Sun is a professor at the Department of Computer Science and Technology and the executive vice-dean of the Institute for Artificial Intelligence, Tsinghua University. His research interests include natural language processing, artificial intelligence, computational humanities and social sciences. He was a project chief scientist of the National Key Basic Research and Development Program (973 Program) of China. He has published over 200 papers at leading academic conferences and in respected journals, with over 30,000 Google Scholar citations. He is the director of Tsinghua University-National University of Singapore Joint Research Center on Next Generation Search Technologies, and the editor-in-chief of the Journal of Chinese Information Processing. He received the National Outstanding Practitioner Award from the State Commission for Language Affairs, People's Republic of China in 2007, and the National Excellent Scientific and Technological Practitioner Award from the China Association for Science and Technology in 2016. He became the Member of the Academia Europaea in 2020, and the Fellow of the Association for Computational Linguistics (ACL) in 2022.