Chapter 1: Extracting the Data
Chapter Goal: Understanding the potential data sources to build NLP applications for business benefits and ways to extract the text data with examples
No of pages: 23
Sub - Topics:
1. Data extraction through API
2. Reading HTML page, HTML parsing
3. Reading pdf file in python
4. Reading word document
5. Regular expressions using python
6. Handling strings using python
7. Web scraping
Chapter 2: Exploring and Processing the Text Data
Chapter Goal: Data is never clean. This chapter will give in depth knowledge about how to clean and process the text data. It covers topics like cleaning, tokenizing and normalizing text data.
No of pages: 22
Sub - Topics
1 Text preprocessing methods
2 Data cleaning - punctuation removal, stopwords removal, spelling correction3 Lexicon normalization - stemming and lemmatization
4 Tokenization
5 Dealing with emoticons and emojis
6 Exploratory data analysis
7 End to end text processing pipeline implementation
Chapter 3: Text to Features
Chapter Goal: One of the important task with text data is to transform text data into machines or algorithms understandable form, by using different feature engineering methods (basic to advanced).
No of pages: 40
Sub - Topics
1 One hot encoding
2 Count vectorizer
3 N grams
4 Co-occurrence matrix
5 Hashing vectorizer
6 TF-IDF
7 Word Embedding - Word2vec, fasttext
8 Glove embeddings
9 ELMo
10 Universal Sentence Encoder
11 Understanding Transformers like BERT, GPT
12 Open AIs
Chapter 4: Implementing Advanced NLP
Chapter Goal: Understanding and building advanced NLP techniques to solve the business problems starting from text similarity to speech recognition and language translation.
No of pages: 25
Sub - Topics:
1. Noun phrase extraction
2. Text similarity
3. Parts of speech tagging
4. Information extraction - NER - entity recognition
5. Topic modeling
6. Machine learning for NLP -
a. Text classification
7. Sentiment analysis
8. Word sense disambiguation
9. Speech recognition and speech to text
10. Text to speech
11. Language detection and translation
Chapter 5: Deep Learning for NLP
Chapter Goal: Unlocking the power of deep learning on text data. Solving few real-time applications of deep learning in NLP.
No of pages: 55
Sub - Topics:
1. Fundamentals of deep learning
2. Information retrieval using word embedding's
3. Text classification using deep learning approaches (CNN, RNN, LSTM, Bi-directional LSTM) 4. Natural language generation - prediction next word/ sequence of words using LSTM.
5. Text summarization using LSTM encoder and decoder.
6. Sentence comparison using SentenceBERT
7. Understanding GPT
8. Comparison between BERT, RoBERTa, DistilBERT, XLNet
Chapter 6: Industrial Application with End to End Implementation
Chapter Goal: Solving real time NLP applications with end to end implementation using python. Right from framing and understanding the business problem to deploying the model.
No of pages: 90
Sub - Topics:
1. Consumer complaint classification
2. Customer reviews sentiment prediction
3. Data stitching using text similarity and record linkage
4. Text summarization for subject notes
5. Document clustering
6.
About the Author:
Akshay Kulkarni is an AI and machine learning evangelist and thought leader. He has consulted with Fortune 500 and global enterprises to drive AI and data science-led strategic transformations. He has a rich experience of building and scaling AI and machine learning businesses and creating significant client impact. Akshay is currently Manager-Data Science & AI at Publicis Sapient where he is part of strategy and transformation interventions through AI. He manages high-priority growth initiatives around data science, works on AI engagements, and applies state-of-the-art techniques. Akshay is a Google Developers Expert-Machine Learning, and is a published author of books on NLP and deep learning. He is a regular speaker at major AI and data science conferences, including Strata, O'Reilly AI Conf, and GIDS. In 2019, he was featured as one of the Top "40 under 40 Data Scientists" in India. In his spare time, he enjoys reading, writing, coding, and helping aspiring data scientists. He lives in Bangalore with his family.
Adarsha Shivananda is Lead Data Scientist at Indegene's Product and Technology team where he leads a group of analysts who enable predictive analytics and AI features for all of their healthcare software products. They handle multi-channel activities for pharma products and solve real-time problems encountered by pharma sales reps. Adarsha aims to build a pool of exceptional data scientists within the organization and to solve greater health care problems through training programs and staying ahead of the curve. His core expertise involves machine learning, deep learning, recommendation systems, and statistics. Adarsha has worked on data science projects across multiple domains using different technologies and methodologies. Previously, he was part of Tredence Analytics and IQVIA. He lives in Bangalore and loves to read and teach data science.