About the Book
PART I - Understanding Machine Learning
Chapter 1: Machine Learning BasicsChapter Goal: This chapter familiarizes and acquaints readers with the basics of machine learning, industry standard workflows followed for machine learning processes and expands on the different types of machine learning and deep learning algorithmsNo of pages: 50-60 Sub -Topics1. Brief on machine learning, definitions and concepts2. Industry standard for data mining processes - CRISP - DM and adoption in ML3. Brief on data processing, visualization, feature extraction\engineering concepts4. Types of learning algorithms - supervised, unsupervised, reinforcement learning5. Advanced models - time series, deep learning6. Model building and validation concepts7. Applications of machine learningChapter 2: The Python Machine Learning EcosystemChapter Goal: This chapter introduces readers to the python language and the entire ecosystem built around machine learning with python tools, frameworks and libraries. Overview and code samples are given for each tool to depict its usage and effectivenessNo of pages: 50 - 60Sub - Topics 1. Brief on Python 2. Why is Python effective for machine learning and data science3. Brief overview on the python ecosystem followed by data scientists (includes anaconda distribution) 4. Reproducible research with ipython5. Data processing and computing with pandas, numpy, scipy6. Statistical learning with statsmodels7. ML frameworks - scikit-learn, pyml etc8. NLP frameworks - nltk, pattern, spacy9. DL frameworks - theano, tensorflow, keras
PART II - The Machine Learning PipelineChapter 3: Processing, wrangling and visualizing data&Sub - Topics: 1. Data Retrieval mechanisms (crawling, databases, APIs etc)2. Data processing (handling various forms of data - SQL, JSON, XML, Images)3. Data attributes and features (numeric, categorical etc)4. Data Wrangling (cleaning, handling missing values, normalizing data)5. Data Summarization6. Data Visualization (bar, histogram, boxplot, line, scatter etc)
Chapter 4: Feature Engineering and SelectionChapter Goal: This chapter focuses on the next stage in the ML pipeline, feature extraction, engineering and selection. Readers will learn about both basic and advanced feature engineering methods for different data formats including numeric, text and images. We will also focus on methods for effective feature selectionNo of pages: 50 - 60Sub - Topics: 1. Features - understanding yourv>2. Basic Feature engineering3. Extracting features from numeric, categorical variables4. Extracting features from date\timestamp variables5. Extracting Basic features from textual data (bag of words)6. Advanced Feature engineering7. Extracting complex features from textual data (word vectorization, tfidf, topic models)8. Extracting features from images (pixels, edge detection, shapes)9. Time series features10. Feature scaling and standardization11 Feature selection techniques12 Using forward\backward selection techniques13 Using machine learning models like random forests14 Other methods
Chapter 5: Building, tuning and deploying modelsChapter Goal: This chapter focuses on the final stage in the ML pipeline where readers will learn how to fit and build models on data features, how to optimize and tune model
About the Author: Dipanjan Sarkar is a Data Scientist at Intel, on a mission to make the world more connected and productive. He primarily works on data science, analytics, business intelligence, application development, and building large-scale intelligent systems. He holds a master of technology degree in Information Technology with specializations in Data Science and Software Engineering from the International Institute of Information Technology, Bangalore. He is also an avid supporter of self-learning, especially Massive Open Online Courses and also holds a Data Science Specialization from Johns Hopkins University on Coursera.
Dipanjan has been an analytics practitioner for several years, specializing in statistical, predictive, and text analytics. Having a passion for data science and education, he is a Data Science Mentor at Springboard, helping people up-skill on areas like Data Science and Machine Learning. Dipanjan has also authored several books on R, Python, Machine Learning and Analytics, including Text Analytics with Python, Apress 2016. Besides this, he occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning.
Raghav Bali has a master's degree (gold medalist) in Information
Technology from International Institute of Information Technology, Bangalore. He is a Data Scientist at Intel, where he works on analytics, business intelligence, and application development to develop scalable machine learning-based solutions. He has also worked as an analyst and developer in domains such as ERP, finance, and BI with some of the leading organizations in the world. Raghav is a technology enthusiast who loves reading and playing around with new gadgets and technologies. He has also authored several books on R, Machine Learning and Analytics. He is a shutterbug, capturing moments when he isn't busy solving problems.
Tushar Sharma has a master's degree from International Institute of Information Technology, Bangalore. He works as a Data Scientist with Intel. His work involves developing analytical solutions at scale using enormous volumes of infrastructure data. In his previous role, he has worked in the financial domain developing scalable machine learning solutions for major financial organizations. He is proficient in Python, R and Big Data frameworks like Spark and Hadoop. Apart from work Tushar enjoys watching movies, playing badminton and is an avid reader. He has also authored a book on R and social media analytics.