Introduction
Goal: The problem of dataset cleaning and why better design is needed
Who this book is for
Chapter 1: Basic Data Types
Goal: understanding data types
Nominal, ordinal, interval, ratio, other
How/why to choose specific representations
Chapter 2: Planning Your Data Collection
Goal: preventive action, avoiding data creation errors
Anticipating your required analysis
The goals of descriptive statistics and visualizations
The goals of relationship statistics and visualizations
Independent and dependent variables
Chapter 3: Dataset Structures
Goal: Understanding how to structure/store data
Types of datasets
.csv, SQL, Excel, Web, JSON,
Sharing data (open formats)
Managing datasets
Chapter 4: Data Collection Issues
Goal: Understanding how to collect data
Understand and avoid Bias
Sampling
Chapter 5: Examples and Use Cases
Goal: Illustrate good & not so good datasets
Chapter 6: Tools for Dataset Cleaning
Goal: still need some data cleanup? here's some help
Data cleaning using R, Python, commercial tools (e.g., Tableau)
Annotated References
Goal: include helpful data design and cleaning references
About the Author: Harry J. Foxwell is a professor. He teaches graduate data analytics courses at George Mason University in the department of Information Sciences and Technology and he designed the data analytics curricula for his university courses. He draws on his decades of experience as Principal System Engineer for Oracle and for other major IT companies to help his students understand the concepts, tools, and practices of big data projects. He is co-author of several books on operating systems administration. He is a US Army combat veteran, having served in Vietnam as a Platoon Sergeant in the First Infantry Division. He lives in Fairfax, Virginia with his wife Eileen and two bothersome cats.