Using computational approaches utilizing large datasets to investigate public health
information is an important mechanism for institutions seeking to identify strategies
for improving public health. The art in computational approaches, for example
in health research, is managing the trade-offs between the two perspectives:
first, inference and s econd, p rediction. Many techniques from statistical methods
(SM) and machine learning (ML) may, in principle, be used for both perspectives.
However, SM has a well established focus on inference by building probabilistic
models which allows us to determine a quantitative measure of confidence about
the magnitude of the effect. Simulation-based validation approaches can be used
in conjunction with SM to explicitly verify assumptions and redefine t he specified
model, if n ecessary. On the other hand, ML uses general-purpose algorithms
to find p atterns t hat b est p redict t he o utcome and makes minimal assumptions
about the data-generating process; and may be more effective in a number of situations.
My work employs both SM- and ML- based computational approaches to
investigate particular public health problems. Chapter One provides philosophical
background and compares the application of the two approaches in public health.
Chapter Two describes and implements penalized Cox proportional hazard models
for time-varying covariates time-to-event data. Chapter Three applies traditional
survival models and machine learning algorithms to predict survival times of cancer
patients, while incorporating the information about the time-varying covariates.
Chapter Four discusses and implements various approaches for computing predictions
and effects for generalized linear (mixed) models. Finally, Chapter Five
implements and compares various statistical models for handling univariate and
multivariate binary outcomes for water, sanitation and hygiene (WaSH) data.