Session 1 – 01:20:00 – Poisson and Extensions
This session begins with the Poisson GLM, covering its assumptions, likelihood, and interpretation. We then highlight the limitations of the Poisson distribution, particularly with respect to overdispersion and zero inflation. To address these issues, we introduce and compare extensions including the Negative Binomial, the Generalised Poisson, and the Conway–Maxwell Poisson (COMP)
Session 2 – 01:20:00 – Model Validation
In this session, we turn to residual diagnostics, introducing simulated residuals and formal tests for model misfit. Participants will learn how to use the DHARMa package in R to detect overdispersion and zero inflation. The session concludes with a practical exercise on validating fitted models using real data.
Session 3 – 01:20:00 – Zero-Inflated Models
This session introduces the theoretical basis of zero-inflated models. We describe the mixture interpretation, in which zeros can arise either structurally or through the sampling process. We then present the Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) models in detail, discussing their assumptions and mathematical formulation. Participants learn how to fit zero-inflated models in R using the pscl::zeroinfl and glmmTMB functions
Session 4 – 01:20:00 – Hurdle and Truncated Models
In this session, we introduce hurdle models and explain their conceptual differences from zero-inflated models. We focus on the Zero-Altered Poisson and Zero-Altered Negative Binomial, which treat zeros differently from the ZIP and ZINB.
Session 4 – 01:00:00 – Exploratory Data Analysis for Machine Learning
Using visual and numerical summaries, this session examines how to perform exploratory data analysis (EDA) in a machine learning context. Relationships between variables, outliers, and data distributions are explored using tools such as ggplot2.
Session 5 -01:00:00 – Model Fitting Frameworks in R
This session introduces the use of the caret and tidymodels frameworks for fitting machine learning models in R. Topics include data splitting, resampling, defining workflows, and generating predictions.
Session 6 – 01:00:00 – Baseline Models and Performance Metrics
This session focuses on constructing simple models and introducing key performance metrics. These include RMSE, MAE, accuracy, and AUC, with discussion of how and when to use each metric depending on the learning task.