Session 1 – 01:00:00 – Introduction to Machine Learning
This session introduces the core ideas of machine learning, including how it differs from traditional statistical modelling. Topics include the difference between supervised and unsupervised learning, the contrast between classification and regression, and typical applications across various domains.
Session 2 – 01:00:00 – The Machine Learning Workflow
The focus in this session is on the structure of a typical machine learning pipeline. This includes steps such as data splitting, preprocessing, model training, validation, and performance evaluation, with emphasis on reproducibility and good workflow practices.
Session 3 – 01:00:00 – Data Preprocessing in R
This session explores key steps in preparing data for machine learning. Topics include dealing with missing values, scaling and transforming variables, encoding categorical predictors, and building preprocessing pipelines using the recipes package.
Session 4 – 01:00:00 – Exploratory Data Analysis for Machine Learning
Using visual and numerical summaries, this session examines how to perform exploratory data analysis (EDA) in a machine learning context. Relationships between variables, outliers, and data distributions are explored using tools such as ggplot2.
Session 5 – 01:00:00 – Model Fitting Frameworks in R
This session introduces the use of the caret and tidymodels frameworks for fitting machine learning models in R. Topics include data splitting, resampling, defining workflows, and generating predictions.
Session 6 – 01:00:00 – Baseline Models and Performance Metrics
This session focuses on constructing simple models and introducing key performance metrics. These include RMSE, MAE, accuracy, and AUC, with discussion of how and when to use each metric depending on the learning task.
Session 7 – 01:00:00 – Linear Regression for Prediction
This session focuses on linear regression as a machine learning method. Model assumptions and limitations are reviewed, and a predictive modelling approach is demonstrated using continuous outcomes.
Session 8 – 01:00:00 – Regularised Regression: Ridge, Lasso, and Elastic Net
The session introduces the concept of overfitting and presents regularisation methods as solutions. Ridge, lasso, and elastic net regression are implemented using the glmnet package and compared in terms of predictive performance.
Session 9 – 01:00:00 – Tree-Based Regression Models
Regression trees, random forests, and gradient boosting machines are explored as non-linear alternatives to linear models. The session demonstrates how tree-based models are constructed and how they handle complex interactions.
Session 10 – 01:00:00 – Evaluating and Comparing Regression Models
A variety of regression models are compared using resampling, cross-validation, and diagnostic plots. Emphasis is placed on selecting models based on appropriate error metrics.
Session 11 – 01:00:00 – Hyperparameter Tuning for Regression Models
This session introduces hyperparameter tuning using grid and random search. The tune and caret packages are used to demonstrate how tuning can improve model accuracy and reduce overfitting.
Session 12 – 01:00:00 – Example: Predicting Housing Prices
A worked example is presented, demonstrating a regression pipeline from raw data through to prediction. Techniques from earlier sessions are applied to a dataset on housing prices, including preprocessing, model fitting, evaluation, and comparison.
Session 13 – 01:00:00 – Logistic Regression for Classification
This session introduces logistic regression for binary outcomes, including model interpretation and probability estimation. The extension to multiclass classification is also discussed.
Session 14 – 01:00:00 – Classification Trees and Random Forests
Classification trees and random forests are explored for categorical outcomes. Topics include model structure, visual interpretation, variable importance, and performance evaluation.
Session 15 – 01:00:00 – k-Nearest Neighbours and Naïve Bayes Classifiers
Two contrasting classification approaches are introduced: k-nearest neighbours (a distance-based method) and naïve Bayes (a probabilistic model).
Session 16 – 01:00:00 – Boosted Trees for Classification with xgboost
Boosting is introduced as an ensemble technique to improve model performance. The xgboost package is used to demonstrate how gradient boosting builds strong classifiers from weak learners.
Session 17 – 01:00:00 – Evaluating Classifiers and Handling Imbalanced Data
This session covers evaluation metrics for classifiers, including ROC curves, precision-recall, and F1-score. It also introduces strategies for dealing with imbalanced classes, such as upsampling, downsampling, and SMOTE.
Session 18 – 01:00:00 – Example: Species Classification
A classification example is presented using a species trait dataset. The session integrates data preprocessing, model fitting, evaluation, and visualisation to reinforce the techniques introduced throughout the day.
Session 19 – 01:00:00 – Clustering with k-Means and Hierarchical Methods
Unsupervised learning is introduced through clustering methods. k-means and hierarchical clustering are implemented, with interpretation supported by distance metrics and silhouette scores.
Session 20 – 01:00:00 – Dimensionality Reduction with PCA and t-SNE
This session focuses on techniques for reducing data dimensionality while preserving structure. Principal component analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE) are used to visualise and summarise multivariate data.
Session 21 – 01:00:00 – Association Rule Learning with arules
The focus shifts to association rule learning, particularly for market basket analysis. The arules package is used to extract frequent itemsets and generate interpretable rules based on support, confidence, and lift.
Session 22 – 01:00:00 – Comparing and Selecting Models
Strategies for model selection are discussed, including the bias-variance trade-off, cross-validation, and selection based on performance metrics. Examples compare models side-by-side to inform selection decisions.
Session 23 – 01:00:00 – Addressing Missing Values and Outliers
This session covers techniques for dealing with common data quality issues. Demonstrations show how to identify and handle missing data and outliers in ways that preserve model validity.
Session 24 – 01:00:00 – Example: Unsupervised Techniques in Action
A full example of unsupervised learning is presented, incorporating clustering, dimensionality reduction, and visualisation. The workflow highlights the exploratory nature of unsupervised analysis.
Session 25 – 01:00:00 – Support Vector Machines for Classification
Support vector machines (SVMs) are introduced, including their use of kernels to separate classes in high-dimensional spaces. The session includes practical implementation for classification tasks.
Session 26 – 01:00:00 – Neural Networks Using R
This session explores the structure and function of neural networks. A neural network is built using the nnet package to demonstrate how these models learn patterns in data.
Session 27 – 01:00:00 – Model Interpretability Techniques
The session focuses on methods for interpreting machine learning models. Variable importance plots, partial dependence plots, and SHAP values are demonstrated, emphasising the need for transparency in complex models.
Session 28 – 01:00:00 – Time Series Data in Machine Learning
Approaches for applying machine learning to time series data are explored. Topics include feature extraction, lagged variables, and an introduction to time series forecasting using packages such as tsibble and fable.
Session 29 – 01:00:00 – Example: Advanced Model Techniques
An extended demonstration showcases a modelling pipeline that combines advanced techniques such as boosting, regularisation, and tuning. The example draws together content from earlier sessions in an applied workflow.
Session 30 – 01:00:00 – Course Summary and Recap
The course concludes with a structured summary of the main themes and methods covered. Additional code snippets are used to aid understanding, and time is given to reviewing concepts and addressing frequently asked questions.