3
Loading Events
Home Recorded Courses Introduction to Machine Learning (IMLRPR)
IMLRPR

Introduction to Machine Learning

Learn machine learning in R with practical, hands-on instruction. Covers supervised, unsupervised models, interpretability, and evaluation in 40 hours.

  • Duration: 40 Hours
  • Next Date: Available 13 October
  • Format: Recorded ‘on-demand’ Format

£450Registration Fee

Register Now

Like what you see? Click and share!

5.0

from 200+ reviews

Course Description

This 40 hour course provides a comprehensive introduction to machine learning using the R programming language, with a focus on practical model development, evaluation, and interpretation. Designed for applied researchers, the course covers both supervised and unsupervised learning, including regression, classification, clustering, and dimensionality reduction. Participants will learn to implement models using modern R packages such as tidymodels, caret, glmnet, and xgboost, and to assess model performance using appropriate metrics and validation techniques. Regularisation, ensemble methods, interpretability tools, and strategies for handling imbalanced data and missing values are introduced through live coding demonstrations. By the end of the course, participants will understand the strengths and limitations of common machine learning methods and be able to construct, tune, and interpret models for a wide range of applied data problems using R.

What You’ll Learn

  • Supervised learning techniques, including linear regression, logistic regression, decision trees, random forests, and boosted models.
  • Regularisation methods such as ridge regression, lasso, and elastic net for improving model generalisability.
  • Unsupervised learning methods including k-means clustering, hierarchical clustering, PCA, and t-SNE for pattern discovery and dimensionality reduction
    Implementation of machine learning workflows using R packages such as tidymodels, caret, glmnet, and xgboost.
  • Model evaluation and comparison using cross-validation, performance metrics (e.g. RMSE, AUC, F1-score), and residual analysis.
  • Handling common data challenges including missing values, outliers, class imbalance, and feature engineering.
  • Hyperparameter tuning and model selection using grid search and resampling-based validation.
  • Introduction to model interpretability tools such as variable importance plots, partial dependence plots, and SHAP values.

Course Format

Flexible Learning Structure

Learn through a carefully structured mix of lecture recordings and guided exercises that you can pause, revisit, and complete at your own pace—ideal for busy professionals or those balancing multiple commitments.

Access Anytime, Anywhere

All course content is available on-demand, making it accessible across all time zones without the need to attend live sessions or adjust your schedule.

Independent Exploration with Support

Engage deeply with course topics through self-directed study, with the option to reach out to instructors via email for clarification or deeper discussion.

Comprehensive Learning Resources

Gain full access to the same high-quality materials provided in live sessions, including code, datasets, and presentation slides—all available to download and keep. Please note recordings can only be streamed.

Work With Your Own Data, On Your Terms

Apply what you learn directly to your own data projects as you go, allowing for a personalized and immediately practical learning experience.

Continued Guidance and Resource Access

Receive 30 days of post-enrolment email support and unrestricted access to all session recordings during that time, so you can review and reinforce your learning as needed.

Who Should Attend / Intended Audiences

This course is aimed at applied scientists, data analysts, postgraduate students, and early-career researchers with a basic background in using R and RStudio, such as importing data, working with data frames, running simple functions, and creating basic plots. Participants are expected to understand fundamental statistical concepts, including mean, variance, correlation, and linear regression. No prior experience with machine learning is required, as all methods will be introduced from first principles. The course emphasizes interpretation and practical application over mathematical derivation, making it accessible to those comfortable working with data in R.

Not required but helpful: Experience with data wrangling (e.g., using dplyr or tidyr), basic plotting with ggplot2, and reading model output.
 

Equipment and Software requirements

A laptop or desktop computer with a functioning installation of R and RStudio is required. Both R and RStudio are free, open-source programs compatible with Windows, macOS, and Linux systems.

While not essential, using a large monitor—or ideally a dual-monitor setup—can significantly enhance your learning experience by allowing you to view course materials and work in R simultaneously.

All necessary R packages will be introduced and installed during the workshop.

Download R Download RStudio Download Zoom

Dr. Rafael Molina Venegas

Dr. Rafael Molina Venegas

Rafael is a phylogenetic plant ecologist whose research sits at the intersection of community ecology, macroecology, phylogenetics, and human well-being. His scientific career revolves around three interconnected themes: (1) understanding the ecological and evolutionary mechanisms that jointly shape species assemblages at both community and macroecological scales, (2) the development, improvement, and critical assessment of phylogenetic methods, and (3) exploring the links between biodiversity and human well-being. While diverse, these research lines are unified by a strong phylogenetic perspective, with plants as his primary passion and study system.

 

Education & Career
• PhD in Integrative Biology from the Department of Plant Biology and Ecology, Universidad de Sevilla, Spain.
• Affiliated with the GloCEE – Global Change Ecology and Evolution Group, within the Department of Life Sciences at the Universidad de Alcalá, Alcalá de Henares, Spain. Additionally, he holds a position as a researcher at the Estación Biológica de Doñana (CSIC), indicated through institutional listings and Google Scholar affiliation.

 

Research Focus
Rafael’s work centres on:
• Ecological and evolutionary drivers of species assemblages at multiple spatial scales
• Development and refinement of phylogenetic methods for ecology and biodiversity research
• Exploring how biodiversity underpins ecosystem services and human well-being
• Plant phylogenetics as a cross-cutting framework for ecological and evolutionary research

 

Current Projects
• Investigating community assembly and macroecological biodiversity patterns in plants
• Testing and improving methods in comparative phylogenetics and eco-phylogenetics
• Assessing the role of biodiversity and phylogenetic diversity in supporting human well-being and ecosystem services

 

Professional Consultancy & Teaching
Rafael contributes his expertise in phylogenetic ecology and comparative analyses to academic and applied research projects. He also supervises and trains postgraduate students in biodiversity science, macroecology, and phylogenetic methods, and regularly contributes to collaborative, interdisciplinary projects at the interface of ecology, evolution, and conservation.

 

Links
ResearchGate
Google Scholar
University Profile
Personal Site

Dr. Ignacio Morales-Castilla

Dr. Ignacio Morales-Castilla

Ignacio is a biogeographer and macroecologist whose research focuses on the spatial and temporal distribution of biodiversity. His work integrates ecological, evolutionary, and biogeographical perspectives to better understand how species assemble into communities and how biodiversity patterns emerge across scales. His research program aims to: (1) disentangle the relative roles of evolution and ecology as drivers of community structure, (2) investigate how different aspects of species’ niches are evolutionarily conserved, and (3) improve models of biotic interactions and species distributions by incorporating phylogenetic, functional, and geographic information.

 

Education & Career
• PhD in Ecology from the University of Alcalá (Universidad de Alcalá), Alcalá de Henares, Spain.
• Affiliated with the the GloCEE – Global Change Ecology and Evolution Group, in the Department of Life Sciences at the Universidad de Alcalá, Alcalá de Henares, Spain, where hue is a is currently a Beatriz Galindo Fellow.

 

Research Focus
Ignacio’s work centres on:
• Disentangling ecological and evolutionary processes that shape community structure
• Niche evolution and the degree of evolutionary conservatism across traits and taxa
• Integrating phylogenetic and functional data into species distribution models
• Advancing models of biotic interactions and biodiversity under global change

 

Current Projects
• Developing integrative models of species distributions that account for evolutionary history and functional traits
• Exploring the role of evolutionary constraints in shaping biodiversity patterns across large spatial and temporal scales
• Assessing biodiversity responses to global change by linking ecology, evolution, and biogeography

 

Professional Consultancy & Teaching
Ignacio contributes expertise in macroecology, biodiversity modelling, and biogeography to interdisciplinary projects. He trains students and researchers in phylogenetic and functional approaches to ecology, and is engaged in advancing reproducible, open science practices within biodiversity research.

 

Links
ResearchGate
ORCID
University Profile
Personal Site
GitHub

Session 1 – 01:00:00 – Introduction to Machine Learning
This session introduces the core ideas of machine learning, including how it differs from traditional statistical modelling. Topics include the difference between supervised and unsupervised learning, the contrast between classification and regression, and typical applications across various domains.

Session 2 – 01:00:00 – The Machine Learning Workflow
The focus in this session is on the structure of a typical machine learning pipeline. This includes steps such as data splitting, preprocessing, model training, validation, and performance evaluation, with emphasis on reproducibility and good workflow practices.

Session 3 – 01:00:00 – Data Preprocessing in R
This session explores key steps in preparing data for machine learning. Topics include dealing with missing values, scaling and transforming variables, encoding categorical predictors, and building preprocessing pipelines using the recipes package.

Session 4 – 01:00:00 – Exploratory Data Analysis for Machine Learning
Using visual and numerical summaries, this session examines how to perform exploratory data analysis (EDA) in a machine learning context. Relationships between variables, outliers, and data distributions are explored using tools such as ggplot2.

Session 5 – 01:00:00 – Model Fitting Frameworks in R
This session introduces the use of the caret and tidymodels frameworks for fitting machine learning models in R. Topics include data splitting, resampling, defining workflows, and generating predictions.

Session 6 – 01:00:00 – Baseline Models and Performance Metrics
This session focuses on constructing simple models and introducing key performance metrics. These include RMSE, MAE, accuracy, and AUC, with discussion of how and when to use each metric depending on the learning task.

Session 7 – 01:00:00 – Linear Regression for Prediction
This session focuses on linear regression as a machine learning method. Model assumptions and limitations are reviewed, and a predictive modelling approach is demonstrated using continuous outcomes.

Session 8 – 01:00:00 – Regularised Regression: Ridge, Lasso, and Elastic Net
The session introduces the concept of overfitting and presents regularisation methods as solutions. Ridge, lasso, and elastic net regression are implemented using the glmnet package and compared in terms of predictive performance.

Session 9 – 01:00:00 – Tree-Based Regression Models
Regression trees, random forests, and gradient boosting machines are explored as non-linear alternatives to linear models. The session demonstrates how tree-based models are constructed and how they handle complex interactions.

Session 10 – 01:00:00 – Evaluating and Comparing Regression Models
A variety of regression models are compared using resampling, cross-validation, and diagnostic plots. Emphasis is placed on selecting models based on appropriate error metrics.

Session 11 – 01:00:00 – Hyperparameter Tuning for Regression Models
This session introduces hyperparameter tuning using grid and random search. The tune and caret packages are used to demonstrate how tuning can improve model accuracy and reduce overfitting.

Session 12 – 01:00:00 – Example: Predicting Housing Prices
A worked example is presented, demonstrating a regression pipeline from raw data through to prediction. Techniques from earlier sessions are applied to a dataset on housing prices, including preprocessing, model fitting, evaluation, and comparison.

Session 13 – 01:00:00 – Logistic Regression for Classification
This session introduces logistic regression for binary outcomes, including model interpretation and probability estimation. The extension to multiclass classification is also discussed.

Session 14 – 01:00:00 – Classification Trees and Random Forests
Classification trees and random forests are explored for categorical outcomes. Topics include model structure, visual interpretation, variable importance, and performance evaluation.

Session 15 – 01:00:00 – k-Nearest Neighbours and Naïve Bayes Classifiers
Two contrasting classification approaches are introduced: k-nearest neighbours (a distance-based method) and naïve Bayes (a probabilistic model).

Session 16 – 01:00:00 – Boosted Trees for Classification with xgboost
Boosting is introduced as an ensemble technique to improve model performance. The xgboost package is used to demonstrate how gradient boosting builds strong classifiers from weak learners.

Session 17 – 01:00:00 – Evaluating Classifiers and Handling Imbalanced Data
This session covers evaluation metrics for classifiers, including ROC curves, precision-recall, and F1-score. It also introduces strategies for dealing with imbalanced classes, such as upsampling, downsampling, and SMOTE.

Session 18 – 01:00:00 – Example: Species Classification
A classification example is presented using a species trait dataset. The session integrates data preprocessing, model fitting, evaluation, and visualisation to reinforce the techniques introduced throughout the day.

Session 19 – 01:00:00 – Clustering with k-Means and Hierarchical Methods
Unsupervised learning is introduced through clustering methods. k-means and hierarchical clustering are implemented, with interpretation supported by distance metrics and silhouette scores.

Session 20 – 01:00:00 – Dimensionality Reduction with PCA and t-SNE
This session focuses on techniques for reducing data dimensionality while preserving structure. Principal component analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE) are used to visualise and summarise multivariate data.

Session 21 – 01:00:00 – Association Rule Learning with arules
The focus shifts to association rule learning, particularly for market basket analysis. The arules package is used to extract frequent itemsets and generate interpretable rules based on support, confidence, and lift.

Session 22 – 01:00:00 – Comparing and Selecting Models
Strategies for model selection are discussed, including the bias-variance trade-off, cross-validation, and selection based on performance metrics. Examples compare models side-by-side to inform selection decisions.

Session 23 – 01:00:00 – Addressing Missing Values and Outliers
This session covers techniques for dealing with common data quality issues. Demonstrations show how to identify and handle missing data and outliers in ways that preserve model validity.

Session 24 – 01:00:00 – Example: Unsupervised Techniques in Action
A full example of unsupervised learning is presented, incorporating clustering, dimensionality reduction, and visualisation. The workflow highlights the exploratory nature of unsupervised analysis.

Session 25 – 01:00:00 – Support Vector Machines for Classification
Support vector machines (SVMs) are introduced, including their use of kernels to separate classes in high-dimensional spaces. The session includes practical implementation for classification tasks.

Session 26 – 01:00:00 – Neural Networks Using R
This session explores the structure and function of neural networks. A neural network is built using the nnet package to demonstrate how these models learn patterns in data.

Session 27 – 01:00:00 – Model Interpretability Techniques
The session focuses on methods for interpreting machine learning models. Variable importance plots, partial dependence plots, and SHAP values are demonstrated, emphasising the need for transparency in complex models.

Session 28 – 01:00:00 – Time Series Data in Machine Learning
Approaches for applying machine learning to time series data are explored. Topics include feature extraction, lagged variables, and an introduction to time series forecasting using packages such as tsibble and fable.

Session 29 – 01:00:00 – Example: Advanced Model Techniques
An extended demonstration showcases a modelling pipeline that combines advanced techniques such as boosting, regularisation, and tuning. The example draws together content from earlier sessions in an applied workflow.

Session 30 – 01:00:00 – Course Summary and Recap
The course concludes with a structured summary of the main themes and methods covered. Additional code snippets are used to aid understanding, and time is given to reviewing concepts and addressing frequently asked questions.

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Frequently asked questions

Everything you need to know about the product and billing.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

Still have questions?

Can’t find the answer you’re looking for? Please chat to our friendly team.

×

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
IMLRPR RECORDED
IMLRPR RECORDED
£ 450.00
Unlimited
£450.00
22nd October 2036 - 24th October 2036
Recorded, United Kingdom
Saturn (the planet)