3
Loading Events
Home Recorded Courses Advancing in R (ADVRPR)
ADVRPR

Advancing in R

Comprehensive on-demand course in data wrangling, visualisation, GLMs, mixed models, and model selection using R. Ideal for researchers in ecology, biology, and social sciences.

  • Duration: 40 Hours
  • Format: Recorded ‘on-demand’ Format

£450Registration Fee

Register Now

Like what you see? Click and share!

5.0

from 200+ reviews

Course Description

This course is designed to provide attendees with a comprehensive understanding of statistical modelling and its applications in various fields, such as ecology, biology, sociology, agriculture, and health. We cover all foundational aspects of modelling, including all coding aspects, ranging from data wrangling, visualisation and exploratory data analysis, to generalized linear mixed models, assessing goodness-of-fit and carrying out model comparison.

What You’ll Learn

During the course will cover the following:

  • Data wrangling.
  • Data manipulation.
  • Data visualisation.
  • Generalized linear models.
  • Mixed models.
  • Model selection and model simplification.

Course Format

Flexible Learning Structure

Learn through a carefully structured mix of lecture recordings and guided exercises that you can pause, revisit, and complete at your own pace—ideal for busy professionals or those balancing multiple commitments.

Access Anytime, Anywhere

All course content is available on-demand, making it accessible across all time zones without the need to attend live sessions or adjust your schedule.

Independent Exploration with Support

Engage deeply with course topics through self-directed study, with the option to reach out to instructors via email for clarification or deeper discussion.

Comprehensive Learning Resources

Gain full access to the same high-quality materials provided in live sessions, including code, datasets, and presentation slides—all available to download and keep. Please note recordings can only be streamed.

Work With Your Own Data, On Your Terms

Apply what you learn directly to your own data projects as you go, allowing for a personalized and immediately practical learning experience.

Continued Guidance and Resource Access

Receive 30 days of post-enrolment email support and unrestricted access to all session recordings during that time, so you can review and reinforce your learning as needed.

Who Should Attend / Intended Audiences

This course is aimed at anyone interested in using R for data science or statistical analysis. R is a widely used language across academic research, as well as in the public and private sectors, due to its flexibility and extensive range of statistical and graphical capabilities. To get the most out of the course, participants should have a basic understanding of statistical concepts, including generalised linear regression models, hypothesis testing, and statistical significance. Familiarity with R is also expected, particularly the ability to import and export data, manipulate data frames, fit basic statistical models, and produce simple exploratory and diagnostic plots.

Equipment and Software requirements

A laptop or desktop computer with a functioning installation of R and RStudio is required. Both R and RStudio are free, open-source programs compatible with Windows, macOS, and Linux systems.

While not essential, using a large monitor—or ideally a dual-monitor setup—can significantly enhance your learning experience by allowing you to view course materials and work in R simultaneously.

All necessary R packages will be introduced and installed during the workshop.

Download R Download RStudio Download Zoom

Dr. Rafael De Andrade Moral

Dr. Rafael De Andrade Moral

Rafael is a statistician working at the intersection of ecological science, environmental research, and applied statistical modelling. His work focuses on developing and applying statistical and mathematical tools to understand ecological dynamics, improve wildlife management strategies, and support sustainable agricultural and environmental practices. With a strong foundation in both biology and statistics, Rafael’s research spans areas such as hierarchical modelling, population dynamics, and the integration of ecological theory with real-world data.
Rafael holds a PhD in Statistics from the University of São Paulo, building on an undergraduate background in Biology. He is currently an Associate Professor of Statistics at Maynooth University, Ireland, where he also leads the Theoretical and Statistical Ecology Group — a multidisciplinary research hub dedicated to advancing quantitative ecology.
In addition to his academic work, Rafael is deeply invested in science communication and innovative teaching. He produces educational music videos and statistical parodies, using creative media to make statistical concepts more engaging and accessible to students and the public alike.

 

Education & Career
• PhD in Statistics – University of São Paulo
• BSc in Biology
• Associate Professor of Statistics – Maynooth University
• Director – Theoretical and Statistical Ecology Group

 

Research Focus
Rafael’s research is rooted in ecological and environmental statistics, particularly:
• Statistical modelling of species distributions and abundance
• Applications of Bayesian and hierarchical models in wildlife and agricultural contexts
• Integrative approaches combining field data, simulation, and theory to inform policy and conservation
• Methodological innovation in data-poor or complex ecological systems

 

Current Projects
• Statistical methods for population modelling and biodiversity monitoring
• Quantitative frameworks for wildlife management under uncertainty
• Modelling ecological responses to climate and land-use changes
• Public outreach through creative science communication in Statistics

 

Professional Activites
Rafael collaborates widely with ecologists, conservationists, and agricultural scientists, providing expert statistical input on study design, modelling, and data analysis. He also supervises postgraduate research across interdisciplinary projects in quantitative ecology.

 

Teaching & Skills
• Teaches courses in statistical modelling, environmental statistics, and data analysis in R
• Promotes engaging and inclusive teaching practices, including music-based educational content
• Advocates for open science, reproducibility, and the integration of theory with application

 

Links
ResearchGate
Google Scholar
ORCID
GitHub

Session 1 – 01:20:00 – Reading in data.
We will begin by reading in data into R using tools such as readr and readxl. Almost all types of data can be read into R, and here we will consider many of the main types, such as csv, xlsx, sav, etc. Here, we will also consider how to control how data are parsed, e.g., so that they are read as dates, numbers, strings, etc.

Session 2 – 01:20:00 – Wrangling with dplyr.
We will next cover the very powerful dplyr R package. This package supplies a number of so-called “verbs” — select, rename, slice, filter, mutate, arrange, etc. — each of which focuses on a key data manipulation tool, such as selecting or changing variables. All of these verbs can be chained together using “pipes” (represented by %>%). Together, these create powerful data wrangling pipelines that take raw data as input and return cleaned data as output. Here, we will also learn about the key concept of “tidy data”, which is roughly where each row of a data frame is an observation, and each column is a variable.

Session 3 – 01:20:00 – Summarizing data.
The summarize and group by tools in dplyr can be used with great effect to summarize data using descriptive statistics.

Session 4 – 01:30:00 – Merging and joining data frames.
There are multiple ways to combine data frames, with the simplest being “bind” operations, which are effectively horizontal or vertical concatenations. Much more powerful are the SQL-like “join” operations. Here, we will consider the inner_join, left_join, right_join, full_join operations. In this section, we will also consider how to use purrr to read in and automatically merge large sets of files.

Session 5 – 01:30:00 – Pivoting data.
Sometimes we need to change data frames from “long” to “wide”
formats. The R package tidyr provides the tools pivot_longer and pivot_wider for doing this.

Session 6 – 01:00:00 – What is data visualization.
Data visualization is a means to explore and understand
our data and should be a major part of any data analysis. Here, we briefly discuss why data
visualization is so important and what the major principles behind it are.

Session 7 – 01:00:00 – Introducing ggplot.
Though there are many options for visualization in R, ggplot is simply the best. Here, we briefly introduce the major principles behind how ggplot works, namely how it is a layered grammar of graphics.

Session 8 – 01:00:00 – Visualizing univariate data.
Here, we cover a set of major tools for visualizing
distributions over single variables: histograms, density plots, barplots, Tukey boxplots. In each case, we will explore how to plot multiple groups of data simultaneously using different colours and also using facet plots.

Session 9 – 01:00:00 – Scatterplots.
Scatterplots and their variants are used to visualize bivariate data. Here, in addition to covering how to visualize multiple groups using colours and facets, we will also cover how to provide marginal plots on the scatterplots, labels to points, and how to obtain linear and nonlinear smoothing of the plots.

Session 10 – 01:00:00 – More plot types.
Having already covered the most widely used general purpose plots, we now turn to cover a range of other major plot types: frequency polygons, area plots, line plots, uncertainty plots, violin plots, and geospatial mapping. Each of these are important and widely used types of plots and knowing them will expand your repertoire.

Session 10 – 01:00:00 – Fine control of plots.
Thus far, we will have mostly used the default for the plot styles and layouts. Here, we will introduce how to modify things like the limits and scales on the axes, the positions and nature of the axis ticks, the colour palettes that are used, and the different types of ggplot themes that are available.

Session 11 – 01:00:00 – Plots for publications and presentations.
Thus far, we have primarily focused on data visualization as a means of interactively exploring data. Often, however, we also want to present our plots in, for example, published articles or in slide presentations. It is simple to save a plot in different file formats, and then insert them into a document. However, a much more efficient way of doing this is to use RMarkdown to run the R code and automatically insert the resulting figure into a, for example, Word document, pdf document, html page, etc. In addition, here we will also cover how to make labelled grids of subplots like those found in many scientific articles.

Session 12 – 01:20:00 – The general linear model.
We begin by providing an overview of the normal, as in normal distribution, general linear model, including using categorical predictor variables. Although this model is not the focus of the course, it is the foundation on which generalized linear models are based and so must be understood to understand generalized linear models.

Session 13 – 01:20:00 – Binary logistic regression.
Our first generalized linear model is the binary logistic regression model, for use when modelling binary outcome data. We will present the assumed theoretical model behind logistic regression, implement it using R’s glm, and then show how to interpret its results, perform predictions, and (nested) model comparisons.

Session 14 – 01:20:00 – Binomial logistic regression.
Here, we show how the binary logistic regression can be extended to deal with data on discrete proportions. We will also present alternative link functions to the logit, such as the probit and complementary log-log links.

Session 15 – 01:30:00 – Categorical logistic regression.
Categorical logistic regression, also known as multinomial logistic regression, is for modelling polychotomous data, i.e. data taking more than two categorically distinct values. Categorical logistic regression is based on an extension of the binary logistic regression case.

Session 16 – 01:30:00 – Poisson regression.
Poisson regression is a widely used technique for modelling count data, i.e., data where the variable denotes the number of times an event has occurred.

Session 16 – 01:10:00 – Measuring model fit.
Here, the concept of conditional probability of the observed data, or of future data, is of vital importance. This is intimately related, though distinct, to concept of likelihood and the likelihood function, which is in turn related to the concept of the log likelihood or deviance of a model. Here, we also show how these concepts are related to concepts of residual sums of squares, root mean square error (rmse), and deviance residuals.

Session 17 – 01:10:00 – Nested model comparison.
In this section, we cover how to do nested model comparison in general linear models, generalized linear models, and their mixed effects (multilevel) counterparts. First, we precisely define what is meant by a nested model. Then we show how nested model comparison can be accomplished in general linear models with F tests, which we will also discuss in relation to R^2 and adjusted R^2. In generalized linear models, we can accomplish nested model comparison using deviance based chi-square tests via Wilks’s theorem.

Session 18 – 01:10:00 – Overdispersion models.
The quasi-likelihood approach for both the Poisson and binomial models. Negative binomial regression. The negative binomial model is, like the Poisson regression model, used for unbounded count data, but it is less restrictive than Poisson regression, specifically by dealing with over-dispersed data. Beta-binomial regression. The beta-binomial model is an over-dispersed alternative to the binomial.

Session 19 – 01:10:00 – Zero inflated models.
Zero inflated count data is where there are excessive numbers of zero counts that can be modelled using either a Poisson or negative binomial model. Zero inflated Poisson or negative binomial models are types of latent variable models.

Session 20 – 01:10:00 – Random effects models.
The defining feature of multilevel models is that they are models of models. We begin by using a binomial random effects model to illustrate this. Specifically, we show how multilevel models are models of the variability in models of different clusters or groups of data.

Session 21 – 01:10:00 – Normal random effects models.
Normal, as in normal distribution, random effects models are the key to understanding the more general and widely used linear mixed effects models. Here, we also cover the key concepts of statistical shrinkage and intraclass correlation.

Session 22 – 00:45:00 – Out of sample predictive performance: cross validation and information criteria.
Here, we describe how to measure out of sample predictive performance, which measures
how well a model can generalize to new data. This is arguably the gold-standard for
evaluating any statistical models. A practical means to measure out of sample predictive
performance is cross-validation, especially leave-one-out cross-validation. Leave-one-out
cross-validation can, in relatively simple models, be approximated by Akaike Information
Criterion (AIC), which can be exceptionally simple to calculate. We will discuss how to
interpret AIC values, and describe other related information criteria, some of which will be
used in more detail in later sections.

Session 23 – 00:45:00 – Linear mixed effects models.
Next, we turn to multilevel linear models, also known as linear mixed effects models. We specifically deal with the cases of varying intercept and/or varying slope linear regression models.

Session 24 – 00:45:00 – Multilevel models for nested data.
Here, we will consider multilevel linear models for nested, as in groups of groups, data. As an example, we will look at multilevel linear models applied to data from students within classes that are themselves within different schools, and where we model the variability of effects across the classes and across the schools.

Session 25 – 00:45:00 – Multilevel models for crossed data.
In some multilevel models, each observation occurs in multiple groups, but these groups are not nested. For example, animals may be members of different species and in different locations, but the species are not subsets of locations, nor vice versa. These are known as crossed or multiclass data structures.

Session 26 – 00:45:00 – Group level predictors.
In some multilevel regression models, predictor variable are sometimes associated with individuals, and sometimes associated with their groups. In this section, we consider how to handle these two situations.

Session 27 – 00:45:00 – Generalized linear mixed models (GLMMs).
Here, we extend the linear mixed model to the exponential family of distributions and showcase an example using the Poisson GLMM. We also cover how to accommodate overdispersion through individual-level random effects.

Session 28 – 00:45:00 – Bayesian multilevel models.
All the models that we have considered can be handled, often more easily, using Bayesian models. Here, we provide an brief introduction to Bayesian models and how to perform examples of the models that we have considered using Bayesian methods and the brms R package.

Session 29 – 00:45:00 – Variable selection.
Variable selection is a type of nested model comparison. It is also one of the most widely used model selection methods, and variable selection of some kind is almost always done routinely in all data analysis. In particular, we cover stepwise regression (and its limitations), all subsets’ methods, ridge regression, Lasso, and elastic nets.

Session 30 – 00:45:00 – Model averaging.
Rather than selecting one model from a set of candidates, it is arguably always better performing model averaging, using all the candidates models, weighted by the predictive performance. We show how to perform model average using information criteria.

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Frequently asked questions

Everything you need to know about the product and billing.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

Still have questions?

Can’t find the answer you’re looking for? Please chat to our friendly team.

×

Tickets

The numbers below include tickets for this event already in your cart. Clicking "Get Tickets" will allow you to edit any existing attendee information as well as change ticket quantities.
ADVRPR RECORDED
ADVRPR RECORDED
£ 450.00
Unlimited
£450.00
4th August 2035 - 6th August 2035
Recorded, United Kingdom