3
Loading Events
Home Recorded Courses Model Selection and Model Simplification (MSMS06)
MSMS06

Model Selection and Model Simplification

This two-day hands-on course teaches researchers how to select, compare, and evaluate statistical models in R using cross-validation, information criteria (AIC, AICc, BIC), and regularization methods. Participants will learn model averaging, mixed effects model selection, and best practices for transparent, theory-driven analysis.

  • Duration: 2 Days, 6 hours per day
  • Next Date: December 9-10, 2025
  • Format: Live Online Format
TIME ZONE

UK (GMT) local time - All sessions will be recorded and made available to ensure accessibility for attendees across different time zones.

£300Registration Fee

Register Now

Like what you see? Click and share!

5.0

from 200+ reviews

Course Description

This two-day course provides practical training in statistical model building, evaluation, comparison, and selection for empirical researchers. Participants will learn principled approaches to choosing among competing models, handling multiple predictors, and accounting for model uncertainty. The course covers cross-validation, information criteria (AIC, AICc, BIC), variable selection methods including regularization (ridge, lasso, elastic net), and model averaging using Akaike weights. Special attention is given to mixed effects model selection, the problems with stepwise methods, and the critical distinction between prediction, explanation, and causal inference. Through hands-on examples with real research data, participants will develop practical workflows in R for comparing models, making model-averaged predictions, and reporting results appropriately. By the end of the course, participants will be able to move beyond automatic model selection and apply thoughtful, theory-driven approaches to their own research.

What You’ll Learn

During the course we will cover the following:

  • Understand the bias-variance tradeoff, overfitting, and why in-sample fit can be misleading.
  • Use cross-validation and information criteria (AIC, AICc, BIC) to evaluate out-of-sample predictive performance.
  • Compare nested and non-nested models appropriately using likelihood ratio tests, F-tests, and information criteria.
  • Recognize the multiple comparisons problem and distinguish between exploratory and confirmatory analysis.
  • Implement variable selection methods including stepwise regression, all-subsets selection, and regularization approaches (ridge, lasso, elastic net).
  • Calculate Akaike weights and make model-averaged predictions that account for model uncertainty.
  • Use confidence sets of models to report when multiple models receive substantial support.
  • Apply model selection methods specifically to mixed effects models, including understanding the crucial REML vs. ML distinction and strategies for comparing fixed and random effects structures.
  • Distinguish between prediction, explanation, and causal inference goals and how this affects model selection.
  • Report model selection results honestly and appropriately in publications.

Course Format

Interactive Learning Format

Each day features a well-balanced combination of lectures and hands-on practical exercises, with dedicated time for discussing participants’ own data, time permitting.

Global Accessibility

All live sessions are recorded and made available on the same day, ensuring accessibility for participants across different time zones.

Collaborative Discussions

Open discussion sessions provide an opportunity for participants to explore specific research questions and engage with instructors and peers.

Comprehensive Course Materials

All code, datasets, and presentation slides used during the course will be shared with participants by the instructor.

Personalized Data Engagement

Participants are encouraged to bring their own data for discussion and practical application during the course.

Post-Course Support

Participants will receive continued support via email for 30 days following the course, along with on-demand access to session recordings for the same period.

Who Should Attend / Intended Audiences

This course is designed for empirical researchers who work with regression models and face decisions about which variables to include or which models best represent their data. If you need to compare competing hypotheses or theoretical models using observational data, this course will provide practical tools and principled frameworks for model selection and comparison. Familiarity with R is required. You should be comfortable loading data, running basic regression models, and installing packages. No programming expertise is needed, but you should be able to follow R code and adapt examples to your own data. A solid foundation in basic statistics is expected. You should understand linear regression, p-values, confidence intervals, and hypothesis testing. Familiarity with generalized linear models (GLMs) or mixed effects models is helpful but not required—we will briefly review these as needed.

Equipment and Software requirements

A laptop or desktop computer with a functioning installation of R and RStudio is required. Both R and RStudio are free, open-source programs compatible with Windows, macOS, and Linux systems.

A working webcam is recommended to support interactive elements of the course. We encourage participants to keep their cameras on during live Zoom sessions to foster a more engaging and collaborative environment.

While not essential, using a large monitor – or ideally a dual-monitor setup – can significantly enhance your learning experience by allowing you to view course materials and work in R or linux simultaneously.

All necessary R packages will be introduced and installed during the workshop. A comprehensive list of required packages will also be shared with participants ahead of the course to allow for optional pre-installation.

Download R Download RStudio Download Zoom

Dr. Mark Andrews

Dr. Mark Andrews

Mark is a psychologist and statistician whose work lies at the intersection of cognitive science, Bayesian data analysis, and applied statistics. His research focuses on developing and testing Bayesian models of human cognition, with a particular emphasis on language processing and memory. He also works extensively on the theory and application of Bayesian statistical methods in the social and behavioural sciences, bridging methodological advances with real-world research challenges.

Since 2015, Mark has co-led a programme of intensive workshops on Bayesian data analysis for social scientists, funded by the UK’s Economic and Social Research Council (ESRC). These workshops have trained hundreds of researchers in the practical application of Bayesian methods, particularly through R and modern statistical packages.

 

Education & Career
• PhD in Psychology, Cornell University, New York (Cognitive Science, Bayesian Models of Cognition)
• MA in Psychology, Cornell University, New York
• BA (Hons) in Psychology, National University of Ireland
• Senior Lecturer in Psychology, Nottingham Trent University, England

 

Research Focus
Mark’s work centres on:
• Bayesian models of human cognition, especially in language processing and memory
• General Bayesian data analysis methods for the social and behavioural sciences
• Comparative studies of Bayesian vs. classical approaches to inference and model comparison
• Promoting reproducibility and transparent statistical practice in psychological research

 

Current Projects
• Developing Bayesian cognitive models of memory and linguistic comprehension
• Exploring Bayesian approaches to regression, multilevel, and mixed-effects models in psychology and social science research
• Co-leading ESRC-funded workshops on Bayesian data analysis for applied researchers

 

Professional Consultancy & Teaching
Mark provides expert training and advice in Bayesian data analysis for academic and applied research projects. His teaching portfolio includes courses and workshops on:
• Bayesian linear and generalized linear models
• Multilevel and mixed-effects models
• Cognitive modelling with Bayesian methods
• Applied statistics in R for psychologists and social scientists

He is also an advocate of open science and is experienced in communicating complex statistical methods to diverse audiences.

 

Teaching & Skills
• Instructor in Bayesian statistics, time series modelling, and machine learning
• Strong advocate for reproducibility, open-source tools, and accessible education
• Skilled in R, Stan, JAGS, and statistical computing for large datasets
• Experienced mentor and workshop leader at all academic levels

 

Links
University Profile
Personal Page
ResearchGate

Session 1 – 02:00:00 – The Model Selection Problem
This session establishes why model selection matters for empirical research. We explore the fundamental tension between model complexity and predictive accuracy through the bias-variance tradeoff, using concrete examples to illustrate overfitting and its consequences. A key theme is distinguishing between prediction, explanation, and causal inference as fundamentally different research goals that require different modeling approaches. We review standard model fit measures and discuss why in-sample fit can be misleading, setting the stage for principled model comparison methods.

Break – 01:00:00

Session 2 – 02:00:00 – Out-of-Sample Prediction and Information Criteria
This session introduces methods for evaluating models based on their out-of-sample predictive performance. We begin with cross-validation as the conceptual gold standard, covering both k-fold and leave-one-out approaches and their practical implementation. We then turn to information criteria as computationally efficient alternatives, explaining AIC, AICc (particularly important for small samples), and BIC. The session emphasizes understanding what these criteria measure, how to interpret differences between models, and when each approach is most appropriate. Hands-on exercises demonstrate these methods with real research data.

Break – 01:00:00

Session 3 – 02:00:00 – Model Comparison Frameworks
This session examines different frameworks for comparing statistical models. We distinguish between nested and non-nested model comparisons, explaining when likelihood ratio tests and F-tests are appropriate versus when information criteria are needed. A critical topic is the multiple comparisons problem: how testing many models increases the risk of spurious findings. We discuss the distinction between exploratory and confirmatory analysis and emphasize the importance of honest reporting when model selection has been performed. Practical exercises compare nested and non-nested models on real datasets.

Session 4 – 02:00:00 – Variable Selection Methods
This session addresses the common research problem of choosing among many potential predictor variables. We examine automated selection methods including stepwise regression and all-subsets selection, with particular attention to the well-documented problems with stepwise approaches. The session then introduces regularization methods (ridge regression, lasso, elastic net) as modern alternatives that handle collinearity and perform variable selection through penalization. Hands-on exercises compare these different approaches on the same dataset to illustrate how much results can vary and help participants understand which method suits which situation.

Break – 01:00:00

Session 5 – 02:00:00 – Model Averaging
This session introduces multi-model inference as an alternative to selecting a single “best” model. We explore how Akaike weights quantify the relative support for competing models in a candidate set. The session focuses particularly on model-averaged predictions, which provide a principled way to account for model uncertainty when making forecasts. We discuss confidence sets of models as a way to honestly report when multiple models receive substantial support. Practical examples demonstrate workflows for calculating model-averaged predictions and interpreting results, with attention to what should and shouldn’t be averaged.

Break – 01:00:00

Session 6 – 02:00:00 – Mixed Effects Model Selection
This session addresses the special considerations that arise when selecting among mixed effects models. Mixed models present unique challenges for model selection because they involve both fixed and random effects, and standard model comparison procedures must be applied carefully. We cover the crucial distinction between REML and ML estimation and when each should be used for model comparison. The session examines strategies for comparing fixed effects structures, random effects structures, and combinations of both. We discuss practical workflows for stepwise simplification of mixed models, the use of information criteria with mixed models, and common pitfalls in mixed model selection. Hands-on examples demonstrate these principles using multilevel data

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PRStats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PRStats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Frequently asked questions

Everything you need to know about the product and billing.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

Still have questions?

Can’t find the answer you’re looking for? Please chat to our friendly team.

×
9 December 2025 - 10 December 2025
Delivered remotely (United Kingdom), Western European Time Zone, United Kingdom
A group of planets