Online Courses

MLEG01

Machine Learning for Evolutionary Genomics (MLEG01)

Name: Machine Learning for Evolutionary Genomics (MLEG01)
Start: 2026-09-14T00:00:00+01:00
End: 2026-09-23T23:59:59+01:00
Location: Delivered remotely (United Kingdom)

Machine Learning for Evolutionary Genomics: learn predictive modelling, data analysis, and AI methods for genomic data.

Duration: 25 hours
Next Date: September 14-18, 2026
Format: Live Online Format

TIME ZONE

Sweden (GMT+2) local time - All sessions will be recorded and made available to ensure accessibility for attendees across different time zones.

^£400Registration Fee

Overview
Instructors
Schedule

Course Description

Machine learning is rapidly transforming ecology, evolutionary biology, population genomics, and metagenomics by enabling researchers to extract patterns from increasingly large and complex datasets. This course provides a practical introduction to both classical and modern machine learning approaches, with a particular focus on biological applications. Participants will gain hands-on experience implementing machine learning algorithms in both R and Python while exploring real-world examples from population genomics, microbial ecology, metagenomics, ancient DNA, and microbiome research.

The course covers both supervised and unsupervised learning, dimensionality reduction techniques, deep learning, natural language processing approaches for DNA sequences, and challenges associated with high-dimensional biological data. Emphasis is placed on understanding the strengths, limitations, and biological interpretation of machine learning methods, enabling participants to critically evaluate and apply these approaches in their own research.

What You’ll Learn

The theoretical foundations of supervised and unsupervised machine learning
How to implement machine learning algorithms from scratch in R and Python
Applications of dimensionality reduction methods, including PCA, t-SNE, and UMAP, and why the latter may not always be appropriate for population genomics
The origin and interpretation of the horseshoe effect and triangular PCA patterns in ecological and genomic datasets
How convolutional neural networks can be applied to microbiome source tracking
The curse of dimensionality challenge posed by high-dimensional biological data
Key concepts in microbial ecology, metagenomics, and contamination detection
How DNA sequences can be treated as text and analysed using natural language processing (NLP) approaches including bag of words and Word2Vec models
How to use Random Forests and feed-forward artificial neural networks for gene annotation and introgression detection
Applications of machine learning and deep learning to population genomics and ancient DNA research
Practical skills for applying machine learning methods to ecological and evolutionary biology datasets.

Course Format

Flexible Learning Structure

Learn through a carefully structured mix of lecture recordings and guided exercises that you can pause, revisit, and complete at your own pace—ideal for busy professionals or those balancing multiple commitments.

Access Anytime, Anywhere

All course content is available on-demand, making it accessible across all time zones without the need to attend live sessions or adjust your schedule.

Independent Exploration with Support

Engage deeply with course topics through self-directed study, with the option to reach out to instructors via email for clarification or deeper discussion.

Comprehensive Learning Resources

Gain full access to the same high-quality materials provided in live sessions, including code, datasets, and presentation slides—all available to download and keep. Please note recordings can only be streamed.

Work With Your Own Data, On Your Terms

Apply what you learn directly to your own data projects as you go, allowing for a personalized and immediately practical learning experience.

Continued Guidance and Resource Access

Receive 30 days of post-enrolment email support and unrestricted access to all session recordings during that time, so you can review and reinforce your learning as needed.

Who Should Attend / Intended Audiences

This course is intended for ecologists, evolutionary biologists, population geneticists, bioinformaticians, postgraduate students, and early-career researchers interested in applying machine learning to biological data. Participants are expected to have a basic background in R or Python, including running scripts and working with simple datasets.

A foundational understanding of biology and statistics, including concepts such as probability, hypothesis testing, correlation, and linear regression, is recommended. Prior experience with machine learning is not required, as key concepts will be introduced from first principles. Familiarity with genomic, metagenomic, or ecological datasets would be beneficial but is not essential.

Equipment and Software requirements

A laptop or desktop computer with a functioning installation of R / Rstudio and Python / Jupyter, which are free tools and can be installed from https://posit.co/download/rstudio-desktop/ and Jupyter https://jupyter.org/install, resepctively. During the course, the Google Colab, https://colab.research.google.com/, and Posit Cloud, https://posit.cloud, will be used for practical session, which require a Google account.

A working webcam is recommended to support interactive elements of the course. We encourage participants to keep their cameras on during live Zoom sessions to foster a more engaging and collaborative environment.

While not essential, using a large monitor—or ideally a dual-monitor setup—can significantly enhance your learning experience by allowing you to view course materials and work in R simultaneously.

All necessary packages will be introduced and installed during the workshop. A comprehensive list of required packages will also be shared with participants ahead of the course to allow for optional pre-installation.

Download R Download RStudio Download Zoom Download Python

Dr. Nikolay Oskolkov

Nikolay is a bioinformatician, computational biologist, and data scientist working at the intersection of biology, medicine, statistics, and artificial intelligence. His research focuses on applying mathematical statistics, machine learning, and deep learning methods to complex biological and biomedical datasets, including genomics, transcriptomics, microbiome research, single-cell data, metagenomics, and multi-omics integration.

Nikolay has a PhD in theoretical physics from 2007, he transition to the Life Sciences in 2011. He currently leads the Metabolic Research Group (MRG) within the TARGETWISE project at the National Institute of Research and Innovation in Latvia, and having a teaching position at Lund University, Sweden, he has previously held research positions at the Danish Technical University, University of North Carolina, Lund University and the National Bioinformatics Infrastructure Sweden (NBIS/SciLifeLab).

Nikolay has more than 20 years of teaching experience and is widely recognised for his ability to communicate advanced statistical and computational methods to researchers from diverse scientific backgrounds. His expertise spans both frequentist and Bayesian statistics, machine learning, dimensionality reduction, clustering, bioinformatics, and scientific programming in R and Python. He has delivered numerous international workshops, summer schools, and professional training courses in computational biology, genomics, and AI-driven biomedical research.

Education & Career

PhD in Theoretical Physics (2007)
• Transitioned from theoretical physics to bioinformatics and computational biology in 2011
• Group Leader (PI), Metabolic Research Group, TARGETWISE Project, Latvia
• Former researcher and bioinformatician at Lund University and NBIS/SciLifeLab, Sweden
• Author of more than 60 peer-reviewed scientific publications with extensive international collaborations in computational biology and biomedical research

Research Focus

Nikolay’s work centres on extracting biological insight from large-scale, high-dimensional datasets using advanced statistical and machine learning approaches. His research interests include:

Machine learning and deep learning for biomedical and omics data
• Multi-omics integration and systems biology
• Single-cell transcriptomics and dimensionality reduction methods
• Population genomics and evolutionary biology
• Microbiome, environmental DNA, and ancient DNA analysis
• Statistical modelling and Bayesian approaches for complex biological systems
• AI applications in precision medicine and drug discovery

Current Projects

Development of machine learning methods for multi-omics data integration and drug discovery in metabolic diseases
• AI-driven approaches for genomics and computational biology
• Statistical and computational methods for ancient and environmental DNA research
• Machine learning analysis workflows for single-cell and population genomics datasets
• Research on metabolic diseases through integrative bioinformatics and systems biology approaches

Professional Consultancy

Nikolay provides expert consultancy in biological and biomedical data analysis, supporting academic researchers, healthcare scientists, and industry teams. His consultancy expertise includes:

Bioinformatics and computational biology
• Medical genomics and precision medicine
• Single-cell and multi-omics data analysis
• Metagenomics and population genomics
• Frequentist and Bayesian statistical modelling
• Machine learning and deep learning applications
• Scientific programming in R, Python, Bash, and C++
• Study design, data analysis pipelines, and reproducible research workflows

Teaching & Skills

More than 20 years of teaching experience in statistics, machine learning, and computational biology
• Teaches topics including machine learning, deep learning, Bayesian statistics, dimensionality reduction, clustering, single-cell analysis, genomics, and bioinformatics
• Instructor for international courses and workshops through organisations including Instats, Physalia, NBIS SciLifeLab, TARGETWISE, and RaukR
• Strong advocate for rigorous statistical thinking, reproducible research, and accessible scientific education
• Experienced in translating advanced computational methods into practical tools for life scientists and healthcare researchers

Links

Session 1- 02:30:00 – Introduction to Machine Learning

Overview of supervised and unsupervised machine learning, key concepts and terminology.

Break – 01:00:00

Session 2 – 02:30:00 – Coding Machine Learning algorithms

Implementing selected algorithms such as K-means, Markov Chain Mote Carlo (MCMC), artificial neural network (ANN) from scratch in R and Python.

Session 3 – 02:30:00 – High-dimensional population genomics data

The curse of dimensionality, data sparsity, and their impact on machine learning and population genomic inference; PCA, t-SNE, and UMAP; strengths and limitations of each method

Break – 01:00:00

Session 4 – 02:30:00 – Dimensionality reduction for population genomics

Why UMAP may be suboptimal for population genomics applications; the mathematics of PCA; interpretation of principal components; the horseshoe effect and triangular PCA patterns in population genomics and microbial ecology.

Session 5 – 02:30:00 – Microbial ecology and metagenomics

Microbial community analysis, environmental metagenomics, contamination challenges, and data quality considerations.

Break – 01:00:00

Session 6 – 02:30:00 – Deep Learning for microbiome source tracking

Introduction to convolutional neural networks (CNNs); architecture and training; applications to human microbiome source tracking.

Session 7 – 02:30:00 – DNA as text: natural language processing for genomics

Representing DNA sequences as language; k-mers, embeddings, and sequence classification; foundations of NLP for genomics and metagenomics; bag of words and Word2Vec models

Break – 01:00:00

Session 8 – 02:30:00 – Machine Learning for genomic annotation and introgression detection

Implementing Random Forests and feed-forward neural networks for genomic feature detection; applications to gene annotation and identifying introgressed genomic regions.

Session 9 – 02:30:00 – Machine Learning in ancinet DNA research

Characteristics of ancient DNA datasets; challenges associated with degradation and contamination; missing data and low coverage challenges in ancient DNA; feature engineering approaches.

Break – 01:00:00

Session 10 – 02:30:00 – Deep Learning applications in ancient DNA and evolutionary biology

Deep learning approaches for ancient-status inference; current applications, limitations, and future directions in ecology and evolutionary biology.

Testimonials

PR Stats offers a great lineup of courses on statistical and analytical methods that are super relevant for ecologists and biologists. My lab and I have taken several of their courses—like Bayesian mixing models, time series analysis, and machine/deep learning—and we've found them very informative and directly useful for our work. I often recommend PR Stats to my students and colleagues as a great way to brush up on or learn new R-based statistical skills.

Rolando O. Santos

PhD Assistant Professor, Florida International University

Courses attended

SIMM05, IMDL03, ITSA02, GEEE01 and MOVE07

Testimonials

PR Stats provided excellent training in stable isotope analysis through the SIMMPR course, which was incredibly valuable for my research. I was fortunate to attend the course through a generous fee waiver, which directly supported my work and enabled me to develop skills that contributed to my recent publication on reservoir food webs in Sri Lanka. I’m very grateful for the opportunity and support, and would highly recommend their courses to others working in ecological research.

Subodha Silva

Aquatic Ecology Researcher

Courses attended

SIMMPR

Testimonials

PR Stats has become an invaluable part of developing my skills in advanced statistical and spatial analysis. Through training in areas such as Bayesian statistics and Species Distribution Modelling, I’ve gained both practical expertise and exposure to leading experts in the field. The impact on my research has been significant with at least four of my published papers have been directly influenced by PR Stats courses. My most recent work benefitted from modelling advice on sample design and model accuracy evaluation and can be seen here.

Carlos P.E. Bedson

Quantitative Spatial Ecology, Ecology and Environment Research Centre, Manchester Metropolitan University, United Kingdom

Courses attended

ADVR08, ENMR03, BMIN02, ISBD01, BADA01, SDMB06

Frequently asked questions

Everything you need to know about the product and billing.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

Yes — administrator access is recommended, as you may need to install software during the course. If you don’t have admin rights, please contact us before the course begins and we’ll provide a list of software to install manually.

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

Absolutely. You’re welcome to join the live sessions you can and use the recordings for those you miss. We do encourage attending live if possible, as it gives you the chance to ask questions and interact with the instructor. You’re also welcome to send questions by email after the sessions.

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

When will I receive instructions on how to join?

You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.

Do I need administrator rights on my computer?

I’m attending the course live — will I also get access to the session recordings?

Yes. All participants will receive access to the recordings for 30 days after the course ends.

I can’t attend every live session — can I join some sessions live and catch up on others later?

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

We aim to upload recordings on the same day, but occasionally they may be available the following day.

I can’t attend live — how can I ask questions?

You can email the instructor with any questions. For more complex topics, we’re happy to arrange a short Zoom call at a time that works for both of you.

Will I receive a certificate?

Yes. All participants receive a digital certificate of attendance, which includes the course title, number of hours, course dates, and the instructor’s name.

Still have questions?

Can’t find the answer you’re looking for? Please chat to our friendly team.

Get in touch

£400.00

14 September 2026 - 23 September 2026

Delivered remotely (United Kingdom), Western European Time Zone, United Kingdom

Machine Learning for Evolutionary Genomics (MLEG01)

Course Description

What You’ll Learn

Course Format

Flexible Learning Structure

Access Anytime, Anywhere

Independent Exploration with Support

Comprehensive Learning Resources

Work With Your Own Data, On Your Terms

Continued Guidance and Resource Access

Who Should Attend / Intended Audiences

Equipment and Software requirements

Dr. Nikolay Oskolkov

Education & Career

Research Focus

Current Projects

Professional Consultancy

Teaching & Skills

Session 5 – 02:30:00 – Microbial ecology and metagenomics

Session 7 – 02:30:00 – DNA as text: natural language processing for genomics

Testimonials

Testimonials

Testimonials

Frequently asked questions

When will I receive instructions on how to join?

Do I need administrator rights on my computer?

I’m attending the course live — will I also get access to the session recordings?

I can’t attend every live session — can I join some sessions live and catch up on others later?

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

I can’t attend live — how can I ask questions?

Will I receive a certificate?

When will I receive instructions on how to join?

Do I need administrator rights on my computer?

I’m attending the course live — will I also get access to the session recordings?

I can’t attend every live session — can I join some sessions live and catch up on others later?

I’m in a different time zone and plan to follow the course via recordings. When will these be available?

I can’t attend live — how can I ask questions?

Will I receive a certificate?

Still have questions?

Tickets

Success!