£400Registration Fee
Register Now- Overview
- Instructors
- Schedule
Course Description
Machine learning is rapidly transforming ecology, evolutionary biology, population genomics, and metagenomics by enabling researchers to extract patterns from increasingly large and complex datasets. This course provides a practical introduction to both classical and modern machine learning approaches, with a particular focus on biological applications. Participants will gain hands-on experience implementing machine learning algorithms in both R and Python while exploring real-world examples from population genomics, microbial ecology, metagenomics, ancient DNA, and microbiome research.
The course covers both supervised and unsupervised learning, dimensionality reduction techniques, deep learning, natural language processing approaches for DNA sequences, and challenges associated with high-dimensional biological data. Emphasis is placed on understanding the strengths, limitations, and biological interpretation of machine learning methods, enabling participants to critically evaluate and apply these approaches in their own research.
What You’ll Learn
- The theoretical foundations of supervised and unsupervised machine learning
- How to implement machine learning algorithms from scratch in R and Python
- Applications of dimensionality reduction methods, including PCA, t-SNE, and UMAP, and why the latter may not always be appropriate for population genomics
- The origin and interpretation of the horseshoe effect and triangular PCA patterns in ecological and genomic datasets
- How convolutional neural networks can be applied to microbiome source tracking
- The curse of dimensionality challenge posed by high-dimensional biological data
- Key concepts in microbial ecology, metagenomics, and contamination detection
- How DNA sequences can be treated as text and analysed using natural language processing (NLP) approaches including bag of words and Word2Vec models
- How to use Random Forests and feed-forward artificial neural networks for gene annotation and introgression detection
- Applications of machine learning and deep learning to population genomics and ancient DNA research
- Practical skills for applying machine learning methods to ecological and evolutionary biology datasets.
.
Course Format
Flexible Learning Structure
Learn through a carefully structured mix of lecture recordings and guided exercises that you can pause, revisit, and complete at your own pace—ideal for busy professionals or those balancing multiple commitments.
Access Anytime, Anywhere
All course content is available on-demand, making it accessible across all time zones without the need to attend live sessions or adjust your schedule.
Independent Exploration with Support
Engage deeply with course topics through self-directed study, with the option to reach out to instructors via email for clarification or deeper discussion.
Comprehensive Learning Resources
Gain full access to the same high-quality materials provided in live sessions, including code, datasets, and presentation slides—all available to download and keep. Please note recordings can only be streamed.
Work With Your Own Data, On Your Terms
Apply what you learn directly to your own data projects as you go, allowing for a personalized and immediately practical learning experience.
Continued Guidance and Resource Access
Receive 30 days of post-enrolment email support and unrestricted access to all session recordings during that time, so you can review and reinforce your learning as needed.
Who Should Attend / Intended Audiences
This course is intended for ecologists, evolutionary biologists, population geneticists, bioinformaticians, postgraduate students, and early-career researchers interested in applying machine learning to biological data. Participants are expected to have a basic background in R or Python, including running scripts and working with simple datasets.
A foundational understanding of biology and statistics, including concepts such as probability, hypothesis testing, correlation, and linear regression, is recommended. Prior experience with machine learning is not required, as key concepts will be introduced from first principles. Familiarity with genomic, metagenomic, or ecological datasets would be beneficial but is not essential.
Equipment and Software requirements
A laptop or desktop computer with a functioning installation of R / Rstudio and Python / Jupyter, which are free tools and can be installed from https://posit.co/download/rstudio-desktop/ and Jupyter https://jupyter.org/install, resepctively. During the course, the Google Colab, https://colab.research.google.com/, and Posit Cloud, https://posit.cloud, will be used for practical session, which require a Google account.
A working webcam is recommended to support interactive elements of the course. We encourage participants to keep their cameras on during live Zoom sessions to foster a more engaging and collaborative environment.
While not essential, using a large monitor—or ideally a dual-monitor setup—can significantly enhance your learning experience by allowing you to view course materials and work in R simultaneously.
All necessary packages will be introduced and installed during the workshop. A comprehensive list of required packages will also be shared with participants ahead of the course to allow for optional pre-installation.
Dr. Nikolay Oskolkov
Nikolay is a bioinformatician, computational biologist, and data scientist working at the intersection of biology, medicine, statistics, and artificial intelligence. His research focuses on applying mathematical statistics, machine learning, and deep learning methods to complex biological and biomedical datasets, including genomics, transcriptomics, microbiome research, single-cell data, metagenomics, and multi-omics integration.
Nikolay has a PhD in theoretical physics from 2007, he transition to the Life Sciences in 2011. He currently leads the Metabolic Research Group (MRG) within the TARGETWISE project at the National Institute of Research and Innovation in Latvia, and having a teaching position at Lund University, Sweden, he has previously held research positions at the Danish Technical University, University of North Carolina, Lund University and the National Bioinformatics Infrastructure Sweden (NBIS/SciLifeLab).
Nikolay has more than 20 years of teaching experience and is widely recognised for his ability to communicate advanced statistical and computational methods to researchers from diverse scientific backgrounds. His expertise spans both frequentist and Bayesian statistics, machine learning, dimensionality reduction, clustering, bioinformatics, and scientific programming in R and Python. He has delivered numerous international workshops, summer schools, and professional training courses in computational biology, genomics, and AI-driven biomedical research.
Education & Career
- PhD in Theoretical Physics (2007)
• Transitioned from theoretical physics to bioinformatics and computational biology in 2011
• Group Leader (PI), Metabolic Research Group, TARGETWISE Project, Latvia
• Former researcher and bioinformatician at Lund University and NBIS/SciLifeLab, Sweden
• Author of more than 60 peer-reviewed scientific publications with extensive international collaborations in computational biology and biomedical research
Research Focus
Nikolay’s work centres on extracting biological insight from large-scale, high-dimensional datasets using advanced statistical and machine learning approaches. His research interests include:
- Machine learning and deep learning for biomedical and omics data
• Multi-omics integration and systems biology
• Single-cell transcriptomics and dimensionality reduction methods
• Population genomics and evolutionary biology
• Microbiome, environmental DNA, and ancient DNA analysis
• Statistical modelling and Bayesian approaches for complex biological systems
• AI applications in precision medicine and drug discovery
Current Projects
- Development of machine learning methods for multi-omics data integration and drug discovery in metabolic diseases
• AI-driven approaches for genomics and computational biology
• Statistical and computational methods for ancient and environmental DNA research
• Machine learning analysis workflows for single-cell and population genomics datasets
• Research on metabolic diseases through integrative bioinformatics and systems biology approaches
Professional Consultancy
Nikolay provides expert consultancy in biological and biomedical data analysis, supporting academic researchers, healthcare scientists, and industry teams. His consultancy expertise includes:
- Bioinformatics and computational biology
• Medical genomics and precision medicine
• Single-cell and multi-omics data analysis
• Metagenomics and population genomics
• Frequentist and Bayesian statistical modelling
• Machine learning and deep learning applications
• Scientific programming in R, Python, Bash, and C++
• Study design, data analysis pipelines, and reproducible research workflows
Teaching & Skills
- More than 20 years of teaching experience in statistics, machine learning, and computational biology
• Teaches topics including machine learning, deep learning, Bayesian statistics, dimensionality reduction, clustering, single-cell analysis, genomics, and bioinformatics
• Instructor for international courses and workshops through organisations including Instats, Physalia, NBIS SciLifeLab, TARGETWISE, and RaukR
• Strong advocate for rigorous statistical thinking, reproducible research, and accessible scientific education
• Experienced in translating advanced computational methods into practical tools for life scientists and healthcare researchers
Links
Session 1- 02:30:00 – Introduction to Machine Learning
Overview of supervised and unsupervised machine learning, key concepts and terminology.
Break – 01:00:00
Session 2 – 02:30:00 – Coding Machine Learning algorithms
Implementing selected algorithms such as K-means, Markov Chain Mote Carlo (MCMC), artificial neural network (ANN) from scratch in R and Python.
Session 3 – 02:30:00 – High-dimensional population genomics data
The curse of dimensionality, data sparsity, and their impact on machine learning and population genomic inference; PCA, t-SNE, and UMAP; strengths and limitations of each method
Break – 01:00:00
Session 4 – 02:30:00 – Dimensionality reduction for population genomics
Why UMAP may be suboptimal for population genomics applications; the mathematics of PCA; interpretation of principal components; the horseshoe effect and triangular PCA patterns in population genomics and microbial ecology.
Session 5 – 02:30:00 – Microbial ecology and metagenomics
Microbial community analysis, environmental metagenomics, contamination challenges, and data quality considerations.
Break – 01:00:00
Session 6 – 02:30:00 – Deep Learning for microbiome source tracking
Introduction to convolutional neural networks (CNNs); architecture and training; applications to human microbiome source tracking.
Session 7 – 02:30:00 – DNA as text: natural language processing for genomics
Representing DNA sequences as language; k-mers, embeddings, and sequence classification; foundations of NLP for genomics and metagenomics; bag of words and Word2Vec models
Break – 01:00:00
Session 8 – 02:30:00 – Machine Learning for genomic annotation and introgression detection
Implementing Random Forests and feed-forward neural networks for genomic feature detection; applications to gene annotation and identifying introgressed genomic regions.
Session 9 – 02:30:00 – Machine Learning in ancinet DNA research
Characteristics of ancient DNA datasets; challenges associated with degradation and contamination; missing data and low coverage challenges in ancient DNA; feature engineering approaches.
Break – 01:00:00
Session 10 – 02:30:00 – Deep Learning applications in ancient DNA and evolutionary biology
Deep learning approaches for ancient-status inference; current applications, limitations, and future directions in ecology and evolutionary biology.
Frequently asked questions
Everything you need to know about the product and billing.
When will I receive instructions on how to join?
You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.
Do I need administrator rights on my computer?
I’m attending the course live — will I also get access to the session recordings?
I can’t attend every live session — can I join some sessions live and catch up on others later?
I’m in a different time zone and plan to follow the course via recordings. When will these be available?
I can’t attend live — how can I ask questions?
Will I receive a certificate?
When will I receive instructions on how to join?
You’ll receive an email on the Friday before the course begins, with full instructions on how to join via Zoom. Please ensure you have Zoom installed in advance.
Do I need administrator rights on my computer?
I’m attending the course live — will I also get access to the session recordings?
I can’t attend every live session — can I join some sessions live and catch up on others later?
I’m in a different time zone and plan to follow the course via recordings. When will these be available?
I can’t attend live — how can I ask questions?
Will I receive a certificate?
Still have questions?
Can’t find the answer you’re looking for? Please chat to our friendly team.








