Dates

Time

Registration

August 10-14, 2026

9:30 AM to 4:00 PM, every day

Register

Overview

The Machine Learning in Python summer camp offers an overview of Machine Learning using the Python programming language. Students will work in pairs and small groups on worksheets and Jupyter notebooks, interspersed with brief lectures and instructor-led live-coding segments.

Prerequisites: Participants should already have some familiarity with Python programming fundamentals, e.g. loops, conditional execution, importing modules, and calling functions. Participants should have access to a laptop computer. Anaconda should already be installed.

Learning Objectives

  1. Describe the machine learning pipeline and core concepts
    Explain the stages of the pipeline (data collection, cleaning, training, testing, and validation) and key terminology in plain language.

  2. Differentiate major types of machine learning tasks
    Distinguish between common task types (e.g., classification, clustering) and their appropriate use cases. 

  3. Explain common machine learning algorithms and techniques at a high level
    Provide intuitive, plain-language explanations of algorithms such as k-nearest neighbours, decision trees (including random forests), k-means/DBSCAN, neural networks, and dimensionality reduction techniques. Students will be able to compare these methods by identifying key trade-offs (e.g., interpretability, performance, data requirements).

  4. Apply machine learning methods using Python
    Implement elements of the machine learning pipeline using Python tools and structured datasets in a guided environment. 

  5. Evaluate model performance and reflect on limitations
    Assess model performance (e.g., validation strategies), identify issues such as overfitting and data leakage, and articulate ethical considerations including data sovereignty and inherent limitations of AI. 

Schedule

Day 1:  Landscape and Appropriateness of Machine Learning; k-Nearest Neighbours

This session introduces the broader landscape of machine learning, including when it is (and is not) an appropriate tool. Students explore key terminology and the machine learning pipeline through a hands-on introduction to the k-nearest neighbours algorithm.

By the end of Day 1, students will be able to:

  • Articulate applications, limitations, and ethical considerations of machine learning 

  • Describe the stages of the machine learning pipeline (data collection, cleaning, training, testing, and validation) 

  • Explain in plain language how the k-nearest neighbours algorithm works 

Day 2:  Data Acquisition, Data Sovereignty, and Tree-Based Models

This session focuses on data as the foundation of machine learning, including how data is collected, common challenges, and issues of data sovereignty and ethics. Students are introduced to decision trees and random forests as intuitive, interpretable models.

By the end of Day 2, students will be able to:

  • Describe the data acquisition process and common challenges in collecting real-world data 

  • Explain ethical considerations related to data use, including data sovereignty 

  • Describe how decision trees and random forests work at a high level 

  • Train and interpret a decision tree model in a guided environment 

Day 3:  Unsupervised Learning, Clustering, and Model Validation

Students are introduced to unsupervised learning through clustering techniques, including k-means and DBSCAN. The session also covers model validation concepts and common pitfalls (including how to avoid them) such as data leakage and overfitting.

By the end of Day 3, students will be able to:

  • Differentiate between supervised and unsupervised learning 

  • Explain how k-means and DBSCAN clustering algorithms work at a high level 

  • Apply clustering algorithms in a scaffolded coding environment 

  • Describe model validation strategies and identify risks such as data leakage 

  • Articulate key steps in data cleaning and common issues with real-world datasets 

Day 4:  Introduction to Neural Networks

This session introduces neural networks as a powerful and widely used class of machine learning models. Students build an intuitive understanding of how neural networks function and begin implementing them in practice.

By the end of Day 4, students will be able to:

  • Describe in plain language what a neural network is and how it works 

  • Explain key components of neural networks (e.g., layers, weights, activation functions) at a high level 

  • Follow a guided process to train a basic neural network for a classification task (identifying handwritten digits)

Day 5:  Advanced Neural Networks, Dimensionality Reduction, and Limitations of AI

The final session builds on neural networks and introduces more advanced ideas, including dimensionality reduction with PCA. The course concludes with a critical discussion of the limitations and broader impacts of AI systems. We will also briefly explore recent advances in AI, including generative models, building on our prior learning to develop a clearer understanding of large language models.

By the end of Day 5, students will be able to:

  • Describe more advanced neural network concepts (such as convolutional, recurrent, and feedforward networks) and how training differs from earlier models 

  • Explain the purpose of dimensionality reduction and how PCA works at a high level 

  • Apply PCA in a guided setting to explore and transform data 

  • Articulate key limitations of AI systems and their societal implications