Do you trust your model? Despite their widespread adoption and impressive performance, modern machine learning models have a crucial flaw: it is extremely difficult to discern when and how models fail. This pitfall has given rise to a field of research known as trustworthy machine learning, in order to make these systems safe, responsible, and understandable.
This course will explore the tools and methods for analyzing the machine learning pipeline and assessing their trustworthiness (or lack thereof), from the datasets, models, and predictions perspective. A tentative schedule of these topics can be found at the bottom of this page.
Instructor: Eric Wong (exwong@cis)
Class: Tues 1:45-3:15pm Eastern, DRLB 4C6 / Thurs 1:45-3:15pm Eastern, CHEM 514
Website: https://www.cis.upenn.edu/~exwong/debugml/
Ed discussion: Self sign-up link
Mask policy: Masks are required.
Students from all majors and degree levels are welcome. There are no specific course requirements, but a background in machine learning at an introductory course level is expected, as well as basic programming experience for the course project.
Grading will be based off of 80% course project (15% proposal + 20% progress report + 25% final report + 20% presentation) and 20% participation (5% readings + 15% discussion). There will be no homeworks or exams.
This class will combine lectures and discussions. The lectures will typically cover the core groundwork, followed by a student-led in-depth discussion based on assigned readings. Readings and lecture materials will be posted on the schedule.
Project
As part of this course, students will inspect and debug machine learning problems for deficiencies in settings of their choice. All parts of the pipeline are fair game, including data collection, training algorithms, models and architectures, the resulting predictions, and even the debugging tools themselves. This can take the form of an audit (identifying the shortcomings of a fixed pipeline) or a patch/update (changing the pipeline to fix a problem). Example projects at various stages in the pipeline include the following:
- Datasets:
- Are there biases, spurious correlations, or underrepresented subpopulations? For example, does US census data have any blind spots or misleading correlations?
- Where do these problems stem from, and how does this impact downstream predictions?
- Can we fix the data or collection procedure to mitigate these issues?
- Methods and architectures:
- Do ML algorithms (i.e. fairness / privacy / adversarial robustness / security) for fixing models via training actually achieve their goal?
- Can you pinpoint or characterize the failures of modern architectures (such as large language models)?
- Can you construct counterexamples / subpopulations that exemplify the failure modes of these models and algorithms, or guarantee that such failure modes don’t exist?
- Interpretability and predictions:
- How faithful are explainability methods to the actual model predictions?
- Are the type of explanations we can generate aligned with what practitioners need?
- For example, do analysis tools for diagnosing health conditions tell doctors useful and meaningful information?
Tentative schedule and topics
The schedule and topics can change based on students’ interests and as time permits. If you don’t see something you’d like to learn about, send me an email.
Date |
Topic |
Notes |
August 30 |
Overview |
Slides Lecture notes Notebook Supplementary reading - Problems in health care
|
|
|
|
Failure modes |
September 1 |
Bias |
Types of Bias Slides Lecture notes Notebook The trouble with Bias - NeurIPS 2017 Keynote by Kate Crawford Supplementary reading - Suresh & Guttag, 2019
|
September 6 |
Bias |
Assigned reading - Bolukbasi et al. 2016 Supplementary reading - Arteaga et al. 2019
|
September 8 |
Out of distribution |
Covariate, label & concept shifts Slides Lecture notes Assigned reading - Rabanser et al. 2019
|
September 13 |
Out of distribution |
Measuring distribution shift Assigned reading - Riegar et al. 2019
|
September 15 |
Adversarial |
Adversarial attacks Slides Lecture notes Assigned reading - Beery et al. 2018
|
September 20 |
No class |
|
September 22 |
Adversarial |
Data poisoning, backdoors, Byzantine faults Assigned reading - Li et al. 2020 Supplementary reading - Rice et al. 2021 Supplementary reading - Robey et al. 2022
|
September 27 |
Adversarial |
Model stealing & membership inference Assigned reading - Nguyen et al. 2014 Assigned reading - Sinha et al. 2017 Supplementary reading - Tramer et al. 2016 Supplementary reading - Jagielski et al. 2020
|
September 29 |
Explainability |
Data visualization, feature visualization, & interpretable models Slides Lecture notes Assigned reading - Javanmard et al. 2020 Assigned reading - Shamir et al. 2021 Assigned reading - Rudin 2019
|
|
|
|
Debugging models |
October 4 |
Explainability |
Local & global explanations Project proposal due Proposal guidelines Assigned reading - Wei et al. 2022 Assigned reading - Hassani et al. 2022 Assigned reading - Dombrowski et al. 2019 Supplementary reading - Woods et al. 2019
|
October 6 |
Fall term break |
|
October 11 |
Explainability |
Example-based & model visualizations Assigned reading - Nguyen et al. 2016 Assigned reading - Slack et al. 2020 Assigned reading - Ye & Durrett 2022 Supplementary reading - Jeanneret et al. 2022
|
October 13 |
Verification |
Complete & incomplete Lecture notes Assigned reading - Reddi et al. 2014 Assigned reading - Elazar et al. 2020 Assigned reading - Jacovi et al. 2021
|
October 18 |
Verification |
Specifications and properties Assigned reading - Swayamdipta et al. 2020 Assigned reading - Ruan et al. 2022 Assigned reading - Zhang et al. 2021 Supplementary reading - Gowel et al. 2019
|
October 20 |
Scientific discovery |
Finding correlations Slides Lecture notes Assigned reading - Liu et al. 2018 Assigned reading - Nori et al. 2019 Assigned reading - Kleinberg et al. 2018
|
October 25 |
Scientific discovery |
Influence functions & data models Assigned reading - Singla et al. 2021 Assigned reading - Xiao et al. 2020 Li et al. 2015 Supplementary reading - Yang et al. 2022
|
|
|
|
ML repair |
October 27 |
Robust learning |
Robust training & overfitting, provable defenses (bound propagation & smoothing) Lecture notes Assigned reading - Chhabra et al. 2022 Assigned reading - Guo et al. 2022 Assigned reading - Ye & Durrett 2022 Supplementary reading - Carlini et al. 2022 Supplementary reading - Salman et al. 2021
|
November 1 |
Robust learning |
Distributional robustness (Domain generalization, Group DRO, IRM, JTT) Assigned reading - Liang & Zou 2022 Assigned reading - Carter et al. 2020 Supplementary reading - Sagawa et al. 2019
|
November 3 |
Data interventions |
Data balancing, source selection, pruning hard examples Checkpoint due Checkpoint guidelines Assigned reading - Recht et al. 2019 Supplementary reading - Idrissi et al. 2021 Assigned reading - Mariani et al. 2018 Assigned reading - Ribeiro et al. 2022 Supplementary reading - Mariani et al. 2018
|
November 8 |
Election day |
Reading group only Assigned reading - Muller et al. 2021 Assigned reading - Lipton et al. 2018 Assigned reading - Baek et al. 2022
|
November 10 |
Data interventions |
Data augmentations (classical, subgroups & generative) Slides Lecture notes Assigned reading - Meng et al. 2022 Assigned reading - Schwartz et al. 2022 Supplementary reading - Sorscher et al. 2022
|
November 15 |
Model adjustments |
Model editing and fine-tuning Assigned reading - Muller et al. 2021 Assigned reading - Jiang et al. 2021 Supplementary reading - Muller et al. 2021
|
November 17 |
Model adjustments |
Model patching & repair Supplementary reading - Barratt et al. 2020 Supplementary reading - Sotoudeh et al. 2021 Supplementary reading - Liu et al. 2022
|
November 22 |
Ethics & Implications |
|
November 24 |
Thanksgiving |
|
November 29 |
NeurIPS |
|
December 1 |
NeurIPS |
|
December 6 |
Presentations |
Presentation guidelines |
December 8 |
Presentations |
|
December 13 |
Reading period |
|
December 15 |
Final examinations |
|
December 22 |
Term ends |
Final report due Final report guidelines
|
There is no official textbook for this course, but you may find the following references to be useful: