instructor: Joe Devietti
when: Monday/Wednesday 12-1:30pm
where: Towne 305
contact: email, canvas
office hours:
Graphics Processing Units (GPUs) have become extremely popular and are used to accelerate an increasingly diverse set of non-graphics workloads. This seminar will examine modern GPU architectures, the programming models used to write general-purpose code for GPUs, and the complexities of programming such highly parallel architectures. There will be a special emphasis on concurrency correctness issues as they relate to GPUs, including GPU memory consistency models and GPU concurrency bugs. Graduate-level coursework in computer architecture (e.g., CIS 5710) will be very helpful.
No textbooks are required; links to all readings will be provided at this website.
There will be no exams.
Submit homework via Canvas.
The class project can be done in groups of up to 3. The project is open-ended: it should be something related to GPUs but the specifics are up to you. Choosing a project that incorporates your interests (research or otherwise) is a great idea! Here are some project ideas:
This schedule is subject to change
Date | Topic | Presenter |
---|---|---|
Wed 30 Aug | Intro | Joe |
Mon 4 Sep | no class - Labor Day | |
Wed 6 Sep | General-Purpose Graphics Processor Architectures (accessible via Penn VPN), Chapters 1 & 2 | Joe |
Mon 11 Sep | ” Sections 3.1 - 3.3 | Joe |
Wed 13 Sep | ” Section 3.4 - 3.6 | Joe |
Mon 18 Sep | ” Chapter 4 | Joe |
Wed 20 Sep | Real-world GPU design | Joe |
Mon 25 Sep | no class - Yom Kippur | |
Wed 27 Sep | CUDA Programming Guide | Joe |
Mon 2 Oct | GEMM and HW1 | Joe |
Wed 4 Oct | CUDA topics, Roofline Model | Joe |
Mon 9 Oct | A Primer on Memory Consistency and Cache Coherence, Chapters 3 (SC) | Joe |
Wed 11 Oct | MCM Primer (Chapter 4, TSO) | Joe |
Mon 16 Oct | MCM Primer (Chapter 5, XC) | Joe |
Wed 18 Oct | Dynamic Warp Formation | Katelyn & Nathan |
Mon 23 Oct | The Dual-Path Execution Model for Efficient GPU Control Flow | Chengjun & Shuhan |
Wed 25 Oct | Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance | Dvisha & Prateek & Siddhant |
Mon 30 Oct | Heterogeneous-Race-Free Memory Models | Paul & Zihao |
Wed 1 Nov | GPU concurrency: Weak Behaviours and Programming Assumptions slides | Dvisha & Katelyn & Ryan |
Mon 6 Nov | A Formal Analysis of the NVIDIA PTX Memory Consistency Model | Chengjun & Harish & Shuhan |
Wed 8 Nov | Cache Coherence for GPU Architectures | Zhiyao & Zhilei |
Mon 13 Nov | Cache-Conscious Wavefront Scheduling | Harish & Nathan & Ryan |
Wed 15 Nov | SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices | Linus & Xitong & Yinda |
Mon 20 Nov | Understanding The Security of Discrete GPUs | Linus & Zhiyao |
Wed 22 Nov | no class - Thanksgiving | |
Mon 27 Nov | GPUfs: integrating a file system with GPUs | Prateek & Siddhant & Zihao |
Wed 29 Nov | GPUnet: Networking Abstractions for GPU Programs | Xitong & Yinda |
Mon 4 Dec | gpucc: An Open-Source GPGPU Compiler | Paul & Zhilei |
Wed 6 Dec | Project Presentations | |
Mon 11 Dec | Project Presentations |