instructor: Joe Devietti
when: Monday/Wednesday 12-1:30pm
where: Towne 305
contact: email, canvas
office hours:
Graphics Processing Units (GPUs) have become extremely popular and are used to accelerate an increasingly diverse set of non-graphics workloads. This seminar will examine modern GPU architectures, the programming models used to write general-purpose code for GPUs, and the complexities of programming such highly parallel architectures. There will be a special emphasis on concurrency correctness issues as they relate to GPUs, including GPU memory consistency models and GPU concurrency bugs. Graduate-level coursework in computer architecture (e.g., CIS 5710) will be very helpful.
No textbooks are required; links to all readings will be provided at this website.
There will be no exams.
Submit homework via Canvas.
The class project can be done in groups of up to 3. The project is open-ended: it should be something related to GPUs but the specifics are up to you. Choosing a project that incorporates your interests (research or otherwise) is a great idea! Here are some project ideas:
This schedule is subject to change
Date | Topic | Presenter |
---|---|---|
Wed 28 Aug | Intro | Joe |
Mon 2 Sep | no class - Labor Day | |
Wed 4 Sep | no class - Joe traveling | |
Mon 9 Sep | General-Purpose Graphics Processor Architectures (accessible via Penn VPN, also on “Files” section of Canvas), Chapters 1 & 2 | Joe |
Wed 11 Sep | ” Sections 3.1 - 3.3 | Joe |
Mon 16 Sep | ” Section 3.4 - 3.6 | Joe |
Wed 18 Sep | ” Chapter 4 | Joe |
Mon 23 Sep | Contemporary GPUs | Joe |
Wed 25 Sep | CUDA Programming Guide | Joe |
Mon 30 Sep | GEMM and HW1 | Joe |
Wed 2 Oct | CUDA topics, Roofline Model | Joe |
Mon 7 Oct | A Primer on Memory Consistency and Cache Coherence, Chapters 3 (SC) | Joe |
Wed 9 Oct | MCM Primer (Chapter 4, TSO) | Joe |
Mon 14 Oct | MCM Primer (Chapter 5, XC) | Joe |
Wed 16 Oct | GPU concurrency: Weak Behaviours and Programming Assumptions slides | Akash & Arnav |
Mon 21 Oct | A Formal Analysis of the NVIDIA PTX Memory Consistency Model | Joe |
Wed 23 Oct | Cache Coherence for GPU Architectures | Pratyush & Arnav |
Mon 28 Oct | Dynamic Warp Formation | Crystal & Akash |
Wed 30 Oct | The Dual-Path Execution Model for Efficient GPU Control Flow | John & Crystal |
Mon 4 Nov | Cache-Conscious Wavefront Scheduling | Tarunyaa & Rui |
Wed 6 Nov | SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices | Ian & Paul & Runlong |
Mon 11 Nov | Understanding The Security of Discrete GPUs | Tarunyaa & Runlong & Sal |
Wed 13 Nov | GPU Memory Exploitation for Fun and Profit | Pratyush & Ian |
Mon 18 Nov | GPUfs: integrating a file system with GPUs | Rui & Andy |
Wed 20 Nov | GPUnet: Networking Abstractions for GPU Programs | Kidus & Paul & Robbie |
Mon 25 Nov | gpucc: An Open-Source GPGPU Compiler | Vikram & John |
Wed 27 Nov | no class - Thanksgiving | |
Mon 2 Dec | Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling | Vikram & Robbie & Sal |
Wed 4 Dec | In-Datacenter Performance Analysis of a Tensor Processing Unit | Andy & Kidus |
Mon 9 Dec | Project Presentations |