CIS 5050: Software Systems (Fall 2024)
Overview
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.
We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.
CIS 5050 is one of the core courses in the MSE program, as well as an option for the WPE-I requirement for PhD students.
Logistics
Instructor:
Linh Thi Xuan Phan
Office hours: Mondays 12:00-1:00pm (Levine 576)
When and where: Mondays/Wednesdays 10:15-11:45am,
DRL A8.
Teaching assistants and office hours:
Note: Online office hours will be conducted via OHQ.
Course policies
Course textbook:
Distributed Systems: Principles and Paradigms, 4th edition (by M. van Steen and A. Tanenbaum).
You can get a digital version of this book for free; hardcopies will be available, e.g., from Amazon soon.
Additional material will be drawn from selected research publications.
Prerequisites:
The course requires undergraduate-level operating systems and networking knowledge, such as CIS 3800 and NETS 2120 (or the equivalence). You must also be proficient in C or C++ programming.
Workload:
The course will involve three substantial programming assignments, a group project, and two midterms. Both the programming assignments and the project involve a considerable amount of programming in C/C++, and the project requires the ability to work with your classmates in teams.
Grading:
Your letter grade will be based on the individual programming assignments (35%), the group project (30%), the
midterm exams (30%), and participation (5%).
Attendance and other policies:
Class attendance is mandatory and will count towards your participation score. More details on attendance and key course policies can be found here.
Resources
We will be using Ed Discussion for all course-related discussions.
Homework assignments and project are available for download from the assignments page. You can submit your solutions online via GradeScope.
Special sessions
The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.
Tentative schedule
Date |
Topic |
Details |
Reading |
Remarks |
Aug 28 |
Introduction
[pdf]
|
Course overview Policies |
Chapter 1 |
HW0 released |
Sep 2 |
Labor Day (no class) |
Sep 4 |
Processes and threads
[pdf]
[video]
|
Basic concepts The UNIX model Implementation in the kernel |
Chapter 3.1 (Sections 1+2) |
HW0 due; HW1 released |
Sep 9 |
System calls
[pdf]
[video 1]
[video 2]
|
System calls The file API Kernel entry/exit |
|
|
Sep 11 |
Concurrency control
[pdf]
[video 1]
[video 2]
|
Synchronization primitives Race conditions, critical sections Deadlock and starvation |
|
|
Sep 16 |
Synchronization
[pdf]
[video]
| Semaphores Classical synchronization problems Monitors and condition variables |
[Hoare monitors] [Mesa monitors] |
|
Sep 18 |
Communication
[pdf]
[video]
|
Sockets Socket programming Handling multiple connections |
Chapters 4.1+4.3 |
HW1 due; HW2 released |
Sep 23+25 |
Remote Procedure Calls
[pdf]
[video 1]
[video 2]
|
Programming model Stub code; marshalling; binding Handling failures |
Chapters 4.2+8.3 |
HW2MS1 due (9/27) |
Sep 30 |
Naming
[pdf]
[video 1]
[video 2]
|
Kinds of names; name spaces The Domain Name System; Akamai; DNSSEC |
Chapter 6 |
|
Oct 2 |
Clock synchronization
[pdf]
[video 1]
[video 2]
[video 3]
|
Logical clocks NTP and Berkeley algorithms Lamport and vector clocks |
Chapters 5.1+5.2 |
|
Oct 3-6 |
Fall break |
Oct 7 |
Clock synchronization (cont)
Note: See slides on Oct 2.
|
Logical clocks NTP and Berkeley algorithms Lamport and vector clocks |
Chapters 5.1+5.2 |
HW2MS2+3 due (on 10/9) |
Oct 7 |
Last day to drop |
|
Oct 9+14 |
Group communication
[pdf]
[video 1]
[video 2]
|
Reliable multicast IP multicast FIFO, causal and total ordering |
Chapter 8.4 |
|
Oct 16 |
First midterm exam |
Oct 21 |
Replication
[pdf]
[video]
|
Primary/backup protocols Quorum protocols Sequential and causal consistency Client-centric models |
Chapter 7 |
HW3 released (10/17); Project released |
Oct 23 |
Bigtable and Project
[pdf]
[video]
|
Bigtable case study Project overview |
[Bigtable] |
|
Oct 28 |
Fault tolerance
[pdf]
|
2PC and 3PC Logging and recovery Chandy-Lamport algorithm |
Chapters 8.5+8.6; |
|
Oct 30 |
State-machine replication
|
Failure models The Consensus problem Paxos |
Chapters 8.1+8.2; [Paxos] |
Project proposal due (10/30); HW3 due (11/01) |
Nov 4 |
Last day to withdraw |
Nov 4+6 |
Non-crash Fault Tolerance
|
The Byzantine Generals problem Impossibility results Solutions |
[BFT] |
|
Nov 11 |
Distributed coordination
|
Distributed mutual exclusion Leader election Bully algorithm; token ring |
Chapter 5.3+5.4 |
|
Nov 13 |
Distributed file systems
|
NFS Coda Disconnected operation |
Chapter 2.3.3; [Coda] |
|
Nov 18 |
Google File System
|
Google cluster architecture Reading and writing in GFS Consistency and fault tolerance |
[Cluster] [GFS] |
|
Nov 20 |
MapReduce
|
MapReduce programming model System architecture |
[MapReduce] |
|
Nov 25 |
Spark
|
Differences to MapReduce RDDs Case study: PageRank |
[RDD] [Spark] |
|
Nov 27 |
No class (Friday schedule) |
Nov 28- Dec 1 |
Thanksgiving Break |
Dec 2+4 |
DHTs and Dynamo
|
Distributed hash tables The CAP dilemma Amazon Dynamo |
[Dynamo] |
|
Dec 9 |
Second midterm exam |
Dec 10-11 |
Reading days |
Dec 12-16 |
Project demos and reports |
|