CIS 5050: Software Systems (Spring 2025)
Overview

Image of a router
This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 5050 is one of the core courses in the MSE program, as well as an option for the WPE-I requirement for PhD students.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: Thursdays 12:00-1:00pm (Levine 576)

When and where:
Tuesdays/Thursdays 10:15-11:45am, LRSM AUD

Teaching assistants and office hours:

Xinran Liu (Lead TA) OH: Mondays + Wednesdays 10:00am-12:00pm (Levine 612)
Erik Wei OH: Mon/Wed 12-1pm (Levine 612) + Fridays 3:30-4:30pm (Africk Lab AGH 104)
Lang Qin OH: Mondays 1:30-3:30pm + Wednesdays 1:30-3:30pm (Online)
Abby Eisenklam OH: Mondays 4:30-6:30pm
Benjamin Le OH: Mondays 8:00-9:00pm + Fridays 9:00-10:00am (Online)
Michael Yao OH: Tuesdays 9:00-10:00am + Thursdays 2:00-3:00pm (Levine 612)
Sriram Josyula OH: Tuesdays 12:00-1:30pm + Thursdays 12:00-1:30pm (Levine 5th floor bump space)
Rohan Moniz OH: Tuesdays 12:30-1:30pm
Emma Jin OH: Tuesdays 2:00-3:30pm + Thursdays 8:30am-10:00am
Joseph Cho OH: Tuesdays 4:00-5:30pm (Levine 612) + Sundays 6:00-8:00pm (Online)
Harshwardhan Yadav OH: Tuesdays 7-9pm (Levine 601 bump space) + Wednesdays 2:00-4:00pm (Levine 612)
Kunli Zhang OH: Wednesdays 12:30-2:00pm
Sahil Parekh OH: Wednesdays + Fridays 2:00-4:00pm (Levine 601 bump space)
Charis Gao OH: Wednesdays 3:30-5:30pm (Levine 501 bump space)
Ashwin Alaparthi OH: Wednesdays 7:00-9:00pm (Levine 601 bump space)
Amay Tripathi OH: Thursdays 3:30-5:00pm (Levine 501 bump space)
Samarth Chandrawat OH: Thursdays 5:00-7:00pm (Online)
Tianrui Xia OH: Fridays 9:00-11:00am
Joseph Zhang OH: Saturdays 9:00am-10:00am (Online)
Austin Yao OH: Sundays 11:00am-1:00pm (Online)

Note: Online office hours will be conducted via OHQ. Office hours will be conducted via OHQ until a location is assigned.

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 4th edition (by M. van Steen and A. Tanenbaum). You can get a digital version of this book for free; hardcopies will be available, e.g., from Amazon soon. Additional material will be drawn from selected research publications.

Prerequisites:
The course requires undergraduate-level operating systems and networking knowledge, such as CIS 4480 (formerly CIS 3800) and NETS 2120 (or the equivalence). You must also be proficient in C or C++ programming.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms. Both the programming assignments and the project involve a considerable amount of programming in C/C++, and the project requires the ability to work with your classmates in teams.

Grading:
Your letter grade will be based on the individual programming assignments (35%), the group project (30%), the midterm exams (30%), and participation (5%).

Attendance and other policies:
Class attendance is mandatory and will count towards your participation score. More details on attendance and key course policies can be found here.


Resources

We will be using Ed Discussion for all course-related discussions.

Homework assignments and project are available for download from the assignments page. You can submit your solutions online via GradeScope.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

Tentative schedule

Date Topic Details Reading Remarks
Jan 16 Introduction Course overview
Policies
Chapter 1 HW0 released
Jan 21 Processes and threads Basic concepts
The UNIX model
Implementation in the kernel
Chapter 3.1 (Sections 1+2)
HW1 released
Jan 23 System calls System calls
The file API
Kernel entry/exit
  HW0 due
Jan 28 Concurrency control Synchronization primitives
Race conditions, critical sections
Deadlock and starvation
   
Jan 30 Synchronization Semaphores
Classical synchronization problems
Monitors and condition variables
[Hoare monitors]
[Mesa monitors]
 
Feb 4 Communication Sockets
Socket programming
Handling multiple connections
Chapters 4.1+4.3 HW1 due;
HW2 released
Feb 6+11 Remote Procedure Calls Programming model
Stub code; marshalling; binding
Handling failures
Chapters 4.2+8.3 HW2MS1 due (2/11)
Feb 13 Naming Kinds of names; name spaces
The Domain Name System;
Akamai; DNSSEC
Chapter 6  
Feb 18+20 Clock synchronization Logical clocks
NTP and Berkeley algorithms
Lamport and vector clocks
Chapters 5.1+5.2  
Feb 24 Last day to drop HW2MS2+3 due
Feb 25+27 Group communication Reliable multicast
IP multicast
FIFO, causal and total ordering
Chapter 8.4 HW3 released
Mar 4 Replication Primary/backup protocols
Quorum protocols
Sequential and causal consistency
Client-centric models
Chapter 7
Project released
Mar 6 First midterm exam
Mar 8-16 Spring break
Mar 18 Bigtable and Project Bigtable case study
Project overview
[Bigtable]  
Mar 20 Fault tolerance 2PC and 3PC
Logging and recovery
Chandy-Lamport algorithm
Chapters 8.5+8.6; HW3 due
Mar 25+27 State-machine replication Failure models
The Consensus problem
Paxos
Chapters 8.1+8.2; [Paxos]  
Mar 31 Last day to withdraw
Apr 1+3 Non-crash Fault Tolerance The Byzantine Generals problem
Impossibility results
Solutions
[BFT]  
Apr 8 Distributed file systems NFS
Coda
Disconnected operation
Chapter 2.3.3; [Coda]  
Apr 10 Google File System Google cluster architecture
Reading and writing in GFS
Consistency and fault tolerance
[Cluster] [GFS]  
Apr 15 MapReduce MapReduce programming model
System architecture
[MapReduce]  
Apr 17 Spark Differences to MapReduce
RDDs
Case study: PageRank
[RDD] [Spark]  
Apr 22+24 DHTs and Dynamo Distributed hash tables
The CAP dilemma
Amazon Dynamo
[Dynamo]  
Apr 29 Second midterm exam
May 1-2 Reading days
May 5-9 Project demos and reports (exact time is TBD)
Web site contact: Linh Thi Xuan Phan