CIS 5050: Software Systems (Fall 2025)

Overview

This course provides an introduction to fundamental concepts of distributed systems, and the design principles for building large-scale computational systems.

We will study some of the key building blocks – such as synchronization primitives, group communication protocols, and replication techniques – that form the foundation of modern distributed systems, such as cloud-computing platforms or the Internet. We will also look at some real-world examples of distributed systems, such as GFS, MapReduce, Spark, and Dynamo, and we will gain some hands-on experience with building and running distributed systems.

CIS 5050 is one of the core courses in the MSE program, as well as an option for the WPE-I requirement for PhD students.

Logistics

Instructor:
Linh Thi Xuan Phan
Office hours: TBA (Levine 576)

When and where:
Mondays/Wednesday 10:15-11:45am, Towne 100

Teaching assistants and office hours: TBD

Course policies

Course textbook:
Distributed Systems: Principles and Paradigms, 4th edition (by M. van Steen and A. Tanenbaum). You can get a digital version of this book for free; hardcopies of the previous version of the book are available, e.g., from Amazon. Additional material will be drawn from selected research publications.

Prerequisites:
The course requires undergraduate-level operating systems and networking knowledge, such as CIS 4480 (formerly CIS 3800) and NETS 2120 (or CIS 5530) or the equivalence. You must also be proficient in C or C++ programming.

Workload:
The course will involve three substantial programming assignments, a group project, and two midterms. Both the programming assignments and the project involve a considerable amount of programming in C/C++, and the project requires the ability to work with your classmates in teams.

Grading:
Your letter grade will be based on the individual programming assignments (35%), the group project (30%), the midterm exams (30%), and participation (5%).

Attendance and other policies:
Class attendance is mandatory and will count towards your participation score. More details on attendance and key course policies can be found here.

Resources

We will be using Ed Discussion for all course-related discussions.

Homework assignments and project are available for download from the assignments page. You can submit your solutions online via GradeScope.

Special sessions

The goal of the special sessions is to provide you with tools and resources that might be useful for the assignments and project. See the special sessions page for more details.

Tentative schedule

Topic	Details	Reading
Introduction	Course overview Policies	Chapter 1
Processes and threads	Basic concepts The UNIX model Implementation in the kernel	Chapter 3.1 (Sections 1+2)
System calls	System calls The file API Kernel entry/exit
Concurrency control	Synchronization primitives Race conditions, critical sections Deadlock and starvation
Synchronization	Semaphores Classical synchronization problems Monitors and condition variables	[Hoare monitors] [Mesa monitors]
Communication	Sockets Socket programming Handling multiple connections	Chapters 4.1+4.3
Remote Procedure Calls	Programming model Stub code; marshalling; binding Handling failures	Chapters 4.2+8.3
Naming	Kinds of names; name spaces The Domain Name System; Akamai; DNSSEC	Chapter 6
Clock synchronization	Logical clocks NTP and Berkeley algorithms Lamport and vector clocks	Chapters 5.1+5.2
Replication	Primary/backup protocols Quorum protocols Sequential and causal consistency Client-centric models	Chapter 7
First midterm exam
Group communication	Reliable multicast IP multicast FIFO, causal and total ordering	Chapter 8.4
Bigtable and Project	Bigtable case study Project overview	[Bigtable]
Fault tolerance	2PC and 3PC Logging and recovery Chandy-Lamport algorithm	Chapters 8.5+8.6;
State-machine replication	Failure models The Consensus problem Paxos	Chapters 8.1+8.2; [Paxos]
Non-crash Fault Tolerance	The Byzantine Generals problem Impossibility results Solutions	[BFT]
Distributed file systems	NFS Coda Disconnected operation	Chapter 2.3.3; [Coda]
Google File System	Google cluster architecture Reading and writing in GFS Consistency and fault tolerance	[Cluster] [GFS]
MapReduce	MapReduce programming model System architecture	[MapReduce]
Spark	Differences to MapReduce RDDs Case study: PageRank	[RDD] [Spark]
DHTs and Dynamo	Distributed hash tables The CAP dilemma Amazon Dynamo	[Dynamo]
Special topics	Evolution of DynamoDB Predictability, Scalability, Availability, and Consistency
Second midterm exam
Reading days
Project demos and reports

Web site contact: Linh Thi Xuan Phan