Caleb Stanford

Image of Me

I am a PhD student in computer science at the University of Pennsylvania. My advisor is Rajeev Alur. My CV can be viewed online here.

I graduated from Brown University in May 2016 (ScB Mathematics – Computer Science).


My research interests include: (1) programming languages and systems for data stream processing; (2) formal verification; and (3) logical foundations of computing.

I attend the Penn PL Club:

Penn PL Club.

Current research

Numerous specialized software platforms now exist for processing large quantities of data and responding in real time. Such stream processing systems are popular because they allow the programmer to specify the computation in an intuitive way (e.g., as a high-level query, as a sequence of stream transformations, or as a dataflow graph), and the system will deploy and parallelize the computation automatically. Distributed deployment is especially critical for many (but not all) applications. Popular modern stream processing systems include Apache Spark Streaming and Apache Flink.

My long-term goal is to improve the state-of-the-art in stream processing systems through better programming abstractions, better implementation techniques, and better formal specification. Good systems should be reliable, i.e. they should have a clear underlying semantics free of unexpected behavior. They should be expressive enough to describe common programming patterns, including computations that are stateful, quantitative, and/or sensitive to the ordering of the input data. Finally, they should allow for high-level programming of the desired computation whenever the efficient low-level implementation can be inferred automatically.

Existing systems are highly optimized for performance and generally provide fault-tolerance guarantees. Orthogonal to this, I’m interested in the above requirements (which are more at the programming language level). For instance, most systems are not reliable with respect to nondeterministic behavior due to out-of-order data or to unsound parallelization — and such nondeterminism is neither immediately visible to the programmer, nor caught at runtime. SQL-based query languages are supported by many platforms, but they may lack one or more dimensions of expressiveness (stateful computation, quantitative computation, or order-sensitive computation). Finally, stream processing systems are usually not fully high-level because they often require tuning system settings and levels of parallelization by hand in order to achieve the desired performance.


Drafts and Submissions

Other Projects


Caleb Stanford DBLP

Caleb Stanford Google Scholar

Caleb Stanford ORCID page


castan at cis upenn edu

Last updated .