Caleb Stanford

I am a PhD student in computer science at the University of Pennsylvania. My advisor is Rajeev Alur. Here is my CV.

I graduated from Brown University in May 2016 (ScB Mathematics – Computer Science).

Research

My research interests include: (1) programming languages and systems for data stream processing; (2) formal verification; and (3) logical foundations of computing.

I attend the Penn PL Club:

.

Current research

Numerous specialized software platforms now exist for processing large quantities of data and responding in real time. Such stream processing systems are popular because they allow the programmer to specify the computation in an intuitive way (e.g., as a high-level query, as a sequence of stream transformations, or as a dataflow graph), and the system will deploy and parallelize the computation automatically. Distributed deployment is especially critical for many (but not all) applications. Popular modern stream processing systems include Apache Spark Streaming and Apache Flink.

My long-term goal is to improve the state-of-the-art in stream processing systems through better programming abstractions, better implementation techniques, and better formal specification. Good systems should be reliable, i.e. they should have a clear underlying semantics free of unexpected behavior. They should be expressive enough to describe common programming patterns, including computations that are stateful, quantitative, and/or sensitive to the ordering of the input data. Finally, they should allow for high-level programming of the desired computation whenever the efficient low-level implementation can be inferred automatically.

Existing systems are highly optimized for performance and generally provide fault-tolerance guarantees. Orthogonal to this, I’m interested in the above requirements (which are more at the programming language level). For instance, most systems are not reliable with respect to nondeterministic behavior due to out-of-order data or to unsound parallelization — and such nondeterminism is neither immediately visible to the programmer, nor caught at runtime. SQL-based query languages are supported by many platforms, but they may lack one or more dimensions of expressiveness (stateful computation, quantitative computation, or order-sensitive computation). Finally, stream processing systems are usually not fully high-level because they often require tuning system settings and levels of parallelization by hand in order to achieve the desired performance.

Other Projects

• Formal verification of properties of knowledge (spring 2016)

My undergraduate capstone project at Brown was on verifying the solutions to several epistemic logic puzzles in the Alloy programming language (alternate link). The abstract is here. You can also download the Alloy files and run them yourself; they are here.

I attended the Complexity Across Disciplines Undergraduate Math REU in summer 2015. The work was in graph theory and combinatorics related to computational biology. We studied a particular operation, called a ‘context-directed reversal’, on signed permutations — permutations of $[1,\ldots,n]$ where each element is additionally given a sign.

The main result classifies exactly which signed permutations are sortable by context-directed reversals. We are unable to provide a formula for the number of such signed permutations, but we relate signed permutations to a subclass of graphs, and provide a formula for the number of graphs sortable by an analogous graph operation. We prove that asymptotically it is 1/3 of all graphs.

Here are some slides on our work and a final poster which we presented at the Joint Math Meetings (JMM) in January 2016, winning an Outstanding Presentation Award.

Exposition

• Online, no-regret machine learning on large sets of experts (spring 2018)

Classical algorithms for online, no-regret learning from expert advice (e.g. randomized weighted majority) work efficiently only for a small number of experts. Specifically, the streaming algorithm in this setting takes O(n) time to process each data item, where n is the number of experts. For a project in CIS 625 (Computational Learning Theory), we discussed some of the literature that attempts to extend the ideas of no-regret learning to efficient algorithms for large or even infinite classes of experts. To see our view on two very different approaches, see the project report.

• SPEED computational complexity estimation (fall 2016)

For a project in CIS 673 (Computer-Aided Verification), I gave a presentation on the paper SPEED: precise and efficient static estimation of program computational complexity. You can read my take on the paper in my project report, or take a look at the slides. See also the SPEED webpage.

• Model theory notes (spring 2015)

With a group of 8 other students, I ran a group independent study project (GISP) on model theory, in the spring semester of 2015. The website for this class is still accessible here.

Here are some of the notes I wrote for the class:

(The notes are not entirely free of errors.)

Contact

castan at cis upenn edu

Last updated .