Publications by Todd Millstein

Publications by authors named "Todd Millstein"

Page 1 of 1

Adding data provenance support to Apache Spark.

Matteo Interlandi Ari Ekmekji Kshitij Shah Muhammad Ali Gulzar Sai Deep Tetali Todd Millstein

VLDB J

October 2018

Debugging data processing logic in data-intensive scalable computing (DISC) systems is a difficult and time-consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result, programmers spend countless hours collecting evidence (e.g.

View Article and Find Full Text PDF

Optimizing Interactive Development of Data-Intensive Applications.

Matteo Interlandi Sai Deep Tetali Muhammad Ali Gulzar Joseph Noor Tyson Condie Todd Millstein

Proc ACM Symp Cloud Comput

October 2016

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language.

View Article and Find Full Text PDF

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.

Muhammad Ali Gulzar Matteo Interlandi Seunghyun Yoo Sai Deep Tetali Tyson Condie Todd Millstein

Proc Int Conf Softw Eng

May 2016

Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform.

View Article and Find Full Text PDF

Titian: Data Provenance Support in Spark.

Matteo Interlandi Kshitij Shah Sai Deep Tetali Muhammad Ali Gulzar Seunghyun Yoo Todd Millstein

Proceedings VLDB Endowment

November 2015

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result programmers spend countless hours collecting evidence ( from log files) and performing trial and error debugging. To aid this effort, we built , a library that enables -tracking data through transformations-in Apache Spark.

View Article and Find Full Text PDF