How much of data analytics can we automate? Data scientists spend many hours manipulating and cleaning data, writing careful/scalable data analysis programs, and debugging analyses. My group explores the intersection of AI and Data Science—towards a world where intelligent systems can automatically perform many of the data analytics tasks we currently expect humans to do.

News/Updates

3/26/19 New course: CMSC 13600 Introduction To Data Engineering

3/4/19 New paper on Deep RL for materialized view selection https://arxiv.org/abs/1903.01363 (Blog post)

1/23/19 New paper on “Band-limited Convolutions” for adaptive resource control during model training. Stay tuned for a preprint! (Blog Post)

1/13/19 Going to CIDR, will present a new paper on visual data management. https://arxiv.org/abs/1812.07607

1/09/19 Updated Paper https://arxiv.org/abs/1808.03196 (Blog Post)

12/15/18 NeurIPS Invited Workshop Talk: http://mlforsystems.org/

Current Projects

We are always looking for exceptional undergraduates, graduate students, and post-docs! Email me. skr @ cs . uchicago

Deep Query Optimization. What is the role of machine learning in the design and implementation of a modern database system? This question has sparked considerable introspection in the data management community, and the epicenter of this debate is the core database problem of query optimization, where the database system finds the best physical execution path for an SQL query. Read our recent research paper on this.

A Formal Theory of Data Science: Datasets often undergo a sequence of rule-based transformations before they are used in any data analytics. Many data scientists craft these rules by hand after observing representative samples of data. However, as with any type of decision rules, the chosen transformations might be overly specific to those errors manifest in the sample, or conversely, may have unforeseen side-effects on unseen data.We seek to formalize this process and understand when data transformation scripts can be expected to generalize.

DeepLens, A Visual Data Management System: New results in deep learning give us tools to add structure to natural data like images and video. We explore using these tools to answer semantic queries on image and video. Read our paper published at CIDR 19.

Optimizing Machine Learning Programs. Training Machine Learning models at scale is a core application for many distributed systems. We explore adaptive optimization, auto-scaling, and model compression.L

Publications

How do we build systems that continuously learn from real-world interactions? Real-world reinforcement, imitation learning, and control

Richard Shin, Roy Fox, Sanjay Krishnan, Dawn Song, Ion Stoica. Parametrized Hierarchical Procedures For Neural Programming. ICLR 2018.

Vatsal Patel*, Sanjay Krishnan, Aimee Goncalves, Carolyn Chen, Walter Doug Boyd, Ken Goldberg. Using Intermittent Synchronization to Compensate for Rhythmic Body Motion During Autonomous Surgical Cutting and Debridement. Under Review 2018.

Sanjay Krishnan*, Roy Fox*, Ion Stoica, Ken Goldberg. DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations. CoRL 2017. (pdf)

Roy Fox*, Sanjay Krishnan*, Ken Goldberg, Ion Stoica. Multi-Level Discovery of Deep Options. (arxiv)

Sanjay Krishnan, Eugene Wu, Michael Franklin, Ken Goldberg. BoostClean: Automated Error Detection and Repair for Machine Learning. Preprint Available. 2017.

Tejas Kannan, Sanjay Krishnan. Exploring the Sensitivity of Policy Gradients to Observation Noise. RLDM 2017.

Sanjay Krishnan, Animesh Garg, Richard Liaw, Brijen Thananjeyan, Lauren Miller, Florian T. Pokorny, Ken Goldberg. SWIRL: A Sequential Windowed Inverse Reinforcement Learning Algorithm for Robot Tasks With Delayed Rewards. Under Review IJRR (Request Copy).

Brijen Thananjeyan,Animesh Garg, Sanjay Krishnan, Carolyn Chen, Lauren Miller, Ken Goldberg. Multilateral Surgical Pattern Cutting in 2D Orthotropic Gauze with Deep Reinforcement Learning Policies for Tensioning. ICRA 2017. read

Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg. Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations. ICRA 2017. read

Sanjay Krishnan, Animesh Garg, Richard Liaw, Brijen Thananjeyan, Lauren Miller, Florian T. Pokorny, Ken Goldberg. SWIRL: A Sequential Windowed Inverse Reinforcement Learning Algorithm for Robot Tasks With Delayed Rewards. WAFR 2016. arxiv

How Do We Visualize/Interpret Data?

Sanjay Krishnan, Eugene Wu. Arachnida: A Transformation-Oriented Explanation Engine. Under Review. 2018

Sanjay Krishnan, Eugene Wu. PALM: Machine Learning Explanations For Iterative Debugging. HILDA 2017. (pdf)

Mo Zhou, Alison Cliff, Sanjay Krishnan, Brandie Nonnecke, Camille Crittenden, Kanji Uchino, Ken Goldberg. M-CAFE 1.0: Motivating and Prioritizing Ongoing Student Feedback During MOOCs and Large on-Campus Courses using Collaborative Filtering.  Proceedings of the 16th Annual ACM Conference on Information Technology Education, SIGITE 15, Chicago, September, 2015. (pdf)

Mo Zhou, Alison Cliff, Allen Huang, Sanjay Krishnan, Brandie Nonnecke, Kanji Uchino, Sam Joseph, Armando Fox, and Ken Goldberg.M-CAFE: Managing MOOC Student Feedback with Collaborative Filtering. In Learning@Scale 2015.(pdf)

Jay Patel, Gil Gershoni, Sanjay Krishnan, Matti Nelimarrka, Brandie Nonnecke, Ken Goldberg.A Case Study in Mobile-Optimized vs. Responsive Web Application Design. In Mobile HCI 2015 (pdf)

Sanjay Krishnan, Jay Patel, Michael J. Franklin, and Ken Goldberg. Social Influence Bias in Recommender Systems: A Methodology for Learning, Analyzing, and Mitigating Bias in Ratings. Under Review: ACM Conference on Recommender Systems (RecSys). Foster City, CA, USA. Oct 2014 (pdf)

Sanjay Krishan, Ken Goldberg, Yuko Okubo, Kanji Uchino. Using a Social Media Platform to Explore How Social Media Can Enhance Primary and Secondary Learning. The Sixth Conference of MIT’s Learning International Networks Consortium. June 2013 (pdf)

Sanjay Krishnan, Ken Goldberg. Distributed Spectral Dimensionality Reduction for Visualizing Textual Data. ICML Workshop on Spectral Learning Methods, Atlanta, GA, June 2013. (pdf)

How Do We Clean Data?

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu. SampleClean: Fast and Reliable Analytics on Dirty Data. IEEE Data Engineering Bul. 2015 (pdf)

Xu Chu, Ihab Ilyas, Sanjay Krishnan, Jiannan Wang. Data Cleaning: Overview and Emerging Challenges. SIGMOD 2016. (read)

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska. PrivateClean: Data Cleaning and Differential Privacy. SIGMOD 2016. (pdf)

Sanjay Krishnan,Eugene Wu, Michael Franklin, Ken Goldberg, Jiannan Wang. ActiveClean: An Interactive Data Cleaning Framework For Machine Learning. SIGMOD 2016 Demo. (pdf)

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Eugene Wu. ActiveClean: Interactive Data Cleaning For Statistical Modeling. VLDB 2016. (pdf)

Sanjay Krishnan, Daniel Haas, Eugene Wu, Michael Franklin. Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations. HILDA 2016. (pdf)

Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu. Wisteria: Nurturing Scalable Data Cleaning Infrastructure. VLDB 2015 Demo. (pdf)

Jiannan Wang, Sanjay Krishnan, Michael Franklin, Ken Goldberg, Tim Kraska, Tova Milo. A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data. In SIGMOD, Jun. 2014 (pdf)

How Do We Structure Data? Robotics Trajectory Segmentation/Cleaning

Sanjay Krishnan, Animesh Garg, Sachin Patil, Colin Lea, Gregory Hager, Pieter Abbeel, Ken Goldberg. Transition State Clustering: Unsupervised Surgical Task Segmentation For Robot Learning. IJRR 2018(read).

Caleb Chuck, Michael Laskey, Sanjay Krishnan, Ruta Joshi, Ken Goldberg. Statistical Data Cleaning for Deep Learning of Automation Tasks from Demonstrations. CASE 2017.

Adithya Murali, Animesh Garg, Sanjay Krishnan, Florian T. Pokorny, Pieter Abbeel, Trevor Darrell, Ken Goldberg: TSC-DL: Unsupervised Trajectory Segmentation of Multi-Modal Surgical Demonstrations with Deep Learning. (read)

Sanjay Krishnan, Animesh Garg, Sachin Patil, Colin Lea, Gregory Hager, Pieter Abbeel, Ken Goldberg. Transition State Clustering: Unsupervised Surgical Task Segmentation For Robot Learning. International Symposium on Robotics Research (ISRR). 2015. (read)

How Do We Store and Update Data?

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, and Tim Kraska. Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views. In VLDB 2015. (pdf) (arxiv)

Liwen Sun, Sanjay Krishnan, Reynold S. Xin and Michael J. Franklin. A Partitioning Framework for Aggressive Data Skipping. VLDB 2014. (pdf)

Liwen Sun, Michael J. Franklin, Sanjay Krishnan, Reynold S. Xin: Fine-grained partitioning for aggressive data skipping. SIGMOD Conference 2014. (pdf)

How Do We Collect Data? Sensors, Actuators, Crowds

Vatsal Patel, Sanjay Krishnan, Aimee Goncalves. SPRK: A Low-Cost Stewart Platform For Motion Study In Surgical Robotics. Under Review ISMR 2018.

Daniel Seita, Sanjay Krishnan, Roy Fox, Stephen McKinley, John F. Canny,
Ken Goldberg. Fast and Reliable Autonomous Surgical Debridement with
Cable-Driven Robots Using a Two-Phase Calibration Procedure. Under Review ICRA 2018.

Yeouhnoh Chung, Sanjay Krishnan , Tim Kraska. A Data Quality Metric (DQM). How to Estimate the Number of Undetected Errors in Data Sets. VLDB 2017. (pdf)

Brandie Nonnecke, Sanjay Krishnan, Jay Patel, Mo Zhou, Laura Byaruhanga, Dorothy Masinde, Maria Elena Meneses, Alejandro Martin del Campo, Camille Crittenden, Ken Goldberg. DevCAFE 1.0: A Participatory Platform for Assessing Development Initiatives in the Field. IEEE Global Humanitarian Technology Conference (GHTC). 2015 (Best Paper) (pdf)

Jeffrey Mahler, Sanjay Krishnan, Michael Laskey, Siddarth Sen, Adithyavairavan Murali, Ben Kehoe, Sachin Patil, Jiannan Wang, Mike Franklin, Pieter Abbeel, Ken Goldberg. Learning Accurate Kinematic Control of Cable-Driven Surgical Robots Using Data Cleaning and Gaussian Process Regression. CASE 2014. (read)