My research group studies corrupted, missing, or otherwise uncertain data in database and information retrieval systems. We are currently studying systems that can give certifiable accuracy guarantees in partially complete databases, query accuracy evaluation in corrupted databases, and automatic detection of data leakage.


5/01/2020 Two SIGMOD papers on Multi-query Execution and Approximate Query Processing

12/04/2020 New CIDR Paper on Edge Analytics pdf

8/04/2020 Redesigned online data science course:

7/17/2020 Adam Dziedzic Graduates!

Recent Publications

(Approximate Query Processing) Xi Liang, Stavros Sintos, Zechao Shang, Sanjay Krishnan. Combining Sampling and Aggregation (Nearly) Optimally. SIGMOD 2021 pdf

(Resource-Efficient Analytics) Dixin Tang, Zechao Shang, Aaron J. Elmore, and Sanjay Krishnan Resource-efficient Shared Query Execution via Exploiting Time Slackness. SIGMOD 2021

(Edge Analytics) John Paparrizos, Chunwei Liu, Bruno Barbarioli, James Hwang, Ikrudya Edian, Aaron Elmore, Mike Franklin, and Sanjay Krishnan, VergeDB: A Database for IoT Analytics on Edge Devices. CIDR 2021 pdf

(Approximate Query Processing) Xi Liang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, Mike Franklin. Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints. SIGMOD 2020

(Resource-Efficient Analytics) Dixin Tang, Zechao Shang, Aaron J. Elmore, Sanjay Krishnan, Mike Franklin. Thrifty Query Execution via Incrementability. SIGMOD 2020. pdf

Past Selected Publications

Learning-based Systems

Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. Deep Unsupervised Cardinality Estimation. VLDB 2020. pdf

Vanlin Sathya, Adam Dziedzic, Monisha Ghosh, and Sanjay Krishnan. Machine Learning based detection of multiple Wi-Fi BSSs for LTE-U CSAT. ICNC 2020 pdf

Sanjay Krishnan, Zongheng Yang, Keng Goldberg, Joe Hellerstein, and Ion Stoica. Learning to optimize join queries with deep reinforcement learning. 2018. pdf

Richard Shin, Roy Fox, Sanjay Krishnan, Dawn Song, Ion Stoica. Parametrized Hierarchical Procedures For Neural Programming. ICLR 2018. pdf

Robust Analysis on Dirty Data

Xu Chu, Ihab Ilyas, Sanjay Krishnan, Jiannan Wang. Data Cleaning: Overview and Emerging Challenges. SIGMOD 2016. (read)

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu. SampleClean: Fast and Reliable Analytics on Dirty Data. IEEE Data Engineering Bul. 2015 (pdf)

Sanjay Krishnan, Jay Patel, Michael J. Franklin, and Ken Goldberg. Social Influence Bias in Recommender Systems: A Methodology for Learning, Analyzing, and Mitigating Bias in Ratings. Under Review: ACM Conference on Recommender Systems (RecSys). Foster City, CA, USA. Oct 2014 (pdf)

Resource Efficient Databases

(Resource-Efficient Analytics, Query Optimization) Zechao Shang, Xi Liang, Dixin Tang, Cong Ding, Aaron J. Elmore, Sanjay Krishnan, Mike Franklin. CrocodileDB: Efficient Database Execution through Intelligent Deferment. CIDR 2020. pdf

Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, and Tim Kraska. Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views. In VLDB 2015. (pdf) (arxiv)

Liwen Sun, Michael J. Franklin, Sanjay Krishnan, Reynold S. Xin: Fine-grained partitioning for aggressive data skipping. SIGMOD Conference 2014. (pdf)