About Me. I'm a Postdoctoral Scholar at the University of California, Berkeley, advised by Aditya Parameswaran. Before that, I was a PhD student at USC's Infolab, advised by Prof. Cyrus Shahabi, from August 2019 to February 2024. My research broadly focuses on data and AI systems — building accurate, reliable, and efficient data-centric systems, with a penchant for theoretically understanding the use of machine learning in such systems.
Publications
My work primarily appears at data systems and machine learning venues (SIGMOD, VLDB, ICML, ICLR, etc.).
award · industry deployment
- Semantic Data Processing with Holistic Data Understanding Preprint
- Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents Preprint
- Arming Data Agents with Tribal Knowledge Preprint
- Featurized-Decomposition Join: Low-Cost Semantic Joins with Guarantees Preprint
- Multi-Objective Agentic Rewrites for Unstructured Data Processing Preprint
- RAG Without the Lag: Enabling "What-If" Analysis for Retrieval-Augmented Generation Pipelines Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, CHI '26
- Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Proceedings of the 16th Conference on Innovative Data Systems Research, CIDR '26
- Task Cascades for Efficient Unstructured Data Processing Proceedings of the 2026 International Conference on Management of Data, SIGMOD '26
- Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees Proceedings of the 2026 International Conference on Management of Data, SIGMOD '26
- NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval Proceedings of the 13th International Conference on Learning Representations, ICLR '25
- Querying Templatized Document Collections with Large Language Models Proceedings of the 41st IEEE International Conference on Data Engineering, ICDE '25
- LLM-Powered Proactive Data Systems IEEE Data Engineering Bulletin, Issue on LLMs-meets-data, March 2025
- Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability Proceedings of the 41st International Conference on Machine Learning, ICML '24
- Towards Establishing Guaranteed Error for Learned Database Operations Proceedings of the 12th International Conference on Learning Representations, ICLR '24
- BiasBuster: a Neural Approach for Accurate Estimation of Population Statistics using Biased Location Data Proceedings of the 25th Conference on Mobile Data Management, MDM '24
- On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing Proceedings of the 40th International Conference on Machine Learning, ICML '23
- NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23
- A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23
- Supporting Pandemic Preparedness with Privacy Enhancing Technology Proceedings of the 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA '23
- A Neural Database for Answering Aggregate Queries on Incomplete Relational Data Transactions on Knowledge and Data Engineering, TKDE '23
- A Neural Database for Differentially Private Spatial Range Queries Proceedings of the VLDB Endowment Volume 15, 2022
- Towards Accurate Spatiotemporal Covid-19 Risk Scores Using High Resolution Real-World Mobility Data ACM Transactions on Spatial Algorithms and Systems, 2022
- Estimating Spread of Contact-Based Contagions in a Population Through Sub-Sampling Proceedings of the VLDB Endowment Volume 14, 2021
- Secure Dynamic Skyline Queries Using Result Materialization 2021 IEEE 37th International Conference on Data Engineering, ICDE '21
- Finding Average Regret Ratio Minimizing Set in Database 2019 IEEE 35th International Conference on Data Engineering, ICDE '19