Sepanta Zeighami
Postdoctoral Scholar
UC Berkeley
Advisors
- Aditya Parameswaran (Postdoc)
- Cyrus Shahabi (PhD)
Current Mentees
- Yourun Sun (undergrad)
- Andrew Cheng (undergrad)
- Sasha Singh (undergrad)
Past Mentees
- Quentin Romero Lauro (undergrad → CEO @ Inspector, YC 2025)
- Nikhil & Vinay Rao (high school → undergrads @ UC Berkeley)
- Kameron Shahabi (undergrad → PhD @ University of Washington)
- Jonathan Qin (undergrad → Centiva Capital)
About Me
I'm a Postdoctoral Scholar at the University of California, Berkeley, advised by Aditya Parameswaran. I received my PhD from the University of Southern California, advised by Cyrus Shahabi.
Research. My research broadly focuses on building accurate, reliable, and efficient data and AI systems. These days I'm excited about using AI agents to reliably and efficiently perform high-level data tasks on heterogeneous data (including both structured and unstructured data sources): figuring out what tools they need, ensuring databases support them efficiently, and designing methods to improve their reliability and enable verification.
Impact and Recognition. My work has been deployed in systems such as LlamaIndex and open-source projects such as DocETL, and has inspired optimizations deployed in industry products. My work has received multiple awards and recognitions at various conferences, including an oral presentation at ICML’24 and best paper awards at CHI’26 and MDM’24. I was also selected as a distinguished reviewer for SIGMOD’26.
Service. I regularly review for conferences in my core research areas: databases (SIGMOD, VLDB, etc.) and machine learning conferences (NeurIPS, ICLR, etc.), and sometimes for other related conferences and journals (I've reviewed for PODS, Scientific Reports, FAccT). I'm helping organize the DASHSys workshop at VLDB’26.
Publications
My work primarily appears at data systems and machine learning venues (SIGMOD, VLDB, ICML, ICLR, etc.).
award · industry deployment
- Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents Preprint
- Semantic Data Processing with Holistic Data Understanding Under revision at Proceedings of the VLDB Endowment Volume, VLDB '27
- Arming Data Agents with Tribal Knowledge Under revision at Proceedings of the VLDB Endowment Volume, VLDB '26
- Featurized-Decomposition Join: Low-Cost Semantic Joins with Guarantees To appear in Proceedings of the VLDB Endowment Volume, VLDB '26
- Multi-Objective Agentic Rewrites for Unstructured Data Processing To appear in Proceedings of the VLDB Endowment Volume, VLDB '26
- Bolt-on, Verifiable Provenance for LLM-Powered Data Processing To appear in Proceedings of the VLDB Endowment Volume, VLDB '26
- RAG Without the Lag: Enabling "What-If" Analysis for Retrieval-Augmented Generation Pipelines Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, CHI '26
- Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Proceedings of the 16th Conference on Innovative Data Systems Research, CIDR '26
- Task Cascades for Efficient Unstructured Data Processing Proceedings of the 2026 International Conference on Management of Data, SIGMOD '26
- Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees Proceedings of the 2026 International Conference on Management of Data, SIGMOD '26
- NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval Proceedings of the 13th International Conference on Learning Representations, ICLR '25
- Querying Templatized Document Collections with Large Language Models Proceedings of the 41st IEEE International Conference on Data Engineering, ICDE '25
- LLM-Powered Proactive Data Systems IEEE Data Engineering Bulletin, Issue on LLMs-meets-data, March 2025
- Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability Proceedings of the 41st International Conference on Machine Learning, ICML '24
- Towards Establishing Guaranteed Error for Learned Database Operations Proceedings of the 12th International Conference on Learning Representations, ICLR '24
- BiasBuster: a Neural Approach for Accurate Estimation of Population Statistics using Biased Location Data Proceedings of the 25th Conference on Mobile Data Management, MDM '24
- On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing Proceedings of the 40th International Conference on Machine Learning, ICML '23
- NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23
- A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy Proceedings of the 2023 International Conference on Management of Data, SIGMOD '23
- A Neural Database for Answering Aggregate Queries on Incomplete Relational Data Transactions on Knowledge and Data Engineering, TKDE '23
- A Neural Database for Differentially Private Spatial Range Queries Proceedings of the VLDB Endowment Volume 15, 2022, VLDB '22
- Towards Accurate Spatiotemporal Covid-19 Risk Scores Using High Resolution Real-World Mobility Data ACM Transactions on Spatial Algorithms and Systems, 2022
- Estimating Spread of Contact-Based Contagions in a Population Through Sub-Sampling Proceedings of the VLDB Endowment Volume 14, 2021, VLDB '21
- Secure Dynamic Skyline Queries Using Result Materialization 2021 IEEE 37th International Conference on Data Engineering, ICDE '21
- Finding Average Regret Ratio Minimizing Set in Database 2019 IEEE 35th International Conference on Data Engineering, ICDE '19