Kexin Rong

I am an Assistant Professor in the School of Computer Science at Georgia Tech. My lab studies systems and algorithms to improve the computational and human efficiency of large-scale data analytics and is part of the Georgia Tech database group. I also spend time at VMware Research Group as an affiliated researcher.

I am broadly interested in building systems and tools to help democratize data science, i.e., making it easy for non-experts to make sense and leverage the increasing large volumes of data by making the process more efficient and more accessible.

Previously, I completed my Ph.D. in CS from Stanford (advised by Peter Bailis and Philip Levis) and my B.S. in CS from Caltech.

I am actively looking for master and PhD students. If you are a GT student who is interested in working with me, please check out this page.

Email / Google Scholar / Bio / CV / Lab Website

Updates

[June 2024] Received a SIGMOD 2024 Distinguished PC Award.
[Apr 2024] Received an Amazon Research Award for optimizing layout designs in data analytics systems.
[Apr 2024] Received an NSF award to reimagine video retrieval with hand-drawn sketches.
[Oct 2023] Received an NSF award to build a person-focused open knowledge graph.

here

[Oct 2023] Congrats to Hantian Zhang for winning the Chih Foundation Graduate Student Research Publication Awards!
[Aug 2023] Congrats to Peng Li for winning the best research paper award at VLDB'23!!
[Aug 2023] Recognized as a distinguished reviewer for PVLDB Vol16.
[June 2023] EECS Rising Stars 2023 Workshop will be hosted at Georgia Tech. Apply by July 10!
[Oct 2022] Thanks Bosch Research for supporting our work!
[Aug 2022] I am honored to have received the Catherine M. and James E. Allchin Early Career Professorship in the College of Computing.
[Jun 2022] After Datadog, TimescaleDB has also published a blog post about using our work ASAP for smoothing their time series visualizations.
[Jun 2022] I honored to have received an Honorable Mention for the 2022 SIGMOD Jim Gray Doctoral Dissertation Award.

PhD Students

Rajveer Bachkaniwala (w/ Ada Gavrilovska)
Jie Jeff Xu
Hongbin Zhong

Hantian Zhang (PhD 2024, Google; w/ Xu Chu)
Renzhi Wu (PhD 2024, ByteDance; w/ Xu Chu)
Peng Li (PhD 2023, ByteDance; w/ Xu Chu)

Publications and Preprints

SketchQL: Video Moment Querying with a Visual Query Interface
Renzhi Wu*, Pramod Chunduri*, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong
SIGMOD 2025.
Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
SoCC 2024.
Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska
IISWC 2024. (Best Paper Finalist)
[code]
SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches.
Renzhi Wu, Pramod Chunduri, Dristi Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong.
VLDB 2024 Demo.
Demonstration of VCR: A Tabular Data Slicing Approach to Understanding Object Detection Model Performance.
Jie Jeff Xu, Saahir Dhanani, Jorge Piazentin Ono, Wenbin He, Liu Ren, Kexin Rong
VLDB 2024 Demo.
FALCON: Fair Active Learning using Multi-armed Bandits
Ki Hyun Tae, Hantian Zhang, Jaeyoung Park, Kexin Rong, Steven Euijong Whang
VLDB 2024.
[code]
Dynamic Data Layout Optimization with Worst-case Guarantees
Kexin Rong, Paul Liu, Sarah Ashok Sonje, Moses Charikar
ICDE 2024.
[slides][code]
Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques
Kexin Rong, Mihai Budiu, Athinagoras Skiadopoulos, Lalith Suresh, Amy Tai
VLDB 2023.
[slides] [code]
DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data
Peng Li, Zhiyi Chen, Xu Chu, Kexin Rong
SIGMOD 2023.
[slides] [code]
Interactive Demonstration of EVA
Gaurav Tarlok Kakkar, Aryan Rajoria, Myna Prasanna Kalluraya, Ashmita Raju, Jiashen Cao, Kexin Rong, Joy Arulraj
VLDB 2023 Demo.
[code]
Improving Computational and Human Efficiency in Large-Scale Data Analytics
Kexin Rong
PhD Thesis 2021. (SIGMOD Doctoral Dissertation Award Honorable Mention)
Approximate Partition Selection for Big-Data Workloads using Summary Statistics
Kexin Rong, Yao Lu, Peter Bailis, Srikanth Kandula, Philip Levis
VLDB 2020.
[talk]
Rehashing Kernel Evaluation in High Dimensions
Paris Siminelakis*, Kexin Rong*, Peter Bailis, Moses Charikar, Philip Levis.
ICML 2019. (Long talk)
[blog] [code] [supplementary]
CrossTrainer: Practical Domain Adaptation with Loss Reweighting
Justin Chen, Edward Gan, Kexin Rong, Sahaana Suri, Peter Bailis.
SIGMOD DEEM Workshop 2019.
Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science
Kexin Rong, Clara Yoon, Karianne Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, Gregory Beroza.
VLDB 2018.
[blog] [video] [code] [seismology paper]
MacroBase: Prioritizing Attention in Fast Data
Firas Abuzaid, Peter Bailis, Jialin Ding, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri.
ACM TODS 2018. "Best of SIGMOD 2017" Special Issue.
ASAP: Prioritizing Attention via Time Series Smoothing
Kexin Rong, Peter Bailis.
VLDB 2017.
[Datadog blog] [Timescale blog] [blog] [demo] [talk] [slides] [code]
Prioritizing Attention in Fast Data: Principles and Promise
Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri.
CIDR 2017.
MacroBase: Prioritizing Attention in Fast Data
Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri.
SIGMOD 2017 (Invited to ACM TODS "Best of SIGMOD 2017" Special Issue.)
[website] [code]
Demonstration: MacroBase, A Fast Data Analysis Engine
Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri.
SIGMOD 2017 Demo.

Teaching

Spring 2025: Emerging Database Technologies (CS4440 A)
Fall 2024: Database Systems Concepts and Design (CS6400 A)

Recent Talks

From Raw to Ready: Optimizing Data Curation for Machine Learning
VMware Research, July 2024, San Francisco Bay Area
[abstract][slides]
Towards a Human-Centric Approach to Machine Learning Lifecycle Management
UCSD Database Seminar, May 2023, Virtual
[abstract]
Learned Indexing and Sampling for Improving Query Performance in Big-Data Analytics
Stanford MLSys Seminar, April 2022, Virtual
[abstract][slides]

Template Source.