Kexin Rong

I am an Assistant Professor in the School of Computer Science at Georgia Tech. My lab studies systems and algorithms to improve the computational and human efficiency of large-scale data analytics and is part of the Georgia Tech database group. I also spend time at VMware Research Group as an affiliated researcher.

I am broadly interested in building systems and tools to help democratize data science, i.e., making it easy for non-experts to make sense and leverage the increasing large volumes of data by making the process more efficient and more accessible.

Previously, I completed my Ph.D. in CS from Stanford (advised by Peter Bailis and Philip Levis) and my B.S. in CS from Caltech.

I am actively looking for master and PhD students. If you are a GT student who is interested in working with me, please check out this page.

Email  /  Google Scholar  /  CV  /  Github  /  Lab Website

Updates
Publications and Preprints
Scaling a Declarative Cluster Manager Architecture with Query Optimization Techniques (Technical Report)
Kexin Rong, Mihai Budiu, Athinagoras Skiadopoulos, Lalith Suresh, Amy Tai
Preprint, 2022
Improving Computational and Human Efficiency in Large-Scale Data Analytics
Kexin Rong
PhD Thesis, 2021 (SIGMOD Doctoral Dissertation Award Honorable Mention)
Approximate Partition Selection for Big-Data Workloads using Summary Statistics
Kexin Rong, Yao Lu, Peter Bailis, Srikanth Kandula, Philip Levis
VLDB, 2020
[talk]

A system that leverages summary statistics to select weighted, partition-level samples to approximate analytical queries on big-data clusters.

Rehashing Kernel Evaluation in High Dimensions
Paris Siminelakis*, Kexin Rong*, Peter Bailis, Moses Charikar, Philip Levis.
ICML, 2019 (Long talk)
[blog] [code] [supplementary]

LSH-based sketching and importance sampling algorithms to accelerate kernel evaluation in high dimensions.

Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science
Kexin Rong, Clara Yoon, Karianne Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, Gregory Beroza.
VLDB, 2018
[blog] [video] [code] [seismology paper]

An unsupervised, end-to-end earthquake detection pipeline based on pairwise similarity search on seismic waveforms.

ASAP: Prioritizing Attention via Time Series Smoothing
Kexin Rong, Peter Bailis.
VLDB, 2017
[Datadog blog] [Timescale blog] [blog] [demo] [talk] [slides] [code]

An automatic smoothing algorithm for time series visualization that removes short-term fluctuations while preserving large-scale deviations.

MacroBase: Prioritizing Attention in Fast Data
Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri.
SIGMOD, 2017 (Invited to ACM TODS "Best of SIGMOD 2017" Special Issue.)
[website] [code] [journal paper] [vision paper] [demo paper]

A data analytics engine that highlights and aggregates important and unusual behavior in high-volume fast data streams.

Teaching
Fall 2022: Human-in-the-loop Data Analytics (CS 8803-MDS)

Template Source.