High-quality data is crucial for successful machine learning models, yet data curation remains a significant challenge in terms of both efficiency and quality. Data preprocessing, the transformation of raw data into usable formats, consumes a disproportionate amount of computational and human resources. Data scientists report spending 60-80% of their time on data cleaning rather than analysis, and preprocessing can account for up to 65% of ML pipeline time for tasks like image classification. Moreover, data quality issues, such as biases in the dataset, can lead to unfairness in downstream ML models. In this talk, I will present our lab's recent work on streamlining the data acquisition and preprocessing stages of the ML lifecycle. I will cover three topics: automatic data preprocessing techniques to reduce manual effort, profiling tools for identifying computation bottlenecks in preprocessing, and fairness-aware data collection methods to mitigate biases at the source.