Link Search Menu Expand Document

Overview

Table of contents


An important learning objective of this course to get hands-on research experience. Therefore as part of this course, you need to complete a semester-long research project. Each project team can have up to 3 students, and we expect work proportional to team sizes. To help you make consistent progress towards the final presentation, we set up a few milestones over the course of the semester. Research projects will mainly be evaluated based on semantic completeness and clarity of presentation.

Timeline

Here’s a rough timeline of the key milestones:

  • Week 3: Finalize project team
  • Week 5: Project Proposal
  • Week 10: Project Update
  • Week 15: Peer Review
  • Week 16: Final Presentation

FAQs

What count as research projects?

The primary requirement is that the project is 1) related to course topics and that it 2) contains some element of research (i.e., something new). In other words, reimplementing a piece of software that someone else proposed without any significant extension/modification does not count as a research project.

There are many forms of novelty: novel problem and solution, novel solution to existing problems, novel application of existing solutions, novel implementation and evaluation or even some cool new datasets. To help you determine whether the project idea is in a reasonable scope, all teams are required to meet with Kexin at least once prior to the project proposal deadline.

If you are a graduate student or have an existing research project, “reusing” the project to this course’s topics is encouraged.

How are projects evaluated?

One of the learning objectives of the class is for you to get hands-on experiences with conducting research. Research projects can vary greatly in scope and complexity and are also highly dependent on your background and skills. Therefore, from a grading perspective, we focus more on the “completeness” of the project, namely:

  • Is the problem/hypothesis well-defined and motivated?
  • Is the related work section thorough?
  • Does the evaluation have the appropriate metrics/experiments for testing the main hypothesis?
  • Is the writing overall clear and easy to follow for a technical expert in the field? We do not evaluate the project based on the “interestingness” of the ideas.

What are different types of projects?

Projects can come in different flavors:

  • Research project: identify a new problem or task, propose or extend a solution, evaluate and report findings. The solution can be a new system, a new tool, a new interface or a new algorithm.
  • Benchmarking and analysis: extensive evaluation of algorithms, data structures, and systems that are of wide interest. The novelty for benchmarking papers comes from 1) new insights about the strengths and weaknesses of existing methods 2) new ways to evaluate exisitng methods, such as by curating new datasets and scenarios.
  • Reproduce and extend: there are many papers that describe an idea in theoretical terms, or that implemented their ideas in a different context (e.g., maybe they made assumptions that don’t hold anymore on new hardware). Thoroughly understanding a paper (or collection of papers) and reproducing the main ideas in a new context can often lead to new findings and extensions.

I need help with research problems.

Identifying a research problem can be challenging (but also rewarding!), especially if you have not done so before. Don’t know where to start? Have a fuzzy idea? Want some feedback on your current idea? Please come to office hours and we are here to help!

Unsolicited Project Ideas

The list below offers examples of possible projects. Keep in mind that this list is not exhaustive, and you are fully encouraged to come up with a project topic that interests you personally. In fact, a common source of ideas is to take your experience from another domain, and combine it with ideas from the class. Another approach is to take concepts from the papers we read, and apply them to another domain.

If any of the projects listed matches your interest, you are welcomed to come and discuss them during the instructor’s OH.

Evaluate Copilot in recommending data preparation steps

  • nearest neighbor paper: Auto-Suggest: Learning-to-Recommend Data Preparation Steps Using Data Science Notebooks
  • Objective: Examine the efficiency and effectiveness of GitHub Copilot in suggesting data preparation steps by comparing its performance to the methods proposed in the Auto-Suggest paper.
  • Both GitHub Copilot and Auto-Suggest models are trained on publicly available code (e.g., the paper trained models over 4M Jupyter notebooks crawled on GitHub). How well does a generic AI-based auto-complete tool like GitHub Copilot suggest data preparation steps? Is it comparable to a more specialized model such as the one trained in the Auto-Suggest paper?
  • Related reading: Assessing the Quality of GitHub Copilot’s Code Generation

Impact of data cleaning on visualization recommendations

  • nearest neighbor paper: Lux: always-on visualization recommendations for exploratory dataframe workflows
  • Objective: Investigate how different data cleaning methods impact visualization recommendation
  • Lux automatically recommends “interesting” visualizations from your pandas dataset. However, it does not currently support dirty data. Do different data cleaning methods have an effect on the visualizations recommended by Lux? If so, how and to what extend?

Data preprocessing transferability

Sketch-based labeling interface for time series anomaly detection

Cross-modality labeling interfaces

Similarity search and beyond