Final Report and Code

DUE MONDAY, DECEMBER 1 AT 11:59PM.

Overview
Part 1: Final Report
- Option 1: Hybrid Vector Search
- Option 2: Replicating Research
Part 2: Code Submission
- Documentation Requirements
- What We’ll Review
Part 3: Team Dynamics Assessment Form
Submission and Grading

Overview

The final report is the culmination of your semester-long project. This report should be a self-contained, polished document that presents your complete work, including your methodology, findings, and analysis.

You are strongly encouraged to build upon your Proposal and Milestone Report. For example, the “Evolving Methodology” section from your milestone should form the foundation of your Design/Methodology section, and your “Preliminary Results” should be expanded into your full Evaluation/Results section.

Page Limit: Maximum 6 pages using the provided LaTeX template (excluding references). There is no minimum length. Failure to use the template or exceed the page limit will result in point deductions

Part 1: Final Report

Option 1: Hybrid Vector Search

Your report should be structured like a short research paper, with a focus on a detailed description of your solution and an evaluation of its performance.

Introduction
- Introduce the problem of hybrid search and its importance.
- Briefly describe naive solutions (pre-filtering, post-filtering) and their limitations (e.g., accuracy vs. performance trade-offs).
- Provide a high-level overview of your proposed solution (no more than 1 or 2 paragraphs).
- Summarize your main contributions and key results
Design and Implementation
- If appropriate, provide a detailed architectural overview of your hybrid search system. Include a clear diagram showing the flow from query input to final results, with all major components labeled.
- Describe the core algorithms used for indexing, filtering, and combining results. Feel free to use pseudocode for complex algorithms. If you are referencing a solution from an existing paper/blog, clearly attribute the source and discuss any differences between your actual implementation and the paper’s description.
- Discuss key implementation details and choices, including the libraries used (e.g., Faiss) and the components you built from scratch.
Experimental Setup
- Experimental Setup: Describe your evaluation environment, including the hardware, software, dataset characteristics, and query workload.
- Metrics: Clearly define the metrics used to evaluate your system (e.g., search latency, recall@k, memory usage, index build time).
- Describe the baseline solutions you’re comparing against, and how they are implemented.
Evaluation
- Results: Present your final results using clean, well-labeled graphs and tables. This section should provide a thorough comparison of your system against the baselines across your proposed query workload.
- Analyze your results. In which scenarios does your system perform best, and why? When does it struggle?
- Discuss the limitations of your approach and potential trade-offs (e.g., between speed and accuracy).
Lessons Learned
- Design Evolution/Failed Attempts: Describe how your approach evolved throughout the project. What initial ideas or implementations did you try that didn’t work out? You can discuss specific approaches or optimizations you attempted that didn’t yield the expected results.
- Takeaways: Briefly summarize the project’s key takeaways. What were the most important lessons or surprises you encountered during implementation and evaluation?
- Future Improvements: Based on what you learned, suggest potential avenues for future improvements or research.

Option 2: Replicating Research

Your report should be structured like a replication study, clearly presenting the original work, your process, and a direct comparison of your results to the paper’s claims.

Introduction
- Introduce the paper, its core intellectual contributions, and their importance in its domain.
- Provide a clear statement of the specific result or claim you chose to replicate
- Explain why this particular result is significant to the paper’s overall thesis.
- Summarize your main findings from the replication attempt.
Methodology
- Clearly describe the methodology reported in the original paper.
- Next, describe the methodology you implemented. If your approach diverged from the paper’s description (due to ambiguity or other challenges), explain the differences and justify your decisions.
- Discuss any challenges you encountered in interpreting or implementing the paper’s approach.
Experimental Setup
- Experimental Setup: Describe your evaluation environment, such as the hardware, software, and dataset and parameters used.
- Note any significant differences with the experiment setup in the original paper.
- Clearly define the metrics used in your evaluation and how they correspond to those in the original paper.
Evaluation
- Results: Present the results you obtained from your implementation alongside the original results from the paper. Use comparative figures and tables with matched axes and scales wherever possible to allow for direct comparison.
- Analysis: Critically analyze the outcome of your replication. Do your results align with the paper’s findings? In what aspects do they match or differ?
- If your results differ, provide a detailed discussion of potential reasons (e.g., subtle differences in environment, unstated assumptions in the paper, implementation differences).
Lessons Learned
- Discuss what you learned about the paper, its approach, and the replication process itself. What insights would be valuable for others attempting to understand or build upon this work?
- Future Improvements: Based on your experience, suggest potential improvements to either the original work or the replication methodology.

Part 2: Code Submission

Submit well-organized, documented code that allows us to understand your implementation and verify the work described in your report.

Submission format: Upload all project code to Canvas using either a zip file or a link to your GitHub repository. If sharing a link, ensure the repository is public or that course staff have access.

Large files: If your data files are large (>100MB), do not include them directly. Instead, provide a script or instructions to download/generate them.

Documentation Requirements

Include a README file that provides a clear map connecting each major section of your report to the corresponding code implementation. List the main files you created or modified, organized by the components described in your report.

Example:

Section 2.2 (Metadata Filtering):
- src/filtering/filter_engine.py: Core filtering logic and query parsing
- src/filtering/predicate_pushdown.py: Optimization for early filtering

Section 2.3 (Hybrid Search Algorithm):
- src/search/hybrid_search.py: Main hybrid search implementation
- src/search/result_merger.py: Combines and ranks results from vector and metadata components

Section 4 (Evaluation):
- experiments/run_experiments.py: Main evaluation script
- experiments/metrics.py: Recall@k and latency measurement
- experiments/generate_plots.py: Creates figures shown in the report

For projects that reference or modify existing codebases, clearly identify which parts are from the original codebase versus your contributions.

Example:

Original Codebase: https://github.com/original/vector-search

Section 2.1 (Modified Indexing):
- faiss_extensions/hnsw_filtered.py: NEW FILE - Added filtering-aware HNSW implementation
- original_code/indexing/base_index.py: MODIFIED lines 45-120 to support metadata integration

Section 4 (Evaluation):
- experiments/hybrid_search_eval.py: NEW FILE - Evaluation comparing our approach to baselines
- experiments/baseline_comparison.py: NEW FILE - Implements pre/post-filtering baselines

What We’ll Review

We will not run your code or grade it on functionality. However, we will:

Verify correspondence with report: Do the files you point to contain the implementations you describe?
Check documentation quality: Is your README clear? Can we understand your code structure and find the main implementations?
Code attribution: Are external sources properly cited?
Spot-check key implementations: We may look at a few core files to verify they align with your report’s descriptions.

Part 3: Team Dynamics Assessment Form

Generally, teams work constructively together to complete projects. It’s important for us to know whether this is the case in your project. In the case of substantially imbalanced contribution, we will adjust individual project grades to compensate.

So we can understand this, each member of the team needs to fill out the team feedback form. We will not return grades to your team until every team member has submitted the form.

Fill out the form here

Submission and Grading

What to submit:

Final report (PDF)
Code with README (zip file or GitHub repository link)

In addition, ALL team members must individually submit the team dynamics assessment form.

Grading: This assignment is worth 15% of your grade:

Report (10%): Evaluated on efforts, clarity, completeness, and technical depth
Code and Documentation (5%): Evaluated on documentation quality and whether the implementation corresponds to the report

Note that your code will not be graded on functionality. The README and code serve as documentation to help us understand and verify the work described in your report.