Evaluation Plan

ASSIGNMENT 7. DUE FRIDAY, NOVEMBER 18 AT 5:00PM.

Part 1: Articulate Your Thesis
Part 2: Derive Your Claim
Part 3: Design Your Evaluation
Part 4: Write Your Evaluation
Submission and Grading

This assignment will center around what your plan is for the rest of the semester. You should submit it in addition to a Weekly Progress Report.

To convince people that your idea is correct, you’ll need some way to convince an expert that you have evaluated it fairly and correctly. In this assignment, you will develop an evaluation plan for your research project, and write it up.

Download the LaTeX template here: CS8803-MDS: Evaluation Plan.

Part 1: Articulate Your Thesis

The first step in planning an evaluation is to articulate the main thesis of your work. (Remember from Assignment 1, Project Proposal, that the main thesis of the work is likely embedded in the topic sentence of your bit flip paragraph.) Go back and reflect on that statement — tweak it if necessary based on what you’ve learned from your project so far.

Write out your thesis at the top of your submission.

Part 2: Derive Your Claim

Theses typicall imply a claim. For example, “x > y”-type (“X is better than Y”) theses imply a claim that x is in fact better/faster/more performant/more usable than y, and “∃ x”-type (“there exists an X”) theses imply a claim that whereas x could not exist before, that it does with your system. Discuss with your team the claim implied by your thesis.

Write down your claim.

Part 3: Design Your Evaluation

Now, you need to work from your claim to design a specific evaluation plan. How do you prove what you have claimed? This evaluation plan typically specifies:

Dependent Variable (DV): what is your dependent variable? (This is the variable you measure as the outcome, such as accuracy on a test set.)
Independent Variable (IV): what is your independent variable? (This is the variable you manipulate for comparison to create conditions, such as the algorithm or the interface used.)
Task: what is the specific task that is being performed in order to measure the DV? (This might include executing a benchmark, a known ML classification task, or a specific sequence of behaviors that a user must perform.)
Threats: what are the factors that might influence your outcome? For example, in what situations might your result hold or not hold? What biases might creep in that you need to make sure to account for?

You don’t need to re-invent the wheel here. Often papers in your related work establish an evaluation paradigm that you can import to your paper. In fact, this is often preferred, since then you don’t need to convince a reader that your approach is valid, since it’s already in the literature. So, go review the evaluations used in your prior work and use those to develop a few possible models. Then, share those models with your team and work together to develop a variant that works well for your project.

Based on your project’s setup, your model might look slightly different than what is laid out above. If you believe this to be the case, talk to your TA about it.

Next, run the following unit test on your proposed design: does it directly test the thesis you articulated above? Imagine a few possible outcomes from your evaluation. Depending on how it comes out, does it directly prove or disprove your thesis, or only obliquely shed light on whether your thesis is correct?

Write out the DV, IV, Task and Threats for your evaluation design. Summarize your explanation of why that design directly tests your thesis.

Part 4: Write Your Evaluation

Your goal is now to write up your evaluation plan like it would appear in a published paper. Ideally, you will be able to reuse and update this for your final paper submission. Having a clear sense of what the evaluation will look like helps make sure that you are targeting your vectoring toward the goals that you need to.

You obviously won’t have final results ready at this point, so either make up the results you might reasonably expect to see or use any current pilot data that you do have. Include any graphs that you are going to want to include in your final writeup. You will, of course, be able to update this for your final submission as your project progresses, including changing your evaluation design and updating your results.

In a full paper, typically the Evaluation section is 2000-3000 words. However, that length includes a detailed analysis of the results, and at this point, you will only have pilot data at best. So, for this assignment, the requirement is 1000-1500 words for a 2-3 person team and at least 800 words for a 1-person team, mostly on methods and a bit of results scaffolding.

Submission and Grading

Submit an evaluation PDF with (1) your thesis, (2) your claim, (3) your evaluation design, and (4) your evaluation writeup.

Your evaluation will be graded on the following criteria:

Claim: is the thesis a correct articulation of the project, and does your claim derive from the thesis? (5pt)
Evaluation design: does the design of the evaluation correctly evaluate the thesis and follow through on the claim? (10pt)
Clarity: does the writeup clearly and correctly describe the design to an expert in the area? (5pt)