TA Calibration Protocol — University A&P Lab Rubrics

University A&P Lab · Operational Companion

TA Calibration Protocol

Overview

v0.1 · Page 1 of 6

A rubric that TAs cannot apply consistently isn't a rubric — it's a wish. The unit packets define what the binary decisions are. This protocol defines how a team of TAs makes those decisions reliably across sections, sessions, and the term.

The four pillars

① Calibration session

Start of every term. TAs grade the same 20–30 sample answers; outliers are coached before they ever grade a real student.

② Anchor cards

At every grading station. Pass / Not-yet / Edge-case examples for the most common items in the unit. The card is consulted when there's hesitation.

③ Spot-check audit

Weekly. Coordinator regrades a sample of each TA's items; inconsistencies surface immediately, not at term's end when nothing can be done about them.

④ Escalation queue

For edge cases. TAs do not adjudicate ambiguous answers alone. Items go to a queue the coordinator reviews end-of-day.

What the protocol prevents

Silent variance across TAs. Without calibration, two TAs can grade the same answer differently and neither will know. With it, the disagreement surfaces during the sample session and gets coached.
Drift over the term. Standards naturally relax (or tighten) as a TA gets tired or sympathetic. Spot-checks catch drift before it accumulates.
TAs adjudicating things they shouldn't. Edge cases that escalate are decided once, by the coordinator, with consistency across the cohort — not 12 different ways by 12 different TAs.
Student appeals losing the institutional thread. Every grading decision has a documented basis (rubric + anchor card + escalation log). Appeals are easier to handle and easier to deny when the system is consistent and visible.

The bottom line

Specifications grading lives or dies on this protocol. A rubric system without TA calibration produces the same variance as traditional point-allocation grading, just with a different vocabulary. The protocol is the part that makes "binary, atomic judgments" actually binary and actually atomic when applied by a team.

TA Calibration Protocol · Pillar ①

Start-of-Term Calibration Session

Pillar ①

v0.1 · Page 2 of 6

Held within the first two weeks of the term, before any TA grades a real practical. 90–120 minutes. Scheduled as paid TA time. Repeated for each unit if TA personnel change between units.

Materials

Printed copy of the unit packet (rubrics + anchor cards) for each TA.
A set of 20–30 sample answers per rubric type, prepared by the coordinator. Mix of clear passes, clear not-yets, and edge cases (roughly 40% / 40% / 20%).
Blank score sheets for the sample exercise.
Shared whiteboard or screen for tallying decisions.

Agenda (90 minutes)

Time	Activity	What happens
0:00–0:15	Walkthrough	Coordinator walks through the rubric structure, the anchor cards, and the bundle threshold downstream consequences. Why the discipline matters.
0:15–0:35	Sample R1 (identification)	TAs independently grade ~10 sample R1 answers. Decisions written, not discussed. Coordinator collects.
0:35–0:55	Sample R1 review	Coordinator tallies decisions on whiteboard. Items where TAs disagreed are discussed first; items where everyone agreed are confirmed second. Outliers are not embarrassed — the goal is the standard, not the person.
0:55–1:15	Sample R2 + R3	Same exercise for ID + Function and Histology samples. R2 and R3 carry the most subjective judgments and benefit most from calibration.
1:15–1:30	Anchor card walkthrough & Q&A	Coordinator points to the anchor card sections that resolve the disagreements that just surfaced. TAs leave with the cards in hand.

The discipline of disagreement

The point of the calibration session is to surface disagreement and resolve it before it costs students. The standard can shift — coordinator may decide an edge case differently after hearing TA reasoning — but it shifts once, on the record, before grading begins. After the session, the standard is the standard.

Outlier handling

If one TA's pattern is clearly different from the rest (consistently more lenient or stricter), the coordinator follows up one-on-one within the week. The conversation is about the rubric, not the person. Most outliers are misunderstanding one specific item or one specific rule; clarification fixes it. If a TA cannot calibrate to the cohort standard after coaching, they grade with the coordinator's check on every item until they can.

TA Calibration Protocol · Pillar ②

Anchor Cards at Every Grading Station

Pillar ②

v0.1 · Page 3 of 6

Anchor cards are the in-the-moment reference TAs consult when there is hesitation. They live at every grading station throughout the practical. They exist because memory is unreliable under load — no TA grading 30 students in 90 minutes will remember every synonym rule.

What an anchor card contains

Each unit packet (cardiovascular, nervous system, musculoskeletal) includes anchor card pages designed to be torn out, laminated, and placed at grading stations. A complete anchor card set covers, for each rubric type:

One pass example — verbatim student answer that should pass, with the canonical version.
One not-yet example — verbatim student answer that should not pass, and why.
Two or three edge cases — the answers that come up repeatedly and where the unconfident TA would otherwise have to guess. The card tells them what to do.

How to use the cards

If the answer is unambiguous, decide and move on. Do not consult the card — speed matters during a 90-second station.
If you hesitate, look at the card. Most edge cases are on it. If your case matches an example, follow the example.
If your case isn't on the card, escalate. Circle the item on the score sheet; do not assign a decision. The escalation queue handles it (Pillar ④).

Maintaining the cards over the term

Anchor cards are versioned. When the escalation queue resolves a new edge case (Pillar ④), the coordinator decides whether to:

Add it to the anchor card for next term, or
Send a one-line clarification to all current TAs, or
Note it for the rubric revision at term's end.

This is how the system improves. Every term should produce a small number of new anchor card entries; the absence of new entries means nobody is encountering edge cases — or, more likely, nobody is escalating them. Either way, it's a signal worth investigating.

A note on physical format

Anchor cards work best laminated, hole-punched, and on a single ring at the grading station. TAs flip through them quickly. Loose-leaf cards get lost; spiral-bound cards don't lay flat. Whatever format your program adopts, prioritize can the TA find the right card in 5 seconds — that's the design constraint.

TA Calibration Protocol · Pillar ③

Weekly Spot-Check Audits

Pillar ③

v0.1 · Page 4 of 6

Calibration at the start of the term is necessary but not sufficient. Standards drift. Sympathy accumulates. Fatigue erodes. The audit is the mechanism that catches drift before it becomes a term-wide problem.

What the audit is

Each week, the coordinator regrades approximately 5–10% of each TA's items from that week's lab sessions. Selection is stratified:

A few clear passes (verifying the TA didn't pass things that shouldn't have).
A few clear not-yets (verifying the TA didn't fail things that should have passed).
All items the TA escalated (verifying the escalation was appropriate — not over- or under-escalating).
Two or three items at random (catching pattern drift).

What the coordinator looks for

Pattern	What it means
All decisions match coordinator's regrade	Calibration is holding. Brief acknowledgment to the TA; no change.
One or two disagreements, scattered	Normal variance. Note the items; revisit at the next coaching touch-point.
Pattern of leniency on one rubric type	TA may need a refresher on that rubric's discipline. Schedule a 10-minute one-on-one within the week.
Pattern of strictness on one rubric type	Same response. Strictness is no more virtuous than leniency — both are deviations from the standard.
Pattern of under-escalating	TA is adjudicating ambiguity instead of escalating. Reinforce: when in doubt, circle and escalate.
Pattern of over-escalating	TA is escalating items the anchor cards already resolve. Walk through the relevant cards.

How feedback is delivered

Audit results are returned to TAs by the start of the following week's lab session. Feedback is brief, specific, and rubric-focused (not personality-focused). Example:

"Thanks for the work this week. One pattern I want to flag: on the tricuspid valve item, you marked four students as pass when they said 'lets blood through.' That's a not-yet per the R2 rubric — the valve's function is preventing backflow, not permitting flow. Have a look at anchor card E2 and let me know if you have questions before next session."

What the audit is not

It is not a performance review. It is not a basis for TA discipline (unless persistent and unaddressed after coaching). It is the same thing the calibration session is — a mechanism for keeping the rubric the rubric, applied by a team that gets tired.

TA Calibration Protocol · Pillar ④

Escalation Queue for Edge Cases

Pillar ④

v0.1 · Page 5 of 6

Some answers are genuinely ambiguous and will not appear on any anchor card the first time they show up. The escalation queue ensures these are decided once, by the coordinator, with consistency across the cohort — not 12 different ways by 12 different TAs.

How escalation works during a practical

TA encounters an ambiguous item. They consult the anchor card. If the card doesn't resolve it, they circle the item on the score sheet — do not assign a P or NY.
The student is told briefly. Sample script: "I'm going to have the coordinator look at this one. You'll see it on your returned score sheet." No further discussion; the practical continues.
End of session: TA delivers all circled items to coordinator. A short note explains what was ambiguous. Examples: "Student spelled 'visceral' as 'visceral' but I think they meant 'visceral pleura' — not on card." "Student named the structure but added an incorrect detail — not sure if this passes."
Coordinator decides within 24 hours. The decision is recorded against the student's score and added to the term's escalation log.

The escalation log

A simple spreadsheet (or notebook) maintained by the coordinator. Each row contains: date, unit, rubric type, the student answer (or summary), the anchor-card item it relates to (if any), the coordinator's decision, and the rationale (one sentence). The log serves three purposes:

Consistency across the cohort. If the same ambiguous answer comes up twice, the second occurrence is decided the same way as the first.
Audit defensibility. If a student appeals, there is a documented basis for the decision — not "the TA decided" but "the coordinator decided per the standard applied to these other identical cases this term."
Continuous improvement. Items appearing repeatedly in the log become anchor card additions for next term (Pillar ②).

What does and does not belong in the queue

Belongs in queue	Does NOT belong (handle in the moment)
Genuinely novel ambiguous answer not anticipated by any anchor card	Answer that matches an anchor card example — just apply the card
Spelling case where the rule is silent	Spelling case the rule covers explicitly
Function statement that's partially correct in a way no anchor case addresses	Function statement that obviously misses the rubric requirement
Disputed dissection technique observation (e.g. unusual approach that worked)	Standard technique observation covered by the 4-point or 5-point checklist
Anything where two TAs at adjacent stations disagree	Anything where the rubric is clear and the TA's hesitation is just speed-related

A target rate

A healthy escalation rate is roughly 2–5% of items in any practical. Below 1% suggests TAs are adjudicating things they shouldn't. Above 8% suggests the anchor cards need to resolve more cases (or the rubric itself has a gap that needs addressing). The coordinator should track this rate per TA and per unit.

TA Calibration Protocol · Coordinator Quick-Reference

The Term at a Glance

Quick-Reference

v0.1 · Page 6 of 6

One page. The full operational rhythm of the calibration system, designed for the wall above the coordinator's desk.

Term timeline

Week	Calibration activity	Notes
Pre-term (Week 0)	Sample answers prepared for first unit	20–30 per rubric type, 40/40/20 mix
Week 1	Calibration session ① with TAs	90–120 minutes, paid time, before any grading
Weeks 2–N	Spot-check audit ③ each week	5–10% of each TA's items, returned by next session
Mid-term	Mini-calibration on emerging escalation log items	30–45 min, only if log volume warrants it
New unit start	Re-run calibration session ① for new unit's anchor cards	Shorter (60 min) if same TA team; full session if any new TAs
Post-term	Review escalation log; promote items to anchor cards for next term	2–3 hours; produces the v0.x → v0.(x+1) update

Daily rhythm during a practical week

Verify anchor card sets are at every grading station before the practical begins.
Brief reminder to TAs at start of session: when in doubt, circle and escalate.
End of session: collect circled items from each TA. Read on the spot if possible.
Decide circled items within 24 hours. Update student score sheets. Log each.
Pull spot-check sample for the week.
Return spot-check feedback to TAs by start of the next session.

Health metrics to track

Escalation rate per TA per unit	Target 2–5%; investigate outliers in either direction
Spot-check disagreement rate	Target <5% of regraded items; rising rate signals drift
Same-answer consistency across TAs	Sampled occasionally by giving two TAs the same item; should be near 100%
Student appeal rate	Should fall over time as the system stabilizes; rising appeals signal a rubric problem worth investigating

What "the system is working" looks like

TAs leave the calibration session knowing they can ask questions and trust the answers.
Students receive returned score sheets without surprises — the rubric was published, the anchor cards were applied, the decisions are explainable.
Appeals are rare, brief, and resolve quickly because every decision is documented.
The escalation log produces 5–15 new anchor-card items per term — small enough that the rubric is stable, large enough that the system is alive.

If the protocol is failing

Three early warning signs: appeals rising; TAs grading the same session very differently in spot-checks; the escalation log either empty (TAs adjudicating alone) or overflowing (rubric or anchor cards have gaps). Any one of these is a cue to step back and redesign rather than push harder on the existing system.