🎯 TA calibration protocol — the operational companion to the rubric system. Print 8.5×11 portrait. The piece that makes specifications grading work at scale.
← Back to the rubric system
▲ Page 1 — Why this exists
University A&P Lab · Operational Companion
TA Calibration Protocol
Overview
v0.1 · Page 1 of 6

A rubric that TAs cannot apply consistently isn't a rubric — it's a wish. The unit packets define what the binary decisions are. This protocol defines how a team of TAs makes those decisions reliably across sections, sessions, and the term.

The four pillars

① Calibration session

Start of every term. TAs grade the same 20–30 sample answers; outliers are coached before they ever grade a real student.

② Anchor cards

At every grading station. Pass / Not-yet / Edge-case examples for the most common items in the unit. The card is consulted when there's hesitation.

③ Spot-check audit

Weekly. Coordinator regrades a sample of each TA's items; inconsistencies surface immediately, not at term's end when nothing can be done about them.

④ Escalation queue

For edge cases. TAs do not adjudicate ambiguous answers alone. Items go to a queue the coordinator reviews end-of-day.

What the protocol prevents

The bottom line

Specifications grading lives or dies on this protocol. A rubric system without TA calibration produces the same variance as traditional point-allocation grading, just with a different vocabulary. The protocol is the part that makes "binary, atomic judgments" actually binary and actually atomic when applied by a team.

▲ Page 2 — Calibration session
TA Calibration Protocol · Pillar ①
Start-of-Term Calibration Session
Pillar ①
v0.1 · Page 2 of 6

Held within the first two weeks of the term, before any TA grades a real practical. 90–120 minutes. Scheduled as paid TA time. Repeated for each unit if TA personnel change between units.

Materials

Agenda (90 minutes)

TimeActivityWhat happens
0:00–0:15WalkthroughCoordinator walks through the rubric structure, the anchor cards, and the bundle threshold downstream consequences. Why the discipline matters.
0:15–0:35Sample R1 (identification)TAs independently grade ~10 sample R1 answers. Decisions written, not discussed. Coordinator collects.
0:35–0:55Sample R1 reviewCoordinator tallies decisions on whiteboard. Items where TAs disagreed are discussed first; items where everyone agreed are confirmed second. Outliers are not embarrassed — the goal is the standard, not the person.
0:55–1:15Sample R2 + R3Same exercise for ID + Function and Histology samples. R2 and R3 carry the most subjective judgments and benefit most from calibration.
1:15–1:30Anchor card walkthrough & Q&ACoordinator points to the anchor card sections that resolve the disagreements that just surfaced. TAs leave with the cards in hand.

The discipline of disagreement

The point of the calibration session is to surface disagreement and resolve it before it costs students. The standard can shift — coordinator may decide an edge case differently after hearing TA reasoning — but it shifts once, on the record, before grading begins. After the session, the standard is the standard.

Outlier handling

If one TA's pattern is clearly different from the rest (consistently more lenient or stricter), the coordinator follows up one-on-one within the week. The conversation is about the rubric, not the person. Most outliers are misunderstanding one specific item or one specific rule; clarification fixes it. If a TA cannot calibrate to the cohort standard after coaching, they grade with the coordinator's check on every item until they can.

▲ Page 3 — Anchor cards
TA Calibration Protocol · Pillar ②
Anchor Cards at Every Grading Station
Pillar ②
v0.1 · Page 3 of 6

Anchor cards are the in-the-moment reference TAs consult when there is hesitation. They live at every grading station throughout the practical. They exist because memory is unreliable under load — no TA grading 30 students in 90 minutes will remember every synonym rule.

What an anchor card contains

Each unit packet (cardiovascular, nervous system, musculoskeletal) includes anchor card pages designed to be torn out, laminated, and placed at grading stations. A complete anchor card set covers, for each rubric type:

How to use the cards

  1. If the answer is unambiguous, decide and move on. Do not consult the card — speed matters during a 90-second station.
  2. If you hesitate, look at the card. Most edge cases are on it. If your case matches an example, follow the example.
  3. If your case isn't on the card, escalate. Circle the item on the score sheet; do not assign a decision. The escalation queue handles it (Pillar ④).

Maintaining the cards over the term

Anchor cards are versioned. When the escalation queue resolves a new edge case (Pillar ④), the coordinator decides whether to:

This is how the system improves. Every term should produce a small number of new anchor card entries; the absence of new entries means nobody is encountering edge cases — or, more likely, nobody is escalating them. Either way, it's a signal worth investigating.

A note on physical format

Anchor cards work best laminated, hole-punched, and on a single ring at the grading station. TAs flip through them quickly. Loose-leaf cards get lost; spiral-bound cards don't lay flat. Whatever format your program adopts, prioritize can the TA find the right card in 5 seconds — that's the design constraint.

▲ Page 4 — Spot-check audits
TA Calibration Protocol · Pillar ③
Weekly Spot-Check Audits
Pillar ③
v0.1 · Page 4 of 6

Calibration at the start of the term is necessary but not sufficient. Standards drift. Sympathy accumulates. Fatigue erodes. The audit is the mechanism that catches drift before it becomes a term-wide problem.

What the audit is

Each week, the coordinator regrades approximately 5–10% of each TA's items from that week's lab sessions. Selection is stratified:

What the coordinator looks for

PatternWhat it means
All decisions match coordinator's regradeCalibration is holding. Brief acknowledgment to the TA; no change.
One or two disagreements, scatteredNormal variance. Note the items; revisit at the next coaching touch-point.
Pattern of leniency on one rubric typeTA may need a refresher on that rubric's discipline. Schedule a 10-minute one-on-one within the week.
Pattern of strictness on one rubric typeSame response. Strictness is no more virtuous than leniency — both are deviations from the standard.
Pattern of under-escalatingTA is adjudicating ambiguity instead of escalating. Reinforce: when in doubt, circle and escalate.
Pattern of over-escalatingTA is escalating items the anchor cards already resolve. Walk through the relevant cards.

How feedback is delivered

Audit results are returned to TAs by the start of the following week's lab session. Feedback is brief, specific, and rubric-focused (not personality-focused). Example:

"Thanks for the work this week. One pattern I want to flag: on the tricuspid valve item, you marked four students as pass when they said 'lets blood through.' That's a not-yet per the R2 rubric — the valve's function is preventing backflow, not permitting flow. Have a look at anchor card E2 and let me know if you have questions before next session."

What the audit is not

It is not a performance review. It is not a basis for TA discipline (unless persistent and unaddressed after coaching). It is the same thing the calibration session is — a mechanism for keeping the rubric the rubric, applied by a team that gets tired.

▲ Page 5 — Escalation queue
TA Calibration Protocol · Pillar ④
Escalation Queue for Edge Cases
Pillar ④
v0.1 · Page 5 of 6

Some answers are genuinely ambiguous and will not appear on any anchor card the first time they show up. The escalation queue ensures these are decided once, by the coordinator, with consistency across the cohort — not 12 different ways by 12 different TAs.

How escalation works during a practical

  1. TA encounters an ambiguous item. They consult the anchor card. If the card doesn't resolve it, they circle the item on the score sheet — do not assign a P or NY.
  2. The student is told briefly. Sample script: "I'm going to have the coordinator look at this one. You'll see it on your returned score sheet." No further discussion; the practical continues.
  3. End of session: TA delivers all circled items to coordinator. A short note explains what was ambiguous. Examples: "Student spelled 'visceral' as 'visceral' but I think they meant 'visceral pleura' — not on card." "Student named the structure but added an incorrect detail — not sure if this passes."
  4. Coordinator decides within 24 hours. The decision is recorded against the student's score and added to the term's escalation log.

The escalation log

A simple spreadsheet (or notebook) maintained by the coordinator. Each row contains: date, unit, rubric type, the student answer (or summary), the anchor-card item it relates to (if any), the coordinator's decision, and the rationale (one sentence). The log serves three purposes:

What does and does not belong in the queue

Belongs in queueDoes NOT belong (handle in the moment)
Genuinely novel ambiguous answer not anticipated by any anchor card Answer that matches an anchor card example — just apply the card
Spelling case where the rule is silent Spelling case the rule covers explicitly
Function statement that's partially correct in a way no anchor case addresses Function statement that obviously misses the rubric requirement
Disputed dissection technique observation (e.g. unusual approach that worked) Standard technique observation covered by the 4-point or 5-point checklist
Anything where two TAs at adjacent stations disagree Anything where the rubric is clear and the TA's hesitation is just speed-related
A target rate

A healthy escalation rate is roughly 2–5% of items in any practical. Below 1% suggests TAs are adjudicating things they shouldn't. Above 8% suggests the anchor cards need to resolve more cases (or the rubric itself has a gap that needs addressing). The coordinator should track this rate per TA and per unit.

▲ Page 6 — Coordinator quick-reference
TA Calibration Protocol · Coordinator Quick-Reference
The Term at a Glance
Quick-Reference
v0.1 · Page 6 of 6

One page. The full operational rhythm of the calibration system, designed for the wall above the coordinator's desk.

Term timeline

WeekCalibration activityNotes
Pre-term (Week 0)Sample answers prepared for first unit20–30 per rubric type, 40/40/20 mix
Week 1Calibration session ① with TAs90–120 minutes, paid time, before any grading
Weeks 2–NSpot-check audit ③ each week5–10% of each TA's items, returned by next session
Mid-termMini-calibration on emerging escalation log items30–45 min, only if log volume warrants it
New unit startRe-run calibration session ① for new unit's anchor cardsShorter (60 min) if same TA team; full session if any new TAs
Post-termReview escalation log; promote items to anchor cards for next term2–3 hours; produces the v0.x → v0.(x+1) update

Daily rhythm during a practical week

Health metrics to track

Escalation rate per TA per unitTarget 2–5%; investigate outliers in either direction
Spot-check disagreement rateTarget <5% of regraded items; rising rate signals drift
Same-answer consistency across TAsSampled occasionally by giving two TAs the same item; should be near 100%
Student appeal rateShould fall over time as the system stabilizes; rising appeals signal a rubric problem worth investigating

What "the system is working" looks like

If the protocol is failing

Three early warning signs: appeals rising; TAs grading the same session very differently in spot-checks; the escalation log either empty (TAs adjudicating alone) or overflowing (rubric or anchor cards have gaps). Any one of these is a cue to step back and redesign rather than push harder on the existing system.