Using Code Katas to Evaluate Engineers

A small take-home that shows how someone really works, then a conversation about it.

4 min read

A whiteboard puzzle tells you how someone performs under artificial pressure. A small take-home kata, submitted as their best work, tells you how they actually write code when no one is watching the clock. This play describes how to use a kata to evaluate an engineer's problem-solving and clean, test-driven coding, and how to frame it so it opens a conversation rather than slamming a gate.

When to use this play#

Use it when you want to evaluate how a candidate genuinely works, not how they survive a timed quiz. It suits roles where clean, tested, maintainable code is the daily job. Frame the exercise as the candidate's best work and the seed for an in-depth conversation, not as a pass-or-fail filter, and the signal you get back will be far richer.

How to run it#

1. Assign a small, self-contained kata. Pick a problem that fits in a library, needs no UI, and has enough depth to reveal craft. The candidate solves it on their own time and submits it as their best work.

2. Require a README and a Git history. Ask for a README with one or two commands to build and run the project, and ask them to use a Git repository with frequent commits. The commit history is part of what you are evaluating; it shows the problem-solving process unfolding, not just the destination.

3. Let them choose the stack. Any language or stack is fine, as long as they tell you what it is and the README explains how to run it. Forcing a stack tests familiarity with your tools, not their underlying craft.

4. Review against the rubric, then talk about it. Score the submission, then bring the candidate in for a conversation about the decisions they made. The conversation is where a good kata earns its keep.

The evaluation rubric#

Score each submission on:

Ease of building and running tests — how quickly you can get the project running and the tests green from the README alone.
Project organization — whether the layout makes the code easy to understand.
Naming and clarity — whether names communicate intent.
Clean code at every level — both at the module and class level and inside individual functions and methods.
Minimal duplication — whether the code stays DRY without contorting itself.
Originality — that the work is genuinely theirs and not plagiarized.

Good library-only katas#

These have enough depth to reveal craft and need no UI:

Mars Rover — model a rover on a grid responding to movement commands.
Bowling Scoring — score a game, including the strike and spare edge cases.
Roman Numerals — convert to and from Arabic numerals.

Common traps#

Treating it as a pass-or-fail gate. The kata's value is the conversation it enables. Reducing it to a binary throws away most of the signal.
Mandating a stack. Forcing your language tests tool familiarity, which is easy to learn, instead of craft, which is what you actually want to assess.
Ignoring the commit history. A single squashed commit hides the problem-solving process. The incremental history is where you see how they think.
Scoring only correctness. A solution that passes its tests but is unreadable, undocumented, or hard to run fails the parts of the rubric that predict day-to-day collaboration.
Skipping the follow-up conversation. Without it, you are guessing at the reasoning behind the code instead of asking the person who wrote it.

Signals it's working#

You can clone the repo, read the README, and get the tests passing in a couple of commands.
The commit history reads like a thought process, with small, related changes.
The follow-up conversation goes deep because the candidate has real decisions to defend and explain.
You come away understanding how the person works, not just whether they produced a correct answer.