Use this when you feel yourself bouncing between theories instead of narrowing the problem.
1. Reproduce the issue clearly
- What exact input or sequence causes the problem?
- Can you reproduce it locally, in staging, or only in production?
- What should have happened instead?
2. Narrow the failure boundary
- Is the problem in the client, API, background job, database, or integration edge?
- What is the last point where behavior still looks correct?
- What changed recently near that boundary?
3. Inspect evidence before guessing
- logs
- error messages
- recent deploys
- payloads or request params
- feature flags or configuration
Do not collect everything. Collect the evidence that changes your next decision.
4. Test one hypothesis at a time
- What is the most likely explanation right now?
- What single check would strengthen or kill that theory?
- What did you learn from the result?
If you need help at this point, ask a better question instead of dumping the whole situation on someone else. Use How to Ask Better Technical Questions at Work.
5. Verify the fix completely
- Did the original failure stop happening?
- Did you test the nearby paths that could still break?
- Does the fix introduce a new risk somewhere else?
6. Leave behind a better default
After the issue is fixed, ask:
- should there be a test for this?
- should the logging be clearer?
- should the rollout or alerting be improved?
- should the team document this pattern?
Strong debugging is not just finding the bug. It is reducing the chance of the same confusion happening again.