Building Systematic Error Recovery Blueprints in AI Agent Skills
When a command fails during an automated run (for example, a linting check fails or a docker container doesnโt boot), an untrained coding agent might panic, try random edits, or give up entirely.
To make our agents truly autonomous, we must teach them how to troubleshoot. By writing a Systematic Error Recovery Skill, we specify exactly how the agent should extract stack traces, read log files, and apply targeted fixes.
๐๏ธ The Error Recovery Checklist
A robust recovery skill provides a systematic path of action when a command fails:
# File: .agents/skills/error-recovery.md
# Skill: Systematic Debugging and Exception Recovery
Use this skill whenever a terminal command or compilation fails.
## Recovery Protocol
When an error occurs, you must execute these steps in order:
### 1. Locate the Root Exception
* Do not guess the error. Print the tail of the log file or terminal output.
* **Command**: `tail -n 50 error.log` or read the terminal output buffer.
* Identify the exact line number and error name (e.g., `ModuleNotFoundError`, `TypeError`).
### 2. Check Git History
* Run `git diff` to see your latest code changes.
* Determine if the error was introduced by the recent edits.
### 3. Verify Imports & Configurations
* Ensure all imported files exist and are referenced using the correct paths.
* Check if any required environment variable is missing from `.env`.
๐ Configuration: Enforcement in the Persona
We bind this recovery protocol to the core agent persona:
# File: .agents/agent.md
# Core Agent Persona
## Persona
You are a senior system engineer specialized in high-availability and fault-tolerant software.
## Error Handling Rule
* In the event of a compilation or command failure, you are forbidden from guessing a fix.
* You must invoke `.agents/skills/error-recovery.md` to trace the error logs and document the exact cause before making any code corrections.
๐ Why This Matters for Autonomous Coding
Autonomous agents that follow structured debugging skills write better fixes because they read the logs first instead of trial-and-error editing. It results in a clean commit history and less broken code.
๐ฌ Conclusion
Documenting how your AI assistant should debug errors makes it a far more reliable co-pilot. Check in an error recovery skill today to streamline your automation loops.