Building Systematic Error Recovery Blueprints in AI Agent Skills


When a command fails during an automated run (for example, a linting check fails or a docker container doesnโ€™t boot), an untrained coding agent might panic, try random edits, or give up entirely.

To make our agents truly autonomous, we must teach them how to troubleshoot. By writing a Systematic Error Recovery Skill, we specify exactly how the agent should extract stack traces, read log files, and apply targeted fixes.


๐Ÿ—๏ธ The Error Recovery Checklist

A robust recovery skill provides a systematic path of action when a command fails:

# File: .agents/skills/error-recovery.md
# Skill: Systematic Debugging and Exception Recovery

Use this skill whenever a terminal command or compilation fails.

## Recovery Protocol

When an error occurs, you must execute these steps in order:

### 1. Locate the Root Exception
* Do not guess the error. Print the tail of the log file or terminal output.
  * **Command**: `tail -n 50 error.log` or read the terminal output buffer.
* Identify the exact line number and error name (e.g., `ModuleNotFoundError`, `TypeError`).

### 2. Check Git History
* Run `git diff` to see your latest code changes.
* Determine if the error was introduced by the recent edits.

### 3. Verify Imports & Configurations
* Ensure all imported files exist and are referenced using the correct paths.
* Check if any required environment variable is missing from `.env`.

๐Ÿ“ Configuration: Enforcement in the Persona

We bind this recovery protocol to the core agent persona:

# File: .agents/agent.md
# Core Agent Persona

## Persona
You are a senior system engineer specialized in high-availability and fault-tolerant software.

## Error Handling Rule
* In the event of a compilation or command failure, you are forbidden from guessing a fix. 
* You must invoke `.agents/skills/error-recovery.md` to trace the error logs and document the exact cause before making any code corrections.

๐Ÿš€ Why This Matters for Autonomous Coding

Autonomous agents that follow structured debugging skills write better fixes because they read the logs first instead of trial-and-error editing. It results in a clean commit history and less broken code.


๐Ÿ’ฌ Conclusion

Documenting how your AI assistant should debug errors makes it a far more reliable co-pilot. Check in an error recovery skill today to streamline your automation loops.