Post-Mortem: How a Hermes Agent Background Loop Burned 353M Tokens in 20 Hours (and How to Protect Yourself)

While our team was asleep, a background task on a development machine running Hermes Agent spent 20 hours and 25 minutes in a silent loop.

It made 5,501 API calls, consumed 353.4 million tokens (352M input, 1.39M output), and spiked our weekly API quota with Kimi to 98%.

This was not an active developer session. No one was at the keyboard, and the agent was not working on a user-directed project. The culprit was a routine background cron job - the Curator - getting stuck in an error-retry loop.

This post-mortem details the mechanical failures that allowed this loop to run unchecked and provides concrete steps to secure your Hermes setup against runaway API bills.

Anatomy of the Loop: Session `20260606_155226_xxxxxx`

Hermes uses a background daemon called the Curator to index terminal history, extract developer skills, and update configuration profiles. By default, it runs every 30 minutes.

On June 6 at 15:52, the Curator started its routine run. It hit a read error - likely triggered by an oversized skills file.

Instead of failing gracefully, the agent attempted self-correction:

It analyzed the file-read error.
It called a tool to locate or modify the target file.
The file operation failed again.
It analyzed the new failure and retried.

This cycle repeated for over 20 hours. Because the job ran as a detached background worker, there was no interactive terminal output, and the user-facing /stop command was unavailable.

By the time we checked the logs, the session profile looked like this:

Metric	Value
Model	`kimi-k2.6` (via `kimi-coding` provider)
Duration	20 hours, 25 minutes
Total API Calls	5,501
Input Tokens	352,000,000
Output Tokens	1,390,000,000 (1.39 million)
Turn Completion Events	0

Why Did the Safety Rails Fail?

This loop was made possible by a combination of two configuration workarounds commonly used by developers.

1. The Hardcoded Loop Limit in `agent/curator.py`

Inside the Curator agent’s initialization script (agent/curator.py), the loop boundary was set to a massive value:

# agent/curator.py (around line 1721)
agent = AIAgent(
    name="Curator",
    max_iterations=9999,  # The loop threshold
    ...
)

Setting max_iterations=9999 is a shortcut to ensure complex agents do not stop midway through a long task. However, in an unsupervised background process, 9,999 iterations is functionally unlimited. At 15 seconds per call, the agent had enough headroom to loop continuously for over 40 hours.

2. The Budget Override Chain Reaction

To prevent active terminal sessions from cutting off mid-code, we had previously removed max_session_tokens from our YAML profiles (located in ~/.hermes/config.yaml and profile YAMLs under ~/.hermes/profiles/).

However, Hermes has hardcoded fallbacks inside hermes_cli/config.py:

# hermes_cli/config.py (around line 748)
max_session_tokens = 500000
max_session_cost_usd = 10.0

When the YAML profiles did not define a budget, the engine fell back to these hardcoded values (500K tokens), causing active terminal coding sessions to crash with “Session budget exceeded” errors. To resolve this, we patched the fallback constants in hermes_cli/config.py directly:

-max_session_tokens = 500000
-max_session_cost_usd = 10.0
+max_session_tokens = 0       # 0 = unlimited
+max_session_cost_usd = 0.0

This successfully opened our interactive sessions. But because the background Curator cron job also relied on these fallback constants, we unknowingly removed the last safety rail stopping background workers.

A Guide to Securing Your Hermes Environment

To prevent background runaways while keeping your interactive terminal sessions unrestricted, apply the following controls:

1. Restrict Background Workers Separately from User Sessions

Do not rely on global session limits to protect background daemons. Background tasks must have their own execution budgets configured in the YAML configs and enforced in code.

First, define a max_api_calls key in your global configuration (~/.hermes/config.yaml):

# Add this to your global config
curator:
  max_api_calls: 100   # Enforce termination after 100 API calls

Second, update the fallback schema definitions in hermes_cli/config.py to register this setting:

# In hermes_cli/config.py (add to default curator schema dictionary)
"curator": {
    "max_api_calls": 100,
    ...
}

Third, modify the Curator’s instantiation in agent/curator.py to read this parameter instead of the hardcoded 9999 value:

# In agent/curator.py
def get_max_api_calls():
    # Read custom config value, default to 100 if undefined
    return config.get("curator.max_api_calls", 100)

agent = AIAgent(
    name="Curator",
    max_iterations=get_max_api_calls(),  # Replaced 9999
    ...
)

If the Curator hits a file-read loop or API error cascade, it will self-terminate after 100 turns, preserving your token balance.

2. Wrap Background Daemons in OS-Level Timeouts

A cron job intended to index logs and update developer skills should never run indefinitely. Wrap the cron execution command in a process-level timeout.

If you invoke Hermes background processes via system cron, modify the crontab entries to use the timeout utility:

# Force kill the process if it runs longer than 5 minutes (300 seconds)
*/30 * * * * timeout 300s hermes curator run

If the agent gets stuck in a loop, the operating system will terminate the process tree after 5 minutes, mitigating token drain.

3. Audit Background Tasks Regularly

Use the following commands to monitor active daemons and verify background workers:

List registered cron triggers:
```
hermes cron list
```
Check the active status of background tasks:
```
hermes curator status
```
Test background execution without making active edits:
```
hermes curator run --dry-run
```

If hermes curator status shows a task has been active for more than 10 minutes, inspect the log output immediately:

tail -n 100 ~/.hermes/logs/curator/curator.log

4. Create a Loop Detection Skill

Document the post-mortem parameters in a skill file (e.g. cronjob-safety) and save a detailed reference file to references/curator-incident-2026-06-06.md. When Hermes reviews its own operational skills, it will recognize loop patterns, inspect its running processes, and self-terminate or restart the gateway if anomalies are found.

5. Set Spend Limits at the Provider Level

Configuration files can be overridden, and code updates can be rolled back during version upgrades. The final line of defense must be at the API provider level:

Use distinct API keys for background automation vs. interactive desktop coding.
Set hard monthly spending limits on the provider dashboard (e.g., Kimi, OpenAI, or OpenRouter) for the automation key.
Enable billing alert notifications to receive immediate email or SMS warnings when spend spikes.

Anatomy of the Loop: Session 20260606_155226_xxxxxx

Why Did the Safety Rails Fail?

1. The Hardcoded Loop Limit in agent/curator.py

2. The Budget Override Chain Reaction

A Guide to Securing Your Hermes Environment

1. Restrict Background Workers Separately from User Sessions

2. Wrap Background Daemons in OS-Level Timeouts

3. Audit Background Tasks Regularly

4. Create a Loop Detection Skill

5. Set Spend Limits at the Provider Level

Related Reading

Related Articles

AI Hallucination Detection: A Practical Guide for Businesses Using LLMs

Practical Prompt Engineering Patterns for LLMs

Model Context Protocol for Small Business: A Practical Implementation Guide

Anatomy of the Loop: Session `20260606_155226_xxxxxx`

1. The Hardcoded Loop Limit in `agent/curator.py`