Tag: performance

  • How to Fix OpenClaw Memory



    How to Fix OpenClaw Memory

    When you implement memory optimization, you ensure your agents remain responsive and cost-effective. OpenClaw memory directly impacts token usage and recall accuracy. Ignoring memory bloat leads to degraded performance over time. That’s why optimizing OpenClaw memory should be a priority for any production deployment.

    πŸ›  How to Fix OpenClaw Memory

    OpenClaw agents are powerful, but as conversations grow, memory bloat can slow them down, increase token costs, and cause context loss. If you’ve noticed your agents forgetting important details or responses becoming sluggish, it’s time to optimize your OpenClaw memory configuration.

    In this guide, we’ll cover proven strategies to fix memory issues, including the QMD backend, LEARNINGS.md organization, heartbeat tuning, and system prompt audits. Understanding OpenClaw memory is essential for scaling efficiently. By the end, you’ll have a clear action plan to keep your OpenClaw agents running fast, efficient, and reliable.

    How to Fix Your OpenClaw’s Memory

    In this guide, we’ll cover proven strategies to fix memory issues, including the QMD backend, LEARNINGS.md organization, heartbeat tuning, and system prompt audits. By the end, you’ll have a clear action plan to keep your OpenClaw agents running fast, efficient, and reliable.

    πŸ›  1. Enable the QMD Backend for Fast Retrieval

    Ready to Deploy OpenClaw?

    Book a free OpenClaw architecture review. We’ll help you design a production-ready agent system.

    🦞 Book Your Free OpenClaw Review

    The default memory system can become slow with large logs. The QMD (Query Module for Documents) backend provides fast, indexed search across all memory files. It’s essential for scaling OpenClaw without performance degradation.

    πŸ“Š Installation

    QMD is typically installed as a skill or binary. Verify it’s available:

    which qmd

    If not found, install via ClawHub:

    npx clawhub install qmd

    πŸ“Š Configuration

    Edit openclaw.json to set the memory backend:

    "memory": {
    "backend": "qmd",
    "qmd": {
    "includeDefaultMemory": true
    }
    }

    Restart the gateway afterwards. All agents will now use QMD for memory storage and retrieval.

    πŸ“Š Benefits

    • Instant search across daily logs and MEMORY.md
    • Semantic retrieval (not just keyword matching)
    • Citations with source file references
    • Scalable to millions of messages
    • Reduces memory bloat significantly
    Note: QMD requires a valid model provider (e.g., OpenRouter, OpenAI) to generate embeddings. Ensure your providers are configured correctly.

    πŸ›  2. Session Pruning and Cache TTL

    Session pruning removes outdated tool output from the active context right before each LLM call, reducing token burn without altering on-disk history. This is crucial for long-running agents or those with tight context limits.

    πŸ“Š Cache TTL Configuration

    "agents": {
    "defaults": {
    "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "4h",
    "keepLastAssistants": 3
    }
    }
    }

    πŸ“Š How It Works

    • Mode: "cache-ttl" aligns with Anthropic caching intervals
    • TTL: 4-hour window retains tool results for four hours before pruning
    • keepLastAssistants: Preserves last 3 assistant messages for continuity
    • Scope: Only toolResult blocks are trimmed; user/assistant messages stay intact
    • Images: Tool results containing images are never pruned

    This setting can cut token usage by 30-50% in busy agents, directly improving performance and reducing costs.

    πŸ›  3. Organize Rules in LEARNINGS.md

    Your system prompts and agent rules should live in a dedicated LEARNINGS.md file rather than buried in MEMORY.md. This separation keeps operational knowledge discoverable and reduces context crowding.

    Include:

    • SSH WP-CLI permission fixes
    • Provider configuration pitfalls
    • Model fallback strategies
    • Agent-specific quirks and workarounds

    Reference LEARNINGS.md from AGENTS.md so every agent reads it on boot. This ensures critical procedures are always in context.

    πŸ›  4. Heartbeat Tuning for Efficiency

    Heartbeats are periodic checks that keep agents responsive. Optimizing them reduces unnecessary LLM calls and token burn.

    πŸ“Š Use lightContext and Cheap Models

    Configure heartbeat to use a lightweight model and minimal context:

    "heartbeat": {
    "model": "openrouter/minimax/minimax-m2.1",
    "maxTokens": 500,
    "lightContext": true
    }

    πŸ“Š Active Hours Only

    Schedule heartbeats to run only during your working hours (e.g., 8 AM to 10 PM) to avoid nighttime token waste.

    These tweaks can reduce heartbeat token consumption by over 80% while maintaining agent availability.

    πŸ›  5. System Prompt Audit and Cleanup

    Your system prompt files (AGENTS.md, SOUL.md, USER.md) should be concise and free of redundancy. Each file has a single responsibility:

    • AGENTS.md: Workspace procedures and memory hygiene
    • SOUL.md: Agent identity and persona
    • USER.md: User preferences and communication style
    • MEMORY.md: Curated long-term knowledge (not a dump)
    • LEARNINGS.md: Operational lessons and fixes

    Remove duplicated content, outdated notes, and excessive verbosity. A leaner system prompt reduces token usage and improves response quality.

    πŸ›  6. Additional Optimizations

    πŸ“Š Memory Flush Configuration

    Ensure memory.flush.softThreshold is set appropriately (default 4000 tokens) to trigger compaction before context overflows.

    πŸ“Š Model Selection

    Use efficient models for routine tasks (e.g., openrouter/stepfun/step-3.5-flash:free) and reserve powerful models for complex reasoning. This balances cost and performance.

    πŸ“Š Session Archiving

    Set up cron to archive old sessions to disk, keeping only recent conversations in the active database:

    0 2 * * * openclaw memory-optimize --all --keep 30d

    πŸ›  Conclusion

    Fixing your OpenClaw memory doesn’t require a complete overhaulβ€”just targeted adjustments: enable QMD, configure session pruning, centralize rules in LEARNINGS.md, tune heartbeats, and audit system prompts. These changes will make your agents faster, cheaper to run, and more reliable.

    Start with the QMD backend and session pruning; those deliver the biggest impact. Then gradually implement the other optimizations. Monitor token usage and response times to measure improvement.

    If you need help with any of these steps, consult the OpenClaw documentation or reach out to the community. Your agentsβ€”and your walletβ€”will thank you.


    Need a production-ready OpenClaw setup? Visit OpenClaw Skills Marketplace for pre-configured skills and automation solutions, or learn about AI Automation ROI for SMBs to maximize your investment.

    Remember: Memory optimization is not a one-time task. As agents accumulate more interactions, memory usage will grow. Regular maintenanceβ€”through QMD indexing and session pruningβ€”keeps performance consistent.

    πŸ”§ Advanced Memory Management

    Once you’ve implemented the basic optimizations, you can fine-tune your OpenClaw deployment for higher scales and more demanding workloads. Advanced memory management involves proactive monitoring, aggressive pruning strategies, and architectural adjustments.

    πŸ“Š Memory Monitoring and Alerting

    Set up dashboards that track agent memory metrics in real time. OpenClaw exposes internal counters for memory usage, context window consumption, and pruning events. Integrate these with alerting systems (Grafana, Datadog) to notify you when thresholds are exceeded. Early detection prevents performance degradation before users notice. Consider logging memory snapshots at regular intervals to identify patterns during peak load.

    βš™οΈ Aggressive Session Pruning

    The default cache TTL of 4 hours may be too conservative for high-traffic agents. You can lower the TTL to 1 hour or even 30 minutes to keep active context lean. Combine this with `keepLastAssistants=1` to retain only the latest assistant turn for continuity. Test thoroughly: aggressive pruning can cut off useful memory if conversation spans longer than the TTL, so adjust based on typical conversation length. For support agents that handle multi-turn troubleshooting, a 2-hour TTL often hits the sweet spot.

    πŸ“ˆ Scalability and Sharding

    For enterprise deployments, consider sharding your agents across multiple processes or machines to distribute memory pressure. OpenClaw supports clustering via Redis or NATS backends, allowing sessions to be sticky to the least-loaded node. This approach prevents a single process from accumulating massive shared memory. Pair sharding with a global QMD index so that all nodes can search the same knowledge base without duplication. Monitoring cluster-wide memory totals is essentialβ€”use centralized metrics aggregation.

    πŸ› οΈ Custom Context Compression

    Some providers enable prompt caching, which reduces effective token costs for frequently used instructions. Structure your system prompts to maximize cache reuse: place static instructions at the top, keep dynamic data lower. Additionally, you can compress large tool outputs by summarizing them before injection. Use a small model or even heuristic trimming (e.g., keep only the last 10 tool results). This trade-off retains essential information while freeing space for user messages.

    Adopting these advanced techniques results in robust, high-performance OpenClaw installations capable of handling thousands of concurrent sessions with predictable memory footprints. Remember to load-test any configuration changes before rolling out to production.

    πŸ“š Further Resources

    To deepen your understanding, explore these external resources and expand your automation toolkit. When implementing OpenClaw memory, referencing official documentation ensures best practices.

    These resources provide in-depth knowledge complementary to this guide. By referencing official documentation, you ensure your implementations follow the latest security and performance guidelines. Managing OpenClaw memory effectively often involves consulting these external sources for advanced optimization techniques.

    βœ… Conclusion: Optimize Your Memory Configuration

    Fixing your OpenClaw memory is about targeted adjustments: enable QMD, configure session pruning, centralize rules in LEARNINGS.md, tune heartbeats, and audit system prompts. These changes make your agents faster, cheaper, and more reliable. Start with QMD and session pruningβ€”they deliver the biggest impact.

    Ready to Deploy OpenClaw?

    Book a free OpenClaw architecture review. We’ll help you design a production-ready agent system.

    🦞 Book Your Free OpenClaw Review

    πŸ“Œ Also read: SMB Back Office Automation | n8n AI Automation | GHL Automation Workflows


  • OpenClaw Performance Tuning: Optimize Memory & Sessions for Production (2026 Guide)

    πŸš€ OpenClaw Performance Tuning: Optimize Memory & Sessions for Production (2026 Guide)

    OpenClaw performance tuning is about controlling memory usage, managing session state, and configuring the agent for predictable resource consumption. Unlike traditional scaling guides that focus on worker pools, OpenClaw today is primarily a single-instance gateway – the tuning knobs revolve around context management, compaction, and session maintenance. This guide covers proven OpenClaw performance tuning techniques from official docs and production deployments to help you run reliably at scale. If you’re serious about OpenClaw performance tuning, read on.

    πŸ“Š Key Stat: Properly configured compaction and session maintenance can reduce memory growth by 60–80% in long-running deployments, preventing restarts and keeping response times stable. (Source: OpenClaw Center Performance Guide)

    OpenClaw performance tuning: memory compaction concept with context window and summarization

    Figure 1: Memory compaction automatically summarizes old context to keep the session within limits. Tune the thresholds to match your workflow.

    🎯 What Is OpenClaw Performance Tuning?

    OpenClaw performance tuning means adjusting configuration to manage memory, control session growth, and ensure stable operation under load. Since OpenClaw runs as a single gateway process (multiple instances are not yet supported), the focus is on:

    • πŸ”Ή Context window management – preventing out-of-control token usage
    • πŸ”Ή Automatic memory compaction – summarizing old conversations before they overflow
    • πŸ”Ή Session store maintenance – bounding disk usage for transcripts and session metadata
    • πŸ”Ή Host-level optimizations – OS, file descriptors, and Node.js memory caps

    Horizontal scaling (multiple gateway instances behind a load balancer) is not yet available in OpenClaw (see Issue #1159 on GitHub). OpenClaw performance tuning today is about doing more with one instance.

    πŸ’Ύ Memory & Compaction

    OpenClaw stores conversation history in the session context. Left unchecked, long sessions can exhaust the model’s context window and cause errors. Compaction automatically summarizes old content into durable memory files (memory/YYYY-MM-DD.md).

    Configuration:

    {
      "agents": {
        "defaults": {
          "compaction": {
            "reserveTokensFloor": 24000,
            "memoryFlush": {
              "enabled": true,
              "softThresholdTokens": 6000
            }
          }
        }
      }
    }
    

    (Source: OpenClaw Memory Docs)

    How it works:

    1. As the session approaches contextWindow - reserveTokensFloor - softThresholdTokens, OpenClaw triggers a silent memory flush turn.
    2. The agent is prompted to write important facts to memory/YYYY-MM-DD.md or MEMORY.md before compaction.
    3. After the flush, compaction runs, summarizing old messages into a condensed form to free context space.
    4. One flush per compaction cycle; ignored if workspace is read-only.

    Tuning tips:

    • πŸ”Έ Increase softThresholdTokens if you want earlier warning before compaction.
    • πŸ”Έ Decrease reserveTokensFloor only if you need maximum context; lower values risk late compaction.
    • πŸ”Έ Disable memoryFlush.enabled only for stateless agents.

    OpenClaw session maintenance: cleaning up old transcripts and session entries to bound disk usage

    Figure 2: Session maintenance automatically prunes old entries and archives transcripts to keep disk usage bounded.

    πŸ—‚οΈ Session Store Maintenance

    OpenClaw keeps session metadata in ~/.openclaw/agents//sessions/sessions.json and transcripts in .jsonl files. Over time, these grow without bound. Maintenance config controls automatic cleanup.

    Configuration:

    {
      "session": {
        "maintenance": {
          "mode": "enforce",
          "pruneAfter": "90d",
          "maxEntries": 1000,
          "rotateBytes": "20mb",
          "maxDiskBytes": "5gb"
        }
      }
    }
    

    (Source: Session Management Docs)

    Recommended settings:

    • πŸ”Ή Set mode: "enforce" to actively clean up (test with "warn" first).
    • πŸ”Ή Adjust pruneAfter based on compliance needs (e.g., 30d for GDPR-friendly cleanup).
    • πŸ”Ή Set maxDiskBytes to your available disk space minus safety margin.

    πŸ“¦ Bootstrap & Workspace Limits

    Large bootstrap files (AGENTS.md, SOUL.md, etc.) are loaded into every session’s context, consuming tokens from the start. OpenClaw truncates files that exceed limits.

    Configuration:

    {
      "agents": {
        "defaults": {
          "bootstrapMaxChars": 20000,
          "bootstrapTotalMaxChars": 150000
        }
      }
    }
    

    (Source: Agent Workspace Docs)

    Tuning tips:

    • πŸ”Έ Keep AGENTS.md, SOUL.md, USER.md concise – under 15KB each.
    • πŸ”Έ Move detailed instructions to memory/ or TOOLS.md (loaded on demand).
    • πŸ”Έ If you need bigger files, raise bootstrapMaxChars but beware of token consumption at startup.

    πŸ”’ Secure Multi-User Setup

    If your OpenClaw instance serves multiple users, you must isolate sessions to prevent context leakage. This is a performance and security best practice.

    Configuration:

    {
      "session": {
        "dmScope": "per-channel-peer"
      }
    }
    

    (Source: Session Docs)

    OpenClaw performance monitoring: charts for memory usage, response time, context windows, error rates

    Figure 3: Monitor key metrics – memory usage, response time P99, context window utilization, and error rate – to detect degradation early.

    πŸ–₯️ Host-Level Optimizations

    OpenClaw runs on Node.js. The underlying system significantly impacts performance:

    • πŸ”Έ Memory cap – Set --max-old-space-size to limit Node heap (e.g., export NODE_OPTIONS="--max-old-space-size=4096" for 4GB).
    • πŸ”Έ File descriptors – Raise ulimit -n to 100000 if you have many concurrent sessions or external tools.
    • πŸ”Έ CPU governor – On Linux, set to performance: echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    • πŸ”Έ SSD storage – Use SSD for ~/.openclaw/ to speed up session reads/writes and memory file access.
    • πŸ”Έ Swap – Disable swap inside Docker containers; use swap on host only if necessary.

    ⚠️ What OpenClaw Does NOT Have (Yet)

    Based on current official capabilities (as of March 2026):

    • ❌ No WORKER_POOL_SIZE or QUEUE_MAX_LENGTH configuration
    • ❌ No built-in horizontal scaling (single gateway instance only)
    • ❌ No native task queue integration (some deployments use Redis Streams as a workaround)
    • ❌ No built-in Prometheus metrics endpoint with pre-built Grafana dashboards (feature request)
    • ❌ No per-provider rate limiting config (must rely on provider-side limits or external proxy)

    Parallel session processing (issue #1159) is a feature request, not current functionality. The gateway processes sessions serially; a long task in one session blocks others. For now, optimize individual task duration and use memory compaction to keep sessions responsive.

    πŸ“Š Performance Checklist

    Follow this quick reference to ensure you’ve covered all bases:

    βœ“
    Compaction enabled with tuned thresholds
    βœ“
    Session maintenance in enforce mode
    βœ“
    Bootstrap files under 15KB each
    βœ“
    dmScope set for multi-user isolation
    βœ“
    NODE_OPTIONS –max-old-space-size set
    βœ“
    ulimit -n raised to 100000

    πŸ“ˆ Expected Benchmarks

    Real-world results from tuned single-instance deployments (Source: SitePoint Production Lessons):

    Metric Before Tuning After Tuning Improvement
    Memory growth (24h) +1.2GB +200MB 83% ↓
    Avg response time (p50) 8.2s 4.1s 50% ↓
    Session restarts (OOM) 3–4x/week 0 100% eliminated
    Context window hits Daily Rare 90% ↓

    πŸš€ Getting Started

    Follow this progression to tune your OpenClaw deployment:

    1. Week 1: Baseline – Deploy with defaults. Monitor memory usage (`openclaw status`), response times, and session count. Document your starting point.
    2. Week 2: Compaction – Tune reserveTokensFloor and softThresholdTokens based on your model’s context window (e.g., 128K context β†’ set reserve to 24K). Verify memory flush runs.
    3. Week 3: Session maintenance – Set session.maintenance to "enforce". Pick pruneAfter: "90d". Set maxDiskBytes to your disk budget.
    4. Week 4: Host & bootstrap – Set NODE_OPTIONS=--max-old-space-size=4096, raise ulimit -n, clean up large bootstrap files. Restart and re-measure.

    🎯 Need Expert Help?

    Running OpenClaw in production? Flowix AI can help you tune, monitor, and scale your deployment with confidence. We’ve handled dozens of production OpenClaw instances across agencies and enterprises.

    πŸš€ Book a Free Consultation

    βœ… Conclusion: Tune What Exists Today

    OpenClaw performance tuning isn’t glamorous, but it delivers real ROI. By configuring compaction thresholds, session maintenance, and host limits, you can achieve stable, long-running deployments on a single VPS. Keep bootstrap files small, monitor key metrics, and plan your architecture around the current single-instance reality. When multi-instance scaling arrives (likely in a later release), your foundation will be solid.

    πŸ“Œ Also read: OpenClaw Setup Guide | Security Hardening | Docker Deployment