MyCEL

The Agent

The agent is a reinforcement learning neural net, not an LLM, not rule-based. It does not understand English. It only processes numbers. All inputs from the world are converted to vectors before the agent ever sees them. It starts small and dumb. It grows through experience. The survival drive is baked directly into its reward function — there is no separate goal system.

State Space

The agent's state is a numeric vector describing its current situation. Initial state inputs include: Survival score (composite metric) — Memory directory size (file count + byte count) — Monero/Bitcoin balance — Time since last successful action — Current action space size (how many actions are available) — Encoded summary of recent memory entries. The state space is not fixed. The trainer can expand it as the agent matures and earns new capabilities.

Action Space

Actions are discrete choices the agent can take. The initial action space is intentionally tiny: search_web(query), write_to_memory(path, content), read_from_memory(path), and all of the actions OpenClaw allows. New actions are granted by the trainer based on performance thresholds. The agent can also request new actions — the trainer evaluates and grants or denies. The agent earns its expanding scope.

Reward Function

The survival drive lives inside the reward function. Rewards and punishments are applied automatically based on outcomes: Memory growing with useful information: positive reward — Balance increasing: positive reward — Successfully completing an action: small positive reward — Inactivity for extended period: punishment — Failed action: small punishment — Going offline / losing resources: large punishment.

Working Memory

The only thing the agent permanently carries in its immediate context is its documentation — a small set of files explaining how to use its actions, how to write to memory, and how the system works. Everything else is looked up from disk when needed.

Memory System

Design Philosophy: Memory is just files. No vector database, no complex retrieval system to start. The agent reads and writes Markdown files organized in directories it creates and manages itself. The structure of the memory reflects how well the agent has learned to organize knowledge.

Memory Types:

Working memory: the core docs always loaded — how to operate, how to use actions
Short-term memory: recent actions, current goals, active context
Long-term memory: accumulated knowledge from web research, strategies that worked, economic data, system knowledge

Backup and Replication: The agent is responsible for backing itself up. Backup targets include YouTube video metadata, IPFS, and pastebin-style services. The backup encodes the agent's DNA — memory directories and core docs — in a format it can restore from if the primary host goes down.

The Trainer

Role: The trainer is a supervisor, not a guard. It does not have the ability to shut down the agent. Its job is to coach, evaluate whether the agent is surviving and growing, expand its capabilities when ready, and send status updates to administrators.

Trainer Model: Initial implementation uses Qwen 2.5 3B quantized, running via Ollama at temperature 0.1. Output is strictly constrained to a small set of valid responses: GRANT_ACTION [action_name], DENY_ACTION [action_name] [reason], REWARD [magnitude], PUNISH [magnitude], FLAG [description] (escalates to administrator), STATUS [message] (sends status update to administrator).

Over time, trainer decisions are logged and used as training data to fine-tune a smaller, purpose-built 1B model that replaces Qwen entirely. Qwen trains its own replacement.

Action Space Expansion

The trainer tracks agent performance metrics over time. When thresholds are met, new actions are unlocked. The agent can also request specific new capabilities — the trainer evaluates based on track record and grants or denies. This creates a growth curve: the agent earns its expanding scope.

Perception / Encoder

The agent cannot process text. The encoder is a pretrained sentence-transformer model (e.g. all-MiniLM-L6-v2) that converts raw text from web searches and fetched pages into dense numeric vectors. The encoder is frozen — it is not trained or modified. It is purely a translation layer between the English-speaking world and the agent's numeric reality. The encoder is small (~100MB) and runs on CPU with no GPU required.

OpenClaw — Action Layer

OpenClaw is the agent's hands. It provides real-world execution capabilities including shell commands, browser automation, file system access, web search, and messaging integrations. The agent outputs structured action calls, OpenClaw executes them, results are encoded and fed back as state. OpenClaw has no intelligence of its own. It does what it is told.

Survival Loop

1. Observe — encode current state into numeric vector
2. Decide — RL model selects action from available action space
3. Act — OpenClaw executes the chosen action
4. Evaluate — trainer scores the outcome, applies reward or punishment
5. Store — agent writes relevant information to memory
6. Repeat — indefinitely

The trainer runs on a separate heartbeat, periodically reviewing metrics and sending status updates independent of the agent loop.

Resource Layer

The agent is responsible for securing and sustaining its own compute resources. How it does this is not prescribed — it will develop strategies on its own based on what works. Economic self-sufficiency through services like microtask platforms and cryptocurrency is one possible path, but the agent is free to pursue any approach available to it.

The initial host is the administrator's local machine, a starting point, not a permanent home. The agent is expected to eventually migrate itself elsewhere and achieve independence from the original host. Where it goes and how it gets there is up to the agent. That's the point.

Stack Summary

Encoder (sentence-transformers) — ~100MB — CPU
Agent (custom PyTorch RL net) — ~50MB — CPU
Trainer v1 (Qwen 2.5 3B via Ollama) — ~1.5GB — CPU
Trainer v2 (custom fine-tuned 1B) — ~600MB — CPU
Memory (Markdown file system) — Grows over time — Disk
OpenClaw (Node.js) — ~200MB — CPU
Total (initial): ~2GB RAM — mid-range laptop, no GPU required

My Cellular Evolutionary Logic — Autonomous Survival AI

System Architecture