HiveMem

HiveMem is a personal knowledge management system that enhances cognition through semantic search and a temporal knowledge graph.

HiveMem

<img width="1637" height="811" alt="image" src="https://github.com/user-attachments/assets/b9ceda91-0678-4d9b-bae8-2b5ba69d53d4" />

Personal knowledge system with semantic search, temporal knowledge graph, and progressive summarization.

MCP server backed by PostgreSQL (pgvector) with external embeddings service. 30 tools, append-only versioning, role-based token auth, agent fleet with approval workflow.

CI codecov GitHub release GHCR Java Spring Boot PostgreSQL Tests MCP Tools License: Sustainable Use SafeSkill

Docker image: ghcr.io/ufelmann/hivemem:main

Vision & Research

HiveMem is built on the premise that well-structured external knowledge systems are not just storage -- they extend cognition. Every design decision is grounded in research on how humans process, retain, and retrieve information.

Scientific Foundations

TheoryKey InsightHiveMem Consequence
Working Memory Limitation (Cowan, 2001)Humans hold ~4 items in working memoryWake-up context delivers max 15-20 items, prioritized by importance
Cognitive Load Theory (Sweller, 1988)Disorganized information wastes mental resources needed for thinkingRealms/Signals/Topics taxonomy, Blueprints, progressive summarization
Extended Mind Thesis (Clark & Chalmers, 1998)Well-used external tools become genuine extensions of cognitionProactive capturing, graph traversal for hidden connections, synthesis agents
Forgetting Curve (Ebbinghaus, 1885)90% of learned information is lost within a weekImmediate capture at session end, proactive storage of decisions

PKM Frameworks

Zettelkasten (Luhmann) -- Atomic notes + linking. Knowledge emerges from connections, not hierarchies. Luhmann produced 70 books and 400 papers from 90,000 linked notes.

What HiveMem adopts: Atomic cells (one topic per cell), knowledge graph as linking (facts), cell-to-cell tunnels with temporal versioning (related_to, builds_on, contradicts, refines). What HiveMem does differently: Semi-automatic linking -- LLM agents create tunnels after archiving based on semantic search. Bidirectional traversal. Temporal validity -- notes and tunnels can expire.

PARA (Tiago Forte) -- Projects / Areas / Resources / Archive. Sorted by actionability, not topic.

What HiveMem adopts: Actionability field (actionable / reference / someday / archive). Wake-up prioritizes actionable over reference. Realms map to Areas.

References

  • Cowan, N. (2001). The magical number 4 in short-term memory. Behavioral and Brain Sciences, 24(1), 87-114.
  • Sweller, J. (1988). Cognitive Load During Problem Solving. Cognitive Science, 12(2), 257-285.
  • Clark, A. & Chalmers, D. (1998). The Extended Mind. Analysis, 58(1), 7-19.
  • Ebbinghaus, H. (1885). Uber das Gedachtnis.
  • Ahrens, S. (2017). How to Take Smart Notes. CreateSpace.
  • Forte, T. (2022). Building a Second Brain. Atria Books.

Transparency & Trust

  • Privacy First: HiveMem is 100% self-hosted. Your data never leaves your infrastructure.
  • Auditability: All tool calls and authentication events are logged to /data/audit.log.
  • Security: Built-in RBAC (Role-Based Access Control) ensures that agents can only perform actions you approve.

Features

  • 30 MCP tools across search, knowledge graph, progressive summarization, agent fleet, references, and admin
  • 5-signal ranked search -- semantic similarity + keyword match + recency + importance + popularity
  • Append-only versioning -- never lose history, revise with parent_id chains, point-in-time queries
  • Progressive summarization (L0-L3) -- content, summary, key_points, insight per cell
  • Temporal knowledge graph -- facts with valid_from/valid_until, contradiction detection, multi-hop traversal
  • Role-based token auth -- multiple tokens, 4 roles (admin/writer/reader/agent), per-role tool visibility
  • Agent fleet with approval workflow -- agents write pending suggestions, only admins approve
  • Blueprints -- curated narrative overviews per realm, append-only versioned
  • References & reading list -- track sources, link to cells, filter by type/status
  • Spring Boot 4.0.5 + Java 25 -- MCP server with jOOQ, Flyway migrations, Caffeine cache
  • Automatic embedding reencoding -- detects model changes at startup, re-encodes all vectors with backup and progress tracking
  • 264 tests with Testcontainers -- unit, integration, HTTP end-to-end, performance, security, concurrency

Prerequisites

  • Docker (v20+)
  • An external PostgreSQL database with pgvector extension (e.g. pgvector/pgvector:pg17)
  • An external embeddings service reachable via HTTP (see below)

Proxmox LXC users: Docker containers running JDK 25 inside unprivileged LXC containers require --security-opt apparmor=unconfined (or security_opt: [apparmor=unconfined] in Compose). This applies to all services, not just HiveMem.

Embedding Service

HiveMem requires an external embedding service. The default model is paraphrase-multilingual-MiniLM-L12-v2 (384 dimensions). An ONNX-based service is included in embedding-service/.

The service must expose:

  • POST /embeddings{"text": "...", "mode": "document"}{"vector": [...], "model": "...", "dimension": N}
  • GET /info{"model": "...", "dimension": N} (used by HiveMem for model change detection)

Automatic reencoding: When HiveMem detects a model change at startup (different model name or dimension), it automatically backs up the database, re-encodes all cells, and rebuilds the HNSW index. Search is blocked (503) during reencoding.

To build the embedding service:

cd embedding-service
# You need model files (tokenizer.json + model_quantized.onnx) in slim-model/
docker build -t hivemem-embeddings .

Quick Start

No clone needed. Save this as docker-compose.yml and run docker compose up -d:

services:
  hivemem-db:
    image: pgvector/pgvector:pg17
    container_name: hivemem-db
    environment:
      POSTGRES_DB: hivemem
      POSTGRES_USER: hivemem
      POSTGRES_PASSWORD: ${HIVEMEM_DB_PASSWORD:-changeme}
    volumes:
      - hivemem-pgdata:/var/lib/postgresql/data
    networks:
      - hivemem-net
    restart: unless-stopped

  hivemem-embeddings:
    image: ghcr.io/ufelmann/hivemem-embeddings:main
    container_name: hivemem-embeddings
    networks:
      - hivemem-net
    restart: unless-stopped

  hivemem:
    image: ghcr.io/ufelmann/hivemem:main
    container_name: hivemem
    ports:
      - "8421:8421"
    environment:
      HIVEMEM_JDBC_URL: jdbc:postgresql://hivemem-db:5432/hivemem
      HIVEMEM_DB_USER: hivemem
      HIVEMEM_DB_PASSWORD: ${HIVEMEM_DB_PASSWORD:-changeme}
      HIVEMEM_EMBEDDING_URL: http://hivemem-embeddings:80
    depends_on:
      - hivemem-db
      - hivemem-embeddings
    networks:
      - hivemem-net
    restart: unless-stopped

networks:
  hivemem-net:

volumes:
  hivemem-pgdata:
# Set a password (or it defaults to "changeme")
export HIVEMEM_DB_PASSWORD=your-secret-here

# Start everything
docker compose up -d

# Wait for startup (Flyway migrations run automatically)
docker logs -f hivemem

# Create your first API token
docker exec hivemem hivemem-token create my-admin --role admin
# Save the printed token — it's shown once and never stored

That's it. Three containers, all images from GHCR, no build needed.

Build from source (optional)

git clone https://github.com/ufelmann/HiveMem.git
cd HiveMem
docker build -t hivemem .

At startup, Spring Boot runs Flyway migrations against the configured PostgreSQL database. Check progress:

docker logs -f hivemem

Wait for the Spring Boot startup log and a successful /mcp response before proceeding.

Required Environment Variables

VariableDescription
HIVEMEM_JDBC_URLJDBC connection string (e.g. jdbc:postgresql://postgres:5432/hivemem)
HIVEMEM_DB_USERPostgreSQL username
HIVEMEM_DB_PASSWORDPostgreSQL password
HIVEMEM_EMBEDDING_URLURL of the external embeddings service
HIVEMEM_API_TOKENUsed by deploy.sh for the health-check smoke test

Create an API token

Use the hivemem-token CLI (copy it into the container first, see Token management below):

docker cp scripts/hivemem-token hivemem:/usr/local/bin/hivemem-token
docker exec hivemem hivemem-token create my-admin --role admin

The plaintext token is printed once and never stored. Save it immediately.

Connect to Claude Code

CLI (recommended):

claude mcp add --scope user hivemem --transport http http://localhost:8421/mcp \
  --header "Authorization: Bearer YOUR_TOKEN_HERE"

Restart Claude Code. The 30 HiveMem tools are now available in every session.

Manual config (~/.claude.json for user-level, or .mcp.json for project-level):

{
  "mcpServers": {
    "hivemem": {
      "type": "http",
      "url": "http://localhost:8421/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

Connect to Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "hivemem": {
      "type": "http",
      "url": "http://localhost:8421/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

Teach your agent to use HiveMem

The MCP server ships instructions that tell the agent how to use the 30 tools (call wake_up first, optionally pass dedupe_threshold to add_cell for duplicate detection, etc.). But the agent won't reliably remember to archive unless you tell it to in your own CLAUDE.md.

Add this to your user-level CLAUDE.md (~/.claude/CLAUDE.md) so it applies to every project:

## HiveMem — Persistent Knowledge

You have access to HiveMem via MCP. It is your long-term memory. Use it.

### Session start
- Call `hivemem_wake_up` before your first response. No exceptions.
- If the user asks about past work, decisions, or people: `hivemem_search` first, never guess.

### During conversation — search proactively

Wake_up is a snapshot, not a subscription. As the conversation evolves, the relevant memory changes. Search actively when you see these signals:

- **Named reference.** When the user mentions a named project, person, decision, tool, or system that wasn't in wake_up context → call `hivemem_search` BEFORE answering. Even if you think you remember: verify.
- **Temporal reference.** Phrases like "last week", "a while back", "we decided earlier", "remember when" → call `hivemem_search` (optionally with time filter), or `hivemem_time_machine` for point-in-time queries.
- **Uncertainty.** If you are about to say "I'm not sure", "I don't recall exactly", or hedge with vague language → search FIRST. If the search returns nothing, then hedge.
- **Topic drift.** When the conversation shifts to a new topic area not covered in wake_up → quick `hivemem_search` on the new topic keywords before engaging deeply.
- **Entity-specific.** When the user asks about a specific entity (person, project, technology) → `hivemem_quick_facts` for fast entity lookup, `hivemem_search_kg` for relationship queries.

**Anti-patterns (do NOT do this):**
- Answering from wake_up context when the topic wasn't in wake_up
- Hedging instead of searching ("I think we discussed..." without verifying)
- Batching searches for the end of the conversation
- Assuming the user will prompt you to search — they won't

**Rule of thumb.** One `hivemem_search` call is cheap (~100ms, no cost). Answering wrong or vague because you didn't search is expensive (user frustration, broken trust in memory system).

**Examples — good proactive search:**
- User: "What did we decide about the embedding model?" → call `hivemem_search("embedding model decision")` BEFORE answering, then cite the decision with its date.
- User: "Remember that patch last week?" → call `hivemem_search("patch")` with a recent-date filter, or `hivemem_time_machine` for a point-in-time view.
- User: "How does the auth flow work again?" → call `hivemem_quick_facts("auth")` first to pull structured facts, then `hivemem_search` for the design cell.

### During work
- After completing a significant action (bug fix, feature, design decision, deployment, investigation):
  archive it immediately. Do not batch, do not wait for session end.
- Archiving means: `add_cell` with `dedupe_threshold` (one embedding call handles the dedupe gate) → extract facts (`kg_add` with `on_conflict=return` to catch contradictions) → link related cells (`search` → `add_tunnel` for top 2-3 matches).
- When facts change: `kg_invalidate` the old fact first, then `kg_add` the new one.

### Session end
- Before the session ends, archive anything significant that hasn't been stored yet.
- When the user says "archive", "save", or "persist": archive the full session.

### Classification
- Use existing realms and signals. Call `list_realms` before inventing new ones (pass the `realm` param to get signals within a specific realm).
- Realm = major life area, Signal = broad category, Topic = specific topic.
- One cell per topic. Fill ALL layers: content (L0), summary (L1), key_points (L2), insight (L3).
- Every fact needs `valid_from`. Knowledge without timestamps is useless.

### What to archive
- Decisions and their rationale (the "why", not just the "what")
- Discoveries, surprises, lessons learned
- Infrastructure changes, deployment details
- Bug root causes and fixes
- New patterns, conventions, or processes established

### What NOT to archive
- Routine code changes that are obvious from git history
- Temporary debugging steps
- Information already in the project's CLAUDE.md or README

Why user-level? Project-level CLAUDE.md files describe the project. HiveMem is your memory across all projects. A user-level CLAUDE.md ensures every agent, in every repo, knows to persist knowledge — even in repos that have never heard of HiveMem.

Why is the MCP protocol not enough? The MCP instructions field tells the agent how to use the tools correctly (check duplicates, fill all layers, etc.). But it cannot force the agent to decide to archive — that decision depends on the conversation context, which only the CLAUDE.md can influence. The MCP protocol is the "API docs"; the CLAUDE.md is the "job description".

The Structure

HiveMem organizes knowledge in a spatial hierarchy that is easy to navigate. Realms, signals, topics, and cells -- four levels from broad to specific. Tunnels connect cells across the entire structure, revealing hidden relationships in your knowledge.

graph TB
    subgraph HM["HiveMem"]
        direction TB

        subgraph Realm1["Realm: Projects"]
            direction TB
            subgraph Signal1["Signal: Software"]
                direction LR
                subgraph Topic1A["Topic: HiveMem"]
                    D1["Cell<br/><i>L0: content</i><br/><i>L1: summary</i><br/><i>L2: key points</i><br/><i>L3: insight</i>"]
                    D2["Cell"]
                end
                subgraph Topic1B["Topic: Website"]
                    D3["Cell"]
                end
            end
        end

        subgraph Realm2["Realm: Knowledge"]
            direction TB
            subgraph Signal2["Signal: Tech"]
                direction LR
                subgraph Topic2A["Topic: AI"]
                    D5["Cell"]
                    D6["Cell"]
                end
                subgraph Topic2B["Topic: Security"]
                    D7["Cell"]
                end
            end
        end
    end

    D1 <-..->|"builds_on"| D5
    D2 <-..->|"related_to"| D3
    D6 <-..->|"contradicts"| D7

    subgraph KG["Knowledge Graph"]
        F1["Fact<br/><i>subject _ predicate _ object</i><br/><i>valid_from / valid_until</i>"]
    end

    subgraph BP["Blueprint"]
        M1["Narrative overview<br/><i>per realm</i>"]
    end

    D1 -.->|"source"| F1
    Realm1 -.-> M1

    classDef realm fill:#4a90d9,stroke:#2c5f8a,color:white
    classDef signal fill:#5ba85b,stroke:#3d7a3d,color:white
    classDef topic fill:#e8a838,stroke:#b8802a,color:white
    classDef cell fill:#f5f5f5,stroke:#999,color:#333
    classDef kg fill:#c0392b,stroke:#962d22,color:white
    classDef bp fill:#9b59b6,stroke:#7d3c98,color:white
    classDef hm fill:#f0f4f8,stroke:#4a90d9,color:#333

    class Realm1,Realm2 realm
    class Signal1,Signal2 signal
    class Topic1A,Topic1B,Topic2A,Topic2B topic
    class D1,D2,D3,D5,D6,D7 cell
    class KG,F1 kg
    class BP,M1 bp
    class HM hm

Concepts

ConceptDescriptionExample
RealmTop-level category"Projects", "Knowledge", "Cooking"
SignalA signal within a realm"Software", "Italian Cuisine"
TopicA topic within a signal"HiveMem", "Pasta Recipes"
CellSingle knowledge item with 4 layers (L0-L3)A design decision, a recipe, a meeting note
TunnelPassage connecting two cellsbuilds_on, related_to, contradicts, refines
FactAtomic knowledge triple in the knowledge graph"HiveMem → uses → PostgreSQL" with temporal validity
BlueprintNarrative overview of a realmHow signals, topics, and key cells in a realm connect

How it works

  1. Store -- Content is classified into realm/signal/topic and stored as a cell with progressive summarization (L0: full content, L1: summary, L2: key points, L3: insight)
  2. Connect -- Tunnels link related cells across the structure; facts capture atomic relationships in the knowledge graph
  3. Search -- 5-signal ranked search finds cells by meaning, keywords, recency, importance, and popularity
  4. Traverse -- Follow tunnels to discover hidden connections; use time machine to see what was known at any point
  5. Wake up -- Each session starts with identity context and critical facts, like navigating back to your knowledge and remembering where everything is

Architecture

graph TB
    Client["Claude / MCP Client"]

    subgraph Container["Docker Container (eclipse-temurin:25-jre)"]
        Auth["AuthFilter<br/><i>Token auth + role check + rate limit</i>"]
        ToolGate["ToolPermissionService<br/><i>Filter tools/list by role</i>"]
        Identity["Identity Injection<br/><i>created_by from token</i>"]
        MCP["McpController<br/>:8421<br/><i>30 tools, Streamable HTTP</i>"]
    end

    EmbSvc["External Embeddings Service<br/><i>HTTP API</i>"]
    PG["External PostgreSQL<br/><i>pgvector, Flyway-managed schema</i>"]

    Client -->|"MCP over HTTP"| Auth
    Auth --> ToolGate
    ToolGate --> Identity
    Identity --> MCP
    MCP -->|"HTTP"| EmbSvc
    MCP -->|"JDBC"| PG

Data Model

erDiagram
    cells {
        UUID id PK
        UUID parent_id FK
        TEXT content
        vector embedding
        TEXT realm
        TEXT signal
        TEXT topic
        TEXT summary
        TEXT[] key_points
        TEXT insight
        TEXT actionability
        SMALLINT importance
        TEXT status
        TIMESTAMPTZ valid_from
        TIMESTAMPTZ valid_until
    }
    facts {
        UUID id PK
        UUID parent_id FK
        TEXT subject
        TEXT predicate
        TEXT object
        REAL confidence
        UUID source_id FK
        TEXT status
        TIMESTAMPTZ valid_from
        TIMESTAMPTZ valid_until
    }
    tunnels {
        UUID id PK
        UUID from_cell FK
        UUID to_cell FK
        TEXT relation
        TEXT note
        TEXT status
        TEXT created_by
        TIMESTAMPTZ valid_from
        TIMESTAMPTZ valid_until
    }
    blueprints {
        UUID id PK
        TEXT realm
        TEXT title
        TEXT narrative
        TEXT[] signal_order
        UUID[] key_cells
        TIMESTAMPTZ valid_from
        TIMESTAMPTZ valid_until
    }
    api_tokens {
        UUID id PK
        TEXT token_hash
        TEXT name
        TEXT role
        TIMESTAMPTZ expires_at
        TIMESTAMPTZ revoked_at
    }
    agents {
        TEXT name PK
        TEXT focus
        JSONB autonomy
        TEXT schedule
    }
    references_ {
        UUID id PK
        TEXT title
        TEXT url
        TEXT ref_type
        TEXT status
        SMALLINT importance
    }

    cells ||--o{ facts : "source_id"
    cells ||--o{ cells : "parent_id (revision chain)"
    facts ||--o{ facts : "parent_id (revision chain)"
    cells ||--o{ cell_references : "links"
    references_ ||--o{ cell_references : "links"
    agents ||--o{ agent_diary : "writes"
    cells ||--o{ access_log : "tracked"

Security & Capability Matrix

Every HiveMem tool is mapped to a specific role to ensure least privilege. Write operations (excluding agents) and admin functions are protected by RBAC.

CategoryToolsAccess RoleData FlowHITL Required?Description
Searchsearch, search_kg, quick_facts, time_machinereaderRead OnlyNo5-signal semantic & keyword search.
Readstatus, get_cell, list_realms, traverse, wake_up, get_blueprint, historyreaderRead OnlyNoNavigation and context retrieval.
Writeadd_cell, kg_add, kg_invalidate, revise_cell, revise_fact, update_identity, update_blueprintagentPropose ChangeYes (for Agents)Append-only knowledge capture.
Tunnelsadd_tunnel, remove_tunnelagentLink DiscoveryYesCell-to-cell semantic linking.
Approvalapprove_pendingadminCommit ChangeYesBatch approve or reject pending agent writes.
Agentregister_agent, list_agents, diary_write, diary_readadminFleet ManagementYesAutonomous fleet orchestration.
Referencesadd_reference, link_reference, reading_listagentMetadataNoSource and citation tracking.
AdminhealthadminSystem ManagementYesDB connection, extensions, counts, disk.

Configuration

VariableDefaultDescription
HIVEMEM_JDBC_URL(required)JDBC connection string to PostgreSQL
HIVEMEM_DB_USER(required)PostgreSQL username
HIVEMEM_DB_PASSWORD(required)PostgreSQL password
HIVEMEM_EMBEDDING_URLhttp://localhost:8081URL of the external embeddings service
HIVEMEM_EMBEDDING_TIMEOUTPT5SHTTP timeout for embedding requests (ISO 8601 duration)
SERVER_PORT8421Port for the MCP server

Security & Compliance

  • SafeSkill Score: 100/100 (Verified Safe). See SafeSkill Report.
  • Transparency: 7/7 points. See SAFE.md for the security manifest.
  • Audit Logging: Every tool call is logged in JSON to /data/audit.log.
  • Human-in-the-Loop: All agent writes require manual approval via hivemem_approve_pending.

Tool List (Full)

Read (15):

  1. hivemem_status: System overview and counts.
  2. hivemem_search: Semantic similarity + keyword search.
  3. hivemem_search_kg: Knowledge graph triple lookup.
  4. hivemem_get_cell: Read single knowledge item (logs access automatically).
  5. hivemem_list_realms: Realms with counts; signals of one realm when realm is provided.
  6. hivemem_traverse: Recursive graph traversal.
  7. hivemem_quick_facts: Context-aware facts about an entity.
  8. hivemem_time_machine: Historical knowledge retrieval.
  9. hivemem_wake_up: Initial session context.
  10. hivemem_history: Trace revisions of a cell or fact (type-dispatched, recursive CTE depth cap 100).
  11. hivemem_pending_approvals: List work awaiting review.
  12. hivemem_get_blueprint: Narrative realm overviews.
  13. hivemem_reading_list: Manage unread/in-progress sources.
  14. hivemem_list_agents: View active agent fleet.
  15. hivemem_diary_read: Read agent diary entries.

Write (13):

  1. hivemem_add_cell: Store with L0-L3; optional dedupe_threshold runs an embedding-based dedupe gate in one call.
  2. hivemem_add_tunnel: Link two cells together.
  3. hivemem_kg_add: Fact triple; optional on_conflict (insert|return|reject) gates against active conflicts.
  4. hivemem_kg_invalidate: Soft-delete/expire a fact.
  5. hivemem_update_identity: Update session context facts.
  6. hivemem_add_reference: Store source documents/URLs.
  7. hivemem_link_reference: Cite source for a cell.
  8. hivemem_remove_tunnel: Expire a cell link.
  9. hivemem_revise_cell: Create a new version of a cell.
  10. hivemem_revise_fact: Create a new version of a fact.
  11. hivemem_register_agent: Add an agent to the fleet.
  12. hivemem_diary_write: Agent-private reflection tool.
  13. hivemem_update_blueprint: Update realm narrative.

Admin (2):

  1. hivemem_approve_pending: Admin tool to batch approve or reject agent writes.
  2. hivemem_health: Monitor DB and service state.

Search Signals

The hivemem_search tool combines 5 signals with configurable weights:

SignalDefault WeightDescription
Semantic0.35Vector cosine similarity
Keyword0.15PostgreSQL full-text search (tsvector, BM25-like)
Recency0.20Exponential decay, 90-day half-life
Importance0.15User/agent assigned 1-5 scale
Popularity0.15Access frequency (materialized view)

Progressive Summarization

Every cell supports 4 layers of progressive summarization:

LayerFieldPurpose
L0contentFull verbatim text
L1summaryOne-sentence summary for scanning
L2key_points3-5 core takeaways
L3insightPersonal conclusion / implication

Plus actionability (actionable / reference / someday / archive) and importance (1-5).

Authentication & Authorization

Tokens are stored as SHA-256 hashes in PostgreSQL. The plaintext is shown exactly once at creation and never stored. Auth responses are cached with Caffeine (60s TTL, max 1000 entries).

Roles

Each token has one of four roles. The role controls which tools the client sees in tools/list and which it can call.

RoleVisible toolsWrite behaviorCan approve?
adminAll 30status: committedYes
writer28 (no admin tools)status: committedNo
reader15 (read only)Can't writeNo
agent28 (same as writer)status: pendingNo

The agent role is the key constraint: agents can add knowledge, but every write goes into a pending queue. Only an admin can approve or reject it. This prevents any agent from writing and self-approving in the same session.

created_by is set automatically from the token name. Clients can't override it.

Token management

The hivemem-token CLI is included in the Docker image:

docker exec hivemem hivemem-token create <name> --role admin|writer|reader|agent [--expires 90d]

Available commands (when the script is available):

hivemem-token create <name> --role admin|writer|reader|agent [--expires 90d]
hivemem-token list
hivemem-token revoke <name>
hivemem-token info <name>

Security details

  • Rate limiting -- 5 failed auth attempts per IP triggers a 15-minute ban
  • Audit log -- every request logged to /data/audit.log
  • Timing-safe -- token comparison uses SHA-256 hash lookup, not string comparison
  • Path traversal protection -- file import restricted to /data/imports and /tmp
  • Tool call enforcement -- tools/call checked against role permissions, not just tools/list filtering

Backups

The hivemem-backup script is included in the Docker image. It is also called automatically before embedding reencoding.

# Manual backup (adjust container name if needed)
docker exec hivemem-db pg_dump -U hivemem hivemem | gzip > "hivemem-$(date +%Y%m%d).sql.gz"

To automate daily backups:

# crontab -e
45 1 * * * docker exec hivemem-db pg_dump -U hivemem hivemem | gzip > /path/to/backups/hivemem-$(date +\%Y\%m\%d).sql.gz

LXC/Proxmox users: Schedule a vzdump at 02:00 to capture the full container including the database dumps. This gives you both logical (pg_dump) and physical (filesystem) backup coverage.

Development

Run tests (no deployment needed)

Tests use Testcontainers -- a pgvector/pgvector:pg17 container is started and destroyed per session. Embeddings are stubbed with a fixed test client (deterministic vectors, no external service needed).

cd java-server
mvn test
264 tests passed

Deploy changes

# Set required env vars first:
export HIVEMEM_JDBC_URL=jdbc:postgresql://postgres:5432/hivemem
export HIVEMEM_DB_USER=hivemem
export HIVEMEM_DB_PASSWORD=secret
export HIVEMEM_EMBEDDING_URL=http://embeddings:8081
export HIVEMEM_API_TOKEN=your-admin-token

./deploy.sh java

The script builds the Docker image, restarts the container, and waits for a successful health check on /mcp.

Migrations

Schema changes are managed by Flyway. Migrations run automatically at Spring Boot application startup.

Migration files live in java-server/src/main/resources/db/migration/ using the Flyway naming convention (V0001__description.sql, V0002__description.sql, etc.).

To add a new migration:

cat > java-server/src/main/resources/db/migration/V0009__my_feature.sql << 'EOF'
CREATE TABLE IF NOT EXISTS my_table (...);
EOF

Deploy the application -- Flyway applies pending migrations on startup.

Debugging

docker logs hivemem --tail 50  # Container logs

License

HiveMem is fair-code licensed under the Sustainable Use License.

  • Free for personal use and internal business use
  • Source available -- inspect, modify, learn
  • Commercially restricted -- you can't sell HiveMem as a service

See LICENSING.md for plain-English details and examples.