HiveMem
HiveMem is a personal knowledge management system that enhances cognition through semantic search and a temporal knowledge graph.
HiveMem
<img width="1637" height="811" alt="image" src="https://github.com/user-attachments/assets/b9ceda91-0678-4d9b-bae8-2b5ba69d53d4" />Personal knowledge system with semantic search, temporal knowledge graph, and progressive summarization.
MCP server backed by PostgreSQL (pgvector) with external embeddings service. 30 tools, append-only versioning, role-based token auth, agent fleet with approval workflow.
Docker image: ghcr.io/ufelmann/hivemem:main
Vision & Research
HiveMem is built on the premise that well-structured external knowledge systems are not just storage -- they extend cognition. Every design decision is grounded in research on how humans process, retain, and retrieve information.
Scientific Foundations
| Theory | Key Insight | HiveMem Consequence |
|---|---|---|
| Working Memory Limitation (Cowan, 2001) | Humans hold ~4 items in working memory | Wake-up context delivers max 15-20 items, prioritized by importance |
| Cognitive Load Theory (Sweller, 1988) | Disorganized information wastes mental resources needed for thinking | Realms/Signals/Topics taxonomy, Blueprints, progressive summarization |
| Extended Mind Thesis (Clark & Chalmers, 1998) | Well-used external tools become genuine extensions of cognition | Proactive capturing, graph traversal for hidden connections, synthesis agents |
| Forgetting Curve (Ebbinghaus, 1885) | 90% of learned information is lost within a week | Immediate capture at session end, proactive storage of decisions |
PKM Frameworks
Zettelkasten (Luhmann) -- Atomic notes + linking. Knowledge emerges from connections, not hierarchies. Luhmann produced 70 books and 400 papers from 90,000 linked notes.
What HiveMem adopts: Atomic cells (one topic per cell), knowledge graph as linking (facts), cell-to-cell tunnels with temporal versioning (related_to, builds_on, contradicts, refines). What HiveMem does differently: Semi-automatic linking -- LLM agents create tunnels after archiving based on semantic search. Bidirectional traversal. Temporal validity -- notes and tunnels can expire.
PARA (Tiago Forte) -- Projects / Areas / Resources / Archive. Sorted by actionability, not topic.
What HiveMem adopts: Actionability field (actionable / reference / someday / archive). Wake-up prioritizes actionable over reference. Realms map to Areas.
References
- Cowan, N. (2001). The magical number 4 in short-term memory. Behavioral and Brain Sciences, 24(1), 87-114.
- Sweller, J. (1988). Cognitive Load During Problem Solving. Cognitive Science, 12(2), 257-285.
- Clark, A. & Chalmers, D. (1998). The Extended Mind. Analysis, 58(1), 7-19.
- Ebbinghaus, H. (1885). Uber das Gedachtnis.
- Ahrens, S. (2017). How to Take Smart Notes. CreateSpace.
- Forte, T. (2022). Building a Second Brain. Atria Books.
Transparency & Trust
- Privacy First: HiveMem is 100% self-hosted. Your data never leaves your infrastructure.
- Auditability: All tool calls and authentication events are logged to
/data/audit.log. - Security: Built-in RBAC (Role-Based Access Control) ensures that agents can only perform actions you approve.
Features
- 30 MCP tools across search, knowledge graph, progressive summarization, agent fleet, references, and admin
- 5-signal ranked search -- semantic similarity + keyword match + recency + importance + popularity
- Append-only versioning -- never lose history, revise with parent_id chains, point-in-time queries
- Progressive summarization (L0-L3) -- content, summary, key_points, insight per cell
- Temporal knowledge graph -- facts with valid_from/valid_until, contradiction detection, multi-hop traversal
- Role-based token auth -- multiple tokens, 4 roles (admin/writer/reader/agent), per-role tool visibility
- Agent fleet with approval workflow -- agents write pending suggestions, only admins approve
- Blueprints -- curated narrative overviews per realm, append-only versioned
- References & reading list -- track sources, link to cells, filter by type/status
- Spring Boot 4.0.5 + Java 25 -- MCP server with jOOQ, Flyway migrations, Caffeine cache
- Automatic embedding reencoding -- detects model changes at startup, re-encodes all vectors with backup and progress tracking
- 264 tests with Testcontainers -- unit, integration, HTTP end-to-end, performance, security, concurrency
Prerequisites
- Docker (v20+)
- An external PostgreSQL database with pgvector extension (e.g.
pgvector/pgvector:pg17) - An external embeddings service reachable via HTTP (see below)
Proxmox LXC users: Docker containers running JDK 25 inside unprivileged LXC containers require --security-opt apparmor=unconfined (or security_opt: [apparmor=unconfined] in Compose). This applies to all services, not just HiveMem.
Embedding Service
HiveMem requires an external embedding service. The default model is paraphrase-multilingual-MiniLM-L12-v2 (384 dimensions). An ONNX-based service is included in embedding-service/.
The service must expose:
POST /embeddings—{"text": "...", "mode": "document"}→{"vector": [...], "model": "...", "dimension": N}GET /info—{"model": "...", "dimension": N}(used by HiveMem for model change detection)
Automatic reencoding: When HiveMem detects a model change at startup (different model name or dimension), it automatically backs up the database, re-encodes all cells, and rebuilds the HNSW index. Search is blocked (503) during reencoding.
To build the embedding service:
cd embedding-service
# You need model files (tokenizer.json + model_quantized.onnx) in slim-model/
docker build -t hivemem-embeddings .
Quick Start
No clone needed. Save this as docker-compose.yml and run docker compose up -d:
services:
hivemem-db:
image: pgvector/pgvector:pg17
container_name: hivemem-db
environment:
POSTGRES_DB: hivemem
POSTGRES_USER: hivemem
POSTGRES_PASSWORD: ${HIVEMEM_DB_PASSWORD:-changeme}
volumes:
- hivemem-pgdata:/var/lib/postgresql/data
networks:
- hivemem-net
restart: unless-stopped
hivemem-embeddings:
image: ghcr.io/ufelmann/hivemem-embeddings:main
container_name: hivemem-embeddings
networks:
- hivemem-net
restart: unless-stopped
hivemem:
image: ghcr.io/ufelmann/hivemem:main
container_name: hivemem
ports:
- "8421:8421"
environment:
HIVEMEM_JDBC_URL: jdbc:postgresql://hivemem-db:5432/hivemem
HIVEMEM_DB_USER: hivemem
HIVEMEM_DB_PASSWORD: ${HIVEMEM_DB_PASSWORD:-changeme}
HIVEMEM_EMBEDDING_URL: http://hivemem-embeddings:80
depends_on:
- hivemem-db
- hivemem-embeddings
networks:
- hivemem-net
restart: unless-stopped
networks:
hivemem-net:
volumes:
hivemem-pgdata:
# Set a password (or it defaults to "changeme")
export HIVEMEM_DB_PASSWORD=your-secret-here
# Start everything
docker compose up -d
# Wait for startup (Flyway migrations run automatically)
docker logs -f hivemem
# Create your first API token
docker exec hivemem hivemem-token create my-admin --role admin
# Save the printed token — it's shown once and never stored
That's it. Three containers, all images from GHCR, no build needed.
Build from source (optional)
git clone https://github.com/ufelmann/HiveMem.git
cd HiveMem
docker build -t hivemem .
At startup, Spring Boot runs Flyway migrations against the configured PostgreSQL database. Check progress:
docker logs -f hivemem
Wait for the Spring Boot startup log and a successful /mcp response before proceeding.
Required Environment Variables
| Variable | Description |
|---|---|
HIVEMEM_JDBC_URL | JDBC connection string (e.g. jdbc:postgresql://postgres:5432/hivemem) |
HIVEMEM_DB_USER | PostgreSQL username |
HIVEMEM_DB_PASSWORD | PostgreSQL password |
HIVEMEM_EMBEDDING_URL | URL of the external embeddings service |
HIVEMEM_API_TOKEN | Used by deploy.sh for the health-check smoke test |
Create an API token
Use the hivemem-token CLI (copy it into the container first, see Token management below):
docker cp scripts/hivemem-token hivemem:/usr/local/bin/hivemem-token
docker exec hivemem hivemem-token create my-admin --role admin
The plaintext token is printed once and never stored. Save it immediately.
Connect to Claude Code
CLI (recommended):
claude mcp add --scope user hivemem --transport http http://localhost:8421/mcp \
--header "Authorization: Bearer YOUR_TOKEN_HERE"
Restart Claude Code. The 30 HiveMem tools are now available in every session.
Manual config (~/.claude.json for user-level, or .mcp.json for project-level):
{
"mcpServers": {
"hivemem": {
"type": "http",
"url": "http://localhost:8421/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
Connect to Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"hivemem": {
"type": "http",
"url": "http://localhost:8421/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN_HERE"
}
}
}
}
Teach your agent to use HiveMem
The MCP server ships instructions that tell the agent how to use the 30 tools (call wake_up first, optionally pass dedupe_threshold to add_cell for duplicate detection, etc.). But the agent won't reliably remember to archive unless you tell it to in your own CLAUDE.md.
Add this to your user-level CLAUDE.md (~/.claude/CLAUDE.md) so it applies to every project:
## HiveMem — Persistent Knowledge
You have access to HiveMem via MCP. It is your long-term memory. Use it.
### Session start
- Call `hivemem_wake_up` before your first response. No exceptions.
- If the user asks about past work, decisions, or people: `hivemem_search` first, never guess.
### During conversation — search proactively
Wake_up is a snapshot, not a subscription. As the conversation evolves, the relevant memory changes. Search actively when you see these signals:
- **Named reference.** When the user mentions a named project, person, decision, tool, or system that wasn't in wake_up context → call `hivemem_search` BEFORE answering. Even if you think you remember: verify.
- **Temporal reference.** Phrases like "last week", "a while back", "we decided earlier", "remember when" → call `hivemem_search` (optionally with time filter), or `hivemem_time_machine` for point-in-time queries.
- **Uncertainty.** If you are about to say "I'm not sure", "I don't recall exactly", or hedge with vague language → search FIRST. If the search returns nothing, then hedge.
- **Topic drift.** When the conversation shifts to a new topic area not covered in wake_up → quick `hivemem_search` on the new topic keywords before engaging deeply.
- **Entity-specific.** When the user asks about a specific entity (person, project, technology) → `hivemem_quick_facts` for fast entity lookup, `hivemem_search_kg` for relationship queries.
**Anti-patterns (do NOT do this):**
- Answering from wake_up context when the topic wasn't in wake_up
- Hedging instead of searching ("I think we discussed..." without verifying)
- Batching searches for the end of the conversation
- Assuming the user will prompt you to search — they won't
**Rule of thumb.** One `hivemem_search` call is cheap (~100ms, no cost). Answering wrong or vague because you didn't search is expensive (user frustration, broken trust in memory system).
**Examples — good proactive search:**
- User: "What did we decide about the embedding model?" → call `hivemem_search("embedding model decision")` BEFORE answering, then cite the decision with its date.
- User: "Remember that patch last week?" → call `hivemem_search("patch")` with a recent-date filter, or `hivemem_time_machine` for a point-in-time view.
- User: "How does the auth flow work again?" → call `hivemem_quick_facts("auth")` first to pull structured facts, then `hivemem_search` for the design cell.
### During work
- After completing a significant action (bug fix, feature, design decision, deployment, investigation):
archive it immediately. Do not batch, do not wait for session end.
- Archiving means: `add_cell` with `dedupe_threshold` (one embedding call handles the dedupe gate) → extract facts (`kg_add` with `on_conflict=return` to catch contradictions) → link related cells (`search` → `add_tunnel` for top 2-3 matches).
- When facts change: `kg_invalidate` the old fact first, then `kg_add` the new one.
### Session end
- Before the session ends, archive anything significant that hasn't been stored yet.
- When the user says "archive", "save", or "persist": archive the full session.
### Classification
- Use existing realms and signals. Call `list_realms` before inventing new ones (pass the `realm` param to get signals within a specific realm).
- Realm = major life area, Signal = broad category, Topic = specific topic.
- One cell per topic. Fill ALL layers: content (L0), summary (L1), key_points (L2), insight (L3).
- Every fact needs `valid_from`. Knowledge without timestamps is useless.
### What to archive
- Decisions and their rationale (the "why", not just the "what")
- Discoveries, surprises, lessons learned
- Infrastructure changes, deployment details
- Bug root causes and fixes
- New patterns, conventions, or processes established
### What NOT to archive
- Routine code changes that are obvious from git history
- Temporary debugging steps
- Information already in the project's CLAUDE.md or README
Why user-level? Project-level CLAUDE.md files describe the project. HiveMem is your memory across all projects. A user-level CLAUDE.md ensures every agent, in every repo, knows to persist knowledge — even in repos that have never heard of HiveMem.
Why is the MCP protocol not enough? The MCP instructions field tells the agent how to use the tools correctly (check duplicates, fill all layers, etc.). But it cannot force the agent to decide to archive — that decision depends on the conversation context, which only the CLAUDE.md can influence. The MCP protocol is the "API docs"; the CLAUDE.md is the "job description".
The Structure
HiveMem organizes knowledge in a spatial hierarchy that is easy to navigate. Realms, signals, topics, and cells -- four levels from broad to specific. Tunnels connect cells across the entire structure, revealing hidden relationships in your knowledge.
graph TB
subgraph HM["HiveMem"]
direction TB
subgraph Realm1["Realm: Projects"]
direction TB
subgraph Signal1["Signal: Software"]
direction LR
subgraph Topic1A["Topic: HiveMem"]
D1["Cell<br/><i>L0: content</i><br/><i>L1: summary</i><br/><i>L2: key points</i><br/><i>L3: insight</i>"]
D2["Cell"]
end
subgraph Topic1B["Topic: Website"]
D3["Cell"]
end
end
end
subgraph Realm2["Realm: Knowledge"]
direction TB
subgraph Signal2["Signal: Tech"]
direction LR
subgraph Topic2A["Topic: AI"]
D5["Cell"]
D6["Cell"]
end
subgraph Topic2B["Topic: Security"]
D7["Cell"]
end
end
end
end
D1 <-..->|"builds_on"| D5
D2 <-..->|"related_to"| D3
D6 <-..->|"contradicts"| D7
subgraph KG["Knowledge Graph"]
F1["Fact<br/><i>subject _ predicate _ object</i><br/><i>valid_from / valid_until</i>"]
end
subgraph BP["Blueprint"]
M1["Narrative overview<br/><i>per realm</i>"]
end
D1 -.->|"source"| F1
Realm1 -.-> M1
classDef realm fill:#4a90d9,stroke:#2c5f8a,color:white
classDef signal fill:#5ba85b,stroke:#3d7a3d,color:white
classDef topic fill:#e8a838,stroke:#b8802a,color:white
classDef cell fill:#f5f5f5,stroke:#999,color:#333
classDef kg fill:#c0392b,stroke:#962d22,color:white
classDef bp fill:#9b59b6,stroke:#7d3c98,color:white
classDef hm fill:#f0f4f8,stroke:#4a90d9,color:#333
class Realm1,Realm2 realm
class Signal1,Signal2 signal
class Topic1A,Topic1B,Topic2A,Topic2B topic
class D1,D2,D3,D5,D6,D7 cell
class KG,F1 kg
class BP,M1 bp
class HM hm
Concepts
| Concept | Description | Example |
|---|---|---|
| Realm | Top-level category | "Projects", "Knowledge", "Cooking" |
| Signal | A signal within a realm | "Software", "Italian Cuisine" |
| Topic | A topic within a signal | "HiveMem", "Pasta Recipes" |
| Cell | Single knowledge item with 4 layers (L0-L3) | A design decision, a recipe, a meeting note |
| Tunnel | Passage connecting two cells | builds_on, related_to, contradicts, refines |
| Fact | Atomic knowledge triple in the knowledge graph | "HiveMem → uses → PostgreSQL" with temporal validity |
| Blueprint | Narrative overview of a realm | How signals, topics, and key cells in a realm connect |
How it works
- Store -- Content is classified into realm/signal/topic and stored as a cell with progressive summarization (L0: full content, L1: summary, L2: key points, L3: insight)
- Connect -- Tunnels link related cells across the structure; facts capture atomic relationships in the knowledge graph
- Search -- 5-signal ranked search finds cells by meaning, keywords, recency, importance, and popularity
- Traverse -- Follow tunnels to discover hidden connections; use time machine to see what was known at any point
- Wake up -- Each session starts with identity context and critical facts, like navigating back to your knowledge and remembering where everything is
Architecture
graph TB
Client["Claude / MCP Client"]
subgraph Container["Docker Container (eclipse-temurin:25-jre)"]
Auth["AuthFilter<br/><i>Token auth + role check + rate limit</i>"]
ToolGate["ToolPermissionService<br/><i>Filter tools/list by role</i>"]
Identity["Identity Injection<br/><i>created_by from token</i>"]
MCP["McpController<br/>:8421<br/><i>30 tools, Streamable HTTP</i>"]
end
EmbSvc["External Embeddings Service<br/><i>HTTP API</i>"]
PG["External PostgreSQL<br/><i>pgvector, Flyway-managed schema</i>"]
Client -->|"MCP over HTTP"| Auth
Auth --> ToolGate
ToolGate --> Identity
Identity --> MCP
MCP -->|"HTTP"| EmbSvc
MCP -->|"JDBC"| PG
Data Model
erDiagram
cells {
UUID id PK
UUID parent_id FK
TEXT content
vector embedding
TEXT realm
TEXT signal
TEXT topic
TEXT summary
TEXT[] key_points
TEXT insight
TEXT actionability
SMALLINT importance
TEXT status
TIMESTAMPTZ valid_from
TIMESTAMPTZ valid_until
}
facts {
UUID id PK
UUID parent_id FK
TEXT subject
TEXT predicate
TEXT object
REAL confidence
UUID source_id FK
TEXT status
TIMESTAMPTZ valid_from
TIMESTAMPTZ valid_until
}
tunnels {
UUID id PK
UUID from_cell FK
UUID to_cell FK
TEXT relation
TEXT note
TEXT status
TEXT created_by
TIMESTAMPTZ valid_from
TIMESTAMPTZ valid_until
}
blueprints {
UUID id PK
TEXT realm
TEXT title
TEXT narrative
TEXT[] signal_order
UUID[] key_cells
TIMESTAMPTZ valid_from
TIMESTAMPTZ valid_until
}
api_tokens {
UUID id PK
TEXT token_hash
TEXT name
TEXT role
TIMESTAMPTZ expires_at
TIMESTAMPTZ revoked_at
}
agents {
TEXT name PK
TEXT focus
JSONB autonomy
TEXT schedule
}
references_ {
UUID id PK
TEXT title
TEXT url
TEXT ref_type
TEXT status
SMALLINT importance
}
cells ||--o{ facts : "source_id"
cells ||--o{ cells : "parent_id (revision chain)"
facts ||--o{ facts : "parent_id (revision chain)"
cells ||--o{ cell_references : "links"
references_ ||--o{ cell_references : "links"
agents ||--o{ agent_diary : "writes"
cells ||--o{ access_log : "tracked"
Security & Capability Matrix
Every HiveMem tool is mapped to a specific role to ensure least privilege. Write operations (excluding agents) and admin functions are protected by RBAC.
| Category | Tools | Access Role | Data Flow | HITL Required? | Description |
|---|---|---|---|---|---|
| Search | search, search_kg, quick_facts, time_machine | reader | Read Only | No | 5-signal semantic & keyword search. |
| Read | status, get_cell, list_realms, traverse, wake_up, get_blueprint, history | reader | Read Only | No | Navigation and context retrieval. |
| Write | add_cell, kg_add, kg_invalidate, revise_cell, revise_fact, update_identity, update_blueprint | agent | Propose Change | Yes (for Agents) | Append-only knowledge capture. |
| Tunnels | add_tunnel, remove_tunnel | agent | Link Discovery | Yes | Cell-to-cell semantic linking. |
| Approval | approve_pending | admin | Commit Change | Yes | Batch approve or reject pending agent writes. |
| Agent | register_agent, list_agents, diary_write, diary_read | admin | Fleet Management | Yes | Autonomous fleet orchestration. |
| References | add_reference, link_reference, reading_list | agent | Metadata | No | Source and citation tracking. |
| Admin | health | admin | System Management | Yes | DB connection, extensions, counts, disk. |
Configuration
| Variable | Default | Description |
|---|---|---|
HIVEMEM_JDBC_URL | (required) | JDBC connection string to PostgreSQL |
HIVEMEM_DB_USER | (required) | PostgreSQL username |
HIVEMEM_DB_PASSWORD | (required) | PostgreSQL password |
HIVEMEM_EMBEDDING_URL | http://localhost:8081 | URL of the external embeddings service |
HIVEMEM_EMBEDDING_TIMEOUT | PT5S | HTTP timeout for embedding requests (ISO 8601 duration) |
SERVER_PORT | 8421 | Port for the MCP server |
Security & Compliance
- SafeSkill Score: 100/100 (Verified Safe). See SafeSkill Report.
- Transparency: 7/7 points. See SAFE.md for the security manifest.
- Audit Logging: Every tool call is logged in JSON to
/data/audit.log. - Human-in-the-Loop: All agent writes require manual approval via
hivemem_approve_pending.
Tool List (Full)
Read (15):
hivemem_status: System overview and counts.hivemem_search: Semantic similarity + keyword search.hivemem_search_kg: Knowledge graph triple lookup.hivemem_get_cell: Read single knowledge item (logs access automatically).hivemem_list_realms: Realms with counts; signals of one realm whenrealmis provided.hivemem_traverse: Recursive graph traversal.hivemem_quick_facts: Context-aware facts about an entity.hivemem_time_machine: Historical knowledge retrieval.hivemem_wake_up: Initial session context.hivemem_history: Trace revisions of a cell or fact (type-dispatched, recursive CTE depth cap 100).hivemem_pending_approvals: List work awaiting review.hivemem_get_blueprint: Narrative realm overviews.hivemem_reading_list: Manage unread/in-progress sources.hivemem_list_agents: View active agent fleet.hivemem_diary_read: Read agent diary entries.
Write (13):
hivemem_add_cell: Store with L0-L3; optionaldedupe_thresholdruns an embedding-based dedupe gate in one call.hivemem_add_tunnel: Link two cells together.hivemem_kg_add: Fact triple; optionalon_conflict(insert|return|reject) gates against active conflicts.hivemem_kg_invalidate: Soft-delete/expire a fact.hivemem_update_identity: Update session context facts.hivemem_add_reference: Store source documents/URLs.hivemem_link_reference: Cite source for a cell.hivemem_remove_tunnel: Expire a cell link.hivemem_revise_cell: Create a new version of a cell.hivemem_revise_fact: Create a new version of a fact.hivemem_register_agent: Add an agent to the fleet.hivemem_diary_write: Agent-private reflection tool.hivemem_update_blueprint: Update realm narrative.
Admin (2):
hivemem_approve_pending: Admin tool to batch approve or reject agent writes.hivemem_health: Monitor DB and service state.
Search Signals
The hivemem_search tool combines 5 signals with configurable weights:
| Signal | Default Weight | Description |
|---|---|---|
| Semantic | 0.35 | Vector cosine similarity |
| Keyword | 0.15 | PostgreSQL full-text search (tsvector, BM25-like) |
| Recency | 0.20 | Exponential decay, 90-day half-life |
| Importance | 0.15 | User/agent assigned 1-5 scale |
| Popularity | 0.15 | Access frequency (materialized view) |
Progressive Summarization
Every cell supports 4 layers of progressive summarization:
| Layer | Field | Purpose |
|---|---|---|
| L0 | content | Full verbatim text |
| L1 | summary | One-sentence summary for scanning |
| L2 | key_points | 3-5 core takeaways |
| L3 | insight | Personal conclusion / implication |
Plus actionability (actionable / reference / someday / archive) and importance (1-5).
Authentication & Authorization
Tokens are stored as SHA-256 hashes in PostgreSQL. The plaintext is shown exactly once at creation and never stored. Auth responses are cached with Caffeine (60s TTL, max 1000 entries).
Roles
Each token has one of four roles. The role controls which tools the client sees in tools/list and which it can call.
| Role | Visible tools | Write behavior | Can approve? |
|---|---|---|---|
admin | All 30 | status: committed | Yes |
writer | 28 (no admin tools) | status: committed | No |
reader | 15 (read only) | Can't write | No |
agent | 28 (same as writer) | status: pending | No |
The agent role is the key constraint: agents can add knowledge, but every write goes into a pending queue. Only an admin can approve or reject it. This prevents any agent from writing and self-approving in the same session.
created_by is set automatically from the token name. Clients can't override it.
Token management
The hivemem-token CLI is included in the Docker image:
docker exec hivemem hivemem-token create <name> --role admin|writer|reader|agent [--expires 90d]
Available commands (when the script is available):
hivemem-token create <name> --role admin|writer|reader|agent [--expires 90d]
hivemem-token list
hivemem-token revoke <name>
hivemem-token info <name>
Security details
- Rate limiting -- 5 failed auth attempts per IP triggers a 15-minute ban
- Audit log -- every request logged to
/data/audit.log - Timing-safe -- token comparison uses SHA-256 hash lookup, not string comparison
- Path traversal protection -- file import restricted to
/data/importsand/tmp - Tool call enforcement --
tools/callchecked against role permissions, not justtools/listfiltering
Backups
The hivemem-backup script is included in the Docker image. It is also called automatically before embedding reencoding.
# Manual backup (adjust container name if needed)
docker exec hivemem-db pg_dump -U hivemem hivemem | gzip > "hivemem-$(date +%Y%m%d).sql.gz"
To automate daily backups:
# crontab -e
45 1 * * * docker exec hivemem-db pg_dump -U hivemem hivemem | gzip > /path/to/backups/hivemem-$(date +\%Y\%m\%d).sql.gz
LXC/Proxmox users: Schedule a vzdump at 02:00 to capture the full container including the database dumps. This gives you both logical (pg_dump) and physical (filesystem) backup coverage.
Development
Run tests (no deployment needed)
Tests use Testcontainers -- a pgvector/pgvector:pg17 container is started and destroyed per session. Embeddings are stubbed with a fixed test client (deterministic vectors, no external service needed).
cd java-server
mvn test
264 tests passed
Deploy changes
# Set required env vars first:
export HIVEMEM_JDBC_URL=jdbc:postgresql://postgres:5432/hivemem
export HIVEMEM_DB_USER=hivemem
export HIVEMEM_DB_PASSWORD=secret
export HIVEMEM_EMBEDDING_URL=http://embeddings:8081
export HIVEMEM_API_TOKEN=your-admin-token
./deploy.sh java
The script builds the Docker image, restarts the container, and waits for a successful health check on /mcp.
Migrations
Schema changes are managed by Flyway. Migrations run automatically at Spring Boot application startup.
Migration files live in java-server/src/main/resources/db/migration/ using the Flyway naming convention (V0001__description.sql, V0002__description.sql, etc.).
To add a new migration:
cat > java-server/src/main/resources/db/migration/V0009__my_feature.sql << 'EOF'
CREATE TABLE IF NOT EXISTS my_table (...);
EOF
Deploy the application -- Flyway applies pending migrations on startup.
Debugging
docker logs hivemem --tail 50 # Container logs
License
HiveMem is fair-code licensed under the Sustainable Use License.
- Free for personal use and internal business use
- Source available -- inspect, modify, learn
- Commercially restricted -- you can't sell HiveMem as a service
See LICENSING.md for plain-English details and examples.