Neo4j Integration Guide
Neo4j is the heart of our episodic memory system, storing conversation history, user interactions, and temporal relationships in a graph database. This guide explains why we chose Neo4j and how to work with it effectively.
🎯 Why Neo4j for Episodic Memory?
The Problem with Traditional Databases
Traditional relational databases (like MySQL, PostgreSQL) are excellent for structured data, but they struggle with:
- Complex Relationships: Hard to represent many-to-many relationships
- Temporal Queries: Difficult to query time-based sequences
- Flexible Schema: Adding new relationship types requires schema changes
- Graph Traversals: Finding connected data requires complex JOINs
Why Neo4j Excels
Neo4j is a graph database designed specifically for relationship-heavy data:
- Native Graph Structure: Relationships are first-class citizens
- Cypher Query Language: Intuitive language for graph queries
- Flexible Schema: Easy to add new node types and relationships
- Performance: Fast queries even with deep relationship traversals
- Temporal Support: Excellent handling of time-based data
🏗️ Graph Data Model
Core Node Types
Node Properties
User Node
(:User {
id: "user-123",
username: "john_doe",
createdAt: "2024-01-01T00:00:00Z",
preferences: {
language: "en",
timezone: "UTC",
theme: "dark"
}
})
Session Node
(:Session {
id: "session-456",
userId: "user-123",
startedAt: "2024-01-15T10:00:00Z",
endedAt: "2024-01-15T11:30:00Z",
platform: "web",
userAgent: "Mozilla/5.0...",
metadata: {
ipAddress: "192.168.1.100",
location: "San Francisco, CA"
}
})
EpisodicMemory Node
(:EpisodicMemory {
id: "memory-789",
content: "User asked about React hooks and state management",
timestamp: "2024-01-15T10:30:00Z",
importance: 0.8,
source: "chat",
type: "conversation",
metadata: {
conversationTurn: 5,
topic: "React",
sentiment: "positive",
confidence: 0.9
}
})
Relationship Types
Temporal Relationships
// Previous memory in sequence
(memory1:EpisodicMemory)-[:PREVIOUS]->(memory2:EpisodicMemory)
// Next memory in sequence
(memory1:EpisodicMemory)-[:NEXT]->(memory2:EpisodicMemory)
// Memory in same conversation
(memory1:EpisodicMemory)-[:IN_SAME_CONVERSATION]->(memory2:EpisodicMemory)
Hierarchical Relationships
// User owns session
(user:User)-[:OWNS]->(session:Session)
// Session contains memory
(session:Session)-[:CONTAINS]->(memory:EpisodicMemory)
// Memory belongs to user
(memory:EpisodicMemory)-[:BELONGS_TO]->(user:User)
Semantic Relationships
// Related memories
(memory1:EpisodicMemory)-[:RELATED_TO {strength: 0.7}]->(memory2:EpisodicMemory)
// Memory references concept
(memory:EpisodicMemory)-[:REFERENCES]->(concept:Concept)
// Memory has tag
(memory:EpisodicMemory)-[:HAS_TAG]->(tag:Tag)
🔍 Cypher Query Examples
Basic Memory Retrieval
Get All Memories for a User
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
RETURN m.content, m.timestamp, m.importance
ORDER BY m.timestamp DESC
Get Recent Memories
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.timestamp > datetime() - duration('P7D') // Last 7 days
RETURN m
ORDER BY m.timestamp DESC
LIMIT 10
Search Memories by Content
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.content CONTAINS "React" OR m.content CONTAINS "hooks"
RETURN m.content, m.timestamp, m.importance
ORDER BY m.importance DESC
Advanced Relationship Queries
Find Related Memories
MATCH (m1:EpisodicMemory {id: "memory-789"})-[:RELATED_TO]->(m2:EpisodicMemory)
RETURN m2.content, m2.timestamp
ORDER BY m2.timestamp DESC
Get Memory Context (Surrounding Memories)
MATCH (m:EpisodicMemory {id: "memory-789"})
OPTIONAL MATCH (m)-[:PREVIOUS]->(prev:EpisodicMemory)
OPTIONAL MATCH (m)-[:NEXT]->(next:EpisodicMemory)
RETURN prev.content as previous, m.content as current, next.content as next
Find Memories by Tag
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)-[:HAS_TAG]->(t:Tag {name: "React"})
RETURN m.content, m.timestamp
ORDER BY m.timestamp DESC
Complex Analysis Queries
Memory Importance Over Time
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.timestamp > datetime() - duration('P30D')
RETURN date(m.timestamp) as date, avg(m.importance) as avg_importance, count(m) as memory_count
ORDER BY date DESC
Find Conversation Patterns
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.type = "conversation"
WITH s, collect(m) as memories
WHERE size(memories) > 5 // Sessions with more than 5 memories
RETURN s.id, size(memories) as conversation_length,
avg([m in memories | m.importance]) as avg_importance
ORDER BY conversation_length DESC
🚀 Practical Implementation
Setting Up Neo4j
1. Installation
# Using Docker (Recommended)
docker run -d \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/your_password \
-e NEO4J_PLUGINS='["apoc"]' \
neo4j:latest
# Or download Neo4j Desktop from https://neo4j.com/download/
2. Environment Configuration
# packages/server/.env
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_password
NEO4J_DATABASE=neo4j
3. Connection Test
import neo4j from 'neo4j-driver';
const driver = neo4j.driver(
process.env.NEO4J_URI!,
neo4j.auth.basic(process.env.NEO4J_USERNAME!, process.env.NEO4J_PASSWORD!)
);
// Test connection
const session = driver.session();
const result = await session.run('RETURN 1 as test');
console.log('Neo4j connected:', result.records[0].get('test'));
await session.close();
Memory Operations
Store Episodic Memory
async function storeEpisodicMemory(memoryData: EpisodicMemoryData) {
const session = driver.session();
try {
const result = await session.run(`
MATCH (u:User {id: $userId})
MATCH (s:Session {id: $sessionId})
CREATE (m:EpisodicMemory {
id: $id,
content: $content,
timestamp: datetime($timestamp),
importance: $importance,
source: $source,
type: $type,
metadata: $metadata
})
CREATE (s)-[:CONTAINS]->(m)
CREATE (m)-[:BELONGS_TO]->(u)
RETURN m
`, {
userId: memoryData.userId,
sessionId: memoryData.sessionId,
id: memoryData.id,
content: memoryData.content,
timestamp: memoryData.timestamp,
importance: memoryData.importance,
source: memoryData.source,
type: memoryData.type,
metadata: memoryData.metadata
});
return result.records[0].get('m');
} finally {
await session.close();
}
}
Search Memories
async function searchMemories(userId: string, query: string, limit: number = 10) {
const session = driver.session();
try {
const result = await session.run(`
MATCH (u:User {id: $userId})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.content CONTAINS $query
RETURN m.content, m.timestamp, m.importance
ORDER BY m.importance DESC, m.timestamp DESC
LIMIT $limit
`, { userId, query, limit });
return result.records.map(record => ({
content: record.get('m.content'),
timestamp: record.get('m.timestamp'),
importance: record.get('m.importance')
}));
} finally {
await session.close();
}
}
Create Memory Relationships
async function createMemoryRelationship(
memoryId1: string,
memoryId2: string,
relationshipType: string,
strength: number = 1.0
) {
const session = driver.session();
try {
await session.run(`
MATCH (m1:EpisodicMemory {id: $memoryId1})
MATCH (m2:EpisodicMemory {id: $memoryId2})
CREATE (m1)-[:${relationshipType} {strength: $strength}]->(m2)
`, { memoryId1, memoryId2, strength });
} finally {
await session.close();
}
}
📊 Performance Optimization
Indexing Strategy
// Create indexes for common queries
CREATE INDEX user_id_index FOR (u:User) ON (u.id);
CREATE INDEX session_id_index FOR (s:Session) ON (s.id);
CREATE INDEX memory_timestamp_index FOR (m:EpisodicMemory) ON (m.timestamp);
CREATE INDEX memory_importance_index FOR (m:EpisodicMemory) ON (m.importance);
CREATE INDEX memory_content_index FOR (m:EpisodicMemory) ON (m.content);
Query Optimization Tips
1. Use Specific Node Labels
// Good: Specific label
MATCH (u:User {id: "user-123"})
// Bad: Generic label
MATCH (n {id: "user-123"})
2. Limit Result Sets Early
// Good: Limit early
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
WHERE m.timestamp > datetime() - duration('P7D')
RETURN m
ORDER BY m.timestamp DESC
LIMIT 10
// Bad: Limit late
MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
RETURN m
ORDER BY m.timestamp DESC
LIMIT 10
3. Use EXPLAIN and PROFILE
// Analyze query performance
EXPLAIN MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
RETURN m
// Profile actual execution
PROFILE MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
RETURN m
🔧 Troubleshooting
Common Issues
1. Connection Problems
# Check if Neo4j is running
docker ps | grep neo4j
# Check logs
docker logs neo4j
# Test connection
cypher-shell -u neo4j -p your_password
2. Query Performance Issues
// Check query plan
EXPLAIN MATCH (u:User {id: "user-123"})-[:OWNS]->(s:Session)-[:CONTAINS]->(m:EpisodicMemory)
RETURN m
// Check indexes
SHOW INDEXES
// Check constraints
SHOW CONSTRAINTS
3. Memory Issues
// Check database size
CALL dbms.listConfig() YIELD name, value
WHERE name CONTAINS "memory"
RETURN name, value
// Check node counts
MATCH (n) RETURN labels(n) as label, count(n) as count
ORDER BY count DESC
🎯 Best Practices
1. Data Modeling
- Keep nodes simple: Store only essential properties
- Use relationships for structure: Don't duplicate data in properties
- Plan for growth: Design schema to handle increasing data
2. Query Writing
- Start specific, get general: Begin with specific nodes, then traverse
- Use parameters: Always use parameters for user input
- Test with EXPLAIN: Understand query execution plans
3. Performance
- Create appropriate indexes: Index frequently queried properties
- Monitor query performance: Use PROFILE to identify bottlenecks
- Batch operations: Group multiple operations when possible
4. Maintenance
- Regular backups: Set up automated backup procedures
- Monitor disk usage: Graph databases can grow quickly
- Update statistics: Keep database statistics current
🚀 Next Steps
Now that you understand Neo4j integration, explore:
- Pinecone Vector Database - Semantic memory with vectors
- Ollama Embeddings - Local text embeddings
- Memory Examples - Practical usage scenarios
- Troubleshooting Guide - Common issues and solutions
Ready to learn about vector databases? Check out the Pinecone Integration Guide!