Ollama Embeddings Guide

Ollama provides local, privacy-focused text embeddings for our memory system. This guide explains how embeddings work, why we chose Ollama, and how to set up and use local embeddings effectively.

🎯 What are Text Embeddings?

The Magic of Converting Text to Numbers

Text embeddings are numerical representations of text that capture semantic meaning. Think of them as a way to convert human language into a format that computers can understand and compare.

// Human-readable text
const text = "I love machine learning";

// Computer-readable vector (simplified - actual vectors have 768 dimensions)
const embedding = [0.1, 0.3, 0.7, 0.2, 0.9, 0.4, 0.6, 0.8, ...];

Why Embeddings Matter

Semantic Understanding: "car" and "automobile" have similar embeddings
Language Agnostic: "hello" and "hola" have similar vectors
Context Awareness: "bank" (financial) and "bank" (river) have different embeddings
Similarity Search: Find related content even with different wording

🏠 Why Local Embeddings with Ollama?

The Problem with External APIs

Traditional embedding services have drawbacks:

Privacy Concerns: Your data goes to external servers
Cost: Pay per API call, can get expensive
Latency: Network requests add delay
Dependency: Relies on external service availability
Data Control: No control over how your data is processed

Ollama's Advantages

Privacy First: Everything runs on your machine
Cost Effective: One-time setup, no per-request fees
Fast: No network latency
Reliable: No external dependencies
Customizable: Use different models as needed

🚀 Setting Up Ollama

1. Installation

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Verify installation
ollama --version

2. Start Ollama Service

# Start Ollama server
ollama serve

# In another terminal, verify it's running
curl http://localhost:11434/api/tags

3. Install Embedding Model

# Install nomic-embed-text (recommended for our system)
ollama pull nomic-embed-text

# Verify model is available
ollama list

4. Test the Model

# Test embedding generation
ollama run nomic-embed-text "Hello, world!"

🔧 Implementation

Basic Embedding Service

class OllamaEmbeddingService {
  private baseUrl: string;
  private model: string;

  constructor(baseUrl: string = 'http://localhost:11434', model: string = 'nomic-embed-text') {
    this.baseUrl = baseUrl;
    this.model = model;
  }

  async generateEmbedding(text: string): Promise<number[]> {
    try {
      const response = await fetch(`${this.baseUrl}/api/embeddings`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          model: this.model,
          prompt: text
        })
      });

      if (!response.ok) {
        throw new Error(`Ollama API error: ${response.status} ${response.statusText}`);
      }

      const data = await response.json();
      return data.embedding;
    } catch (error) {
      console.error('Error generating embedding:', error);
      throw error;
    }
  }
}

Batch Embedding Generation

class BatchEmbeddingService extends OllamaEmbeddingService {
  private batchSize: number;
  private delay: number;

  constructor(batchSize: number = 5, delay: number = 100) {
    super();
    this.batchSize = batchSize;
    this.delay = delay;
  }

  async generateBatchEmbeddings(texts: string[]): Promise<number[][]> {
    const embeddings: number[][] = [];
    
    for (let i = 0; i < texts.length; i += this.batchSize) {
      const batch = texts.slice(i, i + this.batchSize);
      
      // Process batch in parallel
      const batchPromises = batch.map(text => this.generateEmbedding(text));
      const batchEmbeddings = await Promise.all(batchPromises);
      
      embeddings.push(...batchEmbeddings);
      
      // Add delay between batches to avoid overwhelming Ollama
      if (i + this.batchSize < texts.length) {
        await new Promise(resolve => setTimeout(resolve, this.delay));
      }
    }
    
    return embeddings;
  }
}

Caching for Performance

class CachedEmbeddingService extends OllamaEmbeddingService {
  private cache: Map<string, number[]>;
  private maxCacheSize: number;

  constructor(maxCacheSize: number = 1000) {
    super();
    this.cache = new Map();
    this.maxCacheSize = maxCacheSize;
  }

  async generateEmbedding(text: string): Promise<number[]> {
    // Check cache first
    if (this.cache.has(text)) {
      return this.cache.get(text)!;
    }

    // Generate new embedding
    const embedding = await super.generateEmbedding(text);
    
    // Cache the result
    this.cacheEmbedding(text, embedding);
    
    return embedding;
  }

  private cacheEmbedding(text: string, embedding: number[]): void {
    // Remove oldest entries if cache is full
    if (this.cache.size >= this.maxCacheSize) {
      const firstKey = this.cache.keys().next().value;
      this.cache.delete(firstKey);
    }
    
    this.cache.set(text, embedding);
  }

  // Preload common embeddings
  async preloadEmbeddings(texts: string[]): Promise<void> {
    const uncachedTexts = texts.filter(text => !this.cache.has(text));
    
    if (uncachedTexts.length > 0) {
      const embeddings = await this.generateBatchEmbeddings(uncachedTexts);
      
      uncachedTexts.forEach((text, index) => {
        this.cache.set(text, embeddings[index]);
      });
    }
  }
}

📊 Embedding Quality and Performance

Model Comparison

// Different models for different use cases
const EMBEDDING_MODELS = {
  // Best for general text (768 dimensions)
  'nomic-embed-text': {
    dimensions: 768,
    useCase: 'General text embedding',
    performance: 'High',
    size: '274MB'
  },
  
  // Smaller, faster model (384 dimensions)
  'all-minilm': {
    dimensions: 384,
    useCase: 'Fast embedding for simple text',
    performance: 'Very High',
    size: '80MB'
  },
  
  // Multilingual support
  'multilingual-e5-large': {
    dimensions: 1024,
    useCase: 'Multilingual text',
    performance: 'High',
    size: '1.1GB'
  }
};

Quality Testing

async function testEmbeddingQuality() {
  const embeddingService = new OllamaEmbeddingService();
  
  // Test similar concepts
  const similarTexts = [
    "I love programming",
    "I enjoy coding",
    "Programming is fun"
  ];
  
  const embeddings = await embeddingService.generateBatchEmbeddings(similarTexts);
  
  // Calculate similarities
  for (let i = 0; i < embeddings.length; i++) {
    for (let j = i + 1; j < embeddings.length; j++) {
      const similarity = cosineSimilarity(embeddings[i], embeddings[j]);
      console.log(`Similarity between "${similarTexts[i]}" and "${similarTexts[j]}": ${similarity}`);
    }
  }
}

function cosineSimilarity(vecA: number[], vecB: number[]): number {
  const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  
  return dotProduct / (magnitudeA * magnitudeB);
}

Performance Monitoring

class MonitoredEmbeddingService extends OllamaEmbeddingService {
  private metrics = {
    totalRequests: 0,
    totalTime: 0,
    averageTime: 0,
    errors: 0
  };

  async generateEmbedding(text: string): Promise<number[]> {
    const startTime = Date.now();
    this.metrics.totalRequests++;
    
    try {
      const embedding = await super.generateEmbedding(text);
      
      const duration = Date.now() - startTime;
      this.metrics.totalTime += duration;
      this.metrics.averageTime = this.metrics.totalTime / this.metrics.totalRequests;
      
      return embedding;
    } catch (error) {
      this.metrics.errors++;
      throw error;
    }
  }

  getMetrics() {
    return {
      ...this.metrics,
      errorRate: this.metrics.errors / this.metrics.totalRequests,
      requestsPerSecond: this.metrics.totalRequests / (this.metrics.totalTime / 1000)
    };
  }
}

🔧 Configuration and Optimization

Environment Configuration

# packages/server/.env
OLLAMA_BASE_URL=http://localhost:11434
MEMORY_EMBEDDING_MODEL=nomic-embed-text
MEMORY_EMBEDDING_DIMENSIONS=768
MEMORY_EMBEDDING_BATCH_SIZE=5
MEMORY_EMBEDDING_DELAY_MS=100
MEMORY_EMBEDDING_CACHE_SIZE=1000

Service Configuration

interface EmbeddingConfig {
  baseUrl: string;
  model: string;
  dimensions: number;
  batchSize: number;
  delay: number;
  cacheSize: number;
  timeout: number;
}

const defaultConfig: EmbeddingConfig = {
  baseUrl: process.env.OLLAMA_BASE_URL || 'http://localhost:11434',
  model: process.env.MEMORY_EMBEDDING_MODEL || 'nomic-embed-text',
  dimensions: parseInt(process.env.MEMORY_EMBEDDING_DIMENSIONS || '768'),
  batchSize: parseInt(process.env.MEMORY_EMBEDDING_BATCH_SIZE || '5'),
  delay: parseInt(process.env.MEMORY_EMBEDDING_DELAY_MS || '100'),
  cacheSize: parseInt(process.env.MEMORY_EMBEDDING_CACHE_SIZE || '1000'),
  timeout: 30000
};

Memory System Integration

class MemoryEmbeddingService {
  private embeddingService: OllamaEmbeddingService;
  private config: EmbeddingConfig;

  constructor(config: EmbeddingConfig = defaultConfig) {
    this.config = config;
    this.embeddingService = new CachedEmbeddingService(config.cacheSize);
  }

  async generateMemoryEmbedding(memory: {
    content: string;
    concept?: string;
    category?: string;
    tags?: string[];
  }): Promise<number[]> {
    // Combine different parts of memory for better embedding
    const combinedText = [
      memory.content,
      memory.concept,
      memory.category,
      ...(memory.tags || [])
    ].filter(Boolean).join(' ');

    return this.embeddingService.generateEmbedding(combinedText);
  }

  async generateQueryEmbedding(query: string): Promise<number[]> {
    return this.embeddingService.generateEmbedding(query);
  }

  async generateBatchMemoryEmbeddings(memories: any[]): Promise<number[][]> {
    const texts = memories.map(memory => 
      [memory.content, memory.concept, memory.category, ...(memory.tags || [])]
        .filter(Boolean)
        .join(' ')
    );

    return this.embeddingService.generateBatchEmbeddings(texts);
  }
}

🎯 Real-World Examples

Example 1: Knowledge Base Embeddings

// Store knowledge with embeddings
const knowledgeItems = [
  {
    content: "React hooks allow you to use state and lifecycle features in functional components",
    concept: "React Hooks",
    category: "frontend",
    tags: ["React", "hooks", "functional-components"]
  },
  {
    content: "useState is a React hook that lets you add state to functional components",
    concept: "useState Hook",
    category: "frontend",
    tags: ["React", "useState", "state-management"]
  }
];

const embeddingService = new MemoryEmbeddingService();

// Generate embeddings for all knowledge items
const embeddings = await embeddingService.generateBatchMemoryEmbeddings(knowledgeItems);

// Store in Pinecone with embeddings
for (let i = 0; i < knowledgeItems.length; i++) {
  await storeSemanticMemory({
    id: `knowledge-${i}`,
    ...knowledgeItems[i],
    embedding: embeddings[i]
  });
}

Example 2: User Query Processing

// Process user query
const userQuery = "How do I manage state in React components?";

// Generate query embedding
const queryEmbedding = await embeddingService.generateQueryEmbedding(userQuery);

// Search for similar knowledge
const similarKnowledge = await searchSemanticMemory(queryEmbedding, {
  userId: 'user-123',
  threshold: 0.7,
  limit: 5
});

// Results will include both React hooks knowledge items
// even though the query uses different wording

Example 3: Learning Progress Tracking

// Track what user has learned
const learningProgress = [
  {
    content: "User successfully implemented useState hook in their React component",
    concept: "useState Implementation",
    category: "learning-progress",
    tags: ["React", "useState", "learning", "progress"]
  },
  {
    content: "User asked about useEffect hook and implemented it correctly",
    concept: "useEffect Learning",
    category: "learning-progress",
    tags: ["React", "useEffect", "learning", "progress"]
  }
];

// Generate embeddings for learning progress
const progressEmbeddings = await embeddingService.generateBatchMemoryEmbeddings(learningProgress);

// Store learning progress
for (let i = 0; i < learningProgress.length; i++) {
  await storeSemanticMemory({
    id: `progress-${Date.now()}-${i}`,
    ...learningProgress[i],
    embedding: progressEmbeddings[i]
  });
}

🚨 Troubleshooting

Common Issues

1. Ollama Not Running

# Check if Ollama is running
curl http://localhost:11434/api/tags

# If not running, start it
ollama serve

2. Model Not Found

# Check available models
ollama list

# Install missing model
ollama pull nomic-embed-text

3. Connection Timeout

// Add timeout to requests
const response = await fetch(`${this.baseUrl}/api/embeddings`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ model: this.model, prompt: text }),
  signal: AbortSignal.timeout(30000) // 30 second timeout
});

4. Memory Issues

// Monitor memory usage
class MemoryAwareEmbeddingService extends OllamaEmbeddingService {
  private maxMemoryUsage = 1000; // Max number of cached embeddings

  private cleanupCache(): void {
    if (this.cache.size > this.maxMemoryUsage) {
      // Remove oldest 20% of cache
      const toRemove = Math.floor(this.cache.size * 0.2);
      const keys = Array.from(this.cache.keys()).slice(0, toRemove);
      keys.forEach(key => this.cache.delete(key));
    }
  }
}

Performance Issues

1. Slow Embedding Generation

// Use smaller model for faster generation
const fastEmbeddingService = new OllamaEmbeddingService(
  'http://localhost:11434',
  'all-minilm' // Smaller, faster model
);

2. High Memory Usage

// Implement cache limits
class LimitedCacheEmbeddingService extends OllamaEmbeddingService {
  private maxCacheSize = 500;
  
  private enforceCacheLimit(): void {
    if (this.cache.size > this.maxCacheSize) {
      const keys = Array.from(this.cache.keys());
      const toRemove = keys.slice(0, this.cache.size - this.maxCacheSize);
      toRemove.forEach(key => this.cache.delete(key));
    }
  }
}

🎯 Best Practices

1. Model Selection

nomic-embed-text: Best for general use (768 dimensions)
all-minilm: Fastest for simple text (384 dimensions)
multilingual-e5-large: Best for multilingual content (1024 dimensions)

2. Performance Optimization

Use caching: Avoid regenerating same embeddings
Batch processing: Group multiple requests together
Appropriate delays: Don't overwhelm Ollama with requests
Monitor resources: Keep track of memory and CPU usage

3. Quality Assurance

Test embeddings: Verify similarity scores make sense
Monitor performance: Track generation times and error rates
Regular updates: Keep Ollama and models updated
Backup strategy: Have fallback models available

4. Security

Local only: Never expose Ollama to external networks
Input validation: Sanitize all text inputs
Resource limits: Prevent excessive resource usage
Access control: Secure the Ollama service

🚀 Next Steps

Now that you understand Ollama embeddings, explore:

Memory Examples - Practical usage scenarios
Troubleshooting Guide - Common issues and solutions
Memory System Overview - Complete system understanding
Neo4j Integration - Episodic memory with graphs

Ready to see it all in action? Check out the Memory Examples for practical scenarios!

🎯 What are Text Embeddings?​

The Magic of Converting Text to Numbers​

Why Embeddings Matter​

🏠 Why Local Embeddings with Ollama?​

The Problem with External APIs​

Ollama's Advantages​

🚀 Setting Up Ollama​

1. Installation​

2. Start Ollama Service​

3. Install Embedding Model​

4. Test the Model​

🔧 Implementation​

Basic Embedding Service​

Batch Embedding Generation​

Caching for Performance​

📊 Embedding Quality and Performance​

Model Comparison​

Quality Testing​

Performance Monitoring​

🔧 Configuration and Optimization​

Environment Configuration​

Service Configuration​

Memory System Integration​

🎯 Real-World Examples​

Example 1: Knowledge Base Embeddings​

Example 2: User Query Processing​

Example 3: Learning Progress Tracking​

🚨 Troubleshooting​

Common Issues​

1. Ollama Not Running​

2. Model Not Found​

3. Connection Timeout​

4. Memory Issues​

Performance Issues​

1. Slow Embedding Generation​

2. High Memory Usage​

🎯 Best Practices​

1. Model Selection​

2. Performance Optimization​

3. Quality Assurance​

4. Security​

🚀 Next Steps​