Architecture
This page explains how Clear AI v2 is structured, how modules work together, and the technical decisions behind the design.
High-Level Overviewβ
Clear AI v2 follows a layered architecture where your agents sit on top of a comprehensive shared library:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Your AI Agents (Future) β
β Orchestrator, Planner, Executor, etc. β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β Clear AI v2 Shared Library β
β (19 Production-Ready Modules) β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β Conversational β Context & β Workflows β
β Intelligence β Memory β β
βββββββββββββββββββΌββββββββββββΌββββββββββββββββ€
β Infrastructure β Foundationβ Tools & API β
β (LLM, Tokens, β (Types, β (MCP Tools, β
β Observability)β Utils) β REST API) β
ββββββββββββββββββββ΄ββββββββββββ΄ββββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β External Services β
β β’ OpenAI / Groq / Ollama (LLMs) β
β β’ Neo4j (Graph DB for memory) β
β β’ Pinecone (Vector DB for memory) β
β β’ MongoDB (Document storage) β
β β’ Langfuse (Observability platform) β
βββββββββββββββββββββββββββββββββββββββββββββββ
Module Organizationβ
The shared library is organized into 5 categories with 19 modules:
1. Conversational Intelligence (5 modules)β
Enables natural, multi-turn conversations with users.
conversational/
βββ response/ # ResponseBuilder - structured responses
βββ intent/ # IntentClassifier - detect user intent
βββ confidence/ # ConfidenceScorer - uncertainty quantification
βββ progress/ # ProgressTracker - multi-step task tracking
βββ conversation/ # ConversationUtils - entity extraction, helpers
Purpose: Allow AI to ask questions, show progress, express uncertainty, understand follow-ups.
Key Features:
- 4 response types (answer, question, progress, acknowledgment)
- 5 intent types (query, question, clarification, confirmation, follow-up)
- Confidence calculation and uncertainty thresholds
- Time estimation for long-running tasks
2. Context & Memory (3 modules)β
Manage conversation context and persistent memory.
context/
βββ manager.ts # ContextManager - main interface
βββ message.ts # MessageHandler - message operations
βββ state/ # StateManager - conversation phases
βββ compression/ # Smart compression strategies
βββ compressor.ts # ContextCompressor - orchestration
βββ prioritizer.ts # MessagePrioritizer - importance scoring
βββ entity-extractor.ts # EntityExtractor - find key entities
βββ summarizer.ts # MessageSummarizer - LLM-based compression
memory/
βββ manager.ts # MemoryManager - orchestrates both systems
βββ neo4j.ts # Neo4jMemory - episodic (conversation flow)
βββ pinecone.ts # PineconeMemory - semantic (searchable facts)
βββ embeddings.ts # EmbeddingService - Ollama & OpenAI adapters
Purpose: Handle long conversations efficiently, remember past interactions.
Key Features:
- 3 compression strategies (sliding window, prioritization, summarization)
- Automatic compression when token limit approached
- Episodic memory for conversation flow (Neo4j graph)
- Semantic memory for knowledge retrieval (Pinecone vectors)
- Entity extraction and preservation during compression
3. Workflows (2 modules)β
Build complex, multi-step processes with conditional logic.
workflow/
βββ graph/
β βββ builder.ts # GraphBuilder - fluent API for graphs
βββ execution/
β βββ executor.ts # WorkflowExecutor - run graphs
βββ checkpoint/
βββ manager.ts # CheckpointManager - save/resume state
Purpose: Define and execute complex business logic as state machines.
Key Features:
- LangGraph-style state graphs
- Conditional branching based on state
- Checkpointing for resumable workflows
- Execution metadata (time, steps, status)
4. Infrastructure (4 modules)β
Core infrastructure for production AI systems.
tokens/
βββ counter.ts # TokenCounter - accurate counting (tiktoken)
βββ budget.ts # TokenBudget - budget enforcement
llm/
βββ provider.ts # LLMProvider - unified interface
βββ adapters/
βββ openai.ts # OpenAI adapter
βββ groq.ts # Groq adapter
βββ ollama.ts # Ollama adapter
config/
βββ loader.ts # loadConfig - environment management
observability/
βββ langfuse.ts # LangfuseTracer - production tracing
Purpose: Provide reliable, cost-controlled, observable AI infrastructure.
Key Features:
- Multi-model token counting (GPT, Claude, Llama, etc.)
- Per-operation token budgets with enforcement
- Cost estimation before execution
- Automatic fallback between providers
- Distributed tracing with Langfuse
5. Foundation (5 modules)β
Fundamental building blocks used by all other modules.
types/ # TypeScript interfaces for everything
validation/ # Zod schemas for runtime validation
utils/ # 10+ utility modules
βββ template.ts # Parameter interpolation
βββ statistics.ts # Statistical functions
βββ retry.ts # Exponential backoff
βββ circuit-breaker.ts # Failure protection
βββ logger.ts # Structured logging
βββ ...
tools/ # MCP tools (Shipments, Facilities, etc.)
api/ # REST API with MongoDB
Purpose: Provide common functionality and domain-specific tools.
Key Features:
- Strict TypeScript types for everything
- Runtime validation with Zod
- Circuit breaker pattern for resilience
- MCP-compliant tool implementations
- RESTful API for waste management domain
Data Flowβ
Here's how data flows through a typical conversation:
sequenceDiagram
participant User
participant Agent
participant Intent as IntentClassifier
participant Context as ContextManager
participant LLM as LLMProvider
participant Memory as MemoryManager
participant Tools
participant Tracer as LangfuseTracer
User->>Agent: "Show me contaminated shipments"
Agent->>Tracer: Start trace
Agent->>Intent: Classify message
Intent-->>Agent: Intent: 'query'
Agent->>Context: Add message
Context->>Context: Check token limit
Context-->>Agent: OK (enough tokens)
Agent->>Memory: Search related context
Memory-->>Agent: Previous queries about shipments
Agent->>Tools: Execute shipments tool
Tools-->>Agent: 23 results
Agent->>Agent: Calculate confidence (85%)
Agent->>Context: Add response
Agent->>Tracer: End trace
Agent-->>User: "Found 23 contaminated shipments"
User->>Agent: "What about from FacilityA?"
Agent->>Intent: Classify message
Intent-->>Agent: Intent: 'followup'
Agent->>Context: Get last query context
Context-->>Agent: Previous query was about shipments
Agent->>Agent: Filter results by FacilityA
Agent-->>User: "8 contaminated shipments from FacilityA"
Key Design Decisionsβ
1. TypeScript-Firstβ
Decision: Use strict TypeScript with ES modules
Rationale:
- Catch errors at compile time, not runtime
- Better IDE support and autocomplete
- Self-documenting code with types
- Modern module system (ES modules)
Trade-off: Slightly more verbose, requires compilation step
2. Dependency Injectionβ
Decision: Support constructor injection for testing
Rationale:
- Enable unit testing without real services
- Swap implementations easily (e.g., OpenAI β Ollama)
- Better modularity
Example:
// Production: real services
const memory = new MemoryManager();
// Testing: mocked services
const memory = new MemoryManager({
mockNeo4j: mockDriver,
mockPinecone: mockClient
});
3. Test-Driven Developmentβ
Decision: Write tests before implementation (TDD)
Rationale:
- Ensure all code is testable
- Document expected behavior
- Catch regressions early
- Build confidence in changes
Result: 724 unit tests + 45 integration tests = 100% pass rate
4. Modular Architectureβ
Decision: Small, focused modules with single responsibilities
Rationale:
- Easy to understand and maintain
- Use only what you need
- Clear separation of concerns
- Simpler testing
Metric: Average 110 lines of code per file
5. Multi-Provider Supportβ
Decision: Abstract LLM providers behind unified interface
Rationale:
- Avoid vendor lock-in
- Provide reliability through fallback
- Enable cost optimization
- Support local/private deployments
Trade-off: Slightly more complex setup
6. Configurable Embeddingsβ
Decision: Support multiple embedding providers (Ollama, OpenAI)
Rationale:
- Privacy: Use local Ollama for sensitive data
- Cost: Ollama is free, OpenAI costs per request
- Flexibility: Switch based on needs
Default: Ollama (privacy-focused, free)
7. Langfuse Integrationβ
Decision: Built-in observability with Langfuse
Rationale:
- Essential for production debugging
- See exact prompts and responses
- Track costs and performance
- Industry standard for LLM observability
Trade-off: Optional dependency
Technology Stackβ
Core Technologiesβ
- TypeScript 5.x: Language with strict typing
- Node.js 22+: Runtime environment
- Yarn Berry 4.x: Package manager with PnP
- Jest 30.x: Testing framework
- Zod 4.x: Schema validation
AI & LLMβ
- OpenAI SDK: GPT-3.5, GPT-4 models
- Groq SDK: Fast Llama and Mixtral models
- Ollama: Local model inference
- tiktoken: Accurate token counting
- Langfuse: LLM observability
Databasesβ
- Neo4j 6.x: Graph database for episodic memory
- Pinecone 6.x: Vector database for semantic memory
- MongoDB 8.x: Document database for tools/API
Infrastructureβ
- Express 4.x: REST API framework
- Axios 1.x: HTTP client
- dotenv: Environment configuration
Performance Considerationsβ
Token Efficiencyβ
Challenge: Long conversations consume many tokens, increasing costs.
Solution:
- Context compression (saves 70-80% of tokens)
- Sliding window strategy for simple cases
- Intelligent summarization for complex conversations
- Token budgets to prevent overruns
Memory Efficiencyβ
Challenge: Neo4j and Pinecone queries can be slow.
Solution:
- Lazy loading (only query when needed)
- Caching recent results
- Batch operations where possible
- Make memory systems optional (disable if not needed)
Latency Optimizationβ
Challenge: Multiple service calls add latency.
Solution:
- Parallel execution where possible
- Streaming responses for LLM calls
- Progress updates during long operations
- Conditional logic to skip unnecessary steps
Security Considerationsβ
API Keysβ
- Stored in
.env
file (never committed) - Validated on startup
- Rotatable without code changes
Data Privacyβ
- Support for local Ollama (no data leaves your machine)
- Optional memory systems (disable for sensitive data)
- No logging of user data by default
Input Validationβ
- Zod schemas validate all inputs
- Type checking at compile and runtime
- Sanitization of user inputs
Scalabilityβ
The architecture supports scaling in multiple ways:
Horizontal Scalingβ
- Stateless design: No server-side session state
- Database-backed memory: Multiple instances share Neo4j/Pinecone
- Load balancing: Distribute requests across instances
Vertical Scalingβ
- Efficient token usage: Context compression reduces memory needs
- Lazy loading: Only load what's needed
- Streaming: Process large responses incrementally
Cost Scalingβ
- Token budgets: Prevent runaway costs
- Provider fallback: Use cheaper providers when appropriate
- Local models: Zero marginal cost with Ollama
Testing Strategyβ
Unit Tests (724 tests)β
- Test each module in isolation
- Mock external dependencies
- Fast execution (less than 3 seconds)
- Run on every change
Integration Tests (45 tests)β
- Test with real services (OpenAI, Neo4j, etc.)
- Verify end-to-end flows
- Slower execution (~5 seconds)
- Run before releases
Test Coverageβ
- 100% pass rate maintained
- Every module fully tested
- TDD approach throughout
- Continuous integration ready
What's Next?β
Now that you understand the architecture:
- π¬ Conversational AI Modules
- π§ Context & Memory Modules
- π Workflow Modules
- ποΈ Infrastructure Modules
- π§ Foundation Modules
Questions about architecture? Check the Development Guide or specific module docs.