LegalFab System Architecture

Version: 1.4 Last Updated: January 2026


Platform Overview

LegalFab is an AI-powered legal technology platform built on a metadata-driven architecture. The platform provides unified access to distributed data assets, intelligent automation through AI agents, and compliance capabilities for regulated legal practices.

Platform Components

Component Purpose Key Capabilities
Knowledge Fabric Data integration and intelligence layer Persistent Knowledge Graph, Entity Resolution, 200+ MCP Connectors, Search Sessions
Studio Creation and execution environment Agent Builder, Widgets, Datasets, Chain of Agents, Operational Modes (0-4)
Dialog Conversational intelligence interface Natural Language Understanding, Intelligent Routing, Cross-Platform Context, Long-Term Memory
Schema Management Business domain definition and control Domain Discovery, Schema Registry, Schema Validation
AI & LLM Layer Intelligent processing and inference LightLLM Gateway, Output Consistency, Model Provenance
AML Compliance Regulatory compliance module Rule Engine, BPM Workflows, Screening, Case Management
DevOps Infrastructure Deployment and operations CI/CD Pipelines, Monitoring, Security Operations

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                        PRESENTATION LAYER                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐   │
│  │   Web UI     │  │  API Gateway │  │    Dialog    │  │  Messaging │   │
│  │              │  │  (REST)      │  │   Interface  │  │  Mini-App  │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘   │
│                              │                                          │
│                   [Authentication, Rate Limiting, Input Validation]     │
├─────────────────────────────────────────────────────────────────────────┤
│                          APPLICATION LAYER                              │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                          DIALOG                                 │    │
│  │  ┌───────────┐  ┌───────────────┐  ┌───────────┐  ┌──────────┐  │    │
│  │  │    NLU    │  │    Dialog     │  │  Function │  │ Context  │  │    │
│  │  │  Engine   │  │    Manager    │  │  Router   │  │ Manager  │  │    │
│  │  └───────────┘  └───────────────┘  └───────────┘  └──────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                          STUDIO                                 │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────────────┐  │    │
│  │  │  Agent    │  │  Widgets  │  │ Datasets  │  │  Chain of    │  │    │
│  │  │  Builder  │  │           │  │           │  │  Agents      │  │    │
│  │  └───────────┘  └───────────┘  └───────────┘  └──────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                     AML COMPLIANCE MODULE                       │    │
│  │  ┌───────────┐  ┌───────────────┐  ┌───────────┐  ┌──────────┐  │    │
│  │  │   Rule    │  │  BPM          │  │ Screening │  │  Case    │  │    │
│  │  │  Engine   │  │  Workflows    │  │  Service  │  │ Manager  │  │    │
│  │  └───────────┘  └───────────────┘  └───────────┘  └──────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌────────────┐   │
│  │  Search  │ │ Lineage  │ │ Quality  │ │ Discovery  │ │ Governance │   │
│  │ Service  │ │ Service  │ │ Service  │ │  Service   │ │  Service   │   │
│  └──────────┘ └──────────┘ └──────────┘ └────────────┘ └────────────┘   │
│                              │                                          │
│                   [Service-to-Service AuthN/AuthZ, mTLS]                │
├─────────────────────────────────────────────────────────────────────────┤
│                      KNOWLEDGE FABRIC LAYER                             │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                    KNOWLEDGE GRAPH                              │    │
│  │  ┌──────────────┐  ┌──────────────────┐  ┌──────────────────┐   │    │
│  │  │ Entity Store │  │ Relationship     │  │  Query Engine    │   │    │
│  │  │   (Nodes)    │  │ Store (Edges)    │  │  (Traversal)     │   │    │
│  │  └──────────────┘  └──────────────────┘  └──────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                  ENTITY RESOLUTION ENGINE                       │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌─────────────┐   │    │
│  │  │ Blocking  │  │ Matching  │  │ Clustering│  │ Golden      │   │    │
│  │  │ Service   │  │ Service   │  │ Service   │  │ Record Mgmt │   │    │
│  │  └───────────┘  └───────────┘  └───────────┘  └─────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│                   [Encryption at Rest, Access Control Lists]            │
├─────────────────────────────────────────────────────────────────────────┤
│                        AI & LLM LAYER                                   │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  ┌────────────┐  ┌──────────────┐  ┌────────────────────────┐   │    │
│  │  │  LightLLM  │  │   Provider   │  │  Prompt Management     │   │    │
│  │  │  Gateway   │  │  Abstraction │  │  & Template Engine     │   │    │
│  │  └────────────┘  └──────────────┘  └────────────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│                   [Input Filtering, Output Validation, PII Detection]   │
├─────────────────────────────────────────────────────────────────────────┤
│                       CONNECTIVITY LAYER                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐   │
│  │  Database    │  │    API       │  │     MCP      │  │   Event    │   │
│  │  Connectors  │  │  Connectors  │  │  Connectors  │  │  Streams   │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘   │
│                              │                                          │
│               [Credential Vault, Secure Connections, Data Sampling]     │
├─────────────────────────────────────────────────────────────────────────┤
│                       SOURCE SYSTEMS                                    │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐  │
│  │Databases │ │  Document │ │  APIs    │ │  Legal   │ │   External   │  │
│  │          │ │  Stores   │ │          │ │  Systems │ │   Sources    │  │
│  └──────────┘ └───────────┘ └──────────┘ └──────────┘ └──────────────┘  │
│                              │                                          │
│                [Customer-Managed, Customer Credentials]                 │
└─────────────────────────────────────────────────────────────────────────┘

Knowledge Fabric

The Knowledge Fabric serves as the foundational data integration and intelligence layer for the LegalFab platform. It implements a metadata-driven architecture that provides unified access to distributed data assets while leaving source data in place.

Core Capabilities

Capability Description
Persistent Knowledge Graph Corporate memory with schema-bounded extraction, source provenance, and Knowledge Tree structure
Entity Resolution Cross-source entity matching using blocking, matching, and clustering algorithms with golden record management
200+ MCP Connectors Federated queries across databases, SaaS applications, legal systems, and corporate registries
Search Sessions Iterative exploration with session graphs and accumulated context
Data Observability Quality monitoring, freshness tracking, and automated alerts
Discovery Service Automated identification and cataloging of data assets
Active Metadata Continuous metadata analysis, profiling, and enrichment
Data Lineage End-to-end tracking of data flow from source to consumption

Knowledge Graph Model

The Knowledge Graph stores entities as nodes and relationships as edges, enabling complex traversal queries and network analysis.

Entity Types:

Entity Description Use Cases
Person Individual clients, contacts, beneficial owners KYC, risk assessment, relationship mapping
Organization Corporate clients, counterparties, related entities Corporate structure, ownership analysis
Matter Legal matters, cases, engagements Matter management, conflict checking
Document Contracts, filings, correspondence Document management, search
Address Physical and registered addresses Location analysis, verification
Identifier Tax IDs, registration numbers, LEIs Cross-reference, verification

Relationship Types:

Relationship Description
OWNS Ownership stake between entities
CONTROLS Control relationship (voting, management)
RELATED_TO Personal or business relationship
EMPLOYS Employment relationship
REPRESENTS Legal representation relationship
LOCATED_AT Entity-address relationship

Entity Resolution

The Entity Resolution Engine identifies duplicate and related records across connected sources to build unified entity profiles (Golden Records).

Resolution Pipeline:

Source Records → Blocking → Candidate Pairs → Matching → Clusters → Golden Records
                    │             │              │           │
              (Key generation) (Comparison)  (Scoring)  (Merge rules)

Studio

The Studio provides the creation and execution environment for AI agents, widgets, datasets, and workflows. The platform supports five operational modes enabling organizations to balance automation with human oversight.

Core Capabilities

Capability Description
Agent Creation Domain-driven flow with schema selection, natural language definition, and testing
Widgets Agents with visual interface components (charts, tables, custom views)
Datasets Structured data collections with schema enforcement and access controls
Business Domain Discovery Extract schemas from documents to define entity structures
Chain of Agents Orchestrate multiple agents in sequential, parallel, or hierarchical patterns
Operational Modes Five modes from traditional platform (Mode 0) to fully automated with audit (Mode 4)
Text-to-Pipeline Natural language pipeline generation with DSL output and iterative refinement
Testing Framework Test agents and workflows in sandboxed environments

Text-to-Pipeline

Users describe desired workflows in natural language, and the system generates a structured DSL representation:

Stage Description
Intent Analysis Parse natural language to identify data sources, operations, and conditions
DSL Generation Generate structured pipeline definition with steps and dependencies
Validation Verify permissions, tool availability, and type compatibility
Visual Preview Display interactive flow diagram with step details
Refinement Iterate using natural language commands to modify the pipeline
Execution Run immediately, schedule, or save as reusable template

Agent Architecture

Agents are the fundamental execution unit in Studio. Each agent has:

Component Description
Definition Name, version, description, category
Inputs Input parameters with schemas and validation rules
Workflow Execution steps, tool calls, conditional logic
Outputs Result definitions with transformation rules

Chain of Agents

Chains enable complex workflows by orchestrating multiple agents:

Pattern Description Use Case
Sequential Agents execute in order, passing results Document review pipeline
Parallel Agents execute concurrently Multi-source research
Conditional Agent selection based on runtime conditions Risk-based routing
Loop Repeated execution until condition met Iterative refinement

Operational Modes

The platform supports five operational modes enabling organizations to configure automation levels:

Mode Name Description
0 Traditional Platform No AI agents; manual investigation and analysis
1 AI-Assisted Manual Agents in suggest-only mode; all decisions require human approval
2 Routine Automation Agents handle routine tasks; humans focus on analysis and decisions
3 Autonomous with Escalation Full automation with escalation on exceptions
4 Fully Automated with Audit End-to-end automation with post-investigation human audits

Dialog

The Dialog component serves as the central conversational intelligence layer, enabling natural language interaction across all platform components.

Core Capabilities

Capability Description
Natural Language Understanding Intent classification, entity extraction, context analysis
Intelligent Routing Routes queries to Knowledge Fabric, Studio, Marketplace, or Exchange
Multi-Level Query Processing Handles complex queries requiring multiple platform components
Cross-Platform Continuity Maintains context between web and messaging applications
Long-Term Memory Preserves conversation history across sessions
Document Processing Handles document uploads within conversations
Intelligent Caching Two-layer caching (exact match + semantic) for performance

Dialog State Machine

State Description
IDLE Waiting for user input
PROCESSING Analyzing user request via NLU
ROUTING Determining appropriate platform components
EXECUTING Calling platform functions
RESPONDING Generating user response
CLARIFYING Requesting additional information
ERROR Handling errors and recovery

Function Routing

Component Query Types
Knowledge Fabric Search, entity lookup, graph traversal
Studio Agent invocation, pipeline execution, creation assistance
Marketplace Asset search, details, acquisition
Exchange Data sharing, collaboration, access requests

Schema Management

The Schema Management system provides a unified approach to defining, discovering, and managing business domain schemas across the platform.

Core Capabilities

Capability Description
Business Domain Discovery Extract domain concepts from user documents
Schema Registry Centralized storage with versioning and access control
Schema Validation Ensure data conformance across all platform components
Schema Binding Link schemas to agents, extractors, and MCP connectors

Schema Usage Across Platform

Component Schema Role
Studio Agents Input/output validation, data transformation control
OSINT Extractors Structure external data according to domain model
MCP Connectors Ensure data consistency across tool integrations
Knowledge Graph Entity and relationship type definitions
Pipelines Data flow validation between processing steps

Document-to-Schema Discovery

Stage Description
Document Analysis Extract text, structure, and metadata from user documents
Concept Extraction Identify entities, attributes, and relationships
Schema Generation Create structured schema definitions
User Refinement Interactive review and modification
Publication Register schema in central registry

AI & LLM Layer

The AI & LLM Layer provides intelligent processing capabilities through a provider-agnostic gateway with comprehensive output consistency and quality controls.

Core Capabilities

Capability Description
LightLLM Gateway Unified interface for multiple LLM providers
Output Consistency Schema-validated extraction and ontology-based execution
Model Provenance Complete tracking of model versions and configurations
A/B Testing Controlled model updates with performance comparison
Quality Assurance Feedback loops, accuracy monitoring, reasoning chain transparency
Provider Abstraction Swap providers without code changes
Prompt Management Template library with version control

LLM Output Consistency

Control Description
Schema-Validated Extraction All outputs validated against user-defined schemas
Ontology-Based Execution Responses grounded in domain ontology
Reasoning Chain Transparency Full reasoning paths logged for audit
Deterministic Components Separation of deterministic vs. probabilistic processing

Provider Support

Provider Type Examples Integration
Commercial APIs OpenAI, Anthropic, Google API key authentication
Self-Hosted Llama, Mistral, custom models Private endpoint
Enterprise Azure OpenAI, AWS Bedrock Cloud provider auth

AML Compliance Module

The AML Compliance module provides comprehensive capabilities for customer due diligence and regulatory compliance.

Core Capabilities

Capability Description
Rule Engine Risk scoring and compliance decision rules
BPM Workflows CDD, EDD, and investigation process orchestration
Screening Service Sanctions, PEP, and adverse media checking
Case Management Compliance case tracking and documentation
Reporting Regulatory reporting and analytics

Risk Assessment

The Rule Engine calculates risk scores based on configurable factors:

Risk Category Factors
Geographic Country risk, jurisdiction, sanctions exposure
Industry Sector risk, regulatory intensity
Product/Service Complexity, cash intensity, cross-border
Customer Type Individual, corporate, PEP status
Behavioral Transaction patterns, anomalies

BPM Workflows

Workflow Purpose Trigger
Client Onboarding Initial CDD for new clients Client intake
Periodic Review Regular risk reassessment Time-based
Triggered Review Event-driven reassessment Risk event
Investigation Detailed inquiry process Alert escalation
SAR Escalation Suspicious activity reporting Investigation finding

Network Architecture

Network Segmentation

Zone Purpose Components
Edge/DMZ External entry point Load balancers, WAF, API Gateway
Application Service execution Studio, Services, AI Layer
Data Persistent storage Knowledge Graph, Document Store
Management Operations Monitoring, Logging, Admin tools

Connectivity Patterns

Pattern Description Use Case
Direct TLS Encrypted connection over internet Cloud-hosted sources
VPN Tunnel Site-to-site encrypted tunnel On-premises sources
Private Link Cloud provider private connectivity Same-cloud sources
Agent-Based Customer-deployed agent connects outbound Air-gapped environments

Data Flow Model

LegalFab operates on a metadata-first principle. Source data remains in place while metadata flows through the platform.

Data Type Handling Storage
Structural Metadata Extracted and stored Knowledge Graph
Statistical Profiles Computed via aggregation Knowledge Graph
Sample Data Optional, user-controlled Ephemeral cache
Query Results Pass-through federation Never persisted
Source Credentials Encrypted storage Secure Vault

Security Architecture

Defense-in-Depth Model

┌─────────────────────────────────────────────────────────────────────┐
│                    PERIMETER SECURITY                               │
│    DDoS Protection │ WAF │ Rate Limiting │ Geographic Filtering     │
├─────────────────────────────────────────────────────────────────────┤
│                    NETWORK SECURITY                                 │
│    VPC Isolation │ Network Segmentation │ Private Endpoints         │
├─────────────────────────────────────────────────────────────────────┤
│                    APPLICATION SECURITY                             │
│    Input Validation │ Output Encoding │ CSRF Protection │ CSP       │
├─────────────────────────────────────────────────────────────────────┤
│                    IDENTITY & ACCESS                                │
│    OAuth 2.0/OIDC │ RBAC │ ABAC │ MFA │ Session Management          │
├─────────────────────────────────────────────────────────────────────┤
│                    DATA SECURITY                                    │
│    Encryption │ Tokenization │ Data Masking │ Classification        │
├─────────────────────────────────────────────────────────────────────┤
│                    INFRASTRUCTURE SECURITY                          │
│    Hardened Images │ Patch Management │ Container Security          │
└─────────────────────────────────────────────────────────────────────┘

Cryptographic Standards

Purpose Algorithm Key Length
Data at Rest AES-256-GCM 256-bit
Data in Transit TLS 1.3 256-bit
Key Encryption RSA-OAEP 2048-bit minimum
Digital Signatures ECDSA P-256
Hashing SHA-256 N/A

Integration Points

External System Integration

System Type Integration Method Data Exchange
Practice Management API / Database connector Matters, clients, time entries
CRM Systems API / Webhook Contacts, organizations, activities
Document Management API / File system Documents, metadata
Billing Systems API / Database Invoices, payments, AR
Corporate Registries API Company filings, ownership
Screening Providers API Sanctions, PEP, adverse media

MCP Protocol

The Model Context Protocol (MCP) provides a standardized interface for tool integration:

Component Purpose
Tool Registry Catalog of available tools and capabilities
Schema Definition Input/output contracts for each tool
Authentication Tool-specific credential management
Execution Secure tool invocation with timeout handling

Deployment Models

LegalFab is available in multiple deployment configurations to meet varying security, compliance, and operational requirements.

Deployment Options

Model Description Use Case
SaaS Multi-Tenant Shared infrastructure, isolated data Standard deployment, cost-effective
Dedicated Cloud Tenant Dedicated infrastructure in LegalFab cloud Enterprise, enhanced isolation
Customer Cloud Deploy in customer’s cloud tenant (AWS, Azure, GCP) Data sovereignty, infrastructure control
On-Premises Fully customer-managed within data centers Maximum control, air-gapped environments
Hybrid SaaS control plane, on-premises data plane Balance of convenience and control

SaaS Multi-Tenant

Aspect Details
Infrastructure Shared compute, isolated data stores
Data Isolation Logical tenant isolation with encryption
Compliance SOC 2, GDPR compliant infrastructure
Updates Automatic platform updates
Best For Standard requirements, rapid deployment

Dedicated Cloud Tenant

Aspect Details
Infrastructure Dedicated resources within LegalFab cloud
Data Isolation Physical separation of compute and storage
Compliance Enhanced compliance posture
Updates Coordinated update windows
Best For Enterprise customers requiring dedicated resources

Customer Cloud Deployment

Aspect Details
Infrastructure Customer’s own cloud tenant (AWS, Azure, GCP)
Data Control Customer maintains full infrastructure control
Region Selection Deploy in preferred cloud region
Management LegalFab manages application, customer manages infrastructure
Best For Data sovereignty, existing cloud investments

On-Premises Deployment

Aspect Details
Infrastructure Customer data centers
Data Control Complete data custody
Network Air-gapped capability available
Management Customer-managed with LegalFab support
Best For Strict regulatory requirements, maximum control

Data Residency Configuration

Region Option Data Location LLM Processing
UK-Only UK data centers UK-based providers or on-premises
EU-Only EU data centers (Ireland common) EU-based providers
US-Only US data centers US-based providers
Customer-Specified Any supported region Region-aligned providers
On-Premises Customer data centers Self-hosted models

LLM Provider Configuration by Deployment

Deployment LLM Options
SaaS Cloud providers (OpenAI, Anthropic, Google)
Dedicated Tenant Cloud providers with dedicated keys
Customer Cloud Cloud providers or self-hosted in customer cloud
On-Premises Self-hosted models (Llama, Mistral) for complete data control
Hybrid Cloud for non-sensitive, on-premises for sensitive

Deployment Feature Comparison

Feature SaaS Dedicated Customer Cloud On-Premises
Knowledge Fabric
Studio (Agents, Widgets, Datasets)
Dialog Interface
Multi-Tenancy
LLM Integration
Custom Region Limited N/A
Self-Hosted LLMs

For detailed information see individual component documents: Knowledge Fabric, Studio, Dialog, AI-LLM, AML Compliance, and API Security.