LegalFab System Architecture

Version: 1.4 Last Updated: January 2026

Platform Overview

LegalFab is an AI-powered legal technology platform built on a metadata-driven architecture. The platform provides unified access to distributed data assets, intelligent automation through AI agents, and compliance capabilities for regulated legal practices.

Platform Components

Component	Purpose	Key Capabilities
Knowledge Fabric	Data integration and intelligence layer	Persistent Knowledge Graph, Entity Resolution, 200+ MCP Connectors, Search Sessions
Studio	Creation and execution environment	Agent Builder, Widgets, Datasets, Chain of Agents, Operational Modes (0-4)
Dialog	Conversational intelligence interface	Natural Language Understanding, Intelligent Routing, Cross-Platform Context, Long-Term Memory
Schema Management	Business domain definition and control	Domain Discovery, Schema Registry, Schema Validation
AI & LLM Layer	Intelligent processing and inference	LightLLM Gateway, Output Consistency, Model Provenance
AML Compliance	Regulatory compliance module	Rule Engine, BPM Workflows, Screening, Case Management
DevOps Infrastructure	Deployment and operations	CI/CD Pipelines, Monitoring, Security Operations

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                        PRESENTATION LAYER                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐   │
│  │   Web UI     │  │  API Gateway │  │    Dialog    │  │  Messaging │   │
│  │              │  │  (REST)      │  │   Interface  │  │  Mini-App  │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘   │
│                              │                                          │
│                   [Authentication, Rate Limiting, Input Validation]     │
├─────────────────────────────────────────────────────────────────────────┤
│                          APPLICATION LAYER                              │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                          DIALOG                                 │    │
│  │  ┌───────────┐  ┌───────────────┐  ┌───────────┐  ┌──────────┐  │    │
│  │  │    NLU    │  │    Dialog     │  │  Function │  │ Context  │  │    │
│  │  │  Engine   │  │    Manager    │  │  Router   │  │ Manager  │  │    │
│  │  └───────────┘  └───────────────┘  └───────────┘  └──────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                          STUDIO                                 │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────────────┐  │    │
│  │  │  Agent    │  │  Widgets  │  │ Datasets  │  │  Chain of    │  │    │
│  │  │  Builder  │  │           │  │           │  │  Agents      │  │    │
│  │  └───────────┘  └───────────┘  └───────────┘  └──────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                     AML COMPLIANCE MODULE                       │    │
│  │  ┌───────────┐  ┌───────────────┐  ┌───────────┐  ┌──────────┐  │    │
│  │  │   Rule    │  │  BPM          │  │ Screening │  │  Case    │  │    │
│  │  │  Engine   │  │  Workflows    │  │  Service  │  │ Manager  │  │    │
│  │  └───────────┘  └───────────────┘  └───────────┘  └──────────┘  │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌────────────┐   │
│  │  Search  │ │ Lineage  │ │ Quality  │ │ Discovery  │ │ Governance │   │
│  │ Service  │ │ Service  │ │ Service  │ │  Service   │ │  Service   │   │
│  └──────────┘ └──────────┘ └──────────┘ └────────────┘ └────────────┘   │
│                              │                                          │
│                   [Service-to-Service AuthN/AuthZ, mTLS]                │
├─────────────────────────────────────────────────────────────────────────┤
│                      KNOWLEDGE FABRIC LAYER                             │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                    KNOWLEDGE GRAPH                              │    │
│  │  ┌──────────────┐  ┌──────────────────┐  ┌──────────────────┐   │    │
│  │  │ Entity Store │  │ Relationship     │  │  Query Engine    │   │    │
│  │  │   (Nodes)    │  │ Store (Edges)    │  │  (Traversal)     │   │    │
│  │  └──────────────┘  └──────────────────┘  └──────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                  ENTITY RESOLUTION ENGINE                       │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌─────────────┐   │    │
│  │  │ Blocking  │  │ Matching  │  │ Clustering│  │ Golden      │   │    │
│  │  │ Service   │  │ Service   │  │ Service   │  │ Record Mgmt │   │    │
│  │  └───────────┘  └───────────┘  └───────────┘  └─────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│                   [Encryption at Rest, Access Control Lists]            │
├─────────────────────────────────────────────────────────────────────────┤
│                        AI & LLM LAYER                                   │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │  ┌────────────┐  ┌──────────────┐  ┌────────────────────────┐   │    │
│  │  │  LightLLM  │  │   Provider   │  │  Prompt Management     │   │    │
│  │  │  Gateway   │  │  Abstraction │  │  & Template Engine     │   │    │
│  │  └────────────┘  └──────────────┘  └────────────────────────┘   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                          │
│                   [Input Filtering, Output Validation, PII Detection]   │
├─────────────────────────────────────────────────────────────────────────┤
│                       CONNECTIVITY LAYER                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐   │
│  │  Database    │  │    API       │  │     MCP      │  │   Event    │   │
│  │  Connectors  │  │  Connectors  │  │  Connectors  │  │  Streams   │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘   │
│                              │                                          │
│               [Credential Vault, Secure Connections, Data Sampling]     │
├─────────────────────────────────────────────────────────────────────────┤
│                       SOURCE SYSTEMS                                    │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐  │
│  │Databases │ │  Document │ │  APIs    │ │  Legal   │ │   External   │  │
│  │          │ │  Stores   │ │          │ │  Systems │ │   Sources    │  │
│  └──────────┘ └───────────┘ └──────────┘ └──────────┘ └──────────────┘  │
│                              │                                          │
│                [Customer-Managed, Customer Credentials]                 │
└─────────────────────────────────────────────────────────────────────────┘

Knowledge Fabric

The Knowledge Fabric serves as the foundational data integration and intelligence layer for the LegalFab platform. It implements a metadata-driven architecture that provides unified access to distributed data assets while leaving source data in place.

Core Capabilities

Capability	Description
Persistent Knowledge Graph	Corporate memory with schema-bounded extraction, source provenance, and Knowledge Tree structure
Entity Resolution	Cross-source entity matching using blocking, matching, and clustering algorithms with golden record management
200+ MCP Connectors	Federated queries across databases, SaaS applications, legal systems, and corporate registries
Search Sessions	Iterative exploration with session graphs and accumulated context
Data Observability	Quality monitoring, freshness tracking, and automated alerts
Discovery Service	Automated identification and cataloging of data assets
Active Metadata	Continuous metadata analysis, profiling, and enrichment
Data Lineage	End-to-end tracking of data flow from source to consumption

Knowledge Graph Model

The Knowledge Graph stores entities as nodes and relationships as edges, enabling complex traversal queries and network analysis.

Entity Types:

Entity	Description	Use Cases
Person	Individual clients, contacts, beneficial owners	KYC, risk assessment, relationship mapping
Organization	Corporate clients, counterparties, related entities	Corporate structure, ownership analysis
Matter	Legal matters, cases, engagements	Matter management, conflict checking
Document	Contracts, filings, correspondence	Document management, search
Address	Physical and registered addresses	Location analysis, verification
Identifier	Tax IDs, registration numbers, LEIs	Cross-reference, verification

Relationship Types:

Relationship	Description
OWNS	Ownership stake between entities
CONTROLS	Control relationship (voting, management)
RELATED_TO	Personal or business relationship
EMPLOYS	Employment relationship
REPRESENTS	Legal representation relationship
LOCATED_AT	Entity-address relationship

Entity Resolution

The Entity Resolution Engine identifies duplicate and related records across connected sources to build unified entity profiles (Golden Records).

Resolution Pipeline:

Source Records → Blocking → Candidate Pairs → Matching → Clusters → Golden Records
                    │             │              │           │
              (Key generation) (Comparison)  (Scoring)  (Merge rules)

Studio

The Studio provides the creation and execution environment for AI agents, widgets, datasets, and workflows. The platform supports five operational modes enabling organizations to balance automation with human oversight.

Core Capabilities

Capability	Description
Agent Creation	Domain-driven flow with schema selection, natural language definition, and testing
Widgets	Agents with visual interface components (charts, tables, custom views)
Datasets	Structured data collections with schema enforcement and access controls
Business Domain Discovery	Extract schemas from documents to define entity structures
Chain of Agents	Orchestrate multiple agents in sequential, parallel, or hierarchical patterns
Operational Modes	Five modes from traditional platform (Mode 0) to fully automated with audit (Mode 4)
Text-to-Pipeline	Natural language pipeline generation with DSL output and iterative refinement
Testing Framework	Test agents and workflows in sandboxed environments

Text-to-Pipeline

Users describe desired workflows in natural language, and the system generates a structured DSL representation:

Stage	Description
Intent Analysis	Parse natural language to identify data sources, operations, and conditions
DSL Generation	Generate structured pipeline definition with steps and dependencies
Validation	Verify permissions, tool availability, and type compatibility
Visual Preview	Display interactive flow diagram with step details
Refinement	Iterate using natural language commands to modify the pipeline
Execution	Run immediately, schedule, or save as reusable template

Agent Architecture

Agents are the fundamental execution unit in Studio. Each agent has:

Component	Description
Definition	Name, version, description, category
Inputs	Input parameters with schemas and validation rules
Workflow	Execution steps, tool calls, conditional logic
Outputs	Result definitions with transformation rules

Chain of Agents

Chains enable complex workflows by orchestrating multiple agents:

Pattern	Description	Use Case
Sequential	Agents execute in order, passing results	Document review pipeline
Parallel	Agents execute concurrently	Multi-source research
Conditional	Agent selection based on runtime conditions	Risk-based routing
Loop	Repeated execution until condition met	Iterative refinement

Operational Modes

The platform supports five operational modes enabling organizations to configure automation levels:

Mode	Name	Description
0	Traditional Platform	No AI agents; manual investigation and analysis
1	AI-Assisted Manual	Agents in suggest-only mode; all decisions require human approval
2	Routine Automation	Agents handle routine tasks; humans focus on analysis and decisions
3	Autonomous with Escalation	Full automation with escalation on exceptions
4	Fully Automated with Audit	End-to-end automation with post-investigation human audits

Dialog

The Dialog component serves as the central conversational intelligence layer, enabling natural language interaction across all platform components.

Core Capabilities

Capability	Description
Natural Language Understanding	Intent classification, entity extraction, context analysis
Intelligent Routing	Routes queries to Knowledge Fabric, Studio, Marketplace, or Exchange
Multi-Level Query Processing	Handles complex queries requiring multiple platform components
Cross-Platform Continuity	Maintains context between web and messaging applications
Long-Term Memory	Preserves conversation history across sessions
Document Processing	Handles document uploads within conversations
Intelligent Caching	Two-layer caching (exact match + semantic) for performance

Dialog State Machine

State	Description
IDLE	Waiting for user input
PROCESSING	Analyzing user request via NLU
ROUTING	Determining appropriate platform components
EXECUTING	Calling platform functions
RESPONDING	Generating user response
CLARIFYING	Requesting additional information
ERROR	Handling errors and recovery

Function Routing

Component	Query Types
Knowledge Fabric	Search, entity lookup, graph traversal
Studio	Agent invocation, pipeline execution, creation assistance
Marketplace	Asset search, details, acquisition
Exchange	Data sharing, collaboration, access requests

Schema Management

The Schema Management system provides a unified approach to defining, discovering, and managing business domain schemas across the platform.

Core Capabilities

Capability	Description
Business Domain Discovery	Extract domain concepts from user documents
Schema Registry	Centralized storage with versioning and access control
Schema Validation	Ensure data conformance across all platform components
Schema Binding	Link schemas to agents, extractors, and MCP connectors

Schema Usage Across Platform

Component	Schema Role
Studio Agents	Input/output validation, data transformation control
OSINT Extractors	Structure external data according to domain model
MCP Connectors	Ensure data consistency across tool integrations
Knowledge Graph	Entity and relationship type definitions
Pipelines	Data flow validation between processing steps

Document-to-Schema Discovery

Stage	Description
Document Analysis	Extract text, structure, and metadata from user documents
Concept Extraction	Identify entities, attributes, and relationships
Schema Generation	Create structured schema definitions
User Refinement	Interactive review and modification
Publication	Register schema in central registry

AI & LLM Layer

The AI & LLM Layer provides intelligent processing capabilities through a provider-agnostic gateway with comprehensive output consistency and quality controls.

Core Capabilities

Capability	Description
LightLLM Gateway	Unified interface for multiple LLM providers
Output Consistency	Schema-validated extraction and ontology-based execution
Model Provenance	Complete tracking of model versions and configurations
A/B Testing	Controlled model updates with performance comparison
Quality Assurance	Feedback loops, accuracy monitoring, reasoning chain transparency
Provider Abstraction	Swap providers without code changes
Prompt Management	Template library with version control

LLM Output Consistency

Control	Description
Schema-Validated Extraction	All outputs validated against user-defined schemas
Ontology-Based Execution	Responses grounded in domain ontology
Reasoning Chain Transparency	Full reasoning paths logged for audit
Deterministic Components	Separation of deterministic vs. probabilistic processing

Provider Support

Provider Type	Examples	Integration
Commercial APIs	OpenAI, Anthropic, Google	API key authentication
Self-Hosted	Llama, Mistral, custom models	Private endpoint
Enterprise	Azure OpenAI, AWS Bedrock	Cloud provider auth

AML Compliance Module

The AML Compliance module provides comprehensive capabilities for customer due diligence and regulatory compliance.

Core Capabilities

Capability	Description
Rule Engine	Risk scoring and compliance decision rules
BPM Workflows	CDD, EDD, and investigation process orchestration
Screening Service	Sanctions, PEP, and adverse media checking
Case Management	Compliance case tracking and documentation
Reporting	Regulatory reporting and analytics

Risk Assessment

The Rule Engine calculates risk scores based on configurable factors:

Risk Category	Factors
Geographic	Country risk, jurisdiction, sanctions exposure
Industry	Sector risk, regulatory intensity
Product/Service	Complexity, cash intensity, cross-border
Customer Type	Individual, corporate, PEP status
Behavioral	Transaction patterns, anomalies

BPM Workflows

Workflow	Purpose	Trigger
Client Onboarding	Initial CDD for new clients	Client intake
Periodic Review	Regular risk reassessment	Time-based
Triggered Review	Event-driven reassessment	Risk event
Investigation	Detailed inquiry process	Alert escalation
SAR Escalation	Suspicious activity reporting	Investigation finding

Network Architecture

Network Segmentation

Zone	Purpose	Components
Edge/DMZ	External entry point	Load balancers, WAF, API Gateway
Application	Service execution	Studio, Services, AI Layer
Data	Persistent storage	Knowledge Graph, Document Store
Management	Operations	Monitoring, Logging, Admin tools

Connectivity Patterns

Pattern	Description	Use Case
Direct TLS	Encrypted connection over internet	Cloud-hosted sources
VPN Tunnel	Site-to-site encrypted tunnel	On-premises sources
Private Link	Cloud provider private connectivity	Same-cloud sources
Agent-Based	Customer-deployed agent connects outbound	Air-gapped environments

Data Flow Model

LegalFab operates on a metadata-first principle. Source data remains in place while metadata flows through the platform.

Data Type	Handling	Storage
Structural Metadata	Extracted and stored	Knowledge Graph
Statistical Profiles	Computed via aggregation	Knowledge Graph
Sample Data	Optional, user-controlled	Ephemeral cache
Query Results	Pass-through federation	Never persisted
Source Credentials	Encrypted storage	Secure Vault

Security Architecture

Defense-in-Depth Model

┌─────────────────────────────────────────────────────────────────────┐
│                    PERIMETER SECURITY                               │
│    DDoS Protection │ WAF │ Rate Limiting │ Geographic Filtering     │
├─────────────────────────────────────────────────────────────────────┤
│                    NETWORK SECURITY                                 │
│    VPC Isolation │ Network Segmentation │ Private Endpoints         │
├─────────────────────────────────────────────────────────────────────┤
│                    APPLICATION SECURITY                             │
│    Input Validation │ Output Encoding │ CSRF Protection │ CSP       │
├─────────────────────────────────────────────────────────────────────┤
│                    IDENTITY & ACCESS                                │
│    OAuth 2.0/OIDC │ RBAC │ ABAC │ MFA │ Session Management          │
├─────────────────────────────────────────────────────────────────────┤
│                    DATA SECURITY                                    │
│    Encryption │ Tokenization │ Data Masking │ Classification        │
├─────────────────────────────────────────────────────────────────────┤
│                    INFRASTRUCTURE SECURITY                          │
│    Hardened Images │ Patch Management │ Container Security          │
└─────────────────────────────────────────────────────────────────────┘

Cryptographic Standards

Purpose	Algorithm	Key Length
Data at Rest	AES-256-GCM	256-bit
Data in Transit	TLS 1.3	256-bit
Key Encryption	RSA-OAEP	2048-bit minimum
Digital Signatures	ECDSA	P-256
Hashing	SHA-256	N/A

Integration Points

External System Integration

System Type	Integration Method	Data Exchange
Practice Management	API / Database connector	Matters, clients, time entries
CRM Systems	API / Webhook	Contacts, organizations, activities
Document Management	API / File system	Documents, metadata
Billing Systems	API / Database	Invoices, payments, AR
Corporate Registries	API	Company filings, ownership
Screening Providers	API	Sanctions, PEP, adverse media

MCP Protocol

The Model Context Protocol (MCP) provides a standardized interface for tool integration:

Component	Purpose
Tool Registry	Catalog of available tools and capabilities
Schema Definition	Input/output contracts for each tool
Authentication	Tool-specific credential management
Execution	Secure tool invocation with timeout handling

Deployment Models

LegalFab is available in multiple deployment configurations to meet varying security, compliance, and operational requirements.

Deployment Options

Model	Description	Use Case
SaaS Multi-Tenant	Shared infrastructure, isolated data	Standard deployment, cost-effective
Dedicated Cloud Tenant	Dedicated infrastructure in LegalFab cloud	Enterprise, enhanced isolation
Customer Cloud	Deploy in customer’s cloud tenant (AWS, Azure, GCP)	Data sovereignty, infrastructure control
On-Premises	Fully customer-managed within data centers	Maximum control, air-gapped environments
Hybrid	SaaS control plane, on-premises data plane	Balance of convenience and control

SaaS Multi-Tenant

Aspect	Details
Infrastructure	Shared compute, isolated data stores
Data Isolation	Logical tenant isolation with encryption
Compliance	SOC 2, GDPR compliant infrastructure
Updates	Automatic platform updates
Best For	Standard requirements, rapid deployment

Dedicated Cloud Tenant

Aspect	Details
Infrastructure	Dedicated resources within LegalFab cloud
Data Isolation	Physical separation of compute and storage
Compliance	Enhanced compliance posture
Updates	Coordinated update windows
Best For	Enterprise customers requiring dedicated resources

Customer Cloud Deployment

Aspect	Details
Infrastructure	Customer’s own cloud tenant (AWS, Azure, GCP)
Data Control	Customer maintains full infrastructure control
Region Selection	Deploy in preferred cloud region
Management	LegalFab manages application, customer manages infrastructure
Best For	Data sovereignty, existing cloud investments

On-Premises Deployment

Aspect	Details
Infrastructure	Customer data centers
Data Control	Complete data custody
Network	Air-gapped capability available
Management	Customer-managed with LegalFab support
Best For	Strict regulatory requirements, maximum control

Data Residency Configuration

Region Option	Data Location	LLM Processing
UK-Only	UK data centers	UK-based providers or on-premises
EU-Only	EU data centers (Ireland common)	EU-based providers
US-Only	US data centers	US-based providers
Customer-Specified	Any supported region	Region-aligned providers
On-Premises	Customer data centers	Self-hosted models

LLM Provider Configuration by Deployment

Deployment	LLM Options
SaaS	Cloud providers (OpenAI, Anthropic, Google)
Dedicated Tenant	Cloud providers with dedicated keys
Customer Cloud	Cloud providers or self-hosted in customer cloud
On-Premises	Self-hosted models (Llama, Mistral) for complete data control
Hybrid	Cloud for non-sensitive, on-premises for sensitive

Deployment Feature Comparison

Feature	SaaS	Dedicated	Customer Cloud	On-Premises
Knowledge Fabric	✓	✓	✓	✓
Studio (Agents, Widgets, Datasets)	✓	✓	✓	✓
Dialog Interface	✓	✓	✓	✓
Multi-Tenancy	✓	✓	✓	✓
LLM Integration	✓	✓	✓	✓
Custom Region	Limited	✓	✓	N/A
Self-Hosted LLMs	✗	✗	✓	✓

For detailed information see individual component documents: Knowledge Fabric, Studio, Dialog, AI-LLM, AML Compliance, and API Security.