LegalFab Knowledge Fabric

Version: 1.7 Last Updated: February 2026


Component Overview

The Knowledge Fabric serves as the foundational data integration and intelligence layer for the LegalFab platform. It implements a metadata-driven architecture that provides unified access to distributed data assets while leaving source data in place. The Knowledge Fabric is fundamentally an access enabler, not a data repository—it maintains mappings and relationships rather than duplicating data, ensuring a single source of truth while providing unified investigative capabilities.

Core Capabilities:

Capability Description
Knowledge Graph Graph-native storage with entity resolution
Entity Resolution Cross-source entity matching and linking
Connectivity Direct DB/API connections, MCP connectors
Two-Way Data Flow Read from and write back to source systems
Discovery Service Automated identification and cataloging
Active Metadata Continuous metadata analysis and enrichment
External Sources OSINT and external data integration
Data Lineage End-to-end tracking of data flow
MCP Creation Model Context Protocol connector generation
Monitoring Health, performance, and security monitoring

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                        PRESENTATION LAYER                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────────┐   │
│  │   Search UI  │  │  API Gateway │  │  Platform Integration APIs   │   │
│  └──────────────┘  └──────────────┘  └──────────────────────────────┘   │
│                              │                                          │
│                   [Authentication, Rate Limiting, Input Validation]     │
├─────────────────────────────────────────────────────────────────────────┤
│                          SERVICE LAYER                                  │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌────────────┐   │
│  │  Search  │ │ Lineage  │ │ Quality  │ │ Discovery  │ │ Governance │   │
│  │ Service  │ │ Service  │ │ Service  │ │  Service   │ │  Service   │   │
│  └──────────┘ └──────────┘ └──────────┘ └────────────┘ └────────────┘   │
│                              │                                          │
│                   [Service-to-Service AuthN/AuthZ, mTLS]                │
├─────────────────────────────────────────────────────────────────────────┤
│                      KNOWLEDGE GRAPH LAYER                              │
│  ┌──────────────┐  ┌──────────────────┐  ┌──────────────────────┐       │
│  │ Entity Store │  │ Relationship     │  │    Query Engine      │       │
│  │              │  │ Store            │  │                      │       │
│  └──────────────┘  └──────────────────┘  └──────────────────────┘       │
│                              │                                          │
│                   [Encryption at Rest, Access Control Lists]            │
├─────────────────────────────────────────────────────────────────────────┤
│                     CONNECTIVITY LAYER                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐   │
│  │  Database    │  │    API       │  │     MCP      │  │   Event    │   │
│  │  Connectors  │  │  Connectors  │  │  Connectors  │  │  Streams   │   │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘   │
│                              │                                          │
│               [Credential Vault, Secure Connections, Data Sampling]     │
├─────────────────────────────────────────────────────────────────────────┤
│                       SOURCE SYSTEMS                                    │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐  │
│  │Databases │ │  Document │ │  APIs    │ │  Legal   │ │   External   │  │
│  │          │ │  Stores   │ │          │ │  Systems │ │   Sources    │  │
│  └──────────┘ └───────────┘ └──────────┘ └──────────┘ └──────────────┘  │
│                              │                                          │
│                [Customer-Managed, Customer Credentials]                 │
└─────────────────────────────────────────────────────────────────────────┘

Architecture Overview

Knowledge Graph

The Knowledge Graph serves as the foundation for entity management, relationship tracking, and data integration across the LegalFab platform. Knowledge Graph

Graph Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    KNOWLEDGE GRAPH ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    ENTITY RESOLUTION ENGINE                 │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │    │
│  │  │  Matching   │  │  Merging    │  │  Linking    │          │    │
│  │  │  Service    │  │  Service    │  │  Service    │          │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘          │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    GRAPH STORAGE LAYER                      │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │    │
│  │  │  Entities   │  │Relationships│  │  Properties │          │    │
│  │  │  (Nodes)    │  │   (Edges)   │  │   (Attrs)   │          │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘          │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    QUERY & TRAVERSAL                        │    │
│  │  • Graph Queries   • Path Finding   • Pattern Matching      │    │
│  │  • Aggregations    • Subgraph Extraction   • Analytics      │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

Entity Types

Entity Type Description Key Attributes
Person Individual entities Name variants, identifiers, demographics
Organization Corporate entities Legal name, registration, jurisdiction
Matter Legal engagements Matter ID, type, status, dates
Document Legal documents Type, classification, retention
Address Physical/mailing addresses Components, geolocation, validation
Identifier External IDs Type, value, issuer, validity

Relationship Types

Relationship From To Properties
OWNS Person/Organization Organization Percentage, start/end dates
CONTROLS Person/Organization Organization Control type, effective date
RELATED_TO Person Person Relationship type
EMPLOYS Organization Person Role, department, dates
REPRESENTS Organization Person/Organization Matter reference
LOCATED_AT Person/Organization Address Address type, validity
HAS_IDENTIFIER Person/Organization Identifier Primary flag

Entity Resolution

The Entity Resolution Engine identifies and links records that refer to the same real-world entity across multiple data sources.

Entity Resolution Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    ENTITY RESOLUTION PIPELINE                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Source Records ──▶ [Blocking] ──▶ [Matching] ──▶ [Clustering]      │
│        │                │              │               │            │
│        ▼                ▼              ▼               ▼            │
│  (Raw ingestion)   (Candidate     (Similarity      (Entity          │
│                     pairs)         scoring)        assignment)      │
│                                        │               │            │
│                                        ▼               ▼            │
│                              [Human Review] ──▶ [Golden Record]     │
│                                (Uncertain        (Merged entity)    │
│                                 matches)                |           │
│                                        │                │           │
│                                        ▼                ▼           │
│                              [Audit Trail] ◀────────────┘           │
└─────────────────────────────────────────────────────────────────────┘

Resolution Components

Component Function Security Controls
Blocking Reduces comparison space Configurable rules, no false negatives
Matching Calculates similarity scores Deterministic + probabilistic methods
Clustering Groups related records Configurable thresholds
Golden Record Creates authoritative entity Merge rules, conflict resolution
Human Review Handles uncertain matches Role-based assignment, audit trail

Matching Methods

Method Use Case Accuracy
Exact Match Identifiers, codes 100% (when available)
Fuzzy Name Match Person/organization names Configurable threshold
Phonetic Match Name variations, misspellings Soundex, Metaphone
Address Standardization Location matching USPS/Royal Mail standards
ML-Based Scoring Complex entity types Model-dependent

External Source Integration

The Knowledge Fabric connects to external data sources for entity enrichment and verification.

External Source Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    EXTERNAL SOURCE INTEGRATION                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    SOURCE REGISTRY                          │    │
│  │  • Source Catalog   • Credential Store   • Rate Limits      │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                      │
│         ┌────────────────────┼────────────────────┐                 │
│         ▼                    ▼                    ▼                 │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐          │
│  │  Corporate  │      │  Identity   │      │   OSINT     │          │
│  │  Registries │      │  Providers  │      │  Sources    │          │
│  └─────────────┘      └─────────────┘      └─────────────┘          │
│         │                    │                    │                 │
│         └────────────────────┼────────────────────┘                 │
│                              ▼                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    ENRICHMENT ENGINE                        │    │
│  │  • Entity Matching   • Data Fusion   • Provenance Tracking  │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                              │                                      │
│                              ▼                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    KNOWLEDGE GRAPH                          │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

External Source Categories

Category Examples Data Types
Corporate Registries Companies House, SEC EDGAR Incorporation, officers, filings
Identity Verification ID document validation Identity confirmation
Beneficial Ownership PSC registers, UBO databases Ownership chains
Sanctions Lists OFAC, OFSI, UN, EU Designated persons/entities
PEP Databases Politically exposed persons Political associations
Adverse Media News aggregators, media monitors Negative news, allegations
Court Records Legal databases Litigation history
Credit Bureaus Business credit agencies Financial standing

OSINT Integration Security

Control Implementation
Source Validation Only approved sources in registry
Credential Isolation Per-source credential management
Rate Limiting Respect source API limits
Data Minimization Retrieve only required fields
Caching Policy Time-limited caching per source
Provenance Tracking Full lineage from source to graph
Schema Binding Extracted data aligned to user-controlled schemas

Schema-Controlled Extraction

OSINT extractors use schemas defined by the user to structure incoming data. This ensures external data conforms to the organization’s domain model.

Capability Description
Schema Assignment Each extractor bound to target schema
Field Mapping External fields mapped to schema attributes
Type Conversion External data converted to schema types
Validation Extracted data validated against schema constraints
Default Values Missing fields populated with schema defaults

For detailed schema management, see 09-Schema-Management.

Enrichment Workflow

Stage Description Security Control
Request Entity submitted for enrichment Authorization check
Matching Entity matched against external source Matching rules applied
Retrieval Data fetched from external source Encrypted transport
Fusion External data merged with existing Conflict resolution rules
Validation Enriched data validated Schema validation
Storage Enriched entity persisted Access control inherited
Audit Enrichment event logged Full audit trail

External Source Monitoring

Metric Description Alert Threshold
Source Availability External source uptime < 99% over 24h
Response Latency External source response time > 5 seconds
Match Rate Successful entity matches < 70% (source-dependent)
Error Rate Failed enrichment requests > 5%
Credential Expiry Days until credential expires < 30 days

Customer System Connections

The Knowledge Fabric maintains live connections to customer source systems, keeping the graph synchronized with operational data.

External Source Integration

Connection Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    CUSTOMER SYSTEM CONNECTIONS                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│                    ┌─────────────────────┐                          │
│                    │   KNOWLEDGE GRAPH   │                          │
│                    └─────────┬───────────┘                          │
│                              │                                      │
│              ┌───────────────┼───────────────┐                      │
│              ▼               ▼               ▼                      │
│  ┌───────────────────┐ ┌──────────────┐ ┌───────────────────┐       │
│  │  SYNC MANAGER     │ │ CHANGE DATA  │ │  MCP GATEWAY      │       │
│  │  (Batch/Schedule) │ │ CAPTURE      │ │  (Real-time)      │       │
│  └─────────┬─────────┘ └──────┬───────┘ └─────────┬─────────┘       │
│            │                  │                   │                 │
│            └──────────────────┼───────────────────┘                 │
│                               │                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    CONNECTOR LAYER                          │    │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────┐  │    │
│  │  │  CRM    │ │ Practice│ │ Document│ │ Finance │ │ Custom│  │    │
│  │  │ Systems │ │ Mgmt    │ │  Mgmt   │ │ Systems │ │  APIs │  │    │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └───────┘  │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                               │                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    CUSTOMER SYSTEMS                         │    │
│  │  Matter Mgmt │ CRM │ Billing │ Document Mgmt │ HR Systems   │    │
│  └─────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────┘

Customer System Types

System Type Integration Method Data Extracted
Practice Management API, database Matters, clients, contacts
CRM Systems API Relationships, interactions
Document Management API Document metadata, classifications
Billing Systems API, database Financial relationships
Email Archives API Communication metadata

Connection Security Controls

Control Implementation
Credential Vault All connection credentials encrypted at rest
Connection Encryption TLS 1.2+ required for all connections
IP Allowlisting Customer can restrict to LegalFab IPs
Read-Only Access Metadata extraction uses read-only credentials
Query Auditing All queries to source systems logged
Data Sampling Only statistical samples, no bulk data

Data Synchronization Security

Control Description
Conflict Resolution Configurable rules for conflicting updates
Tombstone Handling Deleted records tracked, not purged
Version Tracking All entity versions preserved
Sync Validation Checksums verify data integrity
Rollback Support Sync batches can be reversed

Data Flow

The Knowledge Fabric enables access and source systems while maintaining data governance and audit requirements. Customer System Connections

Data Flow Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    DATA FLOW                                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│                    ┌─────────────────────┐                          │
│                    │   KNOWLEDGE FABRIC  │                          │
│                    │                     │                          │
│                    └─────────┬───────────┘                          │
│                              │                                      │
│                    ┌─────────┴───────────┐                          │
│                    │   MCP CONNECTORS    │                          │
│                    │                     │                          │
│                    └─────────┬───────────┘                          │
│                              │                                      │
│         ┌────────────────────┼────────────────────┐                 │
│         │                    │                    │                 │
│         ▼                    ▼                    ▼                 │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐          │
│  │  ─────────  │      │  ─────────  │      │  ─────────  │          │
│  │   CRM       │      │  Document   │      │   Data      │          │
│  │             │      │  Mgmt       │      │   Lake      │          │
│  │  ─────────  │      │             │      │             │          │
│  └─────────────┘      └─────────────┘      └─────────────┘          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Read Operations (Data Access)

Operation Description
Cross-Source Query Query across multiple sources simultaneously
Entity Retrieval Retrieve entity information from authoritative systems
Document Access Access documents and case files where they’re stored
Relationship Queries Pull relationship data from existing graph databases
Real-Time Access Direct data access without ETL or replication

Multi-System Coordination

When an entity exists in multiple systems:

Control Description
Primary Source Definition Define authoritative source for each entity type
Multi-Write Option Optionally write to multiple systems
Master Record Designation Designate which system holds master record
Conflict Resolution Handle when same entity has different values across sources
Cascade Control Control whether updates cascade to related records

What Knowledge Fabric Stores vs. What Remains in Source Systems

Stored in Knowledge Fabric:

Data Type Purpose
Entity Resolution Mappings Person A in CRM = Person A in case management
Cross-System Relationships Person connected to organization across databases
Investigation Annotations Tags, notes, and classifications
Derived Insights Agent conclusions and analysis results
Temporal Snapshots Point-in-time views for investigation history

Remains in Source Systems:

Data Type Location
Actual Records Names, addresses, phone numbers in CRM/source
Document Files Case files and evidence in document management
Transaction Records Financial data in source systems
Communication Logs Emails, messages in their native systems
Operational Data All authoritative business data

This architecture ensures you get a unified investigative view without moving or duplicating operational data. Updates to source systems are reflected immediately in the Knowledge Fabric view.

Customer System Health Monitoring

Health Check Description Frequency
Connectivity Source system reachable 5 minutes
Authentication Credentials valid Hourly
Data Freshness Last successful sync Continuous
Schema Drift Source schema changes Daily
Performance Sync latency and throughput Per sync

Connectivity

The Connectivity layer provides secure access to heterogeneous data sources through multiple connection patterns.

Connection Types

Type Description Security Controls
Direct Database JDBC/ODBC connections to relational databases Encrypted connections, credential vault
API Connections REST/GraphQL/SOAP integrations OAuth/API key auth, TLS transport
MCP Connectors Model Context Protocol for AI integration Schema validation, permission scoping
Event Streams Kafka, message queue integrations mTLS, message encryption
File Systems Cloud storage, network shares Access tokens, encryption

Database Connectivity Security

Control Implementation
Connection Encryption TLS 1.2+ required for all database connections
Connection Pooling Managed pools with configurable limits
Query Timeout Maximum query execution time enforced
Read-Only Mode Optional read-only connections for metadata extraction
IP Allowlisting Source system can restrict to LegalFab IPs

API Connectivity Security

Authentication Method Use Case Security Properties
OAuth 2.0 Modern APIs, cloud services Token-based, scoped, refreshable
API Keys Simple integrations Rotatable, rate-limited
mTLS High-security endpoints Certificate-based mutual auth
Basic Auth (over TLS) Legacy systems Encrypted transport required

MCP Connector Security

The Knowledge Fabric can both consume and generate MCP (Model Context Protocol) connectors. With 200+ data source connectors available, the platform enables federated queries across diverse systems.

MCP Connector Catalog:

Category Examples
Databases PostgreSQL, MySQL, MongoDB, Neo4j, Snowflake, BigQuery, ClickHouse, Oracle, SQL Server
SaaS Applications Salesforce, HubSpot, Slack, Gmail, Google Drive, Jira, GitHub, Airtable, Notion
Cloud Storage AWS S3, Azure Blob, Google Cloud Storage, Dropbox, OneDrive
Document Management SharePoint, iManage, NetDocuments, Box
Legal Systems Aderant, Elite, Clio, PracticePanther
AI/ML OpenAI, Anthropic Claude, ChromaDB, LanceDB

MCP Connector Configuration:

Configuration Description
OAuth2 Flow Secure authentication for SaaS connectors
API Key/Connection String Database and API authentication
Sync Frequency Hourly, daily, weekly, or on-demand sync
Enable/Disable Toggle connectors without deletion
Custom Filters Include/exclude rules for data indexing

MCP Schema Integration:

MCP connectors utilize user-defined schemas to ensure extracted data aligns with the organization’s domain model.

Capability Description
Schema Provision Schema provided to MCP tool for data extraction
Response Mapping Tool response mapped to schema structure
Contract Enforcement Tool outputs validated against expected schema
Type Safety Strong typing prevents data inconsistencies

For detailed schema management, see 09-Schema-Management.

Source System Connectivity Patterns

Pattern Description Use Case
Direct TLS Encrypted connection over internet Cloud-hosted sources
VPN Tunnel Site-to-site encrypted tunnel On-premises sources
Private Link Cloud provider private connectivity Same-cloud sources
Agent-Based Customer-deployed agent connects outbound Air-gapped environments

MCP Integration Lifecycle Management

LegalFab maintains MCP connectors through proactive monitoring, continuous health tracking, and defined remediation processes to ensure service continuity as third-party platforms evolve.

Proactive API Change Tracking:

Activity Frequency Description
Release Monitoring Continuous Track vendor release notes, changelogs, deprecation announcements
Breaking Change Alerts As announced Flag planned breaking changes for connected platforms
Compatibility Review Per vendor release cycle Scheduled testing against new API versions
Pre-Release Testing Where available Validation against vendor sandbox/beta environments

Continuous Integration Monitoring:

Capability Description
Real-Time Health Checks Automated status monitoring for all active MCP connections
Schema Drift Detection Identify API response changes and data structure modifications
Authentication Monitoring Track token expiry, credential validity, OAuth refresh status
Back-Office Dashboard Centralized connector status with per-integration health indicators
Configurable Alerting Thresholds adjustable per integration criticality

Error Reporting & Analytics:

Metric Description
Error Rate by MCP Type Aggregated error rates per connector category
Error Rate by Time Period Trend analysis (hourly, daily, weekly, monthly)
Error Classification Root cause categorization (vendor change, auth expiry, rate limit, network, schema drift)
Degradation Detection Trend analysis to identify issues before complete failure
Customer Reports Integration health reports available on request

Remediation Process:

Responsibility Owner Description
Issue Detection LegalFab Automated monitoring identifies connector issues
Root Cause Analysis LegalFab Determine if vendor change, configuration, or platform issue
Connector Updates LegalFab Code updates, compatibility fixes, regression testing
Customer Notification LegalFab Proactive communication of detected issues and remediation timeline

Remediation SLAs:

Severity Definition Response Time
Critical Complete connector failure, data flow stopped 24 hours
High Significant degradation, partial functionality 72 hours
Medium Minor issues, workaround available 7 days
Low Cosmetic or optimization improvements Next release cycle

Service Continuity Assurance:

Capability Description
Graceful Degradation Clear user messaging when connector unavailable
Retry Logic Exponential backoff for transient failures
Fallback Caching Cached data for read operations during outages
Status Transparency Real-time connector status visible to administrators

Discovery Service

The Discovery Service automatically identifies and catalogs data assets across connected sources. Connectivity

Discovery Components

Component Function Security Controls
Crawler Engine Traverses source structures Depth limits, exclusion patterns
Schema Extractor Extracts table/column metadata Read-only access, no data retrieval
Profiler Computes statistical profiles Sampling limits, aggregation only
Classifier Identifies sensitive data patterns ML-based PII/sensitive detection

Discovery Security Controls

Control Implementation
Scan Scheduling Configurable scan windows to minimize source impact
Rate Limiting Throttled requests to prevent source overload
Sampling Limits Maximum sample size for profiling (configurable)
Exclusion Rules Ability to exclude specific schemas/tables
Metadata Only No bulk data extraction; metadata and statistics only

Discovery Scope Management

Scope Setting Description Default
Schema Filter Include/exclude specific schemas All accessible
Table Filter Include/exclude specific tables All accessible
Column Sampling Enable/disable column profiling Enabled
Sample Size Maximum rows for statistical profiling 10,000
Scan Depth Maximum relationship traversal depth 3 levels

Discovery Audit Trail

Event Logged Data Retention
Scan Initiated Source, initiator, scope configuration 1 year
Asset Discovered Asset type, location, classification 1 year
Schema Change Detected Old/new schema, change type 2 years
Scan Completed Duration, assets discovered, errors 1 year
Scan Failed Error details, partial results 1 year

Active Metadata

The Active Metadata system provides continuous metadata analysis, enrichment, and intelligence.

Active Metadata Capabilities

Capability Description
Change Detection Monitors schema and data changes
Auto-Classification ML-based sensitive data detection
Relationship Discovery Identifies entity relationships
Quality Monitoring Continuous data quality assessment
Usage Analytics Tracks metadata access patterns

Automated Classification

Classification Categories:

Category Patterns Detected Handling
PII Names, addresses, SSN, phone, email Restricted access, masking
Legal Privileged Matter IDs, attorney-client markers Highly restricted
Financial Account numbers, billing data Confidential, encryption
Health Medical records, diagnoses Restricted, HIPAA controls
Custom Organization-defined patterns Configurable handling

Classification Security:

Control Implementation
ML Model Security Models trained on synthetic data only
Classification Audit All classification decisions logged
Override Controls Manual classification with approval workflow
Propagation Classifications propagate through lineage

Lineage Tracking

Discovery Service

Lineage Type Description Security Use
Technical Lineage Data flow through systems Impact analysis
Business Lineage Business process relationships Compliance mapping
Column Lineage Field-level transformations Sensitive data tracking
Operational Lineage Runtime execution paths Audit trail

Lineage Security Controls:

Control Description
Access Inheritance Downstream inherits upstream restrictions
Impact Analysis Identify affected assets on changes
Compliance Evidence Lineage serves as audit evidence

Persistent Knowledge Graph

The Persistent Knowledge Graph (PKG) serves as “corporate memory,” storing all information extracted from connected sources according to defined schemas. Unlike transient query results, the PKG maintains a durable, indexed knowledge base that accumulates insights over time.

Knowledge Tree Structure

The PKG organizes knowledge in a hierarchical structure that enables both detailed and global context retrieval:

Level Content Purpose
Level 1 Attributes Entity property information (names, dates, values)
Level 2 Relations Entity-entity relationship triples
Level 3 Keywords Semantic keyword indexing for search
Level 4 Communities Hierarchical clustering for global context

Schema-Bounded Extraction

Entity extraction is constrained through seed schemas that define targeted extraction:

Schema Component Description Example
Entity Types (E_types) Allowed entity categories Person, Organization, Contract
Relation Types (R_types) Allowed relationship types OWNS, REPRESENTS, SIGNED
Attribute Types (A_types) Allowed entity attributes Name, Date, Amount, Jurisdiction

Benefits of Schema-Bounded Extraction:

Benefit Description
Hallucination Prevention System cannot invent entities outside schema
Focused Extraction Only relevant information captured
Consistency Uniform entity structure across sources
Validation All extractions validated against schema

Source Provenance Model

Each entity and relationship maintains complete provenance linking back to source documents:

Provenance Element Description
Connector ID Source system connector reference
Document Reference Document ID, title, and URL
Precise Location Page, paragraph, character offset, or cell range
Extracted Text Exact text from which entity was derived
Confidence Score Extraction confidence level (0-1)
Extraction Timestamp When the extraction occurred

Incremental Updates

Capability Description
Change Detection Detect modified documents on each sync
Differential Processing Only process changed content
Entity Merging Merge updated information with existing entities
Conflict Resolution Handle conflicting information with rules
Version Tracking Maintain history of entity changes

Search Sessions

Search Sessions provide an interactive exploration context where users ask questions and accumulate discoveries across multiple query turns.

Session Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                       SEARCH SESSION MODEL                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌───────────────────┐     ┌───────────────────────────────────────┐    │
│  │   Project         │────▶│     Persistent Knowledge Graph        │    │
│  │   Configuration   │     │     (Corporate Memory)                │    │
│  └───────────────────┘     └───────────────────────────────────────┘    │
│           │                              │                              │
│           ▼                              ▼                              │
│  ┌───────────────────┐     ┌───────────────────────────────────────┐    │
│  │   Search Session  │────▶│     Session Graph                     │    │
│  │   (User Context)  │     │     (Accumulated Discoveries)         │    │
│  └───────────────────┘     └───────────────────────────────────────┘    │
│           │                              │                              │
│           ▼                              ▼                              │
│  ┌───────────────────┐     ┌───────────────────────────────────────┐    │
│  │   Query Turns     │────▶│     Results with Provenance           │    │
│  │   (NL Queries)    │     │     (Sources, Reasoning Chain)        │    │
│  └───────────────────┘     └───────────────────────────────────────┘    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Session Graph Accumulation

The Session Graph builds cumulatively across queries, providing context-aware exploration:

Feature Description
Cumulative Discovery Entities discovered in earlier queries available in later ones
Context Preservation Session maintains full context of exploration path
Relationship Building New queries can reference previously discovered entities
History Tracking All queries and findings preserved in session history

Query Turn Processing

Stage Description Security Control
Query Input Accept natural language query Input sanitization
Context Assembly Combine session context with PKG Permission filtering
Reasoning Chain Generate step-by-step reasoning Audit logging
Subgraph Retrieval Extract relevant portion of PKG Access control enforcement
Source Display Show document references Provenance verification
Session Update Add discoveries to session graph Session isolation

Data Observability

The Knowledge Fabric integrates data observability capabilities to monitor data health, quality, and pipeline performance. Active Metadata

Observability Integration

Capability Description Purpose
Test Outcomes in Lineage Quality test results displayed on lineage Identify issues in data flow
Freshness Monitoring Track data currency and staleness Ensure data timeliness
Quality Alerts Automated notifications on test failures Proactive issue detection
Pipeline Health Monitor extraction and sync pipelines Operational visibility

Data Quality Metrics

Metric Description Tracking
Completeness Percentage of non-null values Per-field, per-source
Uniqueness Duplicate detection rate Per-entity type
Validity Conformance to schema constraints Per-field validation
Consistency Cross-source agreement Entity resolution metrics
Timeliness Data freshness indicators Last sync, staleness alerts

Quality Test Integration

Test Type Description
Schema Tests Validate data structure conformance
Relationship Tests Verify relationship integrity
Business Rule Tests Apply domain-specific validation
Anomaly Detection Statistical outlier identification
Freshness Tests Data currency validation

Observability Alerts

Alert Type Trigger Response
Quality Failure Test fails below threshold Notification to data owners
Pipeline Error Sync or extraction failure Operations alert
Freshness Warning Data exceeds staleness threshold Reprocess or escalate
Drift Detection Schema or pattern changes Review and validate

Data Insights and KPIs

The Knowledge Fabric provides analytics and reporting to track data governance health.

Key Performance Indicators

KPI Description Target
Documentation Coverage Percentage of entities with descriptions 80%+
Ownership Assignment Percentage of assets with owners 95%+
Classification Coverage Percentage of fields classified 90%+
Lineage Completeness Percentage of assets with full lineage 85%+
Data Quality Score Aggregate quality across sources 90%+

Platform Analytics

Report Description
Asset Inventory Total entities, relationships, sources
Usage Patterns Most accessed entities and queries
Quality Trends Quality metrics over time
Governance Compliance Policy adherence and exceptions
Discovery Activity New entities and relationships found

Graph Database Security

Query Security

Control Description
Input Validation Query parameters validated and sanitized
Injection Prevention Parameterized queries, no string concatenation
Rate Limiting Per-user query rate limits
Resource Limits Query timeout and memory limits

Monitoring

The Monitoring system provides comprehensive observability across the Knowledge Fabric.

Health Monitoring

Component Health Checks Frequency
API Services Endpoint availability, response time 30 seconds
Database Connection pool, query latency 1 minute
Connectors Source connectivity status 5 minutes
Background Jobs Job completion, queue depth 1 minute

Performance Monitoring

Metric Description Alert Threshold
API Latency (P95) 95th percentile response time > 2 seconds
Query Latency Graph query execution time > 5 seconds
Discovery Duration Time to complete scan > configured window
Error Rate Failed requests percentage > 1%

Security Monitoring

Monitor Detection Response
Authentication Failures Multiple failed login attempts Account lockout, alert
Anomalous Access Unusual data access patterns Alert, investigation
Credential Usage Unexpected credential access Alert, audit review
Configuration Changes Security setting modifications Audit, approval verification

Security Alert Categories:

Category Examples Response SLA
Critical Breach indicators, data exfiltration 15 minutes
High Multiple auth failures, privilege escalation 1 hour
Medium Policy violations, configuration drift 4 hours
Low Certificate expiration, best practice deviation 24 hours

SIEM Integration

Integration Method Format Use Case
Syslog CEF, RFC 5424 Traditional SIEM
Webhook JSON Cloud-native SIEM
API Pull REST/JSON Custom integration
Event Stream Kafka High-volume environments

Authentication and Access Control

Search Sessions

Authentication Mechanisms

Method Use Case Security Properties
OAuth 2.0 + OIDC User authentication Federated identity, token-based
API Keys Machine-to-machine communication Scoped permissions, rotatable
Mutual TLS (mTLS) Service-to-service authentication Certificate-based verification
SAML 2.0 Enterprise SSO integration Federated identity

Session Management

Parameter Value Rationale
Access Token Lifetime 15 minutes Limits exposure window
Refresh Token Lifetime 8 hours Balances security with usability
Idle Timeout 30 minutes Prevents abandoned session exploitation
Absolute Timeout 24 hours Forces re-authentication

Authorization Model

Role-Based Access Control (RBAC):

Role Permissions Typical Assignment
Viewer Read catalog, search, view lineage Business analysts
Contributor Viewer + add descriptions, tags Data stewards
Editor Contributor + modify classifications Data engineers
Admin Full access including governance policies Platform administrators
Auditor Read-only access to all data including audit logs Compliance officers

Credential Management

Source System Credentials

Control Description
Encrypted Storage AES-256 with HSM-backed key management
Just-in-Time Access Credentials retrieved only at connection time
Scoped Retrieval Only authorized connectors access specific credentials
Audit Trail All credential access logged with requester identity
Separation of Duties Different roles for creation, modification, and usage

Supported Credential Types

Type Storage Method Rotation Support
Username/Password Encrypted vault Manual or automated
API Keys Encrypted vault Automated
OAuth Tokens Encrypted vault with refresh Automatic refresh
Certificates Certificate store Automated with CA integration

Data Protection

Data Classification Framework

Level Description Handling Requirements
Public Non-sensitive data Standard controls
Internal Business confidential Authentication required
Confidential Sensitive business data Encryption, access logging
Restricted PII, privileged legal data Enhanced encryption, masking
Highly Restricted Credentials, privileged matter data HSM encryption, privileged access

Data Masking

Technique Use Case
Full Masking Highly sensitive fields
Partial Masking Identification fields
Format Preserving Testing scenarios
Tokenization Cross-system correlation

Audit Logging

Logged Events

Event Category Logged Data Retention
Authentication User ID, timestamp, IP, result 1 year
Authorization Subject, resource, action, decision 1 year
Data Access User ID, asset ID, fields accessed 1 year
Configuration Changes Setting, old/new value, changed by 2 years
Credential Access Credential ID, accessor, purpose 2 years
Discovery Events Scan scope, results, errors 1 year
Connectivity Events Connection attempts, status, duration 1 year
Entity Resolution Match decisions, merge events, source attribution 2 years
External Enrichment Source queried, entity enriched, data fused 1 year

Log Security

Control Implementation
Integrity Cryptographic hash chain prevents tampering
Confidentiality Logs encrypted at rest
Access Control Auditor role required; no delete capability
Retention Configurable per regulation (default 1 year)