MHLE White Paper
Technical White Paper

Multi-Headed
Learning Engine

A comprehensive technical analysis of the architecture, design patterns, and system topology powering the next generation of AI-driven cognitive learning.

Version 2.0
Classification Internal / Confidential
Date February 2026
Status Production
01

Executive Summary

The Multi-Headed Learning Engine (MHLE) is an enterprise-grade, subscription-based SaaS platform that leverages multiple artificial intelligence providers to deliver multi-perspective cognitive analysis for serious learners, researchers, and professionals.

MHLE represents a paradigm shift in educational technology by applying simultaneous, multi-perspective AI analysis to user-submitted content. Rather than relying on a single AI model's interpretation, the platform orchestrates responses from multiple providers—OpenAI, Anthropic Claude, Google Gemini, and Perplexity AI—to deliver richer, more nuanced insights.

The platform is architected as a modular, full-stack application built on Flask (Python) with a PostgreSQL persistence layer, Redis-backed rate limiting, and a React/TypeScript frontend. This document provides a comprehensive analysis of the system's architecture, data flows, security posture, and scalability characteristics.

20+
API Modules
4
AI Providers
30+
Data Models
3
Subscription Tiers
02

System Overview & Design Philosophy

MHLE is built on four foundational design principles that guide every architectural decision across the platform.

Modular Architecture

Every major feature is encapsulated in its own Blueprint module with isolated routes, models, and service logic, enabling independent development and testing.

Provider Agnosticism

The AI service layer abstracts provider-specific implementations behind a unified interface, allowing seamless switching or failover between OpenAI, Claude, Gemini, and Perplexity.

Tiered Access Control

A subscription-based access model (Free, Pro, Enterprise) enforces granular feature gates and usage quotas at the middleware layer, ensuring fair resource allocation.

Security-First Design

JWT authentication, bcrypt password hashing, rate limiting, CORS policies, and comprehensive security headers form a multi-layered defense posture.

Core Capabilities

The platform delivers a comprehensive suite of learning and research tools, each built as an independent module that integrates seamlessly into the broader ecosystem:

Capability Description Tier
Multi-Perspective AnalysisSimultaneous AI analysis through multiple analytical lensesAll
Content IngestionText, PDF, audio transcription, and image analysis with HEIC supportAll
Course ManagementSyllabus parsing, skeleton generation, study recommendationsAll
Wicked Problem SimulationsAI-generated complex scenarios with Reviewer 2 critiquePro+
Knowledge GraphVisual concept mapping with semantic clustering and relationship analysisPro+
Semantic SearchVector embedding-based content retrieval across notesAll
Weekly PulseAutomated claim extraction and internet-based verificationAll
Learning ArtifactsAI-generated study materials, visual aids, and practice questionsEnterprise
Podcast GeneratorScript generation and TTS-HD audio output from learning contentPro+
Portfolio SystemProfessional artifact portfolio with public "Living Resume" pagePro+
Knowledge SynthesisAI-generated academic papers integrating cross-note conceptsEnterprise
Graph ComparisonCross-user knowledge graph comparison and access controlEnterprise
03

High-Level Architecture

MHLE employs a layered architecture pattern that separates concerns across presentation, application logic, service orchestration, and data persistence.

Figure 1 — System Architecture Overview
PRESENTATION LAYER React / TypeScript Vite Build Jinja2 Templates Static Assets Admin UI D3.js Viz REST API / JWT APPLICATION LAYER — FLASK API GATEWAY Auth Ingest Notes Courses Analysis Search Subscriptions SERVICE ORCHESTRATION LAYER AI Service Cost Tracker Background Worker Graph Layout Quota Middleware Email Svc DATA PERSISTENCE LAYER PostgreSQL (Neon) SQLAlchemy ORM Redis (Upstash) EXTERNAL SERVICES Stripe Mailjet AI APIs Presentation Application Service Persistence External
Figure 1: Four-layer architecture with clear separation of concerns between presentation, application, service, and persistence layers.
04

Backend Architecture

The backend is built on Flask using the Application Factory pattern with Blueprint-based modular routing, enabling independent feature development and clear separation of concerns.

Application Factory Pattern

The application employs Flask's Application Factory pattern via create_app(), which initializes the application instance, configures extensions (CORS, rate limiter, SQLAlchemy), registers all blueprints, and establishes database connections. This pattern supports multiple configurations for testing, staging, and production environments.

def create_app(): app = Flask(__name__, static_folder='static') app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL') app.config['SQLALCHEMY_ENGINE_OPTIONS'] = { 'pool_pre_ping': True, 'pool_recycle': 300, 'pool_size': 3, 'max_overflow': 5 } CORS(app) limiter.init_app(app) db.init_app(app) # Register 20+ Blueprint modules... return app

Blueprint Module Registry

Each functional domain is encapsulated as a Flask Blueprint, providing route isolation, independent middleware chains, and clean import boundaries. The system registers over 20 blueprints at startup:

Figure 2 — Blueprint Module Architecture
Flask Core auth_bp ingest_bp notes_bp courses_bp analyze_bp knowledge_graph_bp simulations_bp search_bp subscriptions_bp referrals_bp portfolio_bp podcast_bp learning_artifacts_bp admin_bp marketing_admin_bp surveys_bp partner_bp
Figure 2: Hub-and-spoke Blueprint architecture with 20+ modules organized by domain — Core (cyan), AI/Analysis (green), Business (amber), Admin (rose).

Key Backend Services

AI Service Layer

Centralized orchestration of multi-provider AI calls with automatic failover, response normalization, and cost tracking per request.

Background Worker

Thread-based job processor with semaphore concurrency control (max 2 concurrent), batch processing, and rate limiting for long-running AI analysis tasks.

Quota Middleware

Request-level enforcement of subscription tier limits across notes, courses, AI calls, and feature access with graceful upgrade prompts.

Cost Tracker

Comprehensive per-request cost accounting across all AI providers with token-level granularity, category tagging, and admin reporting.

Graph Layout Service

Server-side computation of knowledge graph layouts with semantic clustering, authority-based node sizing, and relationship type filtering.

Provider Health Monitor

Singleton health checker with lazy client initialization for each AI provider, enabling intelligent routing and automatic degradation.

05

Data Model & Persistence Layer

The data layer is built on PostgreSQL (Neon-backed) with SQLAlchemy ORM, using UUIDs as primary keys and JSON fields for flexible schema extensions.

Figure 3 — Core Entity Relationship Diagram
User id : UUID (PK) email : String (unique) subscription_tier : Enum stripe_customer_id : String is_admin : Boolean Course id : UUID (PK) user_id : UUID (FK) title : String syllabus_text : Text course_skeleton : JSON 1:N Note id : UUID (PK) user_id : UUID (FK) course_id : UUID (FK) raw_content : Text lenses : JSON vector_embedding : Array 1:N NoteRelationship source_note_id : UUID (FK) target_note_id : UUID (FK) relationship_type : String strength : Float confidence_score : Float N:N Simulation user_id : UUID (FK) scenario_text : Text student_solution : Text reviewer_critique : Text 1:N APICost provider : Enum category : Enum input_tokens : Integer cost_usd : Float UsageEvent usage_type : Enum count : Integer metadata : JSON WeeklyPulse claims_extracted : Integer claims_verified : Integer verification_results : JSON status : Enum ReferralCode code : String (unique) total_signups : Integer total_conversions : Integer
Figure 3: Core entity relationships. User is the central entity with one-to-many relationships to Courses, Notes, Simulations, API Costs, Usage Events, and Referral Codes. Notes form a many-to-many graph via NoteRelationship.

Database Configuration

Connection Pooling: SQLAlchemy is configured with pool_pre_ping for connection health checks, a pool size of 3 with 5 overflow connections, and a 300-second recycle interval to prevent stale connections on Neon's serverless PostgreSQL infrastructure.

Key Design Decisions

06

AI Integration Architecture

The platform's defining feature is its multi-headed AI analysis engine, which orchestrates calls to four distinct AI providers through a unified service abstraction layer.

Figure 4 — AI Provider Integration Flow
User Content Submission AI Service Orchestrator Provider Health Check • Cost Tracking • Failover OpenAI GPT-4o • GPT-4o-mini Whisper • TTS-HD Embeddings (text-embedding-3) Anthropic Claude Claude 3.5 Sonnet Visual Frameworks Process Flow Diagrams Google Gemini Gemini 1.5 Flash Multi-modal Analysis Image Understanding Perplexity AI Internet-Grounded Search Claim Verification Real-Time Fact Checking Cost Tracker & Response Aggregator Token Accounting • Provider Metrics • Error Logging
Figure 4: Multi-provider AI orchestration with centralized cost tracking and provider health monitoring. Each provider serves specialized roles.

Provider Specialization

ProviderPrimary Use CasesKey Models
OpenAIMulti-lens analysis, epistemology tagging, audio transcription, text-to-speech, vector embeddingsGPT-4o, GPT-4o-mini, Whisper, TTS-HD, text-embedding-3-small
Anthropic ClaudeVisual framework generation, process flow diagrams, learning artifact creationClaude 3.5 Sonnet
Google GeminiMulti-modal content analysis, image understanding, supplementary analysisGemini 1.5 Flash
Perplexity AIInternet-grounded fact verification, claim verification, Weekly Pulse checksSonar models

Cost Tracking Architecture

Every AI API call is instrumented through the CostTracker service, which records provider, model, token counts (input/output), calculated cost in USD, category, success status, and response duration. This data feeds into the admin dashboard's financial analytics, enabling precise unit economics tracking per user, per feature, and per provider.

Provider Health Monitoring: The AIProviderHealth singleton uses lazy initialization to create provider clients on first use, reducing startup time. It checks availability and latency for each provider, enabling intelligent routing decisions when a provider experiences degradation.

07

Frontend Architecture

The frontend employs a hybrid rendering strategy, combining React/TypeScript SPA components with server-rendered Jinja2 templates for optimal performance across different page types.

Technology Stack

React 18 TypeScript Vite D3.js Jinja2 CSS Modules Inter Font JetBrains Mono

Rendering Strategy

Page TypeRenderingRationale
Main ApplicationReact SPAComplex interactivity: split-screen editor, real-time AI analysis, dynamic state management
Landing PageServer-Rendered HTMLSEO optimization, fast initial load, no JavaScript dependency
Admin DashboardJinja2 + Vanilla JSData-heavy tables, D3.js visualizations, minimal client-side routing needed
Knowledge GraphReact + D3.jsForce-directed graph layout, real-time node interaction, complex SVG rendering
Public PortfolioServer-Rendered HTMLShareable URLs, SEO-friendly, minimal interactivity required
Growth HubJinja2 TemplatesGamification UI, leaderboards, referral tracking widgets

UI Design System

The interface follows a cohesive design language built around a deep navy background palette with purple accent tones, optimized for extended reading sessions:

Color Palette

Deep Navy (#0f1729) background, off-white (#e2e8f0) text, purple (#8b5cf6) accents, and slate grey (#64748b) secondary elements for reduced eye strain.

Typography

Inter for body text providing excellent readability, JetBrains Mono for code blocks and technical content with ligature support.

Split-Screen Layout

Primary workspace divides between a note/content editor and AI analysis results panel, enabling side-by-side comparison of source material and insights.

Guided Tour System

Custom 15-step interactive walkthrough (zero external dependencies) that auto-launches on first login, covering all major features.

08

Security Architecture

MHLE implements a defense-in-depth security model with multiple overlapping layers of protection across authentication, authorization, transport, and application security.

Figure 5 — Security Layers
TRANSPORT SECURITY — HTTPS / TLS / CORS / SECURITY HEADERS RATE LIMITING — 200/DAY, 50/HOUR PER IP (REDIS-BACKED) JWT AUTHENTICATION — TOKEN VERIFICATION / SESSION MANAGEMENT AUTHORIZATION — SUBSCRIPTION TIER / FEATURE GATES / QUOTA ENFORCEMENT AI Analysis User Data Knowledge Graph Admin Panel bcrypt hashing Input validation XSS protection CSRF prevention
Figure 5: Concentric security model — requests must pass through transport security, rate limiting, authentication, and authorization before reaching protected resources.

Security Controls Summary

ControlImplementationPurpose
AuthenticationJWT tokens with configurable expiry, bcrypt password hashingIdentity verification and session management
AuthorizationDecorator-based role checks (@require_auth, @require_feature)Tier-based feature gating and resource ownership verification
Rate LimitingFlask-Limiter with Redis backend (200/day, 50/hour defaults)Abuse prevention and fair resource allocation
Security HeadersX-Content-Type-Options, X-Frame-Options, X-XSS-ProtectionBrowser-level attack surface reduction
CORSFlask-CORS with configurable originsCross-origin request control
Input ValidationServer-side validation on all endpoints with size limits (75MB max upload)Injection prevention and resource protection
Password ResetTime-limited tokens via Mailjet transactional emailSecure account recovery flow
Request LoggingComprehensive request log with bot detection and probe identificationThreat intelligence and traffic analysis
09

API Design & Route Topology

The RESTful API follows a versioned, resource-oriented design with consistent response structures across all 20+ blueprint modules.

API Versioning

All primary API endpoints are versioned under the /api/v1/ prefix, providing a stable contract for frontend consumers while allowing non-breaking evolution of the API surface. Administrative endpoints use /admin/api/ and referral endpoints use /api/referrals/.

Route Module Summary

ModulePrefixEndpointsAuth Required
Authentication/api/v1/auth7Partial
Content Ingestion/api/v1/ingest4Yes
Notes Management/api/v1/notes3Yes
Course Management/api/v1/courses8Yes
AI Analysis/api/v1/analyze2Yes
Knowledge Graph/api/v1/knowledge-graph8Yes (Pro+)
Simulations/api/v1/simulate5Yes
Semantic Search/api/v1/search3Yes
Subscriptions/api/v1/subscriptions5Partial
Learning Artifacts/api/v1/learning-artifacts6Yes
Podcast/api/podcast5Yes
Portfolio/api/v1/portfolio6Partial
Weekly Pulse/api/v1/pulse4Yes
Usage Tracking/api/v1/usage2Yes
Referrals/api/referrals12Partial
Surveys/api/v1/surveys10Partial
Admin/admin/api40+Admin Only
Onboarding/onboarding/api7Yes

Response Format Convention

// Success Response { "status": "success", "data": { ... }, "count": 42 } // Error Response { "status": "error", "error": "Human-readable error message", "upgrade_required": true }
10

Scalability & Performance

The architecture addresses scalability across compute, storage, and AI processing dimensions through connection pooling, background processing, and intelligent caching.

Connection Pooling

SQLAlchemy pool with pre-ping health checks, 3-connection base pool, 5 overflow connections, and 300-second recycling for Neon's serverless PostgreSQL.

Background Processing

Thread-based worker with global semaphore (max 2 concurrent jobs), batch processing of 5 notes per cycle, and 0.5s rate limiting between AI calls.

Redis Rate Limiting

Upstash Redis for rate limiter state storage with automatic TTL-based expiration, ensuring consistent limit enforcement across requests.

Request Performance

Automated slow request logging (threshold: 500ms) with per-request timing instrumentation for performance regression detection.

Content Limits

75MB maximum upload size, pagination on knowledge graph queries (200 nodes default, 500 in lite mode), and batch processing caps (100 notes, 1000 pairs).

Production Server

Gunicorn WSGI server configured with multiple workers, socket-based reuse, and graceful restart capabilities for zero-downtime deployments.

Performance Monitoring

The application instruments every request lifecycle with timing data, logging slow requests above 500ms to enable proactive performance optimization. Combined with comprehensive request logging (including bot detection and probe identification), the system provides full observability into traffic patterns and performance characteristics.

11

Deployment Architecture

MHLE supports multiple deployment targets with environment-specific configuration management and infrastructure-as-code principles.

Figure 6 — Deployment Topology
Internet / CDN Reverse Proxy / HTTPS Termination Gunicorn Workers Flask App + Background Worker Static Assets CSS / JS / Images / Fonts Admin Dashboard Jinja2 + D3.js Analytics PostgreSQL Redis Stripe API AI Providers Mailjet
Figure 6: Production deployment topology showing request flow from internet through reverse proxy to application workers, static assets, and backing services.

Environment Configuration

The application uses environment variables for all sensitive configuration, following the Twelve-Factor App methodology. Key configuration categories include:

CategoryVariablesPurpose
DatabaseDATABASE_URLPostgreSQL connection string (Neon)
SecuritySESSION_SECRET, JWT_SECRET_KEYToken signing and session encryption
AI ProvidersOPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY, PERPLEXITY_API_KEYProvider authentication
PaymentsSTRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRETStripe integration credentials
CachingREDIS_URLUpstash Redis connection for rate limiting
EmailMAILJET_API_KEY, MAILJET_SECRET_KEYTransactional email delivery
Feature FlagsENABLE_LEARNING_ARTIFACTSRuntime feature toggle controls
12

Future Architecture Considerations

As MHLE scales, several architectural evolutions are planned to address growing user demands, operational complexity, and feature expansion.

Message Queue Integration

Migrating from thread-based background workers to a dedicated message queue (e.g., Celery with Redis broker) for improved job reliability, retry logic, and horizontal scaling of AI processing.

Vector Database

Transitioning semantic search from in-application vector storage to a dedicated vector database (e.g., Pinecone, pgvector) for improved similarity search performance at scale.

Microservice Decomposition

Extracting high-load services (AI orchestration, knowledge graph processing, podcast generation) into independent microservices for isolated scaling and deployment.

Real-Time Collaboration

Adding WebSocket support for real-time collaborative note-taking, live knowledge graph updates, and instant AI analysis result streaming.

Advanced Caching Layer

Implementing multi-tier caching with edge caching for static content, Redis for session/API data, and in-memory caching for frequently accessed AI analysis results.

Observability Stack

Deploying structured logging, distributed tracing (OpenTelemetry), and metrics collection for comprehensive system observability and SLA monitoring.