技術

Technical architecture and AI models

System Architecture
High-level overview of Faxi's components and data flow
Marketing Website (Next.js)Hero • Use Cases • Demo • Metrics DashboardHTTP/RESTBackend API (Express.js)Demo Endpoints • Metrics API • Webhook HandlersAI Processing PipelineVision AIOCR • HandwritingAnnotation DetectorCheckmarks • CirclesIntent ExtractorAction ClassificationMCP ServersEmail • Shopping • AI Chat • Payment • AppointmentsInfrastructurePostgreSQLData StorageRedisQueue & CacheS3File StorageTelnyxFax API

Data Flow

  1. 1.User sends fax or uploads image via marketing website demo
  2. 2.Backend API receives request and queues processing job in Redis
  3. 3.AI pipeline analyzes image: Vision AI extracts text, Annotation Detector finds marks, Intent Extractor determines action
  4. 4.MCP servers execute actions (send email, place order, book appointment, etc.)
  5. 5.Results stored in PostgreSQL, files saved to S3, confirmation fax sent via Telnyx
  6. 6.Metrics aggregated and displayed on dashboard for monitoring and analysis
AI Models & Techniques
State-of-the-art AI powering accurate fax interpretation

Multi-Model AI Pipeline

Faxi uses a sophisticated AI pipeline combining multiple specialized models. Each model excels at a specific task, and their outputs are combined to achieve high overall accuracy. This ensemble approach ensures robust performance across diverse fax formats and handwriting styles.

Core AI Models

👁️

Vision AI (GPT-4 Vision)

Optical Character Recognition and Visual Analysis

92%
Accuracy

Extracts text from fax images including both printed and handwritten content. Uses advanced computer vision to understand document structure, identify form fields, and recognize Japanese characters with high accuracy.

Techniques Used:
Multimodal deep learningTransformer architectureHandwriting recognitionForm field detectionLayout analysis

Annotation Detector

Visual Annotation Recognition

88%
Accuracy

Identifies hand-drawn marks on faxes such as checkmarks, circles, arrows, and underlines. Associates annotations with nearby text to understand user intent (e.g., circled product = selected item).

Techniques Used:
Convolutional neural networksEdge detection algorithmsShape recognitionConfidence scoringBounding box regression
🤖

Intent Classifier (Claude)

Natural Language Understanding and Action Extraction

95%
Accuracy

Analyzes extracted text and annotations to determine what action the user wants to perform. Classifies intents (email, shopping, appointment, etc.) and extracts relevant parameters with high confidence.

Techniques Used:
Large language modelsFew-shot learningContext-aware parsingEntity extractionSemantic analysis

Processing Pipeline

1

Image Preprocessing

Enhance image quality, remove noise, correct skew and rotation

Gaussian blurAdaptive thresholdingMorphological operations
2

Vision Analysis

Extract text regions and identify visual elements

OCRLayout detectionHandwriting recognition
3

Annotation Detection

Find and classify hand-drawn marks

Shape detectionPattern matchingSpatial analysis
4

Intent Extraction

Understand user intent and extract parameters

NLPEntity recognitionContext analysis
5

Confidence Scoring

Assess reliability of each component

Ensemble methodsUncertainty quantificationValidation checks

Key Innovations

🎯Context-Aware Processing

AI understands the relationship between text and annotations. A circle around a product name indicates selection, while an arrow points to important information.

🔄Iterative Refinement

Models work together iteratively. Vision AI output informs annotation detection, which in turn helps intent classification achieve higher accuracy.

🌐Multilingual Support

Specialized handling for Japanese characters (Kanji, Hiragana, Katakana) alongside English, with cultural context awareness for proper interpretation.

📊Confidence Calibration

Each prediction includes a calibrated confidence score. Low-confidence results trigger clarification requests, ensuring users are never left with incorrect actions.

Overall Performance

90%+
Overall Accuracy
<5s
Avg Processing Time
95%+
Intent Classification
10+
Supported Use Cases

For Technical Evaluators

Our AI pipeline leverages transfer learning from pre-trained foundation models, fine-tuned on domain-specific data including Japanese handwriting samples and fax artifacts. We employ ensemble methods to combine predictions, use active learning to continuously improve accuracy, and implement robust error handling with human-in-the-loop fallbacks for edge cases. The system is designed for production deployment with monitoring, A/B testing, and continuous model updates.

Technology Stack
Modern, scalable technologies powering the Faxi platform

Frontend

Next.js 14

React framework with App Router for server-side rendering and optimal performance

⚛️

React 18

Modern UI library with hooks and concurrent features for responsive interfaces

📘

TypeScript

Type-safe JavaScript for robust code and better developer experience

🎨

Tailwind CSS

Utility-first CSS framework for rapid UI development and consistent styling

📊

Recharts

Composable charting library for interactive data visualizations

Framer Motion

Animation library for smooth, performant UI transitions

Backend

🚂

Express.js

Fast, minimalist web framework for Node.js handling API requests

🟢

Node.js

JavaScript runtime for scalable server-side applications

🐘

PostgreSQL

Robust relational database for storing users, jobs, and metrics

🔴

Redis

In-memory data store for job queues and caching

☁️

AWS S3

Object storage for fax images and generated documents

AI & Machine Learning

🤖

Claude (Anthropic)

Advanced language model for intent extraction and natural language understanding

👁️

GPT-4 Vision

Multimodal AI for OCR, handwriting recognition, and visual analysis

🧠

Custom ML Models

Specialized models for annotation detection and form field recognition

Infrastructure

📠

Telnyx

Cloud communications platform for sending and receiving faxes

Vercel

Deployment platform for Next.js with edge network and automatic scaling

🐳

Docker

Containerization for consistent development and production environments

Why This Stack?

  • Performance: Server-side rendering and edge deployment for fast load times
  • Scalability: Horizontal scaling with containerization and cloud infrastructure
  • Reliability: Type safety, robust error handling, and comprehensive testing
  • Developer Experience: Modern tooling with hot reload and TypeScript support
  • AI-First: Integration with leading AI models for state-of-the-art accuracy
MCP Integration
Model Context Protocol servers extend Faxi's capabilities to interact with external services

What is MCP?

Model Context Protocol (MCP) is an open standard that enables AI systems to securely connect with external data sources and tools. Faxi uses MCP servers to extend functionality beyond basic fax processing, allowing users to interact with email, shopping, appointments, and more—all through their familiar fax machine.

Available MCP Servers

Extensibility

The MCP architecture makes Faxi infinitely extensible. Organizations can develop custom MCP servers to integrate with their own systems—healthcare records, inventory management, CRM platforms, and more. This allows Faxi to adapt to any use case while maintaining a simple fax interface for users.

🏥

Healthcare

Integrate with EHR systems for appointment booking and prescription refills

🏛️

Government

Connect to public services for permit applications and benefit enrollment

🏢

Enterprise

Build custom integrations for internal workflows and legacy systems

Why MCP Matters

  • Standardized: Open protocol ensures compatibility and interoperability
  • Secure: Built-in authentication and authorization mechanisms
  • Scalable: Add new capabilities without modifying core system
  • Future-proof: Adapt to new services and technologies as they emerge