Introduction

Welcome to MatrixHub—an open-source, self-hosted AI model registry engineered for large-scale enterprise inference. It serves as a drop-in private replacement for Hugging Face, purpose-built to accelerate vLLM and SGLang workloads.

:::tip Live Demo Try MatrixHub instantly at demo.matrixhub.ai—no setup required.

Sign in with the public demo credentials: username admin, password changeme. The demo is for evaluation only and may be reset at any time. :::

Why MatrixHub?

MatrixHub streamlines the transition from public model hubs to production-grade infrastructure:

🚀 Zero-Wait Distribution

Eliminate bandwidth bottlenecks with a "Pull-once, serve-all" cache, enabling 10Gbps+ speeds across 100+ GPU nodes simultaneously.

🔐 Air-Gapped Delivery

Securely ferry models into isolated networks while maintaining a native HF_ENDPOINT experience for researchers—no internet required.

📦 Private AI Model Registry

Centralize fine-tuned weights with Tag locking and CI/CD integration to guarantee absolute consistency from development to production.

🌍 Global Multi-Region Sync

Automate asynchronous, resumable replication between data centers for high availability and low-latency local access.

Core Features

🚀 High-Performance Distribution

Transparent HF Proxy: Switch to private hosting with zero code changes by simply redirecting your endpoint.
On-Demand Caching: Automatically localizes public models upon the first request to slash redundant traffic.
Inference Native: Native support for P2P distribution, OCI artifacts, and NetLoader for direct-to-GPU weight streaming.

🛡️ Enterprise Governance & Security

RBAC & Multi-Tenancy: Project-based isolation with granular permissions and seamless LDAP/SSO integration.
Audit & Compliance: Full traceability with comprehensive logs for every upload, download, and configuration change.
Integrity Protection: Built-in malware scanning and content signing to ensure models remain untampered.

🌍 Scalable Infrastructure

Storage Agnostic: Compatible with local file systems, NFS, and S3-compatible backends (MinIO, AWS, etc.).
Reliable Replication: Policy-driven, chunked transfers ensure data consistency even over unstable global networks.
Cloud-Native Design: Optimized for Kubernetes with official Helm charts and horizontal scaling capabilities.

Key Use Cases

1. Intranet Inference Acceleration

Accelerate model distribution across internal GPU clusters with intelligent caching that turns multiple downloads into a single fetch.

2. Air-Gapped Environments

Deploy models in isolated networks (government, defense, finance) with secure transport and full data residency guarantees.

3. Enterprise Asset Management

Manage enterprise model versions with CI/CD integration, ensuring training → testing → production consistency.

4. Multi-Region Sync

Replicate models across global data centers with automatic resumption on network interruptions.

Getting Started

MatrixHub is easy to deploy using Docker Compose or Kubernetes. The entire infrastructure is open source and free for the community.

👉 Ready to get started? Head over to the Blog to read the DeepSeek v4 walkthrough and usage examples.

Why MatrixHub?​

🚀 Zero-Wait Distribution​

🔐 Air-Gapped Delivery​

📦 Private AI Model Registry​

🌍 Global Multi-Region Sync​

Core Features​

🚀 High-Performance Distribution​

🛡️ Enterprise Governance & Security​

🌍 Scalable Infrastructure​

Key Use Cases​

1. Intranet Inference Acceleration​

2. Air-Gapped Environments​

3. Enterprise Asset Management​

4. Multi-Region Sync​

Getting Started​