Examples

April 27, 2026 · 2 min read

Real-world examples of using MatrixHub.

Common use cases

Intranet vLLM cluster distribution

Scenario: A production intranet runs a vLLM inference cluster with 100 GPU servers. Because model files can be huge, such as a 70B model exceeding 130GB, having every machine pull from public Hugging Face is slow and may trigger outbound bandwidth throttling.
Flow overview:
1. Single access point: Set the HF_ENDPOINT environment variable of all vLLM nodes to the internal MatrixHub endpoint.
2. Pull once, cache for all: When the first node requests a model, MatrixHub pulls it from the public network and persists it locally; subsequent nodes hit the intranet cache directly.

As a user, I want to point the hf download endpoint to MatrixHub so that later downloads inside the same network become much faster after the first request has already cached the model.

Steps

Visit the MatrixHub address http://x.x.x.x:3001 and open the login page.

Click the top-right user menu, then go to Platform Settings and Repository Management.

Create a target repository: select Hugging Face as the provider, set the repository name to hf, enter the target URL https://hf-mirror.com, enable remote certificate verification, and click OK.

Go to Project Management and open the project list page.

Click Create Project: set the project name to qwen, set it to Public, enable Proxy, select the repository, set the proxy organization to Qwen, and click OK.