Skip to main content

Examples

· 2 min read

Real-world examples of using MatrixHub.

Common use cases

Intranet vLLM cluster distribution

  • Scenario: A production intranet runs a vLLM inference cluster with 100 GPU servers. Because model files can be huge, such as a 70B model exceeding 130GB, having every machine pull from public Hugging Face is slow and may trigger outbound bandwidth throttling.
  • Flow overview:
    1. Single access point: Set the HF_ENDPOINT environment variable of all vLLM nodes to the internal MatrixHub endpoint.
    2. Pull once, cache for all: When the first node requests a model, MatrixHub pulls it from the public network and persists it locally; subsequent nodes hit the intranet cache directly.

As a user, I want to point the hf download endpoint to MatrixHub so that later downloads inside the same network become much faster after the first request has already cached the model.

Steps

  1. Visit the MatrixHub address http://x.x.x.x:3001 and open the login page.

  1. Log in as the admin user and open the model repository list.

  1. Click the top-right user menu, then go to Platform Settings and Repository Management.

  1. Create a target repository: select Hugging Face as the provider, set the repository name to hf, enter the target URL https://hf-mirror.com, enable remote certificate verification, and click OK.

  1. Go to Project Management and open the project list page.

  1. Click Create Project: set the project name to qwen, set it to Public, enable Proxy, select the repository, set the proxy organization to Qwen, and click OK.

  1. Pull the model.

    • First node: about 3m37.318s

  • Second node: about 0m8.500s

  1. View the model information in MatrixHub.