Run any HuggingFace model locally. 500K+ models at your fingertips.
HFL (HuggingFace Local) is a CLI + API server that lets you run HuggingFace models locally. While Ollama offers ~500 curated models, HFL gives you access to the 500,000+ models on the HuggingFace Hub.
# Install
pip install hfl
# Pull a model
hfl pull microsoft/Phi-3-mini-4k-instruct-gguf
# Chat interactively
hfl run microsoft/Phi-3-mini-4k-instruct-gguf
# Start API server (OpenAI + Ollama compatible)
hfl serve --model microsoft/Phi-3-mini-4k-instruct-gguf
| Feature | HFL | Ollama |
|---|---|---|
| Available models | 500,000+ | ~500 |
| Source | HuggingFace Hub | Ollama Library |
| OpenAI API compatible | ✓ | ✓ |
| Ollama API compatible | ✓ | ✓ |
| TTS support | ✓ | ✗ |
| Multiple backends | llama.cpp, transformers, vLLM | llama.cpp only |
| License verification | Automatic (5 levels) | ✗ |
| EU AI Act compliance | Built-in | ✗ |
| GGUF auto-conversion | ✓ | ✗ |
| i18n (EN/ES) | ✓ | ✗ |
HFL exposes both OpenAI and Ollama compatible APIs, so it works as a drop-in replacement with existing tools:
POST /v1/chat/completions
POST /v1/completions
GET /v1/models
POST /v1/audio/speech
POST /api/generate
POST /api/chat
GET /api/tags
POST /api/tts
Works with: Open WebUI, Chatbox, Continue.dev, and any OpenAI/Ollama-compatible client.
| Command | Description |
|---|---|
hfl pull | Download model from HuggingFace Hub |
hfl run | Interactive chat with a model |
hfl serve | Start API server |
hfl list | List local models |
hfl search | Search HuggingFace Hub |
hfl inspect | Show model details |
hfl rm | Remove a model |
hfl alias | Create model aliases |
hfl login / logout | Manage HF authentication |
hfl version | Show version info |
hfl compliance-report | Legal compliance report |
llama.cpp for GGUF (CPU/GPU), transformers for safetensors, vLLM for production GPU with real async streaming.
Multi-backend with sticky routing. Automatically retries with the next engine if one fails.
LRU eviction with real-time RAM/GPU memory tracking. Non-recursive concurrent loading.
5-level license classification, EU AI Act notices, provenance logging, AI disclaimers.
Rate limiting, API key auth, health probes, Prometheus metrics, SLO monitoring, structured logging.
Text-to-speech via Bark and Coqui XTTS-v2 engines with OpenAI-compatible endpoints.
Python 3.10+ • FastAPI • Typer • Rich • Pydantic
llama-cpp-python • transformers • vLLM • Bark • Coqui TTS
1900 tests • 90%+ coverage • mypy • ruff • CI/CD