A CLI tool for managing HuggingFace model deployments across multiple cloud providers.
HF-Cloud can be used in two equivalent ways:
- Standalone CLI install: commands are prefixed with
hf-cloud ... - Hugging Face Hub CLI extension install: commands are prefixed with
hf cloud ...
Preferred method: install and run it as an HF CLI extension (hf cloud ...). More details about HF CLI extensions here.
In the examples below, replace <cli> with either hf-cloud or hf cloud.
This project is packaged as a Python hf CLI extension. It can be discovered and installed using the following commands:
# Search for available extensions
hf extensions search
# Install this extension from GitHub
hf extensions install ehcalabres/hf-cloud
# Run extension commands
hf cloud --help
hf cloud sagemaker ls# Clone the repository
git clone https://github.com/ehcalabres/hf-cloud.git
cd hf-cloud
# Install base CLI
pip install -e .
# Install with SageMaker support
pip install -e ".[sagemaker]"
# Install with Vertex AI support
pip install -e ".[vertex]"
# Install with all providers
pip install -e ".[all]"
# Install with development dependencies
pip install -e ".[dev]"HF-Cloud can scaffold a reusable SKILL.md for coding agents.
# Preview generated skill content
<cli> skills preview
# Install skill in project-level central directory (.agents/skills/hf-cloud)
<cli> skills add
# Install to a custom destination directory
<cli> skills add --dest ./my-skills
# Link into specific assistant directories
<cli> skills add --codex
<cli> skills add --claude --cursor
# Overwrite existing skill files/symlinks
<cli> skills add --forceYou can configure some default settings for your cloud provider. This step is optional, as you can also provide these settings via command-line arguments during deployment or use the default configuration from your environment.
<cli> providers configure sagemaker<cli> sagemaker deploy gpt2 \
--name my-gpt2-endpoint \
--instance-type ml.g5.xlarge \
--region us-east-1<cli> sagemaker status my-gpt2-endpoint<cli> sagemaker invoke my-gpt2-endpoint \
--input "Once upon a time"<cli> ls<cli> sagemaker estimate meta-llama/Llama-2-7b-hf
<cli> vertex estimate meta-llama/Llama-2-7b-hf<cli> sagemaker delete my-gpt2-endpoint| Command | Description |
|---|---|
<cli> [PROVIDER] deploy <model> |
Deploy a model to the specified provider |
<cli> [PROVIDER] ls |
List deployments for the specified provider |
<cli> [PROVIDER] describe <id> |
Show deployment details |
<cli> [PROVIDER] status <id> |
Check deployment status |
<cli> [PROVIDER] logs <id> |
View deployment logs |
<cli> [PROVIDER] invoke <id> |
Test inference |
<cli> [PROVIDER] estimate <model> |
Estimate minimum viable instance for a model |
<cli> [PROVIDER] delete <id> |
Delete deployment |
<cli> ls |
List all deployments (all providers) |
<cli> providers ls |
List available providers |
<cli> providers configure <provider> |
Configure provider credentials |
<cli> skills preview |
Preview generated SKILL.md for AI assistants |
<cli> skills add |
Install SKILL.md and optionally symlink to assistant folders |
- AWS SageMaker - Fully implemented
- Google Cloud Vertex AI - Fully implemented
- Azure ML - Work in progress
# Configure (optional)
<cli> providers configure sagemaker
# Deploy
<cli> sagemaker deploy gpt2 \
--name my-gpt2-endpoint \
--instance-type ml.g5.xlarge \
--region us-east-1
# Check status
<cli> sagemaker status my-gpt2-endpoint
# Invoke
<cli> sagemaker invoke my-gpt2-endpoint --input "Hello world"
# Estimate minimum viable instance
<cli> sagemaker estimate meta-llama/Llama-2-7b-hf
# Show all compatible instances
<cli> sagemaker estimate meta-llama/Llama-2-7b-hf --all
# Delete
<cli> sagemaker delete my-gpt2-endpoint# Configure (optional)
<cli> providers configure vertex
# Deploy
<cli> vertex deploy gpt2 \
--name my-gpt2-endpoint \
--machine-type n1-standard-4 \
--location us-central1
# Deploy with GPU
<cli> vertex deploy meta-llama/Llama-2-7b-hf \
--name my-llama-endpoint \
--machine-type n1-standard-8 \
--accelerator-type NVIDIA_TESLA_T4 \
--accelerator-count 1
# Check status
<cli> vertex status my-gpt2-endpoint
# Invoke
<cli> vertex invoke my-gpt2-endpoint --input "Hello world"
# Estimate minimum viable machine/GPU configuration
<cli> vertex estimate meta-llama/Llama-2-7b-hf
# JSON output for scripting
<cli> vertex estimate meta-llama/Llama-2-7b-hf --json
# Delete
<cli> vertex delete my-gpt2-endpoint- Python 3.9+
You will also need to have the respective cloud provider authentication set up, so that hf-cloud can access your account. Current provider-specific requirements:
- AWS SageMaker: AWS credentials configured via AWS CLI or environment variables.
- Azure ML: Azure CLI logged in or service principal set up.
- Google Cloud Vertex AI: Google Cloud SDK authenticated or service account key set up.
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check src/