Skip to main content

Hugging Face Models

Hugging Face models are models that can be loaded via the Hugging Face Transformers Python library.

note

For Hugging Face models, "singular weights" refers to having a single weights file (named either pytorch_model.bin or model.safetensors) while "sharded weights" refers to having multiple weights files. If you are uploading sharded weights (typically only done for very large models such as LLMs), then be sure they follow the standardized format. If they are .bin files, then they should all be named like pytorch_model-{number}-of-{total}.bin, and there should be a pytorch_model.bin.index.json file. If they are .safetensors files, then they should all be named like model-{number}-of-{total}.safetensors, and there should be a model.safetensors.index.json file as well. Virtually all models on the Hugging Face Hub follow this format.

Importing From Hugging Face

If you are in an instance of Chariot with internet access, you can use the huggingface-hub Python client library to download PyTorch-based models from Hugging Face directly and then upload them into Chariot. The following example shows how to do this:

# Install the huggingface-hub package if you haven't already
! pip install huggingface-hub

# Download a model from Hugging Face
from huggingface_hub import snapshot_download

# Replace 'microsoft/DialoGPT-medium' with the model you want to download
model_name = "microsoft/DialoGPT-medium"
snapshot_download(repo_id=model_name, local_dir=f"./{model_name}")

# Compress the model folder
import os
os.system(f"tar -czf {model_name}.tar.gz {model_name}")

# Upload to Chariot
from chariot.client import connect
from chariot.models import import_model, ArtifactType, TaskType

connect()

model = import_model(
name=model_name,
project_name="My Project",
version="1.0.0",
summary="Model imported from Hugging Face",
task_type=TaskType.CONVERSATIONAL,
artifact_type=ArtifactType.HUGGINGFACE,
model_path=f"./{model_name}.tar.gz"
)

Large Language Models (LLMs)

Hugging Face hosts many large language models that require special considerations for uploading and serving. For detailed guidance on working with LLMs:

  • Uploading LLMs: Requirements, optimization tips, and step-by-step instructions for uploading LLMs to Chariot
  • Inference Servers: Serving LLMs with vLLM and Hugging Face Pipelines, including quantization and inference parameters