Uploading LLMs
Most LLMs that you can find on the Hugging Face Hub under the "Text Generation" task can be uploaded to Chariot. Currently, only models up to 13B parameters will fit on a single consumer-grade GPU (and that will require quantization, found in Inference Servers).
LLM Task Types
Chariot has two task types associated with LLMs:
- Text Generation: Tasks consist of completing sentences or prompts provided by a user. For example, given a prompt of "The Cat in the", the model may predict something like "Hat."
- Conversational: These tasks are chat models where the interface is a conversation between a user and an AI assistant.
Every Text Generation model in the Hugging Face Hub can be uploaded as a Text Generation model in Chariot. If your model happens to be an "instruct model"—also known as a "chat model"—it can also be uploaded as a Conversational model in Chariot. Verify that your model is an instruct model; otherwise, you will get an error at inference time.
For example, Llama 3 8B is a Text Generation model, and Llama 3 8B Instruct is the instruction-tuned variant. The former should be uploaded as a Text Generation task and the latter uploaded as a Conversational task.
Upload Process
Uploading an LLM is identical to uploading any Hugging Face NLP model: Download the model repo (which should have the weights and other auxiliary files), and then convert that folder into a tar.gz
file:
tar -zcvf ../{tar file name}.tgz * && cd ..
You can then upload that tar.gz
via the Chariot UI or SDK.
You may be able to delete a lot of redundant weights in the Hugging Face Hub repo because they are often published in multiple formats. The best weights to use are the .safetensors
weights, as they are secure. To save space, you can delete all weights that aren't safetensors, but be sure to keep the model.safetensors.index.json file.
Uploading via the SDK can take a long time for large models like LLMs. It may be helpful to know the upload progress, which can be seen by enabling debug logging at the start of your session.
SDK Upload Example
The following example shows how to enable debug logging when you begin your session:
import logging
logging.basicConfig(level=logging.DEBUG)
from chariot.client import connect
from chariot.models import import_model, ArtifactType, TaskType
connect()
model = import_model(
name="meta-llama/Meta-Llama-3-8B-Instruct",
project_name="Project Name",
version="1.0.0",
summary="Meta-Llama-3-8B-Instruct model from huggingface hub",
task_type=TaskType.CONVERSATIONAL,
artifact_type=ArtifactType.HUGGINGFACE,
model_path="./llama-3-8b-instruct.tar.gz"
)