Photo by Tudor Baciu on Unsplash
How to convert HF (safetensors) ๐ค model to gguf
A quick guide with code
You want to convert Huggingface model to gguf format?
I was struggling to tackle the same problem a few days ago. I finetuned a Llama 7B model and the model was saved in safetensor format. I wanted to use gguf model so I searched a lot and found a solution.
What is a safetensors format?
Safetensors presents an innovative format designed for secure tensor storage, surpassing traditional methods like pickle, while maintaining exceptional speed with zero-copy operations. Safetensors demonstrate remarkable speed.
To delve more into this topic please visit this page.
What is a GGUF format?
If you are exploring the LLM field you are aware of how llamacpp helps in faster inference.
On August 21, 2023, GGUF (GPT-Generated Unified Format) was unveiled as the successor to GGML (GPT-Generated Model Language). This release marks a notable advancement in language model file formats, offering improved capabilities for the storage and processing of expansive language models like GPT.
Crafted by contributors within the AI community, led by Georgi Gerganov, the originator of GGML, GGUF stands out as a significant development tailored to the requirements of large-scale AI models. Despite its roots in GGML, GGUF appears to be an independent initiative, gaining prominence through its application in scenarios involving Facebook's (Meta's) LLaMA (Large Language Model Meta AI) models, reinforcing its significance in the evolving landscape of AI.
To learn about GGUG and GGML please visit this article.
How to convert models to GGUF format?
I wanted to convert safetensors to gguf. I searched a lot. My aim was to use the model through LlamaCPP. Llamacpp says The main goal ofllama.cpp
is to run the LLaMA model using 4-bit integer quantization on a MacBook
But we can use llamacpp and run models through frameworks like Langchain.
Let's start with code part
I assume you are using Colab but if you are using it Locally this its not much different. I will add comments where you can make changes if its in Local.
# Make sure you have git-lfs installed (https://git-lfs.com)
!git lfs install
# Clone your model from Huggingface
!git clone https://huggingface.co/finetunedmodelpath
# Clone llama.cpp's repository. They provide code to convert models into gguf.
!git clone https://github.com/ggerganov/llama.cpp.git
After cloning Model and Llama.cpp repository, install the requirements of llama.cpp
#if colab
!pip install -r /content/llama.cpp/requirements.txt
#if local then cd to cloned repo and perform following line
# You can create venv as well
!pip install -r requirements.txt
After installing all the requirements run following code
#for colab
# path to convert.py โฌ๏ธ path of model โฌ๏ธ
!python /content/llama.cpp/convert.py /content/finetuned-2_merged \
--outfile finetuned-2.gguf \ # gguf model name that you want to assign
--outtype q8_0 #quantize in 8-bit
Quantizing helps in speed but reduces the precision of the model. It helps if you have less computation.
This will take a few minutes, for me it took around 15-20 mins.
After the model is created, push it to Huggingface.
# Pass your hf-token as environment variable
!export HUGGING_FACE_HUB_TOKEN=<paste-your-own-token>
You can get the access token from here. Make sure to use Write permission token, because we are pushing model to repo.
# This will push the model to HF repository
from huggingface_hub import HfApi
api = HfApi()
model_id = "your hf repo name"
api.create_repo(model_id, exist_ok=True, repo_type="model")
api.upload_file(
path_or_fileobj="finetuned-2.gguf",
path_in_repo="finetuned-2.gguf",
repo_id=model_id,
)
Here are the references:
Thank you for reading ๐.
If you like my work, you can support me here: Support my work
I do welcome constructive criticism and alternative viewpoints. If you have any thoughts or feedback on our analysis, please feel free to share them in the comments section below.
For more such content make sure to subscribe to my Newsletter here
Follow me on