GGML model format for Large Language Model

ggml is a library that provides operations for running machine learning models. ggml model format has different versions. By 8/20/2023, there are ggjit v1, ggjt v2 and ggit v3.

llama.cpp is a project that uses ggml to run LLaMA, a large language model (like GPT) by Meta.

whisper.cpp is a project that uses ggml to run Whisper, a speech recognition model by OpenAI

ggml’s distinguishing feature is efficient operation on CPU. Traditionally, this sort of work is done on GPU, but GPUs with large amounts of memory are specialized and extremely expensive hardware. ggml achieves acceptable speed on commodity hardware.

There is a script to convert LLaModel from Huggingface pytorch model to ggml format.