Ollama 跟 llama.cpp 的矛盾
這件事好像快一個月了,起因是有人在 Ollama 上開 issue 反應,Ollama 跑 OpenAI 新出的 gpt-oss 20b gguf model 會一直失敗,但又找不到原因。
經過一連串混亂的討論,Ollama 的人也沒出面直接說明,ggerganov 看不下去 (ggml, gguf 貢獻者),發了以下:
Since none of the maintainers here seem to care enough to explain the actual reason for ollama to not support the HF GGUF models, while the root cause is pretty obvious, I will help explain it:
Before the model was released, the ollama devs decided to fork the ggml inference engine in order to implement gpt-oss support (#11672). In the process, they did not coordinate the changes with the upstream maintainers of ggml. As a result, the ollama implementation is not only incompatible with the vast majority of gpt-oss GGUFs that everyone else uses, but is also significantly slower and unoptimized. On the bright side, they were able to announce day-1 support for gpt-oss and get featured in the major announcements on the release day.
Now after the model has been released, the blogs and marketing posts have circled the internet and the dust has settled, it's time for ollama to throw out their ggml fork and copy the upstream implementation (#11823). For a few days, you will struggle and wonder why none of the GGUFs work, wasting your time to figure out what is going on, without any help or even with some wrong information. But none of this matters, because soon the upstream version of ggml will be merged and ollama will once again be fast and compatible.
Hope this helps.
不過說真的 Ollama 一直在底層架構使用 llama.cpp,但又沒有給 llama.cpp 足夠的 credit,
兩邊的目標應用者不太一樣,我實在不太理解 Ollama 這麼做的原因,大家可以去看這 issue 吃瓜
作者:JoyceCloud