Releases: YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.85.yr0-ROCm
ROCm backend changes
- This release will have 2 build files for you to try if one doesn't work for you, the only difference is in the GPU kernel files that are included
koboldcpp_rocm.exe
will have been built with files more similar to how v1.79.yr1-ROCm was compiledkoboldcpp_rocm_b2.exe
will have been built with the same files as the previous version
- Support has been added for experimental HIPGraph usage. (Disabled by default, no performance increase yet.)
- HIP virtual memory management added, but is disabled until upstream fixes are created. ggml-org#11405
koboldcpp-rocm-1.85
Now with 5% more kobo edition
-
New Features:
- NEW: Added Server-Sided (networked) save slots! You can now specify a database file when launching KoboldCpp using
--savedatafile
. Then, you will be able to save and load persistent stories over the network to that KoboldCpp server, and access it from any other browser or device connected to it over the network. This can also be combined with--password
to require an API key to save/load the stories. - Added the ability to switch models, settings and configs at runtime! This also allows for remote model swapping. Credits to @esolithe for original reference implementation.
- Launch with
--admin
to enable this feature, and also provide--admindir
containing.kcpps
launch configs. - Optionally, provide
--adminpassword
to secure admin functions - You will be able to swap between any model's config at runtime from the Admin panel in Lite. You can prepare .kcpps configs for different layers, backends, models, etc.
- KoboldCpp will then terminate the current instance and relaunch to a new config.
- Launch with
- Added Top-N Sigma sampler (credit @EquinoxPsychosis). Note that this sampler can only be combined with Top-K, Temperature, and XTC.
- Added
--exportconfig
, allowing users to export any set of launch arguments as a .kcpps config file from the command line. This file can also be used subsequently for model switching in admin mode. - Minor refactors for TFS and rep pen by @Reithan
- CLIP vision embeddings can now be reused between multiple requests, so they won't have to be reprocessed if the images don't change.
- Context shifting disabled when using mrope (used in Qwen2VL) as it does not work correctly.
- Now defaults to AutoGuess for chat completions adapter. Set to "Alpaca" for the old behavior instead.
- You can now set the maximum resolution accepted by vision mmprojs with
--visionmaxres
. Images larger than that will be downscaled before processing. - You can now set a length limit for TTS, using
--ttsmaxlen
when launching, this limits the number of TTS tokens allowed to be generated (range 512 to 4096). Each 1s of audio is about 75 tokens. - Added support for using aria2c and wget for model downloading if detected on system. (credits @henk717).
- It's also now possible to specify multiple URLs when loading multipart models online with
--model [url1] [url2]...
(CLI only), which will allow KoboldCpp to download multiple model file URLs. - Added automatic recovery in admin mode if it fails when switching to a faulty config, it will attempt to rollback to the original known-good config.
- Added cloudflared tunnel download for aarch64 (thanks @FlippFuzz). Also, allowed SSL combined with remote tunnels.
- NEW: Added Server-Sided (networked) save slots! You can now specify a database file when launching KoboldCpp using
-
Kobold Lite
- NEW: Added deepseek instruct template, and added support for reasoning/thinking template tags. You can configure thinking rendering behavior from Context > Tokens > Thinking
- NEW: Finally allows specifying individual start and end instruct tags instead of combining them. Toggle this in Settings > Toggle End Tags.
- NEW: Multi-pass websearch added. This allows you to specify a template that is used to generate the search query.
- Added improved thinking support, display and allow forced injecting
<think>
tokens in AI replies or filtering out old thoughts in subsequent generations. - Reworked and improved load/save UI, added 2 extra local slots and 8 extra remote save slots.
- Top-N sigma support
- Added customization options for assistant jailbreak prompt
- Refactored 3rd party scenario loader (thanks @Desaroll)
- Fix websearch button visibility
- Improved instruct formatting in classic UI
- Fixed some LaTeX and markdown edge cases
- Upped max length slider to 1024 if detected context is larger than 4096.
- Added a websearch toggle button
- TTS now allows downloading the audio output as a file when testing it, instead of just playing the sound.
- Some regex parsing fixes
- Added admin panel
- Multiple other fixes and improvements
-
Fixes:
- Merged fixes and improvements from upstream
- Fixed .kcppt templates backend override not working
- Updated clinfo binary for windows.
- Fixed MoE experts override not working for Deepseek
- Fixed multiple loader bugs when using the AutoGuess adapter.
- Fixed images failing to generate when using the AutoGuess adapter.
- Removed TTS caching as it was not very good.
- Fixed a bug with TTS that could cause a crash.
KoboldCPP-v1.83.1.yr1-ROCm
p. sure I figured it out CMake Flag Fix
KoboldCPP-v1.82.4.yr0-ROCm
Apparently there's starting to be trouble again with the "unofficially supported" ROCm GPUs..I'm trying to look into it when I'm at home and able to. If the regular koboldcpp_rocm.exe doesn't work for you, try the rocm-5.7 version please
KoboldCPP-v1.82.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.82.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.81.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.80.3.yr0-ROCm
Update cmake-rocm-windows.yml
KoboldCPP-v1.79.1.yr1-ROCm
attempt 6700xt fix for cmake-rocm-windows.yml
KoboldCPP-v1.79.1.yr0-ROCm
attempt 6700xt fix for cmake-rocm-windows.yml
KoboldCPP-v1.78.yr0-ROCm
koboldcpp-rocm-1.78
- NEW: Added support for Flux and Stable Diffusion 3.5 models: Image generation has been updated with new arch support (thanks to stable-diffusion.cpp) with additional enhancements. You can use either fp16 or fp8 safetensor models, or the GGUF models. Supports all-in-one models (bundled T5XXL, Clip-L/G, VAE) or loading them individually.
- Grab an all-in-one flux model here: https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors
- Alternatively, we have a ready to use
.kcppt
template that will setup and download everything you need here: https://huggingface.co/koboldcpp/kcppt/resolve/main/Flux1-Dev.kcppt - Large image handling is also more consistent with VAE tiling, 1024x1024 should work nicely for SDXL and Flux.
- You can specify the new image gen components by loading them with
--sdt5xxl
,--sdclipl
and--sdclipg
(for SD3.5), they work with URL resources as well. - Note: FP16 Flux needs over 20GB of VRAM to work. If you have less VRAM, you should use the quantized GGUFs, or select Compress Weights when loading the Flux model. SD3.5 medium is more forgiving.
- As before, it can be used with the bundled StableUI at http://localhost:5001/sdui/
- Debug mode prints penalties for XTC
- Added a new flag
--nofastforward
, this forces full prompt reprocessing on every request. It can potentially give more repeatable/reliable/consistent results in some cases. - CLBlast support is still retained, but has been further downgraded to "compatibility mode" and is no longer recommended (use Vulkan instead). CLBlast GPU offload must now maintain duplicate a copy of the layers in RAM as well, as it now piggybacks off the CPU backend.
- Added common identity provider
/.well-known/serviceinfo
Haidra-Org/AI-Horde#466 PygmalionAI/aphrodite-engine#807 theroyallab/tabbyAPI#232 - Reverted some changes that reduced speed in HIPBLAS.
- Fixed a bug where bad logprobs JSON was output when logits were
-Infinity
- Updated Kobold Lite, multiple fixes and improvements
- Added support for custom CSS styles
- Added support for generating larger images (select BigSquare in image gen settings)
- Fixed some streaming issues when connecting to Tabby backend
- Better world info length limiting (capped at 50% of max context before appending to memory)
- Added support for Clip Skip for local image generation.
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller.
If you're using Linux, clone the repo and build in terminal with make LLAMA_HIPBLAS=1 -j
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.
Release notes from: https://github.com/LostRuins/koboldcpp/releases/tag/v1.78