Name: ollama-vulkan	Distribution: openSUSE Tumbleweed
Version: 0.12.10	Vendor: openSUSE
Release: 2.1	Build date: Mon Nov 10 20:34:43 2025
Group: Unspecified	Build host: reproducible
Size: 46457400	Source RPM: ollama-0.12.10-2.1.src.rpm
Packager: https://bugs.opensuse.org
Url: https://ollama.com
Summary: Ollama Module using Vulkan

Ollama plugin module using Vulkan.

Provides

Requires

License

MIT

Changelog

* Mon Nov 10 2025 Egbert Eich <eich@suse.com>
  - Consolidate spec file to build for CPU or GPUs from NVIDIA (CUDA)
    and AMD (ROCm). Both are presently disabled on openSUSE, ROCm
    will be available on Tumbleweed soon.
  - Splitting Vulkan, CUDA and ROCm into separate packages. The
    Vulkan, CUDA and ROCm modules are recommended.
* Fri Nov 07 2025 Glen Masgai <glen.masgai@gmail.com>
  - Update to version 0.12.10
    * Fixed errors when running qwen3-vl:235b and
      qwen3-vl:235b-instruct
    * Enable flash attention for Vulkan (currently needs to be built
      from source)
    * Add Vulkan memory detection for Intel GPU using DXGI+PDH
    * Ollama will now return tool call IDs from the /api/chat API
    * Fixed hanging due to CPU discovery
    * Ollama will now show login instructions when switching to a
      cloud model in interactive mode
    * Fix reading stale VRAM data
    * 'ollama run' now works with embedding models
* Sun Nov 02 2025 Glen Masgai <glen.masgai@gmail.com>
  - Fixed issue with duplicated libraries (/usr/lib, /usr/lib64)
  - Update to version 0.12.9
    * Fix performance regression on CPU-only systems
  - Update to version 0.12.8
    * qwen3-vl performance improvements, including flash attention
      support by default
    * qwen3-vl will now output less leading whitespace in the
      response when thinking
    * Fixed issue where deepseek-v3.1 thinking could not be disabled
      in Ollama's new app
    * Fixed issue where qwen3-vl would fail to interpret images with
      transparent backgrounds
    * Ollama will now stop running a model before removing it via
      ollama rm
    * Fixed issue where prompt processing would be slower on
      Ollama's engine
  - Update to version 0.12.7
    * New model: Qwen3-VL: Qwen3-VL is now available in all parameter
      sizes ranging from 2B to 235B
    * New model: MiniMax-M2: a 230 Billion parameter model built for
      coding & agentic workflows available on Ollama's cloud
    * Model load failures now include more information on Windows
    * Fixed embedding results being incorrect when running
      embeddinggemma
    * Fixed gemma3n on Vulkan backend
    * Increased time allocated for ROCm to discover devices
    * Fixed truncation error when generating embeddings
    * Fixed request status code when running cloud models
    * The OpenAI-compatible /v1/embeddings endpoint now supports
      encoding_format parameter
    * Ollama will now parse tool calls that don't conform to
      {"name": name, "arguments": args} (thanks @rick-github!)
    * Fixed prompt processing reporting in the llama runner
    * Increase speed when scheduling models
    * Fixed issue where FROM <model> would not inherit RENDERER or
      PARSER commands
* Sun Oct 26 2025 Yoshio Sato <vasua.ukraine@gmail.com>
  - Require groups video and render instead of providing them while
    competing with the system-group-hardware package
* Sat Oct 18 2025 Glen Masgai <glen.masgai@gmail.com>
  - Update vendored golang.org/x/net/html to v0.46.0
  - Update to version 0.12.6
    * Experimental Vulkan support
    * Ollama's app now supports searching when running DeepSeek-V3.1,
      Qwen3 and other models that support tool calling.
    * Flash attention is now enabled by default for Gemma 3,
      improving performance and memory utilization
    * Fixed issue where Ollama would hang while generating responses
    * Fixed issue where qwen3-coder would act in raw mode when using
      /api/generate or ollama run qwen3-coder <prompt>
    * Fixed qwen3-embedding providing invalid results
    * Ollama will now evict models correctly when num_gpu is set
    * Fixed issue where tool_index with a value of 0 would not be
      sent to the model
  - Add ollama user to render group
* Sat Oct 11 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update vendored golang.org/x/net/html to v0.45.0
    [boo#1251413] [CVE-2025-47911] [boo#1241757] [CVE-2025-22872]
  - Update to version 0.12.5:
    * Fixed issue where "think": false would show an error instead of
      being silently ignored
    * Fixed deepseek-r1 output issues
  - Update to version 0.12.4:
    * Flash attention is now enabled by default for Qwen 3 and Qwen 3
      Coder
    * Fixed an issue where keep_alive in the API would accept
      different values for the /api/chat and /api/generate endpoints
    * Fixed tool calling rendering with qwen3-coder
    * More reliable and accurate VRAM detection
    * OLLAMA_FLASH_ATTENTION can now be overridden to 0 for models
      that have flash attention enabled by default
    * Fixed crash where templates were not correctly defined
* Sat Oct 04 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.12.3:
    * New models: DeepSeek-V3.1-Terminus, Kimi K2-Instruct-0905
    * Fixed issue where tool calls provided as stringified JSON
      would not be parsed correctly
    * ollama push will now provide a URL to follow to sign in
    * Fixed issues where qwen3-coder would output unicode characters
      incorrectly
    * Fix issue where loading a model with /load would crash
  - Update to version 0.12.2:
    * A new web search API is now available in Ollama
    * Models with Qwen3's architecture including MoE now run in
      Ollama's new engine
    * Fixed issue where built-in tools for gpt-oss were not being
      rendered correctly
    * Support multi-regex pretokenizers in Ollama's new engine
    * Ollama's new engine can now load tensors by matching a prefix
      or suffix
  - Update to version 0.12.1:
    * New model: Qwen3 Embedding: state of the art open embedding
      model by the Qwen team
    * Qwen3-Coder now supports tool calling
    * Fixed issue where Gemma3 QAT models would not output correct
      tokens
    * Fix issue where & characters in Qwen3-Coder would not be parsed
      correctly when function calling
    * Fixed issues where ollama signin would not work properly
  - Update to version 0.12.0:
    * Cloud models are now available in preview
    * Models with the Bert architecture now run on Ollama's engine
    * Models with the Qwen 3 architecture now run on Ollama's engine
    * Fixed issue where models would not be imported correctly with
      ollama create
    * Ollama will skip parsing the initial <think> if provided in
      the prompt for /api/generate
  - Update to version 0.11.11:
    * Improved memory usage when using gpt-oss
    * Fixed error that would occur when attempting to import
      safetensor files
    * Improved memory estimates for hybrid and recurrent models
    * Fixed error that would occur when when batch size was greater
      than context length
    * Flash attention & KV cache quantization validation fixes
    * Add dimensions field to embed requests
    * Enable new memory estimates in Ollama's new engine by default
    * Ollama will no longer load split vision models in the Ollama engine
* Tue Sep 09 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.11.10:
    * Added support for EmbeddingGemma, a new open embedding model
  - Update to version 0.11.9:
    * Improved performance via overlapping GPU and CPU computations
  - Update to version 0.11.8:
    * gpt-oss now has flash attention enabled by default for systems
      that support it
    * Improved load times for gpt-oss
* Mon Aug 25 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.11.7:
    * DeepSeek-V3.1 is now available to run via Ollama.
    * Fixed issue where multiple models would not be loaded on
      CPU-only systems
    * Ollama will now work with models who skip outputting the
      initial <think> tag (e.g. DeepSeek-V3.1)
    * Fixed issue where text would be emitted when there is no
      opening <think> tag from a model
    * Fixed issue where tool calls containing { or } would not be
      parsed correctly
  - Update to version 0.11.6:
    * Improved performance when using flash attention
    * Fixed boundary case when encoding text using BPE
  - Update to version 0.11.5:
    * Performance improvements for the gpt-oss models
    * Improved memory management for scheduling models on GPUs,
      leading to better VRAM utilization, model performance and less
      out of memory errors. These new memory estimations can be
      enabled with OLLAMA_NEW_ESTIMATES=1 ollama serve and will soon
      be enabled by default.
    * Improved multi-GPU scheduling and reduced VRAM allocation when
      using more than 2 GPUs
    * Fix error when parsing bad harmony tool calls
    * OLLAMA_FLASH_ATTENTION=1 will also enable flash attention for
      pure-CPU models
    * Fixed OpenAI-compatible API not supporting reasoning_effort
* Thu Aug 07 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.11.4:
    * openai: allow for content and tool calls in the same message
    * openai: when converting role=tool messages, propagate the tool
      name
    * openai: always provide reasoning
    * Bug fixes
* Wed Aug 06 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.11.0:
    * New model: OpenAI gpt-oss 20B and 120B
    * Quantization - MXFP4 format
* Tue Aug 05 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.10.1:
    * No notable changes.
  - Update to version 0.10.0:
    * ollama ps will now show the context length of loaded models
    * Improved performance in gemma3n models by 2-3x
    * Parallel request processing now defaults to 1
    * Fixed issue where tool calling would not work correctly with
      granite3.3 and mistral-nemo models
    * Fixed issue where Ollama's tool calling would not work
      correctly if a tool's name was part of of another one, such as
      add and get_address
    * Improved performance when using multiple GPUs by 10-30%
    * Ollama's OpenAI-compatible API will now support WebP images
    * Fixed issue where ollama show would report an error
    * ollama run will more gracefully display errors
* Thu Jul 03 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.9.5:
    * No notable changes.
  - Update to version 0.9.4:
    * The directory in which models are stored can now be modified.
    * Tool calling with empty parameters will now work correctly
    * Fixed issue when quantizing models with the Gemma 3n
    architecture
  - Update to version 0.9.3:
    * Ollama now supports Gemma 3n
    * Ollama will now limit context length to what the model was
    trained against to avoid strange overflow behavior
  - Update to version 0.9.2:
    * Fixed issue where tool calls without parameters would not be
      returned correctly
    * Fixed does not support generate errors
    * Fixed issue where some special tokens would not be tokenized
    properly for some model architectures
* Tue Jun 17 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.9.1:
    * Tool calling reliability and performance has been improved for
      the following models: Magistral Llama 4 Mistral
      DeepSeek-R1-2508
    * Magistral now supports disabling thinking mode
    * Error messages that previously showed POST predict will now be
      more informative
* Sat May 31 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.9.0:
    * Ollama now has the ability to enable or disable thinking.
      This gives users the flexibility to choose the model’s thinking
      behavior for different applications and use cases.
  - Update to version 0.8.0:
    * Ollama will now stream responses with tool calls
    * Logs will now include better memory estimate debug information
      when running models in Ollama's engine.
  - Update to version 0.7.1:
    * Improved model memory management to allocate sufficient memory
      to prevent crashes when running multimodal models in certain
      situations
    * Enhanced memory estimation for models to prevent unintended
      memory offloading
    * ollama show will now show ... when data is truncated
    * Fixed crash that would occur with qwen2.5vl
    * Fixed crash on Nvidia's CUDA for llama3.2-vision
    * Support for Alibaba's Qwen 3 and Qwen 2 architectures in
      Ollama's new multimodal engine
* Fri May 23 2025 Wolfgang Engel <wolfgang.engel@suse.com>
  - Cleanup part in spec file where build for SLE-15-SP6 and above
    is defined to make if condition more robust
* Wed May 21 2025 Wolfgang Engel <wolfgang.engel@suse.com>
  - Allow to build for Package Hub for SLE-15-SP7
    (openSUSE:Backports:SLE-15-SP7) with g++-12/gcc-12
    by checking for sle_version >= 150600 in spec file (bsc#1243438)
* Sat May 17 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.7.0:
    * Ollama now supports multimodal models via Ollama’s new engine,
      starting with new vision multimodal models:
      ~ Meta Llama 4
      ~ Google Gemma 3
      ~ Qwen 2.5 VL
      ~ Qwen 2.5 VL
    * Ollama now supports providing WebP images as input to
      multimodal models
    * Improved performance of importing safetensors models via
      ollama create
    * Various bug fixes and performance enhancements
* Tue May 06 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.6.8:
    * Performance improvements for Qwen 3 MoE models on NVIDIA and
      AMD GPUs
    * Fixed a memory leak that occurred when providing images as
      input
    * ollama show will now correctly label older vision models such
      as llava
    * Reduced out of memory errors by improving worst-case memory
      estimations
    * Fix issue that resulted in a context canceled error
  - Update to version 0.6.7:
    * New model: Qwen 3
    * New model: Phi 4 reasoning and Phi 4 mini reasoning
    * New model: llama 4
    * Increased default context window to 4096 tokens
    * Fixed issue where image paths would not be recognized with ~
      when being provided to ollama run
    * Improved output quality when using JSON mode in certain
      scenarios
    * Fixed issue where model would be stuck in the Stopping...
      state
  - Use source url (https://en.opensuse.org/SourceUrls)
* Thu Apr 24 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.6.6:
    * New model: IBM Granite 3.3
    * New model: DeepCoder
    * New, faster model downloading: OLLAMA_EXPERIMENT=client2
      ollama serve will run Ollama using a new downloader with
      improved performance and reliability when running ollama pull
    * Fixed memory leak issues when running Gemma 3, Mistral Small
      3.1 and other models on Ollama
    * Improved performance of ollama create when importing models
      from Safetensors
    * Ollama will now allow tool function parameters with either a
      single type or an array of types
    * Fixed certain out-of-memory issues caused by not reserving
      enough memory at startup
    * Fixed nondeterministic model unload order
    * Included the items and $defs fields to properly handle array
      types in the API
    * OpenAI-Beta headers are now included in the CORS safelist
    * Fixed issue where model tensor data would be corrupted when
      importing models from Safetensors
* Sat Apr 19 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Add ollama to the video group
  - Update to version 0.6.5:
    * Add support for mistral-small
    * Fix issues with spm tokenizer for Gemma 3 models
    * Add checks for values falling out of sliding window cache
    * Improve file descriptor management for tensors and
      Pull operations
    * Add gfx1200 & gfx1201 GPU support on Linux
    * Optimize sliding window attention and KV cache implementations
    * Implement loading tensors in 32KiB chunks for better performance
    * Add autotemplate for gemma3 models
    * Add benchmarking for ollama server performance
    * Fix file handling in /proc/cpuinfo discovery
    * Support heterogeneous KV cache layer sizes in memory estimation
    * Fix debug logging for memory estimates
    * Improve error handling for empty logits and tensor data reading
    * Return model capabilities from the show endpoint
* Tue Mar 25 2025 me@levitati.ng
  - Update to version 0.6.2:
    * Multiple images are now supported in Gemma 3
    * Fixed issue where running Gemma 3 would consume a large amount
      of system memory
    * ollama create --quantize now works when converting Gemma 3
      from safetensors
    * Fixed issue where /save would not work if running a model
      with / in the name
    * Add support for AMD Strix Halo GPUs
* Tue Mar 18 2025 Bernhard Wiedemann <bwiedemann@suse.com>
  - Only require git-core
* Fri Mar 14 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update BuildRequires to go1.24
  - Update to version 0.6.0:
    * New model: Gemma 3
    * Fixed error that would occur when running
      snowflake-arctic-embed and snowflake-arctic-embed2 models
    * Various performance improvements and bug fixes
* Wed Mar 12 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.5.13:
    * New models: Phi-4-Mini, Granite-3.2-Vision, Command R7B Arabic
    * The default context length can now be set with a new
      OLLAMA_CONTEXT_LENGTH environment variable. For example, to set
      the default context length to 8K, use:
      OLLAMA_CONTEXT_LENGTH=8192 ollama serve
    * Fixed issue where bf16 GGUF files could not be imported
    * Ollama is now be able to accept requests from Visual Studio
      Code and Cursor by accepting requests from origins beginning
      with vscode-file://
    * Various performance improvements and bug fixes
* Thu Feb 27 2025 eyadlorenzo@gmail.com
  - Update to version 0.5.12:
    * New model: Perplexity R1 1776
    * The OpenAI-compatible API will now return tool_calls if the
      model called a tool
    * Performance on certain Intel Xeon processors should now be
      restored
    * Fixed permission denied issues after installing Ollama on Linux
    * Fixed issue where additional CPU libraries were included in the
      arm64 Linux install
    * The progress bar will no longer flicker when running ollama
      pull
    * Fixed issue where running a model would fail on Linux if Ollama
      was installed in a path with UTF-8 characters
    * X-Stainless-Timeout will now be accepted as a header in the
      OpenAI API endpoints
* Sat Feb 15 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Use Ninja instead of Make and update the build script to
    match the new version
  - Update to version 0.5.11:
    * No notable changes for Linux
  - Update to version 0.5.10:
    * Fixed issue on multi-GPU Windows and Linux machines where
      memory estimations would be incorrect
  - Update to version 0.5.9:
    * New model: DeepScaleR
    * New model: OpenThinker
  - Update to version 0.5.8:
    * Ollama will now use AVX-512 instructions where available for
      additional CPU acceleration
    * Fixed indexing error that would occur when downloading a model
      with ollama run or ollama pull
    * Fixes cases where download progress would reverse
* Mon Jan 27 2025 Adrian Schröter <adrian@suse.de>
  - Make ollama configurable by the admin via /etc/sysconfig/ollama
    (boo#1236008)
  - cleanup reproducible.patch
* Thu Jan 16 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Removed 01-build-verbose.patch: embedded GOFLAG into .spec file
  - Disabled reproducible.patch: should be not needed, as .gz is
    not produced anymore
  - Update to version 0.5.7:
    * Fixed issue where using two FROM commands in Modelfile
    * Support importing Command R and Command R+ architectures
      from safetensors
  - Update to version 0.5.6:
    * Fixed errors that would occur when running ollama create on
      Windows and when using absolute paths
  - Update to version 0.5.5:
    * New models:
      ~ Phi-4
      ~ Command R7B
      ~ DeepSeek-V3
      ~ OLMo 2
      ~ Dolphin 3
      ~ SmallThinker:
      ~ Granite 3.1 Dense
      ~ Granite 3.1 MoE
    * The /api/create API endpoint that powers ollama create has
      been changed to improve conversion time and also accept a JSON
      object.
    * Fixed runtime error that would occur when filling the model's
      context window
    * Fixed crash that would occur when quotes were used in /save
    * Fixed errors that would occur when sending x-stainless headers
      from OpenAI clients
  - Update to version 0.5.4:
    * New model: Falcon3
    * Fixed issue where providing null to format would result in
      an error
  - Update to version 0.5.3:
    * Fixed runtime errors on older Intel Macs
    * Fixed issue where setting the format field to "" would cause
      an error
  - Update to version 0.5.2:
    * New model: EXAONE 3.5
    * Fixed issue where whitespace would get trimmed from prompt
      when images were provided
    * Improved memory estimation when scheduling models
    * OLLAMA_ORIGINS will now check hosts in a case insensitive
      manner
* Thu Dec 12 2024 Bernhard Wiedemann <bwiedemann@suse.com>
  - Add reproducible.patch for deterministic .gz creation (boo#1047218)
* Sat Dec 07 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.5.1:
    * Fixed issue where Ollama's API would generate JSON output when
      specifying "format": null
    * Fixed issue where passing --format json to ollama run would
      cause an error
  - Update to version 0.5.0:
    * New models:
      ~ Llama 3.3: a new state of the art 70B model.
      ~ Snowflake Arctic Embed 2: Snowflake's frontier embedding
      model.
    * Ollama now supports structured outputs, making it possible to
      constrain a model's output to a specific format defined by a
      JSON schema. The Ollama Python and JavaScript libraries have
      been updated to support structured outputs, together with
      Ollama's OpenAI-compatible API endpoints.
    * Fixed error importing model vocabulary files
    * Experimental: new flag to set KV cache quantization to 4-bit
      (q4_0), 8-bit (q8_0) or 16-bit (f16). This reduces VRAM
      requirements for longer context windows.
  - Update to version 0.4.7:
    * Enable index tracking for tools - openai api support (#7888)
    * llama: fix typo and formatting in readme (#7876)
    * readme: add SpaceLlama, YouLama, and DualMind to community
      integrations (#7216)
* Sat Nov 30 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.4.6:
    * New model: QwQ: an experimental research model by the Qwen
      team, focused on advancing AI reasoning capabilities.
    * Tool calls will now be included in streaming responses
    * Ollama will now provide an error when submitting SVG images
    * Image tokens will no longer be counted in token counts when
      running a text-only model
  - Update to version 0.4.5:
    * The Ollama Python Library has been updated
    * Fixed issue where HTTPS_PROXY and HTTP_PROXY environment
      variables would have no effect
    * Ollama will now accept X-Stainless-Retry-Count used by many
      OpenAI API clients
    * Fix issue where importing certain GGUF files would result in
      the incorrect quantization level
    * ollama push will now print the uploaded model URL on
      ollama.com
  - Update to version 0.4.4:
    * Marco-o1: An open large reasoning model for real-world
      solutions by the Alibaba International Digital Commerce Group
      (AIDC-AI).
    * Fixed issue where Ollama would freeze when processing requests
      in parallel (e.g. when using code completion tools)
    * Redirecting output to a file no longer outputs progress bars
      or spinners
  - Update to version 0.4.3:
    * New model: Tülu 3 is a leading instruction following model
      family, offering fully open-source data, code, and recipes by
      the The Allen Institute for AI.
    * New model: Mistral Large: a new version of Mistral Large with
      improved Long Context, Function Calling and System Prompt
      support.
    * Improved performance issues that occurred in Ollama versions
      0.4.0-0.4.2
    * Fixed issue that would cause granite3-dense to generate empty
      responses
    * Fixed crashes and hanging caused by KV cache management
* Sat Nov 16 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.4.2:
    * runner.go: Propagate panics back to the user.
    * runner.go: Increase survivability of main processing loop
    * build: fix arm container image (#7674)
    * add line numbers for parser errors (#7326)
    * chore(deps): bump golang.org/x dependencies (#7655)
    * runner.go: Don't trim whitespace from inputs
    * runner.go: Enforce NUM_PARALLEL directly in the runner
    * cmd: preserve exact bytes when displaying template/system layers (#7586)
    * fix(mllama): sync backend between batches
    * runner.go: Fix off-by-one for num predicted
    * CI: give windows lint more time (#7635)
    * Jetpack support for Go server (#7217)
    * doc: capture numeric group requirement (#6941)
    * docs: Capture docker cgroup workaround (#7519)
    * runner.go: Make KV entry accounting more robust
    * readme: add aichat terminal app to community integrations (#7418)
    * api: fix typos in Go Doc comments (#7620)
    * readme: add GoLamify to community integrations (#7521)
    * readme: add browser extension that enables using Ollama for interacting with web pages (#5827)
    * docs: add mentions of Llama 3.2 (#7517)
    * api: fix typo in python ClientFromEnvironment docs (#7604)
    * readme: add llama3.2-vision to model list (#7580)
* Mon Nov 11 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Add patch 01-build-verbose.patch to add the -v option
    to go build
  - Update to version 0.4.1:
    * runner.go: Check for zero length images
    * docs: update langchainpy.md with proper model name (#7527)
    * Set macos min version for all architectures (#7579)
    * win: remove preview title from installer (#7529)
    * Workaround buggy P2P ROCm copy on windows (#7466)
    * Debug logging for nvcuda init (#7532)
    * Align rocm compiler flags (#7467)
    * Be explicit for gpu library link dir (#7560)
    * docs: OLLAMA_NEW_RUNNERS no longer exists
    * runner.go: Remove unused arguments
    * sched: Lift parallel restriction for multimodal models except mllama
* Thu Nov 07 2024 adrian@suse.de
  - Update to version 0.4.0:
    * Update README.md (#7516)
    * One corrupt manifest should not wedge model operations (#7515)
    * prompt: Use a single token when estimating mllama context size
    * readme: add Hexabot to the list of community integrations
    * Quiet down debug log of image payload (#7454)
* Wed Nov 06 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.4.0-rc8:
    * CI: Switch to v13 macos runner (#7498)
    * CI: matrix strategy fix (#7496)
    * Sign windows arm64 official binaries (#7493)
    * readme: add TextCraft to community integrations (#7377)
    * nvidia libs have inconsistent ordering (#7473)
    * CI: omit unused tools for faster release builds (#7432)
    * llama: Improve error handling
    * runner.go: Only allocate 1 element embedding batches for mllama
    * refactor kv estimation
    * mllama cross attention
    * Add basic mllama integration tests (#7455)
    * runner.go: Don't set cross attention before sending embeddings
    * Give unicode test more time to run (#7437)
* Fri Nov 01 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Remove enable-lto.patch
  - Update to version 0.4.0-rc6:
    * Refine default thread selection for NUMA systems (#7322)
    * runner.go: Better abstract vision model integration
    * Soften windows clang requirement (#7428)
    * Remove submodule and shift to Go server - 0.4.0  (#7157)
    * Move windows app out of preview (#7347)
    * windows: Support alt install paths, fit and finish (#6967)
    * add more tests for getting the optimal tiled canvas (#7411)
    * Switch windows to clang (#7407)
    * tests: Add test for Unicode processing
    * runner.go: Better handle return NULL values from llama.cpp
    * add mllama image processing to the generate handler (#7384)
    * Bump to latest Go 1.22 patch (#7379)
    * Fix deepseek deseret regex (#7369)
    * Better support for AMD multi-GPU on linux (#7212)
    * Fix unicode output on windows with redirect to file (#7358)
    * Fix incremental build file deps (#7361)
    * Improve dependency gathering logic (#7345)
    * fix #7247 - invalid image input (#7249)
    * integration: harden embedding test (#7306)
    * default to "FROM ." if a Modelfile isn't present (#7250)
    * Fix rocm windows build and clean up dependency gathering (#7305)
    * runner.go: Merge partial unicode characters before sending
    * readme: add Ollama for Swift to the community integrations (#7295)
    * server: allow vscode-webview origin (#7273)
    * image processing for llama3.2 (#6963)
    * llama: Decouple patching script from submodule (#7139)
    * llama: add compiler tags for cpu features (#7137)
* Wed Oct 30 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
  - Update to version 0.3.14:
    * New Models
      + Granite 3 MoE: The IBM Granite 1B and 3B models are the
      first mixture of experts (MoE) Granite models from IBM
      designed for low latency usage.
      + Granite 3 Dense: The IBM Granite 2B and 8B models are
      designed to support tool-based use cases and support for
      retrieval augmented generation (RAG), streamlining code
      generation, translation and bug fixing.
* Sat Oct 12 2024 eyadlorenzo@gmail.com
  - Update to version 0.3.13:
    * New safety models:
      ~ Llama Guard 3: a series of models by Meta, fine-tuned for
      content safety classification of LLM inputs and responses.
      ~ ShieldGemma: ShieldGemma is set of instruction tuned models
      from Google DeepMind for evaluating the safety of text
      prompt input and text output responses against a set of
      defined safety policies.
    * Fixed issue where ollama pull would leave connections when
      encountering an error
    * ollama rm will now stop a model if it is running prior to
      deleting it
* Sat Sep 28 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
  - Update to version 0.3.12:
    * Llama 3.2: Meta's Llama 3.2 goes small with 1B and 3B
      models.
    * Qwen 2.5 Coder: The latest series of Code-Specific Qwen
      models, with significant improvements in code generation,
      code reasoning, and code fixing.
    * Ollama now supports ARM Windows machines
    * Fixed rare issue where Ollama would report a missing .dll
      file on Windows
    * Fixed performance issue for Windows without GPUs
* Fri Sep 20 2024 adrian@suse.de
  - Update to version 0.3.11:
    * llm: add solar pro (preview) (#6846)
    * server: add tool parsing support for nemotron-mini (#6849)
    * make patches git am-able
    * CI: dist directories no longer present (#6834)
    * CI: clean up naming, fix tagging latest (#6832)
    * CI: set platform build build_linux script to keep buildx happy (#6829)
    * readme: add Agents-Flex to community integrations (#6788)
    * fix typo in import docs (#6828)
    * readme: add vim-intelligence-bridge to Terminal section (#6818)
    * readme: add Obsidian Quiz Generator plugin to community integrations (#6789)
    * Fix incremental builds on linux (#6780)
    * Use GOARCH for build dirs (#6779)
    * Optimize container images for startup (#6547)
    * examples: updated requirements.txt for privategpt example
    * examples: polish loganalyzer example (#6744)
    * readme: add ollama_moe to community integrations (#6752)
    * runner: Flush pending responses before returning
    * add "stop" command (#6739)
    * refactor show ouput
    * readme: add QodeAssist to community integrations (#6754)
    * Verify permissions for AMD GPU (#6736)
    * add *_proxy for debugging
    * docs: update examples to use llama3.1 (#6718)
    * Quiet down dockers new lint warnings (#6716)
    * catch when model vocab size is set correctly (#6714)
    * readme: add crewAI to community integrations (#6699)
    * readme: add crewAI with mesop to community integrations
* Tue Sep 17 2024 adrian@suse.de
  - Update to version 0.3.10:
    * openai: align chat temperature and frequency_penalty options with completion (#6688)
    * docs: improve linux install documentation (#6683)
    * openai: don't scale temperature or frequency_penalty (#6514)
    * readme: add Archyve to community integrations (#6680)
    * readme: add Plasmoid Ollama Control to community integrations (#6681)
    * Improve logging on GPU too small (#6666)
    * openai: fix "presence_penalty" typo and add test (#6665)
    * Fix gemma2 2b conversion (#6645)
    * Document uninstall on windows (#6663)
    * Revert "Detect running in a container (#6495)" (#6662)
    * llm: make load time stall duration configurable via OLLAMA_LOAD_TIMEOUT
    * Introduce GPU Overhead env var (#5922)
    * Detect running in a container (#6495)
    * readme: add AiLama to the list of community integrations (#4957)
    * Update gpu.md: Add RTX 3050 Ti and RTX 3050 Ti (#5888)
    * server: fix blob download when receiving a 200 response  (#6656)
    * readme: add Gentoo package manager entry to community integrations (#5714)
    * Update install.sh：Replace "command -v" with encapsulated functionality (#6035)
    * readme: include Enchanted for Apple Vision Pro (#4949)
    * readme: add lsp-ai to community integrations (#5063)
    * readme: add ollama-php library to community integrations (#6361)
    * readme: add vnc-lm discord bot community integration (#6644)
    * llm: use json.hpp from common (#6642)
    * readme: add confichat to community integrations (#6378)
    * docs: add group to manual Linux isntructions and verify service is running (#6430)
    * readme: add gollm to the list of community libraries (#6099)
    * readme: add Cherry Studio to community integrations (#6633)
    * readme: add Go fun package (#6421)
    * docs: fix spelling error (#6391)
    * install.sh: update instructions to use WSL2 (#6450)
    * readme: add claude-dev to community integrations (#6630)
    * readme: add PyOllaMx project (#6624)
    * llm: update llama.cpp commit to 8962422 (#6618)
    * Use cuda v11 for driver 525 and older (#6620)
    * Log system memory at info (#6617)
    * readme: add Painting Droid community integration (#5514)
    * readme: update Ollama4j link and add link to Ollama4j Web UI (#6608)
    * Fix sprintf to snprintf (#5664)
    * readme: add PartCAD tool to readme for generating 3D CAD models using Ollama (#6605)
    * Reduce docker image size (#5847)
    * readme: add OllamaFarm project (#6508)
    * readme: add go-crew and Ollamaclient projects (#6583)
    * docs: update faq.md for OLLAMA_MODELS env var permissions (#6587)
    * fix(cmd): show info may have nil ModelInfo (#6579)
    * docs: update GGUF examples and references (#6577)
    * Add findutils to base images (#6581)
    * remove any unneeded build artifacts
    * doc: Add Nix and Flox to package manager listing (#6074)
    * update the openai docs to explain how to set the context size (#6548)
    * fix(test): do not clobber models directory
    * add llama3.1 chat template (#6545)
    * update deprecated warnings
    * validate model path
    * throw an error when encountering unsupport tensor sizes (#6538)
    * Move ollama executable out of bin dir (#6535)
    * update templates to use messages
    * more tokenizer tests
    * add safetensors to the modelfile docs (#6532)
    * Fix import image width (#6528)
    * Update manual instructions with discrete ROCm bundle (#6445)
    * llm: fix typo in comment (#6530)
    * adjust image sizes
    * clean up convert tokenizer
    * detect chat template from configs that contain lists
    * update the import docs (#6104)
    * server: clean up route names for consistency (#6524)
    * Only enable numa on CPUs (#6484)
    * gpu: Group GPU Library sets by variant (#6483)
    * update faq
    * passthrough OLLAMA_HOST path to client
    * convert safetensor adapters into GGUF (#6327)
    * gpu: Ensure driver version set before variant (#6480)
    * llm: Align cmake define for cuda no peer copy (#6455)
    * Fix embeddings memory corruption (#6467)
    * llama3.1
    * convert gemma2
    * create bert models from cli
    * bert
    * Split rocm back out of bundle (#6432)
    * CI: remove directories from dist dir before upload step (#6429)
    * CI: handle directories during checksum (#6427)
    * Fix overlapping artifact name on CI
    * Review comments
    * Adjust layout to bin+lib/ollama
    * Remove Jetpack
    * Add windows cuda v12 + v11 support
    * Enable cuda v12 flags
    * Add cuda v12 variant and selection logic
    * Report GPU variant in log
    * Add Jetson cuda variants for arm
    * Wire up ccache and pigz in the docker based build
    * Refactor linux packaging
    * server: limit upload parts to 16 (#6411)
    * Fix white space.
    * Reset NumCtx.
    * Override numParallel only if unset.
    * fix: chmod new layer to 0o644 when creating it
    * fix: Add tooltip to system tray icon
    * only skip invalid json manifests
    * skip invalid manifest files
    * fix noprune
    * add `CONTRIBUTING.md` (#6349)
    * Fix typo and improve readability (#5964)
    * server: reduce max connections used in download (#6347)
    * update chatml template format to latest in docs (#6344)
    * lint
    * Update openai.md to remove extra checkbox (#6345)
    * llama3.1 memory
* Thu Aug 15 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.3.6:
    * Fixed issue where /api/embed would return an error instead of
      loading the model when the input field was not provided.
    * ollama create can now import Phi-3 models from Safetensors
    * Added progress information to ollama create when importing GGUF
      files
    * Ollama will now import GGUF files faster by minimizing file
      copies
  - Update to version 0.3.6:
    * Fixed issue where temporary files would not be cleaned up
    * Fix rare error when Ollama would start up due to invalid model
      data
* Sun Aug 11 2024 Alessandro de Oliveira Faria <cabelo@opensuse.org>
  - Update to version 0.3.4:
    * New embedding models
    - BGE-M3: a large embedding model from BAAI distinguished for
      its versatility in Multi-Functionality, Multi-Linguality, and
      Multi-Granularity.
    - BGE-Large: a large embedding model trained in english.
    - Paraphrase-Multilingual: A multilingual embedding model
      trained on parallel data for 50+ languages.
    * New embedding API with batch support
    - Ollama now supports a new API endpoint /api/embed for
      embedding generation:
    * This API endpoint supports new features:
    - Batches: generate embeddings for several documents in
      one request
    - Normalized embeddings: embeddings are now normalized,
      improving similarity results
    - Truncation: a new truncate parameter that will error if
      set to false
    - Metrics: responses include load_duration, total_duration and
      prompt_eval_count metrics
* Sat Aug 03 2024 eyadlorenzo@gmail.com
  - Update to version 0.3.3:
    * The /api/embed endpoint now returns statistics: total_duration,
      load_duration, and prompt_eval_count
    * Added usage metrics to the /v1/embeddings OpenAI compatibility
      API
    * Fixed issue where /api/generate would respond with an empty
      string if provided a context
    * Fixed issue where /api/generate would return an incorrect
      value for context
    * /show modefile will now render MESSAGE commands correctly
  - Update to version 0.3.2:
    * Fixed issue where ollama pull would not resume download
      progress
    * Fixed issue where phi3 would report an error on older versions
* Tue Jul 30 2024 Adrian Schröter <adrian@suse.de>
  - Update to version 0.3.1:
    * Added support for min_p sampling option
    * Lowered number of requests required when downloading models
      with ollama pull
    * ollama create will now autodetect required stop parameters
      when importing certain models
    * Fixed issue where /save would cause parameters to be saved
      incorrectly.
    * OpenAI-compatible API will now return a finish_reason of
      tool_calls if a tool call occured.
* Mon Jul 29 2024 Adrian Schröter <adrian@suse.de>
  - fix build on leap 15.6
  - exclude builds on 32bit due to build failures
* Sun Jul 28 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.3.0:
    * Ollama now supports tool calling with popular models such
      as Llama 3.1. This enables a model to answer a given prompt
      using tool(s) it knows about, making it possible for models to
      perform more complex tasks or interact with the outside world.
    * New models:
      ~ Llama 3.1
      ~ Mistral Large 2
      ~ Firefunction v2
      ~ Llama-3-Groq-Tool-Use
    * Fixed duplicate error message when running ollama create
* Wed Jul 24 2024 adrian@suse.de
  - Update to version 0.2.8:
    * api embed docs (#5282)
    * convert: capture `head_dim` for mistral (#5818)
    * Update llama.cpp submodule commit to `d94c6e0c` (#5805)
    * server: collect nested tool call objects when parsing (#5824)
    * Remove no longer supported max vram var
    * Refine error reporting for subprocess crash
    * Remove out of space test temporarily (#5825)
    * llm: consider `head_dim` in llama arch (#5817)
    * Adjust windows ROCm discovery
    * add patch for tekken (#5807)
    * preserve last assistant message (#5802)
    * Fix generate test flakyness (#5804)
    * server: validate template (#5734)
    * OpenAI: Function Based Testing (#5752)
    * adjust openai chat msg processing (#5729)
    * fix parsing tool calls
    * server: check for empty tools array too (#5779)
    * always provide content even if empty (#5778)
    * server: only parse tool calls if tools are provided (#5771)
    * Fix context exhaustion integration test for small gpus
    * Refine scheduler unit tests for reliability
* Thu Jul 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Fixed issue with shared libraries
* Thu Jul 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Added %check section
  - Use -v when building
  - Update to version 0.2.6:
    * New models: MathΣtral is a 7B model designed for math
      reasoning and scientific discovery by Mistral AI.
    * Fixed issue where uppercase roles such as USER would no longer
      work in the chat endpoints
    * Fixed issue where empty system message would be included in the
      prompt
* Sun Jul 14 2024 eyadlorenzo@gmail.com
  - Update to version 0.2.5:
    * Fixed issue where a model's SYSTEM message not be applied
  - Update to version 0.2.4:
    * Fixed issue where context, load_duration and total_duration
      fields would not be set in the /api/generate endpoint.
    * Ollama will no longer error if loading models larger than
      system memory if disk space is available
  - Update to version 0.2.3:
    * Fix issue where system prompt would not be applied
  - Update to version 0.2.2:
    * Fixed errors that occurred when using Ollama with Nvidia V100
      GPUs
    * glm4 models will no longer fail to load from out of memory
      errors
    * Fixed error that would occur when running deepseek-v2 and
      deepseek-coder-v2 models
    * Fixed a series of out of memory issues when using Nvidia
      GPUs
    * Fixed a series of errors that would occur when using multiple
      Radeon GPUs
  - Update to version 0.2.1:
    * Fixed issue where setting OLLAMA_NUM_PARALLEL would cause
      models to be reloaded after each request
  - Update to version 0.2.0:
    * Ollama 0.2.0 is now available with concurrency support.
      This unlocks 2 specific features:
      ~ Ollama can now serve multiple requests at the same time
      ~ Ollama now supports loading different models at the same time
    * New models: GLM-4: A strong multi-lingual general language
      model with competitive performance to Llama 3.
    * New models: CodeGeeX4: A versatile model for AI software
      development scenarios, including code completion.
    * New models: Gemma 2: Improved output quality and base text
      generation models now available
    * Ollama will now show a better error if a model architecture
      isn't supported
    * Improved handling of quotes and spaces in Modelfile FROM lines
    * Ollama will now return an error if the system does not have
      enough memory to run a model on Linux
* Sun Jul 07 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.48:
    * Fixed issue where Gemma 2 would continuously output when
      reaching context limits
    * Fixed out of memory and core dump errors when running Gemma 2
    * /show info will now show additional model information in
      ollama run
    * Fixed issue where ollama show would result in an error on
      certain vision models
  - Update to version 0.1.48:
    * Added support for Google Gemma 2 models (9B and 27B)
    * Fixed issues with ollama create when importing from Safetensors
* Mon Jun 24 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.46:
    * Docs (#5149)
    * fix: quantization with template
    * Fix use_mmap parsing for modelfiles
    * Refine mmap default logic on linux
    * Bump latest fedora cuda repo to 39
* Sat Jun 22 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.45:
    * New models: DeepSeek-Coder-V2: A 16B & 236B open-source
      Mixture-of-Experts code language model that achieves
      performance comparable to GPT4-Turbo in code-specific tasks.
    * ollama show <model> will now show model information such as
      context window size
    * Model loading on Windows with CUDA GPUs is now faster
    * Setting seed in the /v1/chat/completions OpenAI compatibility
      endpoint no longer changes temperature
    * Enhanced GPU discovery and multi-gpu support with concurrency
    * Introduced a workaround for AMD Vega RX 56 SDMA support on
      Linux
    * Fix memory prediction for deepseek-v2 and deepseek-coder-v2
      models
    * api/show endpoint returns extensive model metadata
    * GPU configuration variables are now reported in ollama serve
    * Update Linux ROCm to v6.1.1
* Tue Jun 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Added documentation files to .spec
  - Update to version 0.1.44:
    * Fixed issue where unicode characters such as emojis would not
      be loaded correctly when running ollama create
    * Fixed certain cases where Nvidia GPUs would not be detected and
      reported as compute capability 1.0 devices
  - Update to version 0.1.43:
    * New import.md guide for converting and importing models to
      Ollama
    * Fixed issue where embedding vectors resulting from
      /api/embeddings would not be accurate
    * JSON mode responses will no longer include invalid escape
      characters
    * Removing a model will no longer show incorrect File not found
      errors
    * Fixed issue where running ollama create would result in an
      error on Windows with certain file formatting
  - Update to version 0.1.42:
    * New models: Qwen 2: a new series of large language models
      from Alibaba group
    * Qwen 2: a new series of large language models from Alibaba
      group
    * ollama pull is now faster if it detects a model is already
      downloaded
    * ollama create will now automatically detect prompt templates
      for popular model architectures such as Llama, Gemma, Phi and
      more.
    * Ollama can now be accessed from local apps built with Electron
      and Tauri, as well as in developing apps in local html files
    * Update welcome prompt in Windows to llama3
    * Fixed issues where /api/ps and /api/tags would show invalid
      timestamps in responses
  - Update to version 0.1.41:
    * Fixed issue on Windows 10 and 11 with Intel CPUs with
      integrated GPUs where Ollama would encounter an error
* Sat Jun 01 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.40:
    * New model: Codestral: Codestral is Mistral AI’s first-ever code
    model designed for code generation tasks.
    * New model: IBM Granite Code: now in 3B and 8B parameter sizes.
    * New model: Deepseek V2: A Strong, Economical, and Efficient
    Mixture-of-Experts Language Model
    * Fixed out of memory and incorrect token issues when running
    Codestral on 16GB Macs
    * Fixed issue where full-width characters (e.g. Japanese,
    Chinese, Russian) were deleted at end of the line when using
    ollama run
* Wed May 29 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.39:
    * New model: Cohere Aya 23: A new state-of-the-art, multilingual
    LLM covering 23 different languages.
    * New model: Mistral 7B 0.3: A new version of Mistral 7B with
    initial support for function calling.
    * New model: Phi-3 Medium: a 14B parameters, lightweight,
    state-of-the-art open model by Microsoft.
    * New model: Phi-3 Mini 128K and Phi-3 Medium 128K: versions of
    the Phi-3 models that support a context window size of 128K
    * New model: Granite code: A family of open foundation models by
    IBM for Code Intelligence
    * It is now possible to import and quantize Llama 3 and its
    finetunes from Safetensors format to Ollama.
    * Full changelog at
    https://github.com/ollama/ollama/releases/tag/v0.1.39
* Wed May 22 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Added 15.6 build
* Thu May 16 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.38:
    * New model: Falcon 2: A new 11B parameters causal decoder-only
      model built by TII and trained over 5T tokens.
    * New model: Yi 1.5: A new high-performing version of Yi, now
      licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.
    * Added ollama ps command
    * Added /clear command
    * Fixed issue where switching loaded models on Windows would take
      several seconds
    * Running /save will no longer abort the chat session if an
      incorrect name is provided
    * The /api/tags API endpoint will now correctly return an empty
      list [] instead of null if no models are provided
* Sun May 12 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.37:
    * Fixed issue where models with uppercase characters in the name
      would not show with ollama list
    * Fixed usage string for ollama create
    * Fix finish_reason being "" instead of null in the Open-AI
      compatible chat API.
* Sun May 12 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Use obs_scm service instead of the deprecated tar_scm
  - Use zstd for vendor tarball compression
* Sun May 12 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 0.1.36:
    * Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
    * Fixed rare out of memory errors when loading a model to run with CPU
  - Update to version 0.1.35:
    * New models: Llama 3 ChatQA: A model from NVIDIA based on Llama
      3 that excels at conversational question answering (QA) and
      retrieval-augmented generation (RAG).
    * Quantization: ollama create can now quantize models when
      importing them using the --quantize or -q flag
    * Fixed issue where inference subprocesses wouldn't be cleaned up
      on shutdown.
    * Fixed a series out of memory errors when loading models on
      multi-GPU systems
    * Ctrl+J characters will now properly add newlines in ollama run
    * Fixed issues when running ollama show for vision models
    * OPTIONS requests to the Ollama API will no longer result in
      errors
    * Fixed issue where partially downloaded files wouldn't be
      cleaned up
    * Added a new done_reason field in responses describing why
      generation stopped responding
    * Ollama will now more accurately estimate how much memory
      is available on multi-GPU systems especially when running
      different models one after another
  - Update to version 0.1.34:
    * New model: Llava Llama 3
    * New model: Llava Phi 3
    * New model: StarCoder2 15B Instruct
    * New model: CodeGemma 1.1
    * New model: StableLM2 12B
    * New model: Moondream 2
    * Fixed issues with LLaVa models where they would respond
      incorrectly after the first request
    * Fixed out of memory errors when running large models such as
      Llama 3 70B
    * Fixed various issues with Nvidia GPU discovery on Linux and
      Windows
    * Fixed a series of Modelfile errors when running ollama create
    * Fixed no slots available error that occurred when cancelling a
      request and then sending follow up requests
    * Improved AMD GPU detection on Fedora
    * Improved reliability when using the experimental
      OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
    * ollama serve will now shut down quickly, even if a model is
      loading
  - Update to version 0.1.33:
    * New model: Llama 3
    * New model: Phi 3 Mini
    * New model: Moondream
    * New model: Llama 3 Gradient 1048K
    * New model: Dolphin Llama 3
    * New model: Qwen 110B
    * Fixed issues where the model would not terminate, causing the
      API to hang.
    * Fixed a series of out of memory errors on Apple Silicon Macs
    * Fixed out of memory errors when running Mixtral architecture
      models
    * Aded experimental concurrency features:
      ~ OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously
      for a single model
      ~ OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously
* Tue Apr 23 2024 rrahl0@disroot.org
  - Update to version 0.1.32:
    * scale graph based on gpu count
    * Support unicode characters in model path (#3681)
    * darwin: no partial offloading if required memory greater than system
    * update llama.cpp submodule to `7593639` (#3665)
    * fix padding in decode
    * Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)" (#3662)
    * Added Solar example at README.md (#3610)
    * Update langchainjs.md (#2030)
    * Added MindsDB information (#3595)
    * examples: add more Go examples using the API (#3599)
    * Update modelfile.md
    * Add llama2 / torch models for `ollama create` (#3607)
    * Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653)
    * app: gracefully shut down `ollama serve` on windows (#3641)
    * types/model: add path helpers (#3619)
    * update llama.cpp submodule to `4bd0f93` (#3627)
    * types/model: make ParseName variants less confusing (#3617)
    * types/model: remove (*Digest).Scan and Digest.Value (#3605)
    * Fix rocm deps with new subprocess paths
    * mixtral mem
    * Revert "types/model: remove (*Digest).Scan and Digest.Value (#3589)"
    * types/model: remove (*Digest).Scan and Digest.Value (#3589)
    * types/model: remove DisplayLong (#3587)
    * types/model: remove MarshalText/UnmarshalText from Digest (#3586)
    * types/model: init with Name and Digest types (#3541)
    * server: provide helpful workaround hint when stalling on pull (#3584)
    * partial offloading
    * refactor tensor query
    * api: start adding documentation to package api (#2878)
    * examples: start adding Go examples using api/ (#2879)
    * Handle very slow model loads
    * fix: rope
    * Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564)
    * build.go: introduce a friendlier way to build Ollama (#3548)
    * update llama.cpp submodule to `1b67731` (#3561)
    * ci: use go-version-file
    * Correct directory reference in macapp/README (#3555)
    * cgo quantize
    * no blob create if already exists
    * update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
    * Docs: Remove wrong parameter for Chat Completion (#3515)
    * no rope parameters
    * add command-r graph estimate
    * Fail fast if mingw missing on windows
    * use an older version of the mac os sdk in release (#3484)
    * Add test case for context exhaustion
    * CI missing archive
    * fix dll compress in windows building
    * CI subprocess path fix
    * Fix CI release glitches
    * update graph size estimate
    * Fix macOS builds on older SDKs (#3467)
    * cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
    * feat: add OLLAMA_DEBUG in ollama server help message (#3461)
    * Revert options as a ref in the server
    * default head_kv to 1
    * fix metal gpu
    * Bump to b2581
    * Refined min memory from testing
    * Release gpu discovery library after use
    * Safeguard for noexec
    * Detect too-old cuda driver
    * Integration test improvements
    * Apply 01-cache.diff
    * Switch back to subprocessing for llama.cpp
    * Simplify model conversion (#3422)
    * fix generate output
    * update memory calcualtions
    * refactor model parsing
    * Add chromem-go to community integrations (#3437)
    * Update README.md (#3436)
    * Community Integration: CRAG Ollama Chat (#3423)
    * Update README.md (#3378)
    * Community Integration: ChatOllama (#3400)
    * Update 90_bug_report.yml
    * Add gemma safetensors conversion (#3250)
    * CI automation for tagging latest images
    * Bump ROCm to 6.0.2 patch release
    * CI windows gpu builds
    * Update troubleshooting link
    * fix: trim quotes on OLLAMA_ORIGINS
  - add set_version to automatically switch over to the newer version
* Tue Apr 16 2024 bwiedemann@suse.com
  - Update to version 0.1.31:
    * Backport MacOS SDK fix from main
    * Apply 01-cache.diff
    * fix: workflows
    * stub stub
    * mangle arch
    * only generate on changes to llm subdirectory
    * only generate cuda/rocm when changes to llm detected
    * Detect arrow keys on windows (#3363)
    * add license in file header for vendored llama.cpp code (#3351)
    * remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350)
    * change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347)
    * malformed markdown link (#3358)
    * Switch runner for final release job
    * Use Rocky Linux Vault to get GCC 10.2 installed
    * Revert "Switch arm cuda base image to centos 7"
    * Switch arm cuda base image to centos 7
    * Bump llama.cpp to b2527
    * Fix ROCm link in `development.md`
    * adds ooo to community integrations (#1623)
    * Add cliobot to ollama supported list (#1873)
    * Add Dify.AI to community integrations (#1944)
    * enh: add ollero.nvim to community applications (#1905)
    * Add typechat-cli to Terminal apps (#2428)
    * add new Web & Desktop link in readme for alpaca webui (#2881)
    * Add LibreChat to Web & Desktop Apps (#2918)
    * Add Community Integration: OllamaGUI (#2927)
    * Add Community Integration: OpenAOE (#2946)
    * Add Saddle (#3178)
    * tlm added to README.md terminal section. (#3274)
    * Update README.md (#3288)
    * Update README.md (#3338)
    * Integration tests conditionally pull
    * add support for libcudart.so for CUDA devices (adds Jetson support)
    * llm: prevent race appending to slice (#3320)
    * Bump llama.cpp to b2510
    * Add Testcontainers into Libraries section (#3291)
    * Revamp go based integration tests
    * rename `.gitattributes`
    * Bump llama.cpp to b2474
    * Add docs for GPU selection and nvidia uvm workaround
    * doc: faq gpu compatibility (#3142)
    * Update faq.md
    * Better tmpdir cleanup
    * Update faq.md
    * update `faq.md`
    * dyn global
    * llama: remove server static assets (#3174)
    * add `llm/ext_server` directory to `linguist-vendored` (#3173)
    * Add Radeon gfx940-942 GPU support
    * Wire up more complete CI for releases
    * llm,readline: use errors.Is instead of simple == check (#3161)
    * server: replace blob prefix separator from ':' to '-' (#3146)
    * Add ROCm support to linux install script (#2966)
    * .github: fix model and feature request yml (#3155)
    * .github: add issue templates (#3143)
    * fix: clip memory leak
    * Update README.md
    * add `OLLAMA_KEEP_ALIVE` to environment variable docs for `ollama serve` (#3127)
    * Default Keep Alive environment variable (#3094)
    * Use stdin for term discovery on windows
    * Update ollama.iss
    * restore locale patch (#3091)
    * token repeat limit for prediction requests (#3080)
    * Fix iGPU detection for linux
    * add more docs on for the modelfile message command (#3087)
    * warn when json format is expected but not mentioned in prompt (#3081)
    * Adapt our build for imported server.cpp
    * Import server.cpp as of b2356
    * refactor readseeker
    * Add docs explaining GPU selection env vars
    * chore: fix typo (#3073)
    * fix gpu_info_cuda.c compile warning (#3077)
    * use `-trimpath` when building releases (#3069)
    * relay load model errors to the client (#3065)
    * Update troubleshooting.md
    * update llama.cpp submodule to `ceca1ae` (#3064)
    * convert: fix shape
    * Avoid rocm runner and dependency clash
    * fix `03-locale.diff`
    * Harden for deps file being empty (or short)
    * Add ollama executable peer dir for rocm
    * patch: use default locale in wpm tokenizer (#3034)
    * only copy deps for `amd64` in `build_linux.sh`
    * Rename ROCm deps file to avoid confusion (#3025)
    * add `macapp` to `.dockerignore`
    * add `bundle_metal` and `cleanup_metal` funtions to `gen_darwin.sh`
    * tidy cleanup logs
    * update llama.cpp submodule to `77d1ac7` (#3030)
    * disable gpu for certain model architectures and fix divide-by-zero on memory estimation
    * Doc how to set up ROCm builds on windows
    * Finish unwinding idempotent payload logic
    * update llama.cpp submodule to `c2101a2` (#3020)
    * separate out `isLocalIP`
    * simplify host checks
    * add additional allowed hosts
    * Update docs `README.md` and table of contents
    * add allowed host middleware and remove `workDir` middleware (#3018)
    * decode ggla
    * convert: fix default shape
    * fix: allow importing a model from name reference (#3005)
    * update llama.cpp submodule to `6cdabe6` (#2999)
    * Update api.md
    * Revert "adjust download and upload concurrency based on available bandwidth" (#2995)
    * cmd: tighten up env var usage sections (#2962)
    * default terminal width, height
    * Refined ROCm troubleshooting docs
    * Revamp ROCm support
    * update go to 1.22 in other places (#2975)
    * docs: Add LLM-X to Web Integration section (#2759)
    * fix some typos (#2973)
    * Convert Safetensors to an Ollama model (#2824)
    * Allow setting max vram for workarounds
    * cmd: document environment variables for serve command
    * Add Odin Runes, a Feature-Rich Java UI for Ollama, to README (#2440)
    * Update api.md
    * Add NotesOllama to Community Integrations (#2909)
    * Added community link for Ollama Copilot (#2582)
    * use LimitGroup for uploads
    * adjust group limit based on download speed
    * add new LimitGroup for dynamic concurrency
    * refactor download run
* Wed Mar 06 2024 computersemiexpert@outlook.com
  - Update to version 0.1.28:
    * Fix embeddings load model behavior (#2848)
    * Add Community Integration: NextChat (#2780)
    * prepend image tags (#2789)
    * fix: print usedMemory size right (#2827)
    * bump submodule to `87c91c07663b707e831c59ec373b5e665ff9d64a` (#2828)
    * Add ollama user to video group
    * Add env var so podman will map cuda GPUs
* Tue Feb 27 2024 Jan Engelhardt <jengelh@inai.de>
  - Edit description, answer _what_ the package is and use nominal
    phrase. (https://en.opensuse.org/openSUSE:Package_description_guidelines)
* Fri Feb 23 2024 Loren Burkholder <computersemiexpert@outlook.com>
  - Added the Ollama package
  - Included a systemd service

Files

/usr/lib64/ollama/libggml-vulkan.so

Generated by rpm2html 1.8.1

Fabrice Bellet, Tue Nov 18 22:40:47 2025

ollama-vulkan-0.12.10-2.1 RPM for x86_64

From OpenSuSE Tumbleweed for x86_64

Provides

Requires

License

Changelog

Files