| Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
| Name: python312-tokenizers | Distribution: openSUSE Tumbleweed |
| Version: 0.21.4 | Vendor: openSUSE |
| Release: 1.1 | Build date: Tue Jul 29 17:12:29 2025 |
| Group: Unspecified | Build host: reproducible |
| Size: 7617437 | Source RPM: python-tokenizers-0.21.4-1.1.src.rpm |
| Packager: https://bugs.opensuse.org | |
| Url: https://github.com/huggingface/tokenizers | |
| Summary: Provides an implementation of today's most used tokenizers | |
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. * Train new vocabularies and tokenize, using today's most used tokenizers. * Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. * Easy to use, but also extremely versatile. * Designed for research and production. * Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. * Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
Apache-2.0
* Tue Jul 29 2025 John Paul Adrian Glaubitz <adrian.glaubitz@suse.com>
- Update to 0.21.4
* No change, the 0.21.3 release failed, this is just a re-release
- from version 0.21.3
* Clippy fixes
* Fixed an introduced backward breaking change in our Rust APIs.
- from version 0.21.2
* Update the release builds following 0.21.1
* Replace lazy_static with stabilized std::sync::LazyLock in 1.80
* Fix no-onig no-wasm builds
* Fix typos in strings and comments
* Fix type notation of merges in BPE Python binding
* Bump http-proxy-middleware from 2.0.6 to 2.0.9 in
/tokenizers/examples/unstable_wasm/www
* Fix data path in test_continuing_prefix_trainer_mismatch
* clippy by @ArthurZucker
* Update pyo3 and rust-numpy depends for no-gil/free-threading compat
* Use ApiBuilder::from_env() in from_pretrained function
* Upgrade onig, to get it compiling with GCC 15
* Itertools upgrade
* Bump webpack-dev-server from 4.10.0 to 5.2.1
in /tokenizers/examples/unstable_wasm/www
* Bump brace-expansion from 1.1.11 to 1.1.12 in /bindings/node
* Fix features blending into a paragraph
* Adding throughput to benches to have a more consistent measure across
* Upgrading dependencies
* [docs] Whitespace
* Hotfixing the stub
* Bpe clones
* Fixed Length Pre-Tokenizer
* Consolidated optimization ahash dary compact str
* Breaking: Fix training with special tokens
* Wed Mar 19 2025 Lucas Mulling <lucas.mulling@suse.com>
- Update to 0.21.1:
* Update dev version and pyproject.toml
* Add feature flag hint to README.md
* Upgrade to PyO3 0.23
* Fixing the README.md
* Fix typo in Split docstrings
* Fix typos
* Update documentation of Rust feature
* Fix panic in DecodeStream::step due to incorrect index usage
* Fixing the stream by removing the read_index altogether
* Fixing NormalizedString append when normalized is empty
* Update metadata as Python3.7 and Python3.8 support was dropped
* Add rustls-tls feature
- Remove define skip_python313 1
* Wed Mar 05 2025 Christian Goll <cgoll@suse.com>
- disable python3.13
* Thu Jan 09 2025 Andreas Schwab <schwab@suse.de>
- Enable build on riscv64
* Wed Dec 18 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org>
- Update to version 0.21.0:
* More cache options.
* Disable caching for long strings.
* Testing ABI3 wheels to reduce number of wheels
* Adding an API for decode streaming.
* Decode stream python
* Fix encode_batch and encode_batch_fast to accept ndarrays again
* Thu Nov 07 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org>
- Select only rust tier 1 arches.
- Update registry.tar.zst dependencies
- Update version to 0.20.3:
* fix pylist
* [MINOR:TYP] Fix docstrings
- Updates from 0.20.2:
* Bump cookie and express in /tokenizers/examples/unstable_wasm/www
* Fix off-by-one error in tokenizer::normalizer::Range::len
* Arg name correction: auth_token -> token
* Unsound call of set_var
* Add safety comments
* PyO3 0.22
- Updates from 0.20.1:
* Update README.md
* fix benchmark file link
* [ignore_merges] Fix offsets
* Bump body-parser and express in /tokenizers/examples/unstable_wasm/www
* Bump serve-static and express in /tokenizers/examples/unstable_wasm/www
* Bump send and express in /tokenizers/examples/unstable_wasm/www
* Bump webpack from 5.76.0 to 5.95.0 in /tokenizers/examples/unstable_wasm/www
* Fix documentation build
* style: simplify string formatting for readability
* Sun Nov 03 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org>
- Experiment with cargo vendor home registry. See documentation:
https://github.com/openSUSE-Rust/obs-service-cargo/blob/master/README.md#cargo-vendor-home-registry
* Mon Sep 23 2024 Simon Lees <sflees@suse.de>
- Don't use macros for Requires
* Fri Aug 30 2024 Simon Lees <sflees@suse.de>
- Update package name back to "huggingface-hub" to match pypi
* Tue Aug 27 2024 Guang Yee <gyee@suse.com>
- Update package name "huggingface-hub" to "huggingface_hub"
* Tue Aug 20 2024 Simon Lees <sflees@suse.de>
- Fix testsuite on 15.6
* Sun Aug 18 2024 Soc Virnyl Estela <obs@uncomfyhalomacro.pl>
- Replace vendor tarball to zstd compressed vendor tarball
- Force gcc version on leap. Thanks @marv7000 for your zed.spec
- Use `CARGO_*` environmental variables to force generate
full debuginfo and avoid stripping.
- Enable cargo test in %check.
- Update to version 0.20.0:
* remove enforcement of non special when adding tokens
* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder
* Make USED_PARALLELISM atomic
* Fixing for clippy 1.78
* feat(ci): add trufflehog secrets detection
* Switch from cached_download to hf_hub_download in tests
* Fix "dictionnary" typo
* make sure we don't warn on empty tokens
* Enable dropout = 0.0 as an equivalent to none in BPE
* Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
* Add bytelevel normalizer to fix decode when adding tokens to BPE
* Fix clippy + feature test management.
* Bump spm_precompiled to 0.1.3
* Add benchmark vs tiktoken
* Fixing the benchmark.
* Tiny improvement
* Enable fancy regex
* Fixing release CI strict (taken from safetensors).
* Adding some serialization testing around the wrapper.
* Add-legacy-tests
* Adding a few tests for decoder deserialization.
* Better serialization error
* Add test normalizers
* Improve decoder deserialization
* Using serde (serde_pyo3) to get str and repr easily.
* Merges cannot handle tokens containing spaces.
* Fix doc about split
* Support None to reset pre_tokenizers and normalizers, and index sequences
* Fix strip python type
* Tests + Deserialization improvement for normalizers.
* add deserialize for pre tokenizers
* Perf improvement 16% by removing offsets.
* Wed Jul 03 2024 Christian Goll <cgoll@suse.com>
- initial commit on rust based python-tokenizers
/usr/lib64/python3.12/site-packages/tokenizers /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info/INSTALLER /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info/METADATA /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info/RECORD /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info/REQUESTED /usr/lib64/python3.12/site-packages/tokenizers-0.21.4.dist-info/WHEEL /usr/lib64/python3.12/site-packages/tokenizers/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/decoders /usr/lib64/python3.12/site-packages/tokenizers/decoders/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/decoders/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/decoders/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations /usr/lib64/python3.12/site-packages/tokenizers/implementations/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/implementations/base_tokenizer.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/bert_wordpiece.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/byte_level_bpe.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/char_level_bpe.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/sentencepiece_bpe.py /usr/lib64/python3.12/site-packages/tokenizers/implementations/sentencepiece_unigram.py /usr/lib64/python3.12/site-packages/tokenizers/models /usr/lib64/python3.12/site-packages/tokenizers/models/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/models/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/models/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/models/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/models/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/normalizers /usr/lib64/python3.12/site-packages/tokenizers/normalizers/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/normalizers/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/normalizers/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/processors /usr/lib64/python3.12/site-packages/tokenizers/processors/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/processors/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/processors/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/processors/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/processors/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/tokenizers.abi3.so /usr/lib64/python3.12/site-packages/tokenizers/tools /usr/lib64/python3.12/site-packages/tokenizers/tools/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/tools/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/tools/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/tools/__pycache__/__init__.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-312.pyc /usr/lib64/python3.12/site-packages/tokenizers/tools/visualizer-styles.css /usr/lib64/python3.12/site-packages/tokenizers/tools/visualizer.py /usr/lib64/python3.12/site-packages/tokenizers/trainers /usr/lib64/python3.12/site-packages/tokenizers/trainers/__init__.py /usr/lib64/python3.12/site-packages/tokenizers/trainers/__init__.pyi /usr/lib64/python3.12/site-packages/tokenizers/trainers/__pycache__ /usr/lib64/python3.12/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib64/python3.12/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-312.pyc /usr/share/doc/packages/python312-tokenizers /usr/share/doc/packages/python312-tokenizers/README.md /usr/share/licenses/python312-tokenizers /usr/share/licenses/python312-tokenizers/LICENSE
Generated by rpm2html 1.8.1
Fabrice Bellet, Fri Oct 24 22:39:43 2025