Commit Graph

39 Commits

Author SHA1 Message Date
Max Bachmann bbb898475f release v2.4.0 2022-07-29 13:54:49 +02:00
Hugo Le Moine 9886cba1b9 Fixed broken link 2022-02-11 12:24:20 +01:00
Max Bachmann 0f23bdbe5e Finalize v2.0.0 2022-02-09 01:06:31 +01:00
Max Bachmann 4eeeb0bc6a validate lists passed to Editops/Opcodes 2022-01-25 06:28:48 +01:00
Max Bachmann a90d6a736b add multiprocessing to cdist 2021-09-26 21:44:23 +02:00
Max Bachmann a9e7bd703f
add back legacy python support (#122) 2021-09-11 12:25:31 +02:00
Max Bachmann 1aed654d4f
improve performance of partial_ratio (#121) 2021-09-10 02:08:08 +02:00
Max Bachmann c41abbfe1c Update documentation to clone submodule 2021-08-31 13:51:37 +02:00
Max Bachmann 9fd6d08655 add levenshtein_editops 2021-08-21 03:08:50 +02:00
Kwuang Tang b9933eaed7
Add explanation of different output with fuzzywuzzy (#115) 2021-08-14 23:41:31 +02:00
Max Bachmann 3e1776ccd4 update benchmarks 2021-03-23 07:50:04 +01:00
Max Bachmann bfd968b606 drop Python2.7 support 2021-03-08 21:11:04 +01:00
Max Bachmann 2d80120b21 add more benchmarks to documentation 2021-03-07 17:50:39 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 0e6466d835 rename master branch to main 2021-02-14 15:00:57 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
Benjamin-1111 fa2eca63aa
fix links pointing to the rapidfuzz repository (#69)
Co-authored-by: BenjaminBachmann <63192116+BenjaminBachmann@users.noreply.github.com>
Co-authored-by: Max Bachmann <kontakt@maxbachmann.de>
2021-02-10 12:24:48 +01:00
maxbachmann 91694448cf update return types of processors 2020-11-16 17:40:31 +01:00
maxbachmann 3712ba4a87
update installation guide 2020-09-30 13:25:31 +02:00
Max Bachmann dc635e0046
add newline 2020-05-27 14:18:54 +02:00
maxbachmann bbf2de840e
add documentation 2020-05-27 14:16:12 +02:00
maxbachmann 044fd229a9
fix performance degradation and use the same interface everywhere 2020-04-09 09:32:29 +02:00
maxbachmann f8580465d3
start work on a rust version of rapidfuzz 2020-04-08 12:44:57 +02:00
maxbachmann cae67851e5
update workflow badge 2020-04-04 20:08:01 +02:00
maxbachmann 13d313cd9f
add conda badge 2020-04-04 17:30:51 +02:00
maxbachmann 839e19a359
release 0.6.0 2020-04-04 06:15:37 +02:00
maxbachmann 84e7b2283a
adjust ci build 2020-04-01 00:39:46 +02:00
maxbachmann 54609c7508
add benchmark 2020-03-30 21:40:20 +02:00
Pablo Marti 91f549c169
README.md: Fix a couple of typos 2020-03-30 14:09:52 +02:00
maxbachmann 93b7b1cc4a
add section about PyInstaller (#9) 2020-03-29 15:25:09 +02:00
maxbachmann 0b81415484
release version 0.3.0
- When using score_cutoff there is now a guarantee that it returns 0 when result < score_cutoff
- each function has now a preprocess argument to specify whether strings should be preprocessed
- the default preprocessing does now lower case and trim
- extract and extractOne accept a custom processor method instead of utils.default_processor now

- QRatio was removed since it does now exactly the same as normal ratio
2020-03-26 17:23:06 +01:00
maxbachmann e9e5732653
lowercase strings before processing 2020-03-22 14:45:08 +01:00
maxbachmann 439db23e28
add token_set and token_sort methods 2020-03-20 18:19:59 +01:00
maxbachmann 628e434fbe
add partial_ratio to interface 2020-03-20 16:26:12 +01:00
maxbachmann 3a46dbdc89
add limit argument to extract 2020-03-19 11:51:50 +01:00
maxbachmann 734f2a1d0e
update paths for the rhasspy organisation 2020-03-18 21:36:02 +01:00
maxbachmann e157e11fa7
complete basic implementation of rapidfuzz 2020-03-18 21:34:32 +01:00
maxbachmann dabd187847
fix readme 2020-02-29 15:45:55 +01:00
maxbachmann d0f2de09e1 initialise c++ version 2020-02-29 15:45:15 +01:00