Commit Graph

8 Commits

Author SHA1 Message Date
Max Bachmann 05f907bf2b
add distance support to process.*
## Changed
- added processor support to `levenshtein` and `hamming`
- added distance support to extract/extractOne/extract_iter

## Fixes
- incorrect results of `normalized_hamming` and `normalized_levenshtein` when used with `utils.default_process` as processor
2021-03-29 19:09:22 +02:00
Max Bachmann c6eebb70a5 fix incorrect ref counting 2021-03-03 16:08:42 +01:00
Max Bachmann e8102a4e87 Fix result conversion process.extract 2021-02-23 14:58:44 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
Max Bachmann 426fbb24e9
implement process.extractOne in C++ (#53)
* start to simplify complexion

* start implementation

* add extractOne to C++

* fix a couple of bugs in the implementation

* start adressing performance issues
2020-11-15 20:18:46 +01:00
maxbachmann 10946dfac0 add python 2.7 support 2020-08-22 23:06:05 +02:00
maxbachmann 5763318312
add unit tests 2020-05-24 10:42:36 +02:00