Commit Graph

13 Commits

Author SHA1 Message Date
Michał Górny 1df09fb54c tests: handle missing pandas gracefully
Pandas is not yet ready for Python 3.11.  Use pytest.importorskip()
to skip that one regression test that requires it when it's not
available to unblock rapidfuzz on py3.11 on Gentoo.
2022-09-18 15:37:39 +02:00
Max Bachmann 531512e0fa
add pure python implementation 2022-06-28 23:24:20 +02:00
Max Bachmann 460d291a38 improve performance and memory usage of editops 2021-12-11 19:54:14 +01:00
maxbachmann 53172e66b3 fix return value of extract for the querz None 2021-08-18 11:22:52 +02:00
maxbachmann 1bfea8f462 add support back for set objects in extract 2021-08-17 10:56:33 +02:00
Max Bachmann 05f907bf2b
add distance support to process.*
## Changed
- added processor support to `levenshtein` and `hamming`
- added distance support to extract/extractOne/extract_iter

## Fixes
- incorrect results of `normalized_hamming` and `normalized_levenshtein` when used with `utils.default_process` as processor
2021-03-29 19:09:22 +02:00
Max Bachmann c6eebb70a5 fix incorrect ref counting 2021-03-03 16:08:42 +01:00
Max Bachmann e8102a4e87 Fix result conversion process.extract 2021-02-23 14:58:44 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
Max Bachmann 426fbb24e9
implement process.extractOne in C++ (#53)
* start to simplify complexion

* start implementation

* add extractOne to C++

* fix a couple of bugs in the implementation

* start adressing performance issues
2020-11-15 20:18:46 +01:00
maxbachmann 10946dfac0 add python 2.7 support 2020-08-22 23:06:05 +02:00
maxbachmann 5763318312
add unit tests 2020-05-24 10:42:36 +02:00