Commit Graph

60 Commits

Author SHA1 Message Date
Max Bachmann 29abf8fd2b
release 2.0.12 2022-06-22 12:06:45 +02:00
Max Bachmann a9887c123b Release v2.0.11 2022-04-23 23:29:25 +02:00
Max Bachmann 5371f81cab Release v2.0.10 2022-04-18 21:59:43 +02:00
Max Bachmann 82fe8ef02d consider float imprecision in score_cutoff 2022-04-15 01:08:11 +02:00
Max Bachmann c92eeebad1 fix incorrect score_cutoff handling in token_set_ratio and token_ratio 2022-04-07 23:49:42 +02:00
Max Bachmann 87234fc798 release v2.0.7 2022-03-13 15:17:59 +01:00
Max Bachmann b36e107911 release v2.0.6 2022-03-06 20:44:15 +01:00
Max Bachmann 0c2f360b3c add missing type hints 2022-03-06 19:13:27 +01:00
Max Bachmann ec96fd00a4 move Jaro/JaroWinkler into separate package 2022-03-06 17:02:02 +01:00
Max Bachmann 1132612455 fix population of sys.modules 2022-03-06 13:24:01 +01:00
Max Bachmann 5a82119374 fix integer overflow inside hashmap 2022-02-25 18:52:23 +01:00
Max Bachmann d75bcf12e2 remove debug information 2022-02-21 11:03:20 +01:00
Max Bachmann 790467e7a1 backtrace segfaults in CI 2022-02-19 16:46:15 +01:00
Max Bachmann 97d6638e98 Release v2.0.2 2022-02-12 20:28:31 +01:00
Max Bachmann 4444f3411f Fix Indel.normalized_similarity 2022-02-11 16:02:00 +01:00
Max Bachmann dd26483b5f properly link to subprojects 2022-01-17 16:53:58 +01:00
Max Bachmann 241e7fd583 rename algorithm.edit_based to distance 2022-01-12 23:22:34 +01:00
Max Bachmann 551bb22dfc start adding new scorer modules 2022-01-06 23:44:33 +01:00
Max Bachmann 00856eb082 cleanup Python2.7 specifics 2021-12-30 22:57:40 +01:00
Max Bachmann a225b2e7ef apply some missing changes 2021-12-19 16:11:22 +01:00
Max Bachmann e6008d0a4f replace setuptools with scikit-build 2021-12-19 15:50:40 +01:00
Max Bachmann 7edf52150a fix manifest 2021-11-07 19:46:21 +01:00
Max Bachmann 0afb49d28f move submodules into common location 2021-11-06 20:29:21 +01:00
Max Bachmann f70d667648 fix cython build error on cython 2021-11-05 20:56:09 +01:00
Max Bachmann 67245dddb9 Add C-Api for preprocessor functions 2021-11-05 16:28:55 +01:00
Max Bachmann 333138fdad start adding c-api 2021-10-23 20:25:13 +02:00
Max Bachmann a90d6a736b add multiprocessing to cdist 2021-09-26 21:44:23 +02:00
Max Bachmann f25fe290f7 move cdist to separate module 2021-09-15 03:50:30 +02:00
Max Bachmann 56f062b063 add cdist implementation 2021-09-10 13:37:40 +02:00
maxbachmann 6d28d34d8d cleanup kwargs handling in the process module 2021-09-01 00:09:36 +02:00
Max Bachmann a87786c770 disable assertions in release build 2021-08-21 03:51:08 +02:00
Max Bachmann 30ec2f92ae
add more supported types (#101) 2021-05-23 22:09:03 +02:00
Max Bachmann 0d84a8b933 ignore some compiler warnings for cython 2021-03-20 06:35:35 +01:00
Max Bachmann 90cc67be00 fix bug in LCS implementation 2021-03-20 03:46:02 +01:00
Max Bachmann d62cef5f86
strip debug symbols from Linux binaries 2021-03-10 13:54:39 +01:00
Max Bachmann 53b8e3bd61 update build mechanism 2021-03-07 17:45:24 +01:00
Max Bachmann c6eebb70a5 fix incorrect ref counting 2021-03-03 16:08:42 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
Max Bachmann 67b02ff967 add C++11 support 2020-11-21 18:25:47 +01:00
Max Bachmann 426fbb24e9
implement process.extractOne in C++ (#53)
* start to simplify complexion

* start implementation

* add extractOne to C++

* fix a couple of bugs in the implementation

* start adressing performance issues
2020-11-15 20:18:46 +01:00
Max Bachmann 8e6a80d777 always use c++ implemenation of default_process
So far the C++ implementation was only used when the user
- did not provide a processor
- provided the processors True/False/None
However it did not get used when the processor utils.default_process
was used. This made this case a lot slower. When the user provides
utils.default_process all fuzz functions directly use the C++
implementation now.
2020-11-08 21:04:19 +01:00
maxbachmann 789941dc40 replace difflib 2020-09-29 00:18:24 +02:00
maxbachmann 10946dfac0 add python 2.7 support 2020-08-22 23:06:05 +02:00
Max Bachmann a780018db6
add auto deployment 2020-08-14 14:39:53 +02:00
maxbachmann 15c6dbb6fb
reduce string copies and tarball size 2020-05-22 13:28:38 +02:00
maxbachmann 46cf20aa4e
remove intermediate python function to improve performance 2020-05-12 08:56:28 +02:00
maxbachmann 096d3b584f
move cpp into submodule 2020-04-13 08:50:35 +02:00
maxbachmann 044fd229a9
fix performance degradation and use the same interface everywhere 2020-04-09 09:32:29 +02:00
maxbachmann f0adc8da49
start fixing performance issues 2020-04-09 00:32:38 +02:00