Commit Graph

33 Commits

Author SHA1 Message Date
Max Bachmann f25fe290f7 move cdist to separate module 2021-09-15 03:50:30 +02:00
Max Bachmann 56f062b063 add cdist implementation 2021-09-10 13:37:40 +02:00
maxbachmann 6d28d34d8d cleanup kwargs handling in the process module 2021-09-01 00:09:36 +02:00
Max Bachmann a87786c770 disable assertions in release build 2021-08-21 03:51:08 +02:00
Max Bachmann 30ec2f92ae
add more supported types (#101) 2021-05-23 22:09:03 +02:00
Max Bachmann 0d84a8b933 ignore some compiler warnings for cython 2021-03-20 06:35:35 +01:00
Max Bachmann 90cc67be00 fix bug in LCS implementation 2021-03-20 03:46:02 +01:00
Max Bachmann d62cef5f86
strip debug symbols from Linux binaries 2021-03-10 13:54:39 +01:00
Max Bachmann 53b8e3bd61 update build mechanism 2021-03-07 17:45:24 +01:00
Max Bachmann c6eebb70a5 fix incorrect ref counting 2021-03-03 16:08:42 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
Max Bachmann 67b02ff967 add C++11 support 2020-11-21 18:25:47 +01:00
Max Bachmann 426fbb24e9
implement process.extractOne in C++ (#53)
* start to simplify complexion

* start implementation

* add extractOne to C++

* fix a couple of bugs in the implementation

* start adressing performance issues
2020-11-15 20:18:46 +01:00
Max Bachmann 8e6a80d777 always use c++ implemenation of default_process
So far the C++ implementation was only used when the user
- did not provide a processor
- provided the processors True/False/None
However it did not get used when the processor utils.default_process
was used. This made this case a lot slower. When the user provides
utils.default_process all fuzz functions directly use the C++
implementation now.
2020-11-08 21:04:19 +01:00
maxbachmann 789941dc40 replace difflib 2020-09-29 00:18:24 +02:00
maxbachmann 10946dfac0 add python 2.7 support 2020-08-22 23:06:05 +02:00
Max Bachmann a780018db6
add auto deployment 2020-08-14 14:39:53 +02:00
maxbachmann 15c6dbb6fb
reduce string copies and tarball size 2020-05-22 13:28:38 +02:00
maxbachmann 46cf20aa4e
remove intermediate python function to improve performance 2020-05-12 08:56:28 +02:00
maxbachmann 096d3b584f
move cpp into submodule 2020-04-13 08:50:35 +02:00
maxbachmann 044fd229a9
fix performance degradation and use the same interface everywhere 2020-04-09 09:32:29 +02:00
maxbachmann f0adc8da49
start fixing performance issues 2020-04-09 00:32:38 +02:00
maxbachmann 4da4234f73
fix string view usage 2020-04-05 02:48:44 +02:00
maxbachmann 0c7ee10415
replace std::wstring_view with boost::wstring_view to add C++14 support 2020-04-03 14:38:34 +02:00
maxbachmann 18528aed03
implement extractOne using C API 2020-04-01 00:14:56 +02:00
maxbachmann 7ee4808cf9
start replacing pybind11 with the python C API 2020-03-31 19:42:48 +02:00
maxbachmann 028db547d1
reduce template usage to a minimum 2020-03-31 15:16:03 +02:00
maxbachmann 510a0f190e
automatically build python Wheels (#5) 2020-03-22 23:12:51 +01:00
maxbachmann 097365692a
make levenshtein work string_view and wstring_view 2020-03-21 19:41:34 +01:00
maxbachmann 74af424dd4
add conversions between iterables and list 2020-03-19 00:48:47 +01:00
maxbachmann e157e11fa7
complete basic implementation of rapidfuzz 2020-03-18 21:34:32 +01:00
maxbachmann cb84c0521c add setup.py to build python version 2020-02-29 18:17:00 +01:00