Commit Graph

41 Commits

Author SHA1 Message Date
Max Bachmann c686122dcf move capi back into rapidfuzz 2022-10-30 00:15:58 +02:00
Max Bachmann cc48dcfa29
add simd support (#271) 2022-10-01 15:58:37 +02:00
maxbachmann 42c51de46f make cmake and ninja more optional 2022-07-18 00:28:34 +02:00
layday a300fef845
Fix package data inclusion rules (#236) 2022-07-09 12:12:27 +02:00
maxbachmann 729113d047 change src layout from rapidfuzz to src/rapidfuzz 2022-07-04 19:53:28 +02:00
Max Bachmann de0f6d8af3 add fallback implementation back to wheel 2022-06-23 12:54:43 +02:00
Max Bachmann 1c583e8118 add tests to sdist 2022-06-09 13:37:27 +02:00
Max Bachmann 83d0a77f2a
allow usage of system installed libs (#213)
system installed versions of `rapidfuzz-cpp`, `jarowinkler-cpp` and `taskflow` are now used, if they are available in a compatible version
2022-04-17 20:21:34 +02:00
Max Bachmann 2300763331 do not install cmake subprojects when building the project 2022-03-13 15:05:19 +01:00
Max Bachmann 1132612455 fix population of sys.modules 2022-03-06 13:24:01 +01:00
Max Bachmann cd8af8cad2
allow generating cython files (#194) 2022-02-12 18:58:10 +01:00
Max Bachmann 567141402d
fix type hints (#193) 2022-02-11 20:50:52 +01:00
Max Bachmann 6ece2b94de update external libraries 2022-01-23 02:16:11 +01:00
Max Bachmann a861fc980a fix missing symbol 2022-01-20 07:14:29 +01:00
Max Bachmann dd26483b5f properly link to subprojects 2022-01-17 16:53:58 +01:00
Max Bachmann eff74a6b4f split python and cython into separate directories 2022-01-07 00:38:28 +01:00
Max Bachmann 551bb22dfc start adding new scorer modules 2022-01-06 23:44:33 +01:00
Max Bachmann 73f99a4ee9 fix manifest 2022-01-02 15:30:26 +01:00
Max Bachmann 239911bac9 cythonize while installing 2021-12-30 19:46:02 +01:00
Max Bachmann 1371bd93b1 standardize RapidFuzz C-Api 2021-12-30 14:11:53 +01:00
Max Bachmann e6008d0a4f replace setuptools with scikit-build 2021-12-19 15:50:40 +01:00
Max Bachmann 7edf52150a fix manifest 2021-11-07 19:46:21 +01:00
Max Bachmann cb9481f414 start using c-api 2021-10-24 12:47:42 +02:00
Max Bachmann 333138fdad start adding c-api 2021-10-23 20:25:13 +02:00
Max Bachmann a90d6a736b add multiprocessing to cdist 2021-09-26 21:44:23 +02:00
layday 8542bea635 Move `py.typed`
`py.typed` should be placed in the package folder (see PEP 561).
2021-09-15 12:00:34 +03:00
Dan Hess 742743b382 Introduce stub files for typing library 2021-07-04 10:41:52 -08:00
Max Bachmann 53b8e3bd61 update build mechanism 2021-03-07 17:45:24 +01:00
Max Bachmann 5383d286b2
Release v1.1.0 (#75)
## Changed
- string_metric.normalized_levenshtein supports now all weights
- when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported.
- replace C++ implementation with a Cython implementation. This has the following advantages:
  - The implementation is less error prone, since a lot of the complex things are done by Cython
  - slighly faster than the current implementation (up to 10% for some parts)
  - about 33% smaller binary size
  - reduced compile time
- Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer
- Add max argument to hamming distance
- Add support for whole Unicode range to utils.default_process

## Performance
- replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation
2021-02-21 19:42:36 +01:00
Max Bachmann 375c13e436 Release v1.0.0 (#68)
- all normalized string_metrics can now be used as scorer for process.extract/extractOne
- Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future.
- increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future
- improved docstrings of functions

- Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2).
- Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation.
- Improved performance of `fuzz.partial_ratio`
-> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance.
- Improved performance of `process.extract` and `process.extractOne`

- the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0
  These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`.

- added normalized version of the hamming distance in `string_metric.normalized_hamming`
- process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff

- multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz
- fixed bug in `token_ratio`
- fixed bug in result normalisation causing zero division
2021-02-12 16:48:10 +01:00
maxbachmann 789941dc40 replace difflib 2020-09-29 00:18:24 +02:00
maxbachmann 10946dfac0 add python 2.7 support 2020-08-22 23:06:05 +02:00
maxbachmann eae941a647
further reduce tarball size 2020-06-27 12:49:32 +02:00
maxbachmann cceb7cb2ea
do not include boost 2020-05-22 18:26:43 +02:00
maxbachmann 3137df9e96
remove boost::optional dependency 2020-05-22 14:38:13 +02:00
maxbachmann 15c6dbb6fb
reduce string copies and tarball size 2020-05-22 13:28:38 +02:00
maxbachmann e4006839fc
add missing files to tarball 2020-04-15 23:17:35 +02:00
maxbachmann 4da4234f73
fix string view usage 2020-04-05 02:48:44 +02:00
maxbachmann ab8e98bc2d
add missing files to tarball 2020-04-04 06:51:45 +02:00
maxbachmann e1b5f323b8
add license to package 2020-03-23 13:10:37 +01:00
maxbachmann e157e11fa7
complete basic implementation of rapidfuzz 2020-03-18 21:34:32 +01:00