RapidFuzz

Commit Graph

Author	SHA1	Message	Date
Max Bachmann	5ecd72eb39	extend duration	2021-09-24 02:24:06 +02:00
Max Bachmann	d0ec89e9f9	cleanup cdist implementation	2021-09-23 22:41:10 +02:00
Max Bachmann	a9e7bd703f	add back legacy python support (#122 )	2021-09-11 12:25:31 +02:00
Max Bachmann	56f062b063	add cdist implementation	2021-09-10 13:37:40 +02:00
Max Bachmann	1aed654d4f	improve performance of partial_ratio (#121 )	2021-09-10 02:08:08 +02:00
maxbachmann	0362eddd18	Fix out of bounds access in levenshtein_editops	2021-08-31 23:07:30 +02:00
Max Bachmann	9fd6d08655	add levenshtein_editops	2021-08-21 03:08:50 +02:00
Max Bachmann	e3e04da293	use keyword only arguments	2021-08-19 23:02:00 +02:00
maxbachmann	53172e66b3	fix return value of extract for the querz None	2021-08-18 11:22:52 +02:00
maxbachmann	1bfea8f462	add support back for set objects in extract	2021-08-17 10:56:33 +02:00
maxbachmann	370190b088	fix inconsistent handling of empty strings (see #110 )	2021-08-17 10:55:48 +02:00
Max Bachmann	30ec2f92ae	add more supported types (#101 )	2021-05-23 22:09:03 +02:00
Max Bachmann	05f907bf2b	add distance support to process.* ## Changed - added processor support to `levenshtein` and `hamming` - added distance support to extract/extractOne/extract_iter ## Fixes - incorrect results of `normalized_hamming` and `normalized_levenshtein` when used with `utils.default_process` as processor	2021-03-29 19:09:22 +02:00
Max Bachmann	853681f7cf	fix bug in mbleven implementation	2021-03-20 12:04:12 +01:00
Max Bachmann	2d80120b21	add more benchmarks to documentation	2021-03-07 17:50:39 +01:00
Max Bachmann	c6eebb70a5	fix incorrect ref counting	2021-03-03 16:08:42 +01:00
Max Bachmann	e8102a4e87	Fix result conversion process.extract	2021-02-23 14:58:44 +01:00
Max Bachmann	5383d286b2	Release v1.1.0 (#75 ) ## Changed - string_metric.normalized_levenshtein supports now all weights - when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported. - replace C++ implementation with a Cython implementation. This has the following advantages: - The implementation is less error prone, since a lot of the complex things are done by Cython - slighly faster than the current implementation (up to 10% for some parts) - about 33% smaller binary size - reduced compile time - Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer - Add max argument to hamming distance - Add support for whole Unicode range to utils.default_process ## Performance - replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation	2021-02-21 19:42:36 +01:00
Max Bachmann	88a86a1028	deactivate bitparallel LCS The algorithm to find the longest common subsequence after calculating it in bitparall appears to have a bug. Deactivate it until this bug is fixed	2021-02-19 15:20:31 +01:00
Max Bachmann	7139004214	fix uninitialized variable	2021-02-17 23:08:56 +01:00
Max Bachmann	375c13e436	Release v1.0.0 (#68 ) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division	2021-02-12 16:48:10 +01:00
Max Bachmann	cc5fa23c32	fix custom processors in fuzz.*	2020-12-13 16:55:45 +01:00
Max Bachmann	426fbb24e9	implement process.extractOne in C++ (#53 ) * start to simplify complexion * start implementation * add extractOne to C++ * fix a couple of bugs in the implementation * start adressing performance issues	2020-11-15 20:18:46 +01:00
maxbachmann	10946dfac0	add python 2.7 support	2020-08-22 23:06:05 +02:00
maxbachmann	5763318312	add unit tests	2020-05-24 10:42:36 +02:00

25 Commits