RapidFuzz

Commit Graph

Author	SHA1	Message	Date
Max Bachmann	29abf8fd2b	release 2.0.12	2022-06-22 12:06:45 +02:00
Max Bachmann	a9887c123b	Release v2.0.11	2022-04-23 23:29:25 +02:00
Max Bachmann	5371f81cab	Release v2.0.10	2022-04-18 21:59:43 +02:00
Max Bachmann	82fe8ef02d	consider float imprecision in score_cutoff	2022-04-15 01:08:11 +02:00
Max Bachmann	c92eeebad1	fix incorrect score_cutoff handling in token_set_ratio and token_ratio	2022-04-07 23:49:42 +02:00
Max Bachmann	87234fc798	release v2.0.7	2022-03-13 15:17:59 +01:00
Max Bachmann	b36e107911	release v2.0.6	2022-03-06 20:44:15 +01:00
Max Bachmann	0c2f360b3c	add missing type hints	2022-03-06 19:13:27 +01:00
Max Bachmann	ec96fd00a4	move Jaro/JaroWinkler into separate package	2022-03-06 17:02:02 +01:00
Max Bachmann	1132612455	fix population of sys.modules	2022-03-06 13:24:01 +01:00
Max Bachmann	5a82119374	fix integer overflow inside hashmap	2022-02-25 18:52:23 +01:00
Max Bachmann	d75bcf12e2	remove debug information	2022-02-21 11:03:20 +01:00
Max Bachmann	790467e7a1	backtrace segfaults in CI	2022-02-19 16:46:15 +01:00
Max Bachmann	97d6638e98	Release v2.0.2	2022-02-12 20:28:31 +01:00
Max Bachmann	4444f3411f	Fix Indel.normalized_similarity	2022-02-11 16:02:00 +01:00
Max Bachmann	dd26483b5f	properly link to subprojects	2022-01-17 16:53:58 +01:00
Max Bachmann	241e7fd583	rename algorithm.edit_based to distance	2022-01-12 23:22:34 +01:00
Max Bachmann	551bb22dfc	start adding new scorer modules	2022-01-06 23:44:33 +01:00
Max Bachmann	00856eb082	cleanup Python2.7 specifics	2021-12-30 22:57:40 +01:00
Max Bachmann	a225b2e7ef	apply some missing changes	2021-12-19 16:11:22 +01:00
Max Bachmann	e6008d0a4f	replace setuptools with scikit-build	2021-12-19 15:50:40 +01:00
Max Bachmann	7edf52150a	fix manifest	2021-11-07 19:46:21 +01:00
Max Bachmann	0afb49d28f	move submodules into common location	2021-11-06 20:29:21 +01:00
Max Bachmann	f70d667648	fix cython build error on cython	2021-11-05 20:56:09 +01:00
Max Bachmann	67245dddb9	Add C-Api for preprocessor functions	2021-11-05 16:28:55 +01:00
Max Bachmann	333138fdad	start adding c-api	2021-10-23 20:25:13 +02:00
Max Bachmann	a90d6a736b	add multiprocessing to cdist	2021-09-26 21:44:23 +02:00
Max Bachmann	f25fe290f7	move cdist to separate module	2021-09-15 03:50:30 +02:00
Max Bachmann	56f062b063	add cdist implementation	2021-09-10 13:37:40 +02:00
maxbachmann	6d28d34d8d	cleanup kwargs handling in the process module	2021-09-01 00:09:36 +02:00
Max Bachmann	a87786c770	disable assertions in release build	2021-08-21 03:51:08 +02:00
Max Bachmann	30ec2f92ae	add more supported types (#101 )	2021-05-23 22:09:03 +02:00
Max Bachmann	0d84a8b933	ignore some compiler warnings for cython	2021-03-20 06:35:35 +01:00
Max Bachmann	90cc67be00	fix bug in LCS implementation	2021-03-20 03:46:02 +01:00
Max Bachmann	d62cef5f86	strip debug symbols from Linux binaries	2021-03-10 13:54:39 +01:00
Max Bachmann	53b8e3bd61	update build mechanism	2021-03-07 17:45:24 +01:00
Max Bachmann	c6eebb70a5	fix incorrect ref counting	2021-03-03 16:08:42 +01:00
Max Bachmann	5383d286b2	Release v1.1.0 (#75 ) ## Changed - string_metric.normalized_levenshtein supports now all weights - when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported. - replace C++ implementation with a Cython implementation. This has the following advantages: - The implementation is less error prone, since a lot of the complex things are done by Cython - slighly faster than the current implementation (up to 10% for some parts) - about 33% smaller binary size - reduced compile time - Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer - Add max argument to hamming distance - Add support for whole Unicode range to utils.default_process ## Performance - replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation	2021-02-21 19:42:36 +01:00
Max Bachmann	375c13e436	Release v1.0.0 (#68 ) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division	2021-02-12 16:48:10 +01:00
Max Bachmann	67b02ff967	add C++11 support	2020-11-21 18:25:47 +01:00
Max Bachmann	426fbb24e9	implement process.extractOne in C++ (#53 ) * start to simplify complexion * start implementation * add extractOne to C++ * fix a couple of bugs in the implementation * start adressing performance issues	2020-11-15 20:18:46 +01:00
Max Bachmann	8e6a80d777	always use c++ implemenation of default_process So far the C++ implementation was only used when the user - did not provide a processor - provided the processors True/False/None However it did not get used when the processor utils.default_process was used. This made this case a lot slower. When the user provides utils.default_process all fuzz functions directly use the C++ implementation now.	2020-11-08 21:04:19 +01:00
maxbachmann	789941dc40	replace difflib	2020-09-29 00:18:24 +02:00
maxbachmann	10946dfac0	add python 2.7 support	2020-08-22 23:06:05 +02:00
Max Bachmann	a780018db6	add auto deployment	2020-08-14 14:39:53 +02:00
maxbachmann	15c6dbb6fb	reduce string copies and tarball size	2020-05-22 13:28:38 +02:00
maxbachmann	46cf20aa4e	remove intermediate python function to improve performance	2020-05-12 08:56:28 +02:00
maxbachmann	096d3b584f	move cpp into submodule	2020-04-13 08:50:35 +02:00
maxbachmann	044fd229a9	fix performance degradation and use the same interface everywhere	2020-04-09 09:32:29 +02:00
maxbachmann	f0adc8da49	start fixing performance issues	2020-04-09 00:32:38 +02:00

1 2

60 Commits