RapidFuzz

Commit Graph

Author	SHA1	Message	Date
Max Bachmann	f25fe290f7	move cdist to separate module	2021-09-15 03:50:30 +02:00
Max Bachmann	56f062b063	add cdist implementation	2021-09-10 13:37:40 +02:00
maxbachmann	6d28d34d8d	cleanup kwargs handling in the process module	2021-09-01 00:09:36 +02:00
Max Bachmann	a87786c770	disable assertions in release build	2021-08-21 03:51:08 +02:00
Max Bachmann	30ec2f92ae	add more supported types (#101 )	2021-05-23 22:09:03 +02:00
Max Bachmann	0d84a8b933	ignore some compiler warnings for cython	2021-03-20 06:35:35 +01:00
Max Bachmann	90cc67be00	fix bug in LCS implementation	2021-03-20 03:46:02 +01:00
Max Bachmann	d62cef5f86	strip debug symbols from Linux binaries	2021-03-10 13:54:39 +01:00
Max Bachmann	53b8e3bd61	update build mechanism	2021-03-07 17:45:24 +01:00
Max Bachmann	c6eebb70a5	fix incorrect ref counting	2021-03-03 16:08:42 +01:00
Max Bachmann	5383d286b2	Release v1.1.0 (#75 ) ## Changed - string_metric.normalized_levenshtein supports now all weights - when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported. - replace C++ implementation with a Cython implementation. This has the following advantages: - The implementation is less error prone, since a lot of the complex things are done by Cython - slighly faster than the current implementation (up to 10% for some parts) - about 33% smaller binary size - reduced compile time - Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer - Add max argument to hamming distance - Add support for whole Unicode range to utils.default_process ## Performance - replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation	2021-02-21 19:42:36 +01:00
Max Bachmann	375c13e436	Release v1.0.0 (#68 ) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division	2021-02-12 16:48:10 +01:00
Max Bachmann	67b02ff967	add C++11 support	2020-11-21 18:25:47 +01:00
Max Bachmann	426fbb24e9	implement process.extractOne in C++ (#53 ) * start to simplify complexion * start implementation * add extractOne to C++ * fix a couple of bugs in the implementation * start adressing performance issues	2020-11-15 20:18:46 +01:00
Max Bachmann	8e6a80d777	always use c++ implemenation of default_process So far the C++ implementation was only used when the user - did not provide a processor - provided the processors True/False/None However it did not get used when the processor utils.default_process was used. This made this case a lot slower. When the user provides utils.default_process all fuzz functions directly use the C++ implementation now.	2020-11-08 21:04:19 +01:00
maxbachmann	789941dc40	replace difflib	2020-09-29 00:18:24 +02:00
maxbachmann	10946dfac0	add python 2.7 support	2020-08-22 23:06:05 +02:00
Max Bachmann	a780018db6	add auto deployment	2020-08-14 14:39:53 +02:00
maxbachmann	15c6dbb6fb	reduce string copies and tarball size	2020-05-22 13:28:38 +02:00
maxbachmann	46cf20aa4e	remove intermediate python function to improve performance	2020-05-12 08:56:28 +02:00
maxbachmann	096d3b584f	move cpp into submodule	2020-04-13 08:50:35 +02:00
maxbachmann	044fd229a9	fix performance degradation and use the same interface everywhere	2020-04-09 09:32:29 +02:00
maxbachmann	f0adc8da49	start fixing performance issues	2020-04-09 00:32:38 +02:00
maxbachmann	4da4234f73	fix string view usage	2020-04-05 02:48:44 +02:00
maxbachmann	0c7ee10415	replace std::wstring_view with boost::wstring_view to add C++14 support	2020-04-03 14:38:34 +02:00
maxbachmann	18528aed03	implement extractOne using C API	2020-04-01 00:14:56 +02:00
maxbachmann	7ee4808cf9	start replacing pybind11 with the python C API	2020-03-31 19:42:48 +02:00
maxbachmann	028db547d1	reduce template usage to a minimum	2020-03-31 15:16:03 +02:00
maxbachmann	510a0f190e	automatically build python Wheels (#5 )	2020-03-22 23:12:51 +01:00
maxbachmann	097365692a	make levenshtein work string_view and wstring_view	2020-03-21 19:41:34 +01:00
maxbachmann	74af424dd4	add conversions between iterables and list	2020-03-19 00:48:47 +01:00
maxbachmann	e157e11fa7	complete basic implementation of rapidfuzz	2020-03-18 21:34:32 +01:00
maxbachmann	cb84c0521c	add setup.py to build python version	2020-02-29 18:17:00 +01:00

33 Commits