RapidFuzz/tests/test_utils.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import unittest

from rapidfuzz import process, fuzz, utils

class UtilsTest(unittest.TestCase):
    def test_fullProcess(self):
        mixed_strings = [
            "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
            "C'est la vie",
            u"Ça va?",
            u"Cães danados",
            u"¬Camarões assados",
            u"a¬ሴ€耀",
            u"Á"
        ]
        mixed_strings_proc = [
            "lorem ipsum is simply dummy text of the printing and typesetting industry",
            "c est la vie",
            u"ça va",
            u"cães danados",
            u"camarões assados",
            u"a ሴ 耀",
            u"á"
        ]

        for string, proc_string in zip(mixed_strings, mixed_strings_proc):
            self.assertEqual(
                utils.default_process(string),
                proc_string)

if __name__ == '__main__':
    unittest.main()
add python 2.7 support 2020-08-22 19:07:08 +00:00			`#!/usr/bin/env python`
			`# -- coding: utf-8 --`

add unit tests 2020-05-24 07:57:08 +00:00			`import unittest`

			`from rapidfuzz import process, fuzz, utils`

			`class UtilsTest(unittest.TestCase):`
			`def test_fullProcess(self):`
			`mixed_strings = [`
			`"Lorem Ipsum is simply dummy text of the printing and typesetting industry.",`
			`"C'est la vie",`
add python 2.7 support 2020-08-22 19:07:08 +00:00			`u"Ça va?",`
			`u"Cães danados",`
			`u"¬Camarões assados",`
			`u"a¬ሴ€耀",`
			`u"Á"`
add unit tests 2020-05-24 07:57:08 +00:00			`]`
			`mixed_strings_proc = [`
			`"lorem ipsum is simply dummy text of the printing and typesetting industry",`
			`"c est la vie",`
Release v1.0.0 (#68) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division 2021-02-12 15:37:44 +00:00			`u"ça va",`
add python 2.7 support 2020-08-22 19:07:08 +00:00			`u"cães danados",`
Release v1.0.0 (#68) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division 2021-02-12 15:37:44 +00:00			`u"camarões assados",`
Release v1.1.0 (#75) ## Changed - string_metric.normalized_levenshtein supports now all weights - when different weights are used for Insertion and Deletion the strings can not be swapped inside the Levenshtein implementation anymore. So different weights for Insertion and Deletion are now supported. - replace C++ implementation with a Cython implementation. This has the following advantages: - The implementation is less error prone, since a lot of the complex things are done by Cython - slighly faster than the current implementation (up to 10% for some parts) - about 33% smaller binary size - reduced compile time - Added **kwargs argument to process.extract/extractOne/extract_iter that is passed to the scorer - Add max argument to hamming distance - Add support for whole Unicode range to utils.default_process ## Performance - replaced Wagner Fischer usage in the normal Levenshtein distance with a bitparallel implementation 2021-02-21 18:42:36 +00:00			`u"a ሴ 耀",`
Release v1.0.0 (#68) - all normalized string_metrics can now be used as scorer for process.extract/extractOne - Implementation of the C++ Wrapper completely refactored to make it easier to add more scorers, processors and string matching algorithms in the future. - increased test coverage, that already helped to fix some bugs and help to prevent regressions in the future - improved docstrings of functions - Added bitparallel implementation of the Levenshtein distance for the weights (1,1,1) and (1,1,2). - Added specialized implementation of the Levenshtein distance for cases with a small maximum edit distance, that is even faster, than the bitparallel implementation. - Improved performance of `fuzz.partial_ratio` -> Since `fuzz.ratio` and `fuzz.partial_ratio` are used in most scorers, this improves the overall performance. - Improved performance of `process.extract` and `process.extractOne` - the `rapidfuzz.levenshtein` module is now deprecated and will be removed in v2.0.0 These functions are now placed in `rapidfuzz.string_metric`. `distance`, `normalized_distance`, `weighted_distance` and `weighted_normalized_distance` are combined into `levenshtein` and `normalized_levenshtein`. - added normalized version of the hamming distance in `string_metric.normalized_hamming` - process.extract_iter as a generator, that yields the similarity of all elements, that have a similarity >= score_cutoff - multiple bugs in extractOne when used with a scorer, thats not from RapidFuzz - fixed bug in `token_ratio` - fixed bug in result normalisation causing zero division 2021-02-12 15:37:44 +00:00			`u"á"`
add unit tests 2020-05-24 07:57:08 +00:00			`]`

			`for string, proc_string in zip(mixed_strings, mixed_strings_proc):`
			`self.assertEqual(`
			`utils.default_process(string),`
			`proc_string)`

			`if __name__ == '__main__':`
			`unittest.main()`