Find parts of long text or data, allowing for some changes/typos.
Go to file
Tal Einat b56657093c use substitutions_only_has_near_matches_ngrams_byteslike when available 2014-05-17 14:27:38 +03:00
.travis adding python 3.4 to TravisCI config 2014-04-22 19:38:29 +03:00
benchmarks updated benchmarks 2014-05-16 12:18:05 +03:00
docs finished fixing docs/conf.py for readthedocs.org 2014-05-05 19:22:32 +03:00
fuzzysearch use substitutions_only_has_near_matches_ngrams_byteslike when available 2014-05-17 14:27:38 +03:00
tests fixed edge-case of broken input handling in find_near_matches() 2014-05-16 12:00:59 +03:00
.coveragerc TracisCI and Coveralls integration WIP 2014-04-19 17:35:47 +03:00
.gitignore added docs/_build to .gitignore 2014-05-07 00:24:49 +03:00
.travis.yml adding python 3.4 to TravisCI config 2014-04-22 19:04:38 +03:00
AUTHORS.rst initial commit (project framework) 2013-11-02 00:34:18 +02:00
CONTRIBUTING.rst initial commit (project framework) 2013-11-02 00:34:18 +02:00
HISTORY.rst version 0.2.2 2014-03-27 15:36:43 +02:00
LICENSE updated copyright dates in LICENSE and added license to README 2014-05-12 21:09:37 +03:00
MANIFEST.in replaced use of the KMP search with a simpler Rabin-Karp inspired search 2014-05-16 11:19:20 +03:00
Makefile Makefile fixes 2014-04-23 00:32:21 +03:00
README.rst updated copyright dates in LICENSE and added license to README 2014-05-12 21:09:37 +03:00
nose2.cfg added C extensions and changed to single-source code 2014-04-19 01:31:32 +03:00
requirements.txt fixed requirements and working on TracisCI and Coveralls integration 2014-04-19 02:13:16 +03:00
setup.py replaced use of the KMP search with a simpler Rabin-Karp inspired search 2014-05-16 11:19:20 +03:00
test_requirements.txt fixed requirements and working on TracisCI and Coveralls integration 2014-04-19 02:13:16 +03:00
test_requirements_py26.txt added C extensions and changed to single-source code 2014-04-19 01:31:32 +03:00
tox.ini adding python 3.4 to TravisCI config 2014-04-22 19:04:38 +03:00

README.rst

===============================
fuzzysearch
===============================

.. image:: https://badge.fury.io/py/fuzzysearch.png
    :target: http://badge.fury.io/py/fuzzysearch

.. image:: https://travis-ci.org/taleinat/fuzzysearch.png?branch=master
        :target: https://travis-ci.org/taleinat/fuzzysearch

.. image:: https://coveralls.io/repos/taleinat/fuzzysearch/badge.png
        :target: https://coveralls.io/r/taleinat/fuzzysearch

.. image:: https://pypip.in/d/fuzzysearch/badge.png
        :target: https://crate.io/packages/fuzzysearch?version=latest

fuzzysearch is useful for finding approximate subsequence matches

* Free software: MIT license
* Documentation: http://fuzzysearch.rtfd.org.

Features
--------

* Fuzzy sub-sequence search: Find parts of a sequence which match a given
  sub-sequence up to a given maximum Levenshtein distance.
* Set individual limits for the number of substitutions, insertions and/or
  deletions allowed for a near-match.
* Includes optimized implementations for specific use-cases, e.g. only allowing
  substitutions in near-matches.

Simple Example
--------------
You can usually just use the `find_near_matches()` utility function, which
chooses a suitable fuzzy search implementation according to the given
parameters:

.. code:: python

    >>> from fuzzysearch import find_near_matches
    >>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
    [Match(start=3, end=9, dist=1)]

Advanced Example
----------------
If needed you can choose a specific search implementation, such as
`find_near_matches_with_ngrams()`:

.. code:: python

    >>> sequence = '''\
    GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
    TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
    CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
    GGGATAGG'''
    >>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
    >>> max_distance = 2

    >>> from fuzzysearch import find_near_matches_with_ngrams
    >>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
    [Match(start=3, end=24, dist=1)]

License
-------
.. include:: LICENSE
   :literal: