Find parts of long text or data, allowing for some changes/typos.
Go to file
Tal Einat 05f33c4337 fix bug calling search_exact() without passing end_index 2019-02-01 14:17:02 +02:00
benchmarks fixed broken function imports in benchmarks 2015-02-13 13:08:47 +02:00
docs minor documentation fixes 2015-02-09 21:51:32 +02:00
src/fuzzysearch fix bug calling search_exact() without passing end_index 2019-02-01 14:17:02 +02:00
tests implement _expand_short() in Cython 2018-12-06 23:06:52 +02:00
.bumpversion.cfg bump version to 0.6.1 2018-12-08 21:12:27 +02:00
.coveragerc trying to consolidate coverage reports from tests run via tox 2015-02-07 14:25:59 +02:00
.gitignore add some build and dist directories to gitignore 2015-09-16 20:41:52 +03:00
.travis.yml bump version to 0.6.0 2018-12-07 00:54:02 +02:00
AUTHORS.rst initial commit (project framework) 2013-11-02 00:34:18 +02:00
CONTRIBUTING.rst initial commit (project framework) 2013-11-02 00:34:18 +02:00
HISTORY.rst Update the changelog and copyright end year. 2018-12-08 21:12:27 +02:00
LICENSE Update the changelog and copyright end year. 2018-12-08 21:12:27 +02:00
MANIFEST.in moved package directory under src/ 2015-02-01 14:44:22 +02:00
Makefile implement _expand_short() in Cython 2018-12-06 23:06:52 +02:00
README.rst rework README 2018-12-09 00:47:35 +02:00
appveyor.yml fix AppVeyor builds 2018-12-07 00:53:57 +02:00
build.cmd adding AppVeyor integration for testing and building wheels on Windows 2017-07-06 13:08:53 +03:00
requirements_dev.txt drop support for Python 2.6, 3.2 and 3.3; add testing for Python 3.7 2018-12-07 00:02:54 +02:00
setup.py bump version to 0.6.1 2018-12-08 21:12:27 +02:00
tox.ini drop support for Python 2.6, 3.2 and 3.3; add testing for Python 3.7 2018-12-07 00:02:54 +02:00

README.rst

===========
fuzzysearch
===========

.. image:: https://img.shields.io/pypi/v/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch
    :alt: Latest Version

.. image:: https://img.shields.io/travis/taleinat/fuzzysearch.svg?branch=master
    :target: https://travis-ci.org/taleinat/fuzzysearch/branches
    :alt: Build & Tests Status

.. image:: https://img.shields.io/coveralls/taleinat/fuzzysearch.svg?branch=master
    :target: https://coveralls.io/r/taleinat/fuzzysearch?branch=master
    :alt: Test Coverage

.. image:: https://img.shields.io/pypi/dm/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch
    :alt: Downloads

.. image:: https://img.shields.io/pypi/wheel/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch
    :alt: Wheels

.. image:: https://img.shields.io/pypi/pyversions/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch
    :alt: Supported Python versions

.. image:: https://img.shields.io/pypi/implementation/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch
    :alt: Supported Python implementations

.. image:: https://img.shields.io/pypi/l/fuzzysearch.svg?style=flat
    :target: https://pypi.python.org/pypi/fuzzysearch/
    :alt: License

**Easy fuzzy search that just works, fast!**

.. code:: python

    >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
    [Match(start=3, end=9, dist=1)]

* approximate sub-string searches

* single, simple function to use

  * chooses the fastest available search mechanism based on the given input

* uses the Levenshtein Distance metric with configurable parameters

  * separately configure the max. allowed distance, substitutions, deletions
    and insertions

* optional, highly optimized C and Cython implementations

* extensively tested

* free software: `MIT license <LICENSE>`_

For more info, see the `documentation <http://fuzzysearch.rtfd.org>`_.

Installation
------------

.. code::

    $ pip install fuzzysearch

This will work even if installing the C and Cython extensions fails, using
pure-Python fallbacks.

Usage
-----
Just call ``find_near_matches()`` with the sub-sequence you're looking for,
the sequence to search, and the matching parameters:

.. code:: python

    >>> from fuzzysearch import find_near_matches
    # search for 'PATTERN' with a maximum Levenshtein Distance of 1
    >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
    [Match(start=3, end=9, dist=1)]

.. code:: python

    >>> sequence = '''\
    GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
    TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
    CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
    GGGATAGG'''
    >>> subsequence = 'TGCACTGTAGGGATAACAAT' # distance = 1
    >>> find_near_matches(subsequence, sequence, max_l_dist=2)
    [Match(start=3, end=24, dist=1)]

Matching Criteria
-----------------
The search function supports four possible match criteria, which may be
supplied in any combination:

* maximum Levenshtein distance (*max_l_dist*)

* maximum # of subsitutions

* maximum # of deletions ("delete" = skip a character in the sub-sequence)

* maximum # of insertions ("insert" = skip a character in the sequence)

Not supplying a criterion means that there is no limit for it. For this reason,
one must always supply *max_l_dist* and/or all other criteria.

.. code:: python

    >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
    [Match(start=3, end=9, dist=1)]
    
    # this will not match since max-deletions is set to zero
    >>> find_near_matches('PATTERN', '---PATERN---', max_l_dist=1, max_deletions=0)
    []
    
    # note that a deletion + insertion may be combined to match a substution
    >>> find_near_matches('PATTERN', '---PAT-ERN---', max_deletions=1, max_insertions=1, max_substitutions=0)
    [Match(start=3, end=10, dist=1)] # the Levenshtein distance is still 1

    # ... but deletion + insertion may also match other, non-substitution differences
    >>> find_near_matches('PATTERN', '---PATERRN---', max_deletions=1, max_insertions=1, max_substitutions=0)
    [Match(start=3, end=10, dist=2)]