2013-11-01 22:34:18 +00:00
|
|
|
===============================
|
|
|
|
fuzzysearch
|
|
|
|
===============================
|
|
|
|
|
|
|
|
.. image:: https://badge.fury.io/py/fuzzysearch.png
|
|
|
|
:target: http://badge.fury.io/py/fuzzysearch
|
2014-03-12 13:59:37 +00:00
|
|
|
|
2013-11-01 22:34:18 +00:00
|
|
|
.. image:: https://travis-ci.org/taleinat/fuzzysearch.png?branch=master
|
|
|
|
:target: https://travis-ci.org/taleinat/fuzzysearch
|
|
|
|
|
|
|
|
.. image:: https://pypip.in/d/fuzzysearch/badge.png
|
|
|
|
:target: https://crate.io/packages/fuzzysearch?version=latest
|
|
|
|
|
|
|
|
|
|
|
|
fuzzysearch is useful for finding approximate subsequence matches
|
|
|
|
|
|
|
|
* Free software: MIT license
|
|
|
|
* Documentation: http://fuzzysearch.rtfd.org.
|
|
|
|
|
|
|
|
Features
|
|
|
|
--------
|
|
|
|
|
2013-11-12 08:58:41 +00:00
|
|
|
* Fuzzy sub-sequence search: Find parts of a sequence which match a given sub-sequence up to a given maximum Levenshtein distance.
|
|
|
|
|
2014-03-12 13:43:59 +00:00
|
|
|
Simple Example
|
|
|
|
--------------
|
2014-03-12 13:59:37 +00:00
|
|
|
You can usually just use the `find_near_matches()` utility function, which
|
|
|
|
chooses a suitable fuzzy search implementation according to the given
|
|
|
|
parameters:
|
|
|
|
|
2014-03-12 13:43:59 +00:00
|
|
|
.. code:: python
|
|
|
|
|
|
|
|
>>> from fuzzysearch import find_near_matches
|
|
|
|
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
|
|
|
|
[Match(start=3, end=9, dist=1)]
|
|
|
|
|
|
|
|
Advanced Example
|
|
|
|
----------------
|
2014-03-12 13:59:37 +00:00
|
|
|
If needed you can choose a specific search implementation, such as
|
|
|
|
`find_near_matches_with_ngrams()`:
|
|
|
|
|
2013-11-12 08:58:41 +00:00
|
|
|
.. code:: python
|
|
|
|
|
|
|
|
>>> sequence = '''\
|
|
|
|
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
|
|
|
|
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
|
|
|
|
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
|
|
|
|
GGGATAGG'''
|
|
|
|
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
|
|
|
|
>>> max_distance = 2
|
|
|
|
|
|
|
|
>>> from fuzzysearch import find_near_matches_with_ngrams
|
|
|
|
>>> find_near_matches_with_ngrams(subsequence, sequence, max_distance)
|
|
|
|
[Match(start=3, end=24, dist=1)]
|