🎐 a python library for doing approximate and phonetic matching of strings.
Go to file
Danrich Parrol b9bbb0d450 Fix segfault in Damerau-Levenstein C code.
If one of the characters had a value of 128 or above, this would be
treated as a signed char and would result in an array lookup with a
negative index. The somewhat contrived test case given here --
comparing a space with a non-breaking space -- reproduces the
segmentation fault prior to the fix.

This also makes a Clang warning go away. Thanks, compiler! :-)
2015-02-03 22:04:23 -08:00
cjellyfish Fix segfault in Damerau-Levenstein C code. 2015-02-03 22:04:23 -08:00
jellyfish Fix segfault in Damerau-Levenstein C code. 2015-02-03 22:04:23 -08:00
.coveragerc coveragerc 2014-08-11 15:02:13 -04:00
.gitignore update tests 2014-08-11 14:38:46 -04:00
.travis.yml update travis 2014-08-11 14:44:32 -04:00
LICENSE LICENSE 2010-07-13 14:03:17 -04:00
MANIFEST.in install right .h files 2014-07-16 08:23:11 -04:00
README.rst add coveralls badge 2014-08-12 22:00:33 -04:00
porter-test.csv add semi-broken porter.py 2013-04-18 00:48:19 -04:00
setup.py 0.3.3 2014-11-20 15:03:50 -05:00
tox.ini update tests 2014-08-11 14:38:46 -04:00

README.rst

=========
jellyfish
=========

.. image:: https://travis-ci.org/sunlightlabs/jellyfish.svg?branch=master
    :target: https://travis-ci.org/sunlightlabs/jellyfish

.. image:: https://coveralls.io/repos/sunlightlabs/jellyfish/badge.png?branch=master
    :target: https://coveralls.io/r/sunlightlabs/jellyfish

.. image:: https://pypip.in/version/jellyfish/badge.svg
    :target: https://pypi.python.org/pypi/jellyfish

.. image:: https://pypip.in/format/jellyfish/badge.svg
    :target: https://pypi.python.org/pypi/jellyfish


Jellyfish is a python library for doing approximate and phonetic matching of strings.

jellyfish is a project of Sunlight Labs (c) 2014.
All code is released under a BSD-style license, see LICENSE for details.

Written by James Turk <jturk@sunlightfoundation.com> and Michael Stephens.

See https://github.com/sunlightlabs/jellyfish/graphs/contributors for contributors.

Source is available at http://github.com/sunlightlabs/jellyfish.

Included Algorithms
===================

String comparison:

  * Levenshtein Distance
  * Damerau-Levenshtein Distance
  * Jaro Distance
  * Jaro-Winkler Distance
  * Match Rating Approach Comparison
  * Hamming Distance

Phonetic encoding:

  * American Soundex
  * Metaphone
  * NYSIIS (New York State Identification and Intelligence System)
  * Match Rating Codex

Example Usage
=============

>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
>>> jellyfish.jaro_distance('jellyfish', 'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
1

>>> jellyfish.metaphone('Jellyfish')
'JLFX'
>>> jellyfish.soundex('Jellyfish')
'J412'
>>> jellyfish.nysiis('Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex('Jellyfish')
'JLLFSH'