🎐 a python library for doing approximate and phonetic matching of strings.
Go to file
Heiko Becker dce22ae4d7
pyproject.toml: Allow maturin >= 1.8.0 to fill the version (#226)
maturin >= 1.8.0 stopped filling the version from Cargo.toml, because
the spec only allows this, when the metadata key is listed in
`dynamic` [1][2].

Fixes #224.

[1] https://packaging.python.org/en/latest/specifications/pyproject-toml/#dynamic
[2] https://github.com/PyO3/maturin/issues/2390
2024-12-30 21:17:57 -06:00
.github disable only the 32 bit Windows 3.13 2024-12-14 13:31:36 -06:00
benchmarks Apply pyupgrade suggestions (#193) 2023-08-13 20:32:34 -05:00
docs 1.1.2 metadata 2024-12-03 12:29:33 -06:00
python/jellyfish #186 implement Jaccard similarity (#214) 2024-07-28 03:29:57 -04:00
src drop temp. feature 2024-07-28 04:11:30 -04:00
testdata bring CSVs in from old repo 2024-09-06 21:20:42 -04:00
tests fix mrc (#203) 2023-11-17 12:56:02 -06:00
.coveragerc coveragerc 2014-08-11 15:02:13 -04:00
.gitignore add rustyfish module 2023-03-25 01:42:01 -05:00
.pre-commit-config.yaml switch fully away from travis 2020-12-07 20:11:59 -05:00
CITATION.cff move to top level 2023-08-16 00:19:43 -05:00
Cargo.toml 1.1.3 2024-12-14 13:36:16 -06:00
Justfile Update Justfile 2023-10-14 01:45:10 -05:00
LICENSE 0.11.0 2023-03-26 22:09:38 -05:00
README.md #186 implement Jaccard similarity (#214) 2024-07-28 03:29:57 -04:00
mkdocs.yml bump version 2023-06-21 11:51:52 -05:00
pyproject.toml pyproject.toml: Allow maturin >= 1.8.0 to fill the version (#226) 2024-12-30 21:17:57 -06:00
run-cov.sh Shell script is executable but lacked a shebang 2021-12-09 11:33:58 +01:00

README.md

Overview

jellyfish is a library for approximate & phonetic matching of strings.

Source: https://github.com/jamesturk/jellyfish

Documentation: https://jamesturk.github.io/jellyfish/

Issues: https://github.com/jamesturk/jellyfish/issues

PyPI badge Test badge Coveralls Test Rust

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaccard Index
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance('jellyfish', 'smellyfish')
2
>>> jellyfish.jaro_similarity('jellyfish', 'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
1

>>> jellyfish.metaphone('Jellyfish')
'JLFX'
>>> jellyfish.soundex('Jellyfish')
'J412'
>>> jellyfish.nysiis('Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex('Jellyfish')
'JLLFSH'