switch to functions.md
This commit is contained in:
parent
3a06a61d0c
commit
b27f556c2e
|
@ -1,10 +1,12 @@
|
|||
String Comparison
|
||||
=================
|
||||
# Functions
|
||||
|
||||
These methods are all measures of the difference (aka `edit distance`) between two strings.
|
||||
Jellyfish provides a variety of functions for string comparison, phonetic encoding, and stemming.
|
||||
|
||||
Levenshtein Distance
|
||||
--------------------
|
||||
## String Comparison
|
||||
|
||||
These methods are all measures of the difference (aka edit distance) between two strings.
|
||||
|
||||
### Levenshtein Distance
|
||||
|
||||
``` python
|
||||
def levenshtein_distance(s1: str, s2: str)
|
||||
|
@ -18,8 +20,7 @@ For example: ``levenshtein_distance('berne', 'born') == 2`` representing the tra
|
|||
|
||||
See the [Levenshtein distance article at Wikipedia](http://en.wikipedia.org/wiki/Levenshtein_distance) for more details.
|
||||
|
||||
Damerau-Levenshtein Distance
|
||||
----------------------------
|
||||
### Damerau-Levenshtein Distance
|
||||
|
||||
``` python
|
||||
def damerau_levenshtein_distance(s1: str, s2: str)
|
||||
|
@ -34,8 +35,7 @@ though ``damerau_levenshtein_distance('fish', 'ifsh') == 1`` as this counts as a
|
|||
|
||||
See the [Damerau-Levenshtein distance article at Wikipedia](http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance) for more details.
|
||||
|
||||
Hamming Distance
|
||||
----------------
|
||||
### Hamming Distance
|
||||
|
||||
``` python
|
||||
def hamming_distance(s1: str, s2: str)
|
||||
|
@ -50,8 +50,7 @@ considers extra characters as differing. For example ``hamming_distance('abc',
|
|||
|
||||
See the [Hamming distance article at Wikipedia](http://en.wikipedia.org/wiki/Hamming_distance) for more details.
|
||||
|
||||
Jaro Similarity
|
||||
----------------
|
||||
### Jaro Similarity
|
||||
|
||||
``` python
|
||||
def jaro_similarity(s1: str, s2: str)
|
||||
|
@ -66,8 +65,7 @@ Jaro distance is a string-edit distance that gives a floating point response in
|
|||
Prior to 0.8.1 this function was named jaro_distance. That name is still available, but is no longer recommended.
|
||||
It will be replaced in 1.0 with a correct version.
|
||||
|
||||
Jaro-Winkler Similarity
|
||||
-----------------------
|
||||
### Jaro-Winkler Similarity
|
||||
|
||||
``` python
|
||||
def jaro_winkler_similarity(s1: str, s2: str)
|
||||
|
@ -84,8 +82,7 @@ Jaro-Winkler is a modification/improvement to Jaro distance, like Jaro it gives
|
|||
|
||||
See the [Jaro-Winkler distance article at Wikipedia](http://en.wikipedia.org/wiki/Jaro-Winkler_distance) for more details.
|
||||
|
||||
Match Rating Approach (comparison)
|
||||
----------------------------------
|
||||
### Match Rating Approach (comparison)
|
||||
|
||||
``` python
|
||||
def match_rating_comparison(s1, s2)
|
||||
|
@ -97,3 +94,85 @@ The Match rating approach algorithm is an algorithm for determining whether or n
|
|||
pronounced similarly. Strings are first encoded using :py:func:`match_rating_codex` then compared according to the MRA algorithm.
|
||||
|
||||
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.
|
||||
|
||||
## Phonetic Encoding
|
||||
|
||||
These algorithms convert a string to a normalized phonetic encoding, converting a word to a representation of its pronunciation. Each takes a single string and returns a coded representation.
|
||||
|
||||
|
||||
### American Soundex
|
||||
|
||||
``` python
|
||||
def soundex(s: str)
|
||||
```
|
||||
|
||||
Calculate the American Soundex of the string s.
|
||||
|
||||
Soundex is an algorithm to convert a word (typically a name) to a four digit code in the form
|
||||
'A123' where 'A' is the first letter of the name and the digits represent similar sounds.
|
||||
|
||||
For example ``soundex('Ann') == soundex('Anne') == 'A500'`` and
|
||||
``soundex('Rupert') == soundex('Robert') == 'R163'``.
|
||||
|
||||
See the [Soundex article at Wikipedia](http://en.wikipedia.org/wiki/Soundex) for more details.
|
||||
|
||||
|
||||
### Metaphone
|
||||
|
||||
``` python
|
||||
def metaphone(s: str)
|
||||
```
|
||||
|
||||
Calculate the metaphone code for the string s.
|
||||
|
||||
The metaphone algorithm was designed as an improvement on Soundex. It transforms a word into a
|
||||
string consisting of '0BFHJKLMNPRSTWXY' where '0' is pronounced 'th' and 'X' is a '[sc]h' sound.
|
||||
|
||||
For example ``metaphone('Klumpz') == metaphone('Clumps') == 'KLMPS'``.
|
||||
|
||||
See the [Metaphone article at Wikipedia](http://en.wikipedia.org/wiki/Metaphone) for more details.
|
||||
|
||||
|
||||
### NYSIIS
|
||||
|
||||
``` python
|
||||
def nysiis(s: str)
|
||||
```
|
||||
|
||||
Calculate the NYSIIS code for the string s.
|
||||
|
||||
The NYSIIS algorithm is an algorithm developed by the New York State Identification and Intelligence System. It transforms a word into a phonetic code. Like soundex and metaphone it is primarily intended for use on names (as they would be pronounced in English).
|
||||
|
||||
For example ``nysiis('John') == nysiis('Jan') == JAN``.
|
||||
|
||||
See the [NYSIIS article at Wikipedia](http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System) for more details.
|
||||
|
||||
### Match Rating Approach (codex)
|
||||
|
||||
``` python
|
||||
def match_rating_codex(s: str)
|
||||
```
|
||||
|
||||
Calculate the match rating approach value (also called PNI) for the string s.
|
||||
|
||||
The Match rating approach algorithm is an algorithm for determining whether or not two names are
|
||||
pronounced similarly. The algorithm consists of an encoding function (similar to soundex or nysiis)
|
||||
which is implemented here as well as :py:func:`match_rating_comparison` which does the actual comparison.
|
||||
|
||||
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.
|
||||
|
||||
## Stemming
|
||||
|
||||
### Porter Stemmer
|
||||
|
||||
``` python
|
||||
def porter_stem(s: str)
|
||||
```
|
||||
|
||||
Reduce the string s to its stem using the common Porter stemmer.
|
||||
|
||||
Stemming is the process of reducing a word to its root form, for example 'stemmed' to 'stem'.
|
||||
|
||||
Martin Porter's algorithm is a common algorithm used for stemming that works for many purposes.
|
||||
|
||||
See the [official homepage for the Porter Stemming Algorithm](http://tartarus.org/martin/PorterStemmer/) for more details.
|
|
@ -1,70 +0,0 @@
|
|||
Phonetic Encoding
|
||||
=================
|
||||
|
||||
These algorithms convert a string to a normalized phonetic encoding, converting a word to a representation of its pronunciation. Each takes a single string and returns a coded representation.
|
||||
|
||||
|
||||
American Soundex
|
||||
----------------
|
||||
|
||||
``` python
|
||||
def soundex(s: str)
|
||||
```
|
||||
|
||||
Calculate the American Soundex of the string s.
|
||||
|
||||
Soundex is an algorithm to convert a word (typically a name) to a four digit code in the form
|
||||
'A123' where 'A' is the first letter of the name and the digits represent similar sounds.
|
||||
|
||||
For example ``soundex('Ann') == soundex('Anne') == 'A500'`` and
|
||||
``soundex('Rupert') == soundex('Robert') == 'R163'``.
|
||||
|
||||
See the [Soundex article at Wikipedia](http://en.wikipedia.org/wiki/Soundex) for more details.
|
||||
|
||||
|
||||
Metaphone
|
||||
---------
|
||||
|
||||
``` python
|
||||
def metaphone(s: str)
|
||||
```
|
||||
|
||||
Calculate the metaphone code for the string s.
|
||||
|
||||
The metaphone algorithm was designed as an improvement on Soundex. It transforms a word into a
|
||||
string consisting of '0BFHJKLMNPRSTWXY' where '0' is pronounced 'th' and 'X' is a '[sc]h' sound.
|
||||
|
||||
For example ``metaphone('Klumpz') == metaphone('Clumps') == 'KLMPS'``.
|
||||
|
||||
See the [Metaphone article at Wikipedia](http://en.wikipedia.org/wiki/Metaphone) for more details.
|
||||
|
||||
|
||||
NYSIIS
|
||||
------
|
||||
|
||||
``` python
|
||||
def nysiis(s: str)
|
||||
```
|
||||
|
||||
Calculate the NYSIIS code for the string s.
|
||||
|
||||
The NYSIIS algorithm is an algorithm developed by the New York State Identification and Intelligence System. It transforms a word into a phonetic code. Like soundex and metaphone it is primarily intended for use on names (as they would be pronounced in English).
|
||||
|
||||
For example ``nysiis('John') == nysiis('Jan') == JAN``.
|
||||
|
||||
See the [NYSIIS article at Wikipedia](http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System) for more details.
|
||||
|
||||
Match Rating Approach (codex)
|
||||
-----------------------------
|
||||
|
||||
``` python
|
||||
def match_rating_codex(s: str)
|
||||
```
|
||||
|
||||
Calculate the match rating approach value (also called PNI) for the string s.
|
||||
|
||||
The Match rating approach algorithm is an algorithm for determining whether or not two names are
|
||||
pronounced similarly. The algorithm consists of an encoding function (similar to soundex or nysiis)
|
||||
which is implemented here as well as :py:func:`match_rating_comparison` which does the actual comparison.
|
||||
|
||||
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.
|
|
@ -1,17 +0,0 @@
|
|||
Stemming
|
||||
========
|
||||
|
||||
Porter Stemmer
|
||||
--------------
|
||||
|
||||
``` python
|
||||
def porter_stem(s: str)
|
||||
```
|
||||
|
||||
Reduce the string s to its stem using the common Porter stemmer.
|
||||
|
||||
Stemming is the process of reducing a word to its root form, for example 'stemmed' to 'stem'.
|
||||
|
||||
Martin Porter's algorithm is a common algorithm used for stemming that works for many purposes.
|
||||
|
||||
See the [official homepage for the Porter Stemming Algorithm](http://tartarus.org/martin/PorterStemmer/) for more details.
|
|
@ -46,7 +46,5 @@ extra_css:
|
|||
- assets/extra.css
|
||||
nav:
|
||||
- 'index.md'
|
||||
- 'phonetic.md'
|
||||
- 'comparison.md'
|
||||
- 'stemming.md'
|
||||
- 'functions.md'
|
||||
- 'changelog.md'
|
||||
|
|
Loading…
Reference in New Issue