switch to functions.md

This commit is contained in:
James Turk 2021-11-10 14:54:19 -05:00
parent 3a06a61d0c
commit b27f556c2e
4 changed files with 95 additions and 105 deletions

View File

@ -1,10 +1,12 @@
String Comparison
=================
# Functions
These methods are all measures of the difference (aka `edit distance`) between two strings.
Jellyfish provides a variety of functions for string comparison, phonetic encoding, and stemming.
Levenshtein Distance
--------------------
## String Comparison
These methods are all measures of the difference (aka edit distance) between two strings.
### Levenshtein Distance
``` python
def levenshtein_distance(s1: str, s2: str)
@ -18,8 +20,7 @@ For example: ``levenshtein_distance('berne', 'born') == 2`` representing the tra
See the [Levenshtein distance article at Wikipedia](http://en.wikipedia.org/wiki/Levenshtein_distance) for more details.
Damerau-Levenshtein Distance
----------------------------
### Damerau-Levenshtein Distance
``` python
def damerau_levenshtein_distance(s1: str, s2: str)
@ -34,8 +35,7 @@ though ``damerau_levenshtein_distance('fish', 'ifsh') == 1`` as this counts as a
See the [Damerau-Levenshtein distance article at Wikipedia](http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance) for more details.
Hamming Distance
----------------
### Hamming Distance
``` python
def hamming_distance(s1: str, s2: str)
@ -50,8 +50,7 @@ considers extra characters as differing. For example ``hamming_distance('abc',
See the [Hamming distance article at Wikipedia](http://en.wikipedia.org/wiki/Hamming_distance) for more details.
Jaro Similarity
----------------
### Jaro Similarity
``` python
def jaro_similarity(s1: str, s2: str)
@ -66,8 +65,7 @@ Jaro distance is a string-edit distance that gives a floating point response in
Prior to 0.8.1 this function was named jaro_distance. That name is still available, but is no longer recommended.
It will be replaced in 1.0 with a correct version.
Jaro-Winkler Similarity
-----------------------
### Jaro-Winkler Similarity
``` python
def jaro_winkler_similarity(s1: str, s2: str)
@ -84,8 +82,7 @@ Jaro-Winkler is a modification/improvement to Jaro distance, like Jaro it gives
See the [Jaro-Winkler distance article at Wikipedia](http://en.wikipedia.org/wiki/Jaro-Winkler_distance) for more details.
Match Rating Approach (comparison)
----------------------------------
### Match Rating Approach (comparison)
``` python
def match_rating_comparison(s1, s2)
@ -97,3 +94,85 @@ The Match rating approach algorithm is an algorithm for determining whether or n
pronounced similarly. Strings are first encoded using :py:func:`match_rating_codex` then compared according to the MRA algorithm.
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.
## Phonetic Encoding
These algorithms convert a string to a normalized phonetic encoding, converting a word to a representation of its pronunciation. Each takes a single string and returns a coded representation.
### American Soundex
``` python
def soundex(s: str)
```
Calculate the American Soundex of the string s.
Soundex is an algorithm to convert a word (typically a name) to a four digit code in the form
'A123' where 'A' is the first letter of the name and the digits represent similar sounds.
For example ``soundex('Ann') == soundex('Anne') == 'A500'`` and
``soundex('Rupert') == soundex('Robert') == 'R163'``.
See the [Soundex article at Wikipedia](http://en.wikipedia.org/wiki/Soundex) for more details.
### Metaphone
``` python
def metaphone(s: str)
```
Calculate the metaphone code for the string s.
The metaphone algorithm was designed as an improvement on Soundex. It transforms a word into a
string consisting of '0BFHJKLMNPRSTWXY' where '0' is pronounced 'th' and 'X' is a '[sc]h' sound.
For example ``metaphone('Klumpz') == metaphone('Clumps') == 'KLMPS'``.
See the [Metaphone article at Wikipedia](http://en.wikipedia.org/wiki/Metaphone) for more details.
### NYSIIS
``` python
def nysiis(s: str)
```
Calculate the NYSIIS code for the string s.
The NYSIIS algorithm is an algorithm developed by the New York State Identification and Intelligence System. It transforms a word into a phonetic code. Like soundex and metaphone it is primarily intended for use on names (as they would be pronounced in English).
For example ``nysiis('John') == nysiis('Jan') == JAN``.
See the [NYSIIS article at Wikipedia](http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System) for more details.
### Match Rating Approach (codex)
``` python
def match_rating_codex(s: str)
```
Calculate the match rating approach value (also called PNI) for the string s.
The Match rating approach algorithm is an algorithm for determining whether or not two names are
pronounced similarly. The algorithm consists of an encoding function (similar to soundex or nysiis)
which is implemented here as well as :py:func:`match_rating_comparison` which does the actual comparison.
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.
## Stemming
### Porter Stemmer
``` python
def porter_stem(s: str)
```
Reduce the string s to its stem using the common Porter stemmer.
Stemming is the process of reducing a word to its root form, for example 'stemmed' to 'stem'.
Martin Porter's algorithm is a common algorithm used for stemming that works for many purposes.
See the [official homepage for the Porter Stemming Algorithm](http://tartarus.org/martin/PorterStemmer/) for more details.

View File

@ -1,70 +0,0 @@
Phonetic Encoding
=================
These algorithms convert a string to a normalized phonetic encoding, converting a word to a representation of its pronunciation. Each takes a single string and returns a coded representation.
American Soundex
----------------
``` python
def soundex(s: str)
```
Calculate the American Soundex of the string s.
Soundex is an algorithm to convert a word (typically a name) to a four digit code in the form
'A123' where 'A' is the first letter of the name and the digits represent similar sounds.
For example ``soundex('Ann') == soundex('Anne') == 'A500'`` and
``soundex('Rupert') == soundex('Robert') == 'R163'``.
See the [Soundex article at Wikipedia](http://en.wikipedia.org/wiki/Soundex) for more details.
Metaphone
---------
``` python
def metaphone(s: str)
```
Calculate the metaphone code for the string s.
The metaphone algorithm was designed as an improvement on Soundex. It transforms a word into a
string consisting of '0BFHJKLMNPRSTWXY' where '0' is pronounced 'th' and 'X' is a '[sc]h' sound.
For example ``metaphone('Klumpz') == metaphone('Clumps') == 'KLMPS'``.
See the [Metaphone article at Wikipedia](http://en.wikipedia.org/wiki/Metaphone) for more details.
NYSIIS
------
``` python
def nysiis(s: str)
```
Calculate the NYSIIS code for the string s.
The NYSIIS algorithm is an algorithm developed by the New York State Identification and Intelligence System. It transforms a word into a phonetic code. Like soundex and metaphone it is primarily intended for use on names (as they would be pronounced in English).
For example ``nysiis('John') == nysiis('Jan') == JAN``.
See the [NYSIIS article at Wikipedia](http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System) for more details.
Match Rating Approach (codex)
-----------------------------
``` python
def match_rating_codex(s: str)
```
Calculate the match rating approach value (also called PNI) for the string s.
The Match rating approach algorithm is an algorithm for determining whether or not two names are
pronounced similarly. The algorithm consists of an encoding function (similar to soundex or nysiis)
which is implemented here as well as :py:func:`match_rating_comparison` which does the actual comparison.
See the [Match Rating Approach article at Wikipedia](http://en.wikipedia.org/wiki/Match_rating_approach) for more details.

View File

@ -1,17 +0,0 @@
Stemming
========
Porter Stemmer
--------------
``` python
def porter_stem(s: str)
```
Reduce the string s to its stem using the common Porter stemmer.
Stemming is the process of reducing a word to its root form, for example 'stemmed' to 'stem'.
Martin Porter's algorithm is a common algorithm used for stemming that works for many purposes.
See the [official homepage for the Porter Stemming Algorithm](http://tartarus.org/martin/PorterStemmer/) for more details.

View File

@ -46,7 +46,5 @@ extra_css:
- assets/extra.css
nav:
- 'index.md'
- 'phonetic.md'
- 'comparison.md'
- 'stemming.md'
- 'functions.md'
- 'changelog.md'