Ines Montani
|
012f4820cb
|
Keep infixes of punctuation + hyphens as one token (see #801)
|
2017-02-02 16:22:40 +01:00 |
Ines Montani
|
1219a5f513
|
Add = to tokenizer prefixes
|
2017-02-02 16:21:11 +01:00 |
Ines Montani
|
116c675c3c
|
Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
|
2017-01-14 23:52:44 +01:00 |
Gyorgy Orosz
|
63037e79af
|
Fixed hyphen handling in the Hungarian tokenizer.
|
2017-01-14 16:30:11 +01:00 |
Gyorgy Orosz
|
1be5da1ac6
|
Fixed Hungarian tokenizer for numbers
|
2017-01-14 15:51:59 +01:00 |
Ines Montani
|
0894b8c0ef
|
Don't split tokens with digits and "/" infixes (resolves #740)
|
2017-01-12 22:58:26 +01:00 |
Ines Montani
|
eef94e3ee2
|
Split off period after two or more uppercase letters (fixes #483)
|
2017-01-08 22:28:25 +01:00 |
Ines Montani
|
347c4a2d06
|
Reorganise and reformat global tokenizer prefixes, suffixes and infixes
|
2017-01-08 20:37:39 +01:00 |
Ines Montani
|
eaa3b1319d
|
Fix formatting
|
2016-12-18 15:36:53 +01:00 |
Ines Montani
|
e47ee94761
|
Split punctuation into its own file
|
2016-12-08 19:46:43 +01:00 |