Commit Graph

2341 Commits

Author SHA1 Message Date
ines 654fe447b1 Add Swedish tokenizer tests (see #807) 2017-02-05 11:47:07 +01:00
ines 6715615d55 Add missing EXC variable and combine tokenizer exceptions 2017-02-05 11:42:52 +01:00
Ines Montani 30a52d576b Merge pull request #807 from magnusburton/master
Added swedish lemma rules and more verb contractions
2017-02-05 11:34:19 +01:00
Magnus Burton 19c0ce745a Added swedish lemma rules 2017-02-04 17:53:32 +01:00
Michael Wallin d25556bf80 [issue 805] Fix issue 2017-02-04 16:22:21 +02:00
Michael Wallin 35100c8bdd [issue 805] Add regression test and the required fixture 2017-02-04 16:21:34 +02:00
ines 0ab353b0ca Add line breaks to Finnish stop words for better readability 2017-02-04 13:40:25 +01:00
Michael Wallin 1a1952afa5 [finnish] Add initial tests for tokenizer 2017-02-04 13:54:10 +02:00
Michael Wallin f9bb25d1cf [finnish] Reformat and correct stop words 2017-02-04 13:54:10 +02:00
Michael Wallin 73f66ec570 Add preliminary support for Finnish 2017-02-04 13:54:10 +02:00
Ines Montani 65d6202107 Merge pull request #802 from Tpt/fr-tokenizer
Adds more French tokenizer exceptions
2017-02-03 10:52:20 +01:00
Tpt 75a74857bb Adds more French tokenizer exceptions 2017-02-03 13:45:18 +04:00
Ines Montani afc6365388 Update regression test for #801 to match current expected behaviour 2017-02-02 16:23:05 +01:00
Ines Montani 012f4820cb Keep infixes of punctuation + hyphens as one token (see #801) 2017-02-02 16:22:40 +01:00
Ines Montani 1219a5f513 Add = to tokenizer prefixes 2017-02-02 16:21:11 +01:00
Ines Montani ff04748eb6 Add missing emoticon 2017-02-02 16:21:00 +01:00
Ines Montani 13a4ab37e0 Add regression test for #801 2017-02-02 15:33:52 +01:00
Matvey Ezhov 32a22291bc Small `Doc.count_by` documentation update
Current example doesn't work
2017-01-31 19:18:45 +03:00
Ines Montani e4875834fe Fix formatting 2017-01-31 15:19:33 +01:00
Ines Montani c304834e45 Add missing import 2017-01-31 15:18:30 +01:00
Ines Montani e6465b9ca3 Parametrize test cases and mark as xfail 2017-01-31 15:14:42 +01:00
latkins e4c84321a5 Added regression test for Issue #792. 2017-01-31 13:47:42 +00:00
Matthew Honnibal 6c665b81df Fix redundant == TAG in from_array conditional 2017-01-31 00:46:21 +11:00
Ines Montani 19501f3340 Add regression test for #775 2017-01-25 13:16:52 +01:00
Ines Montani 209c37bbcf Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775) 2017-01-25 13:15:02 +01:00
Raphaël Bournhonesque 1be9c0e724 Add fr tokenization unit tests 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque 1faaf698ca Add infixes and abbreviation exceptions (fr) 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque cf8474401b Remove unused import statement 2017-01-24 10:57:37 +01:00
Raphaël Bournhonesque 902f136f18 Add support for elision in French 2017-01-24 10:57:37 +01:00
Ines Montani 55c9c62abc Use relative import 2017-01-23 21:27:49 +01:00
Ines Montani 0967eb07be Add regression test for #768 2017-01-23 21:25:46 +01:00
Ines Montani 6baa98f774 Merge pull request #769 from raphael0202/spacy-768
Allow zero-width 'infix' token
2017-01-23 21:24:33 +01:00
Raphaël Bournhonesque dce8f5515e Allow zero-width 'infix' token 2017-01-23 18:28:01 +01:00
Ines Montani 5f6f48e734 Add regression test for #759 2017-01-20 15:11:48 +01:00
Ines Montani 09ecc39b4e Fix multi-line string of NUM_WORDS (resolves #759) 2017-01-20 15:11:48 +01:00
Magnus Burton 69eab727d7 Added loops to handle contractions with verbs 2017-01-19 14:08:52 +01:00
Matthew Honnibal be26085277 Fix missing import
Closes #755
2017-01-19 22:03:52 +11:00
Ines Montani 7e36568d5b Fix title to accommodate sputnik 2017-01-17 00:51:09 +01:00
Ines Montani d704cfa60d Fix typo 2017-01-16 21:30:33 +01:00
Ines Montani 64e142f460 Update about.py 2017-01-16 14:23:08 +01:00
Matthew Honnibal e889cd698e Increment version 2017-01-16 14:01:35 +01:00
Matthew Honnibal e7f8e13cf3 Make Token hashable. Fixes #743 2017-01-16 13:27:57 +01:00
Matthew Honnibal 2c60d0cb1e Test #743: Tokens unhashable. 2017-01-16 13:27:26 +01:00
Matthew Honnibal 48c712f1c1 Merge branch 'master' of ssh://github.com/explosion/spaCy 2017-01-16 13:18:06 +01:00
Matthew Honnibal 7ccf490c73 Increment version 2017-01-16 13:17:58 +01:00
Ines Montani 50878ef598 Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744) 2017-01-16 13:10:38 +01:00
Ines Montani e053c7693b Fix formatting 2017-01-16 13:09:52 +01:00
Ines Montani 116c675c3c Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
2017-01-14 23:52:44 +01:00
Gyorgy Orosz 92345b6a41 Further numeric test. 2017-01-14 22:44:19 +01:00
Gyorgy Orosz b4df202bfa Better error handling 2017-01-14 22:24:58 +01:00