ines
|
654fe447b1
|
Add Swedish tokenizer tests (see #807)
|
2017-02-05 11:47:07 +01:00 |
ines
|
6715615d55
|
Add missing EXC variable and combine tokenizer exceptions
|
2017-02-05 11:42:52 +01:00 |
Ines Montani
|
30a52d576b
|
Merge pull request #807 from magnusburton/master
Added swedish lemma rules and more verb contractions
|
2017-02-05 11:34:19 +01:00 |
Magnus Burton
|
19c0ce745a
|
Added swedish lemma rules
|
2017-02-04 17:53:32 +01:00 |
Michael Wallin
|
d25556bf80
|
[issue 805] Fix issue
|
2017-02-04 16:22:21 +02:00 |
Michael Wallin
|
35100c8bdd
|
[issue 805] Add regression test and the required fixture
|
2017-02-04 16:21:34 +02:00 |
ines
|
0ab353b0ca
|
Add line breaks to Finnish stop words for better readability
|
2017-02-04 13:40:25 +01:00 |
Michael Wallin
|
1a1952afa5
|
[finnish] Add initial tests for tokenizer
|
2017-02-04 13:54:10 +02:00 |
Michael Wallin
|
f9bb25d1cf
|
[finnish] Reformat and correct stop words
|
2017-02-04 13:54:10 +02:00 |
Michael Wallin
|
73f66ec570
|
Add preliminary support for Finnish
|
2017-02-04 13:54:10 +02:00 |
Ines Montani
|
65d6202107
|
Merge pull request #802 from Tpt/fr-tokenizer
Adds more French tokenizer exceptions
|
2017-02-03 10:52:20 +01:00 |
Tpt
|
75a74857bb
|
Adds more French tokenizer exceptions
|
2017-02-03 13:45:18 +04:00 |
Ines Montani
|
afc6365388
|
Update regression test for #801 to match current expected behaviour
|
2017-02-02 16:23:05 +01:00 |
Ines Montani
|
012f4820cb
|
Keep infixes of punctuation + hyphens as one token (see #801)
|
2017-02-02 16:22:40 +01:00 |
Ines Montani
|
1219a5f513
|
Add = to tokenizer prefixes
|
2017-02-02 16:21:11 +01:00 |
Ines Montani
|
ff04748eb6
|
Add missing emoticon
|
2017-02-02 16:21:00 +01:00 |
Ines Montani
|
13a4ab37e0
|
Add regression test for #801
|
2017-02-02 15:33:52 +01:00 |
Matvey Ezhov
|
32a22291bc
|
Small `Doc.count_by` documentation update
Current example doesn't work
|
2017-01-31 19:18:45 +03:00 |
Ines Montani
|
e4875834fe
|
Fix formatting
|
2017-01-31 15:19:33 +01:00 |
Ines Montani
|
c304834e45
|
Add missing import
|
2017-01-31 15:18:30 +01:00 |
Ines Montani
|
e6465b9ca3
|
Parametrize test cases and mark as xfail
|
2017-01-31 15:14:42 +01:00 |
latkins
|
e4c84321a5
|
Added regression test for Issue #792.
|
2017-01-31 13:47:42 +00:00 |
Matthew Honnibal
|
6c665b81df
|
Fix redundant == TAG in from_array conditional
|
2017-01-31 00:46:21 +11:00 |
Ines Montani
|
19501f3340
|
Add regression test for #775
|
2017-01-25 13:16:52 +01:00 |
Ines Montani
|
209c37bbcf
|
Exclude "shell" and "Shell" from English tokenizer exceptions (resolves #775)
|
2017-01-25 13:15:02 +01:00 |
Raphaël Bournhonesque
|
1be9c0e724
|
Add fr tokenization unit tests
|
2017-01-24 10:57:37 +01:00 |
Raphaël Bournhonesque
|
1faaf698ca
|
Add infixes and abbreviation exceptions (fr)
|
2017-01-24 10:57:37 +01:00 |
Raphaël Bournhonesque
|
cf8474401b
|
Remove unused import statement
|
2017-01-24 10:57:37 +01:00 |
Raphaël Bournhonesque
|
902f136f18
|
Add support for elision in French
|
2017-01-24 10:57:37 +01:00 |
Ines Montani
|
55c9c62abc
|
Use relative import
|
2017-01-23 21:27:49 +01:00 |
Ines Montani
|
0967eb07be
|
Add regression test for #768
|
2017-01-23 21:25:46 +01:00 |
Ines Montani
|
6baa98f774
|
Merge pull request #769 from raphael0202/spacy-768
Allow zero-width 'infix' token
|
2017-01-23 21:24:33 +01:00 |
Raphaël Bournhonesque
|
dce8f5515e
|
Allow zero-width 'infix' token
|
2017-01-23 18:28:01 +01:00 |
Ines Montani
|
5f6f48e734
|
Add regression test for #759
|
2017-01-20 15:11:48 +01:00 |
Ines Montani
|
09ecc39b4e
|
Fix multi-line string of NUM_WORDS (resolves #759)
|
2017-01-20 15:11:48 +01:00 |
Magnus Burton
|
69eab727d7
|
Added loops to handle contractions with verbs
|
2017-01-19 14:08:52 +01:00 |
Matthew Honnibal
|
be26085277
|
Fix missing import
Closes #755
|
2017-01-19 22:03:52 +11:00 |
Ines Montani
|
7e36568d5b
|
Fix title to accommodate sputnik
|
2017-01-17 00:51:09 +01:00 |
Ines Montani
|
d704cfa60d
|
Fix typo
|
2017-01-16 21:30:33 +01:00 |
Ines Montani
|
64e142f460
|
Update about.py
|
2017-01-16 14:23:08 +01:00 |
Matthew Honnibal
|
e889cd698e
|
Increment version
|
2017-01-16 14:01:35 +01:00 |
Matthew Honnibal
|
e7f8e13cf3
|
Make Token hashable. Fixes #743
|
2017-01-16 13:27:57 +01:00 |
Matthew Honnibal
|
2c60d0cb1e
|
Test #743: Tokens unhashable.
|
2017-01-16 13:27:26 +01:00 |
Matthew Honnibal
|
48c712f1c1
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2017-01-16 13:18:06 +01:00 |
Matthew Honnibal
|
7ccf490c73
|
Increment version
|
2017-01-16 13:17:58 +01:00 |
Ines Montani
|
50878ef598
|
Exclude "were" and "Were" from tokenizer exceptions and add regression test (resolves #744)
|
2017-01-16 13:10:38 +01:00 |
Ines Montani
|
e053c7693b
|
Fix formatting
|
2017-01-16 13:09:52 +01:00 |
Ines Montani
|
116c675c3c
|
Merge pull request #742 from oroszgy/hu_tokenizer_fix
Improved Hungarian tokenizer
|
2017-01-14 23:52:44 +01:00 |
Gyorgy Orosz
|
92345b6a41
|
Further numeric test.
|
2017-01-14 22:44:19 +01:00 |
Gyorgy Orosz
|
b4df202bfa
|
Better error handling
|
2017-01-14 22:24:58 +01:00 |