Ines Montani
|
da10a049a6
|
Add unicode declarations
|
2017-01-05 13:09:48 +01:00 |
Ines Montani
|
58adae8774
|
Remove unused file
|
2017-01-05 13:09:22 +01:00 |
Ines Montani
|
c6e5a5349d
|
Move regression test for #360 into own file
|
2017-01-04 00:49:31 +01:00 |
Ines Montani
|
8279993a6f
|
Modernize and merge tokenizer tests for punctuation
|
2017-01-04 00:49:20 +01:00 |
Ines Montani
|
550630df73
|
Update tokenizer tests for contractions
|
2017-01-04 00:48:42 +01:00 |
Ines Montani
|
109f202e8f
|
Update conftest fixture
|
2017-01-04 00:48:21 +01:00 |
Ines Montani
|
ee6b49b293
|
Modernize tokenizer tests for emoticons
|
2017-01-04 00:47:59 +01:00 |
Ines Montani
|
f09b5a5dfd
|
Modernize tokenizer tests for infixes
|
2017-01-04 00:47:42 +01:00 |
Ines Montani
|
59059fed27
|
Move regression test for #351 to own file
|
2017-01-04 00:47:11 +01:00 |
Ines Montani
|
667051375d
|
Modernize tokenizer tests for whitespace
|
2017-01-04 00:46:35 +01:00 |
Ines Montani
|
aafc894285
|
Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
|
2017-01-03 23:02:21 +01:00 |
Ines Montani
|
1d237664af
|
Add lowercase lemma to tokenizer exceptions
|
2017-01-03 23:02:21 +01:00 |
Ines Montani
|
84a87951eb
|
Fix typos
|
2017-01-03 18:27:43 +01:00 |
Ines Montani
|
35b39f53c3
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:26:09 +01:00 |
Ines Montani
|
fb9d3bb022
|
Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1 , reversing
changes made to b19cfcc144 .
|
2017-01-03 18:21:36 +01:00 |
Ines Montani
|
461cbb99d8
|
Revert "Reorganise English tokenizer exceptions (as discussed in #718)"
This reverts commit b19cfcc144 .
|
2017-01-03 18:21:29 +01:00 |
Ines Montani
|
d3b181cdf1
|
Merge remote-tracking branch 'origin/master'
# Conflicts:
# spacy/en/tokenizer_exceptions.py
|
2017-01-03 18:20:01 +01:00 |
Ines Montani
|
b19cfcc144
|
Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
|
2017-01-03 18:17:57 +01:00 |
Ines Montani
|
1bd53bbf89
|
Fix typos (resolves #718)
|
2017-01-03 11:26:21 +01:00 |
Matthew Honnibal
|
fde53be3b4
|
Move whole token mach inside _split_affixes.
|
2016-12-30 17:11:50 -06:00 |
Matthew Honnibal
|
3ba7c167a8
|
Fix URL tests
|
2016-12-30 17:10:08 -06:00 |
Matthew Honnibal
|
9936a1b9b5
|
Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns
|
2016-12-30 14:53:40 -06:00 |
Matthew Honnibal
|
3e8d9c772e
|
Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
|
2016-12-31 00:52:17 +11:00 |
Matthew Honnibal
|
623d94e14f
|
Whitespace
|
2016-12-31 00:30:28 +11:00 |
Petter Hohle
|
f112e7754e
|
Add PART to tag map
16 of the 17 PoS tags in the UD tag set is added; PART is missing.
|
2016-12-28 18:39:01 +01:00 |
Matthew Honnibal
|
f62db78dc3
|
Increment version
|
2016-12-27 21:11:22 +01:00 |
Matthew Honnibal
|
cade536d1e
|
Merge branch 'master' of ssh://github.com/explosion/spaCy
|
2016-12-27 21:04:10 +01:00 |
Matthew Honnibal
|
ce4539dafd
|
Allow the vocabulary to grow to 10,000, to prevent cold-start problem.
|
2016-12-27 21:03:45 +01:00 |
Ines Montani
|
ad3669cef5
|
Merge pull request #703 from magnusburton/master
Added Swedish abbreviations
|
2016-12-27 01:01:49 +01:00 |
Ines Montani
|
78f754dd9a
|
Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
|
2016-12-27 00:48:13 +01:00 |
Ines Montani
|
8785706039
|
Reformat stop words for better readability
|
2016-12-24 00:58:40 +01:00 |
Gyorgy Orosz
|
45e045a87b
|
Unicode/UTF8 compatibility for Python2
|
2016-12-24 00:21:00 +01:00 |
Gyorgy Orosz
|
72b61b6d03
|
Typo fix.
|
2016-12-24 00:10:29 +01:00 |
Gyorgy Orosz
|
3a9be4d485
|
Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers.
|
2016-12-23 23:49:34 +01:00 |
Ines Montani
|
1436b9f15a
|
Fix formatting and consistency
|
2016-12-23 21:36:01 +01:00 |
Ines Montani
|
1d64527727
|
Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
|
2016-12-23 21:36:01 +01:00 |
Ines Montani
|
7f411fd01c
|
Remove exceptions containing whitespace / no special chars
|
2016-12-23 14:30:06 +01:00 |
Magnus Burton
|
fdf4776262
|
Added Swedish abbreviations
|
2016-12-22 22:45:18 +01:00 |
Gyorgy Orosz
|
d9c59c4751
|
Maintaining backward compatibility.
|
2016-12-21 23:30:49 +01:00 |
Gyorgy Orosz
|
1748549aeb
|
Added exception pattern mechanism to the tokenizer.
|
2016-12-21 23:16:19 +01:00 |
Gyorgy Orosz
|
35aa54765d
|
Hungarian module is exposed in spacy.
|
2016-12-21 20:45:36 +01:00 |
Gyorgy Orosz
|
ab2f6ea46c
|
Removed data files from tests..
|
2016-12-21 20:22:09 +01:00 |
Ines Montani
|
3c87c71d43
|
Add tokenizer exceptions for a.m. and p.m. in Spanish
|
2016-12-21 18:19:10 +01:00 |
Ines Montani
|
78e63dc7d0
|
Update tokenizer exceptions for English
|
2016-12-21 18:06:34 +01:00 |
Ines Montani
|
702d1eed93
|
Update tokenizer exceptions for German
|
2016-12-21 18:06:27 +01:00 |
Ines Montani
|
d60380418e
|
Update tokenizer exceptions for Spanish
|
2016-12-21 18:06:17 +01:00 |
Ines Montani
|
920fa0fed2
|
Add DET_LEMMA constant
|
2016-12-21 18:05:41 +01:00 |
Ines Montani
|
8978806ea6
|
Allow Vocab to load without serializer_freqs
|
2016-12-21 18:05:23 +01:00 |
Ines Montani
|
be8ed811f6
|
Remove trailing whitespace
|
2016-12-21 18:04:41 +01:00 |
Ines Montani
|
926e19184a
|
Merge pull request #695 from magnusburton/master
Added Swedish morph rules
|
2016-12-21 01:06:00 +01:00 |