Jim Geovedi
|
c97f5ae0bb
|
updated tokenizer exceptions
|
2017-07-26 19:12:52 +07:00 |
Jim Geovedi
|
73f6ac9d9b
|
added hyhen
|
2017-07-24 15:56:31 +07:00 |
Jim Geovedi
|
68454c40bf
|
added missing import
|
2017-07-24 14:12:34 +07:00 |
Jim Geovedi
|
eaf9cbd708
|
cursed of copy & paste
|
2017-07-24 14:11:51 +07:00 |
Jim Geovedi
|
7aad6718bc
|
enable tokenizer exceptions
|
2017-07-24 14:11:10 +07:00 |
Jim Geovedi
|
ad56c9179a
|
added tokenizer exceptions list
|
2017-07-24 14:10:16 +07:00 |
Jim Geovedi
|
c1f3fe99fe
|
updated punctuation rules
|
2017-07-24 13:57:21 +07:00 |
Jim Geovedi
|
37fa2c8c80
|
punctution rules
|
2017-07-24 06:17:18 +07:00 |
Jim Geovedi
|
082e94ac1c
|
added inflix rules
|
2017-07-24 06:17:07 +07:00 |
Jim Geovedi
|
d0ec484725
|
reverted
|
2017-07-24 06:16:29 +07:00 |
Jim Geovedi
|
0e590c711f
|
added prefix & suffix rules
|
2017-07-23 23:46:40 +07:00 |
Jim Geovedi
|
ba922e30e8
|
added ampere hour unit
|
2017-07-23 23:46:18 +07:00 |
Jim Geovedi
|
3b17eba27b
|
added frequency units
|
2017-07-23 23:10:52 +07:00 |
Jim Geovedi
|
d5fd32a572
|
added known currencies
|
2017-07-23 22:56:48 +07:00 |
Jim Geovedi
|
f6f15678fb
|
added lex_attrs
|
2017-07-23 22:55:22 +07:00 |
Jim Geovedi
|
bed8162d00
|
added tokenizer_exceptions
|
2017-07-23 22:55:05 +07:00 |
Jim Geovedi
|
b80c35bc9a
|
added norm_exceptions
|
2017-07-23 22:54:49 +07:00 |
Jim Geovedi
|
b5de329ea3
|
added norm_exceptions
|
2017-07-23 22:54:19 +07:00 |
Jim Geovedi
|
082e9ade46
|
fixed typo
|
2017-07-23 21:30:34 +07:00 |
Jim Geovedi
|
e2efeb186e
|
added stopwords
|
2017-07-23 20:52:37 +07:00 |
Jim Geovedi
|
da98676839
|
use template
|
2017-07-23 20:51:31 +07:00 |
Jim Geovedi
|
c2b4dd7809
|
start working on Indonesian language
|
2017-07-23 20:50:56 +07:00 |
Ines Montani
|
c91642efd5
|
Port over changes from #1168
|
2017-07-01 11:43:54 +02:00 |
Jim Regan
|
d81ceb0cd5
|
Merge branch 'develop' into polish
|
2017-06-26 22:42:27 +01:00 |
Jim O'Regan
|
2f84c73585
|
a start
|
2017-06-26 22:40:04 +01:00 |
Jim O'Regan
|
28d7f0a672
|
reference
|
2017-06-26 22:38:28 +01:00 |
Matthew Honnibal
|
91e52543ef
|
Merge pull request #1118 from Gregory-Howard/patch-2
Update _tokenizer_exceptions_list (adding cities)
|
2017-06-20 11:16:07 +02:00 |
Tpt
|
7745b3ae04
|
Adds noun chunks to French syntax iterators
|
2017-06-12 15:29:58 +02:00 |
Grégory Howard
|
cd974b32b7
|
Update _tokenizer_exceptions_list (adding cities)
|
2017-06-09 17:58:18 +02:00 |
Matthew Honnibal
|
55d0621532
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2017-06-04 15:53:25 -05:00 |
Matthew Honnibal
|
e28f90b672
|
Fix syntax iterators
|
2017-06-04 15:51:50 -05:00 |
Ines Montani
|
112c5787eb
|
Merge pull request #1101 from oroszgy/hu_tokenizer_fix
More robust Hungarian tokenizer.
|
2017-06-04 22:37:51 +02:00 |
ines
|
9254a3dd78
|
Import and add Spanish syntax iterators
|
2017-06-04 21:42:15 +02:00 |
Matthew Honnibal
|
7ca215bc26
|
Resolve lex_attr_getters conflict
|
2017-06-03 16:12:01 -05:00 |
ines
|
4c643d74c5
|
Add norm exceptions to other Language classes
|
2017-06-03 22:29:21 +02:00 |
ines
|
fa7e576c57
|
Change order of exception dicts
|
2017-06-03 21:52:06 +02:00 |
Matthew Honnibal
|
3f5c85d8de
|
Reorder setting of lex attrs, to avoid clobbering
|
2017-06-03 14:47:55 -05:00 |
Matthew Honnibal
|
aeb7520133
|
Make norm use lower-case
|
2017-06-03 14:47:38 -05:00 |
Matthew Honnibal
|
de3954843e
|
Populate norm exceptions with lower-case
|
2017-06-03 14:47:12 -05:00 |
ines
|
e47eef5e03
|
Update German tokenizer exceptions and tests
|
2017-06-03 21:07:44 +02:00 |
ines
|
0d6fa8b241
|
Add German norm exceptions
|
2017-06-03 20:54:18 +02:00 |
ines
|
5bd311c77e
|
Fix update of norm exceptions
|
2017-06-03 20:54:09 +02:00 |
ines
|
746653880c
|
Add English norm exceptions to lex_attrs
|
2017-06-03 20:27:28 +02:00 |
ines
|
095eeeb12f
|
Update English tokenizer exceptions and add norms
|
2017-06-03 20:27:16 +02:00 |
ines
|
e5d426406a
|
Add base norm exceptions
|
2017-06-03 20:27:05 +02:00 |
ines
|
2f1025a94c
|
Port over Spanish changes from #1096
|
2017-06-02 19:09:58 +02:00 |
Gyorgy Orosz
|
f0c3b09242
|
More robust Hungarian tokenizer.
|
2017-05-31 22:28:40 +02:00 |
Gyorgy Orosz
|
8c0b4b850e
|
Fixed emoji handling for Hungarian
|
2017-05-30 21:34:46 +02:00 |
ines
|
84189c1cab
|
Add 'xx' language ID for multi-language support
Allows models to specify their language ID as 'xx'.
|
2017-05-28 00:58:59 +02:00 |
ines
|
33e332e67c
|
Remove unused export
|
2017-05-28 00:57:59 +02:00 |