Commit Graph

4041 Commits

Author SHA1 Message Date
Matthew Honnibal 010a7309ff Merge pull request #1402 from explosion/feature/fix-matcher-operators
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal c29927d2e7 Fix matcher test 2017-10-16 17:22:18 +02:00
Matthew Honnibal a928ae2f35 Merge branch 'develop' into feature/fix-matcher-operators 2017-10-16 13:38:36 +02:00
Matthew Honnibal 56aa42cc5d Fix and document matcher operator 'shadowing' behaviour 2017-10-16 13:38:20 +02:00
Matthew Honnibal 748d525801 Add more matcher operator tests 2017-10-16 13:38:01 +02:00
Matthew Honnibal 0433181658 Document operator semantics in Matcher docstring 2017-10-16 12:06:33 +02:00
ines 9d6c8eaa49 Update base norm exceptions with more unicode characters
e.g. unicode variations of punctuation used in Chinese
2017-10-14 14:58:52 +02:00
ines 3516aa0cea Port over changes from #1389 2017-10-14 13:32:55 +02:00
ines cd6a29dce7 Port over changes from #1294 2017-10-14 13:28:46 +02:00
ines 38c756fd85 Port over changes from #1287 2017-10-14 13:16:21 +02:00
ines 612224c10d Port over changes from #1157 2017-10-14 13:11:39 +02:00
ines 9b3f8f9ec3 Fix formatting and add comment on languages 2017-10-14 13:11:18 +02:00
ines a4d974d97b Port over URL pattern changes from #1411 2017-10-14 12:58:07 +02:00
ines 09aed58140 Port over changes from #1333 and add comments 2017-10-14 12:52:59 +02:00
Matthew Honnibal cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal 9b90d235d1 Fix tag check in lemmatizer 2017-10-12 22:50:43 +02:00
Matthew Honnibal dc01acd821 Escape encoding in validate function 2017-10-12 22:23:21 +02:00
Matthew Honnibal 27b927259a Add locale_escape compat function 2017-10-12 22:22:04 +02:00
ines 9c6de3dcfa Merge branch 'develop' into feature/cli-validate 2017-10-12 21:44:28 +02:00
Matthew Honnibal 462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
ines fff1028391 Add validate CLI command 2017-10-12 20:05:06 +02:00
Matthew Honnibal 908f44c3fe Disable history features by default 2017-10-12 14:56:11 +02:00
Matthew Honnibal a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal cecfcc7711 Set default hyper params back to 'slow' settings 2017-10-12 13:12:26 +02:00
Ines Montani 37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines 8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines 51519251c2 Fix underscore method test 2017-10-11 13:34:19 +02:00
ines c6ae49e8bf Fix formatting 2017-10-11 13:34:11 +02:00
ines 453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
ines 15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
ines 6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
ines 9620c1a640 Add lemma_lookup to Language defaults 2017-10-11 13:26:05 +02:00
ines 9fd471372a Add lookup lemmatizer to lemmatizer as lookup() method 2017-10-11 13:25:51 +02:00
ines e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
ines c1d6d43c83 Merge branch 'develop' into feature/lemmatizer 2017-10-11 11:56:35 +02:00
Matthew Honnibal 17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal 807e109f2b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 02:47:59 -05:00
Matthew Honnibal 6e552c9d83 Prune number of non-projective labels more aggressiely 2017-10-11 02:46:44 -05:00
Matthew Honnibal 76fe24f44d Improve embedding defaults 2017-10-11 09:44:17 +02:00
Matthew Honnibal 188f620046 Improve parser defaults 2017-10-11 09:43:48 +02:00
Matthew Honnibal acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00
Matthew Honnibal 74c2c6a58c Add default name and lang to meta 2017-10-11 08:49:12 +02:00
Matthew Honnibal 3814a161e6 Avoid clobbering preset lemmas 2017-10-11 08:41:03 +02:00
Matthew Honnibal fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal 462b2e26b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 08:23:04 +02:00
Matthew Honnibal a6ac4699eb Allow Morphology class to setup tokens
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal 3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal c15d8278cb Avoid lemmatizing inappropriate tags in English lemmatizer 2017-10-11 03:23:23 +02:00
Matthew Honnibal d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00
Matthew Honnibal 2c118ab3a6 Add tests for Doc creation 2017-10-11 03:21:23 +02:00