Commit Graph

6711 Commits

Author SHA1 Message Date
ines a69f4e56e5 Remove outdated aside 2017-10-14 12:52:07 +02:00
ines bb6ecb82e5 Ensure long file paths in code examples break if needed 2017-10-14 12:51:52 +02:00
ines bfd9506f1d Update extensions docs and add resources 2017-10-13 00:18:13 +02:00
ines 5f5d6897e8 Increment version 2017-10-13 00:18:02 +02:00
ines 9fd68334ab Add validate command docs 2017-10-12 23:36:48 +02:00
Matthew Honnibal cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal 9b90d235d1 Fix tag check in lemmatizer 2017-10-12 22:50:43 +02:00
Matthew Honnibal dc01acd821 Escape encoding in validate function 2017-10-12 22:23:21 +02:00
Matthew Honnibal 27b927259a Add locale_escape compat function 2017-10-12 22:22:04 +02:00
Matthew Honnibal e72603f39f Merge pull request #1416 from explosion/feature/cli-validate
💫 Add "validate" command to CLI
2017-10-12 21:45:20 +02:00
Matthew Honnibal cb0e727c54 Merge pull request #1415 from IamJeffG/fix-alpha-example-train-ner-standalone
Bugfix example script train_ner_standalone.py, fails after training
2017-10-12 21:44:28 +02:00
ines 9c6de3dcfa Merge branch 'develop' into feature/cli-validate 2017-10-12 21:44:28 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Matthew Honnibal 462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines fff1028391 Add validate CLI command 2017-10-12 20:05:06 +02:00
Matthew Honnibal 908f44c3fe Disable history features by default 2017-10-12 14:56:11 +02:00
Matthew Honnibal a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal cecfcc7711 Set default hyper params back to 'slow' settings 2017-10-12 13:12:26 +02:00
Ines Montani 37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
Matthew Honnibal 40dbc85ffa Merge pull request #1413 from explosion/feature/lemmatizer
💫  Integrate lookup lemmatization (9+ languages)
2017-10-11 17:54:36 +02:00
ines 8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
ines eac9e99086 Update docs on adding lemmatization to languages 2017-10-11 14:21:15 +02:00
ines 51519251c2 Fix underscore method test 2017-10-11 13:34:19 +02:00
ines c6ae49e8bf Fix formatting 2017-10-11 13:34:11 +02:00
ines 453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
ines 15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
ines 6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
ines 9620c1a640 Add lemma_lookup to Language defaults 2017-10-11 13:26:05 +02:00
ines 9fd471372a Add lookup lemmatizer to lemmatizer as lookup() method 2017-10-11 13:25:51 +02:00
ines e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
ines c1d6d43c83 Merge branch 'develop' into feature/lemmatizer 2017-10-11 11:56:35 +02:00
Matthew Honnibal 17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal 807e109f2b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 02:47:59 -05:00
Matthew Honnibal 6e552c9d83 Prune number of non-projective labels more aggressiely 2017-10-11 02:46:44 -05:00
Matthew Honnibal 76fe24f44d Improve embedding defaults 2017-10-11 09:44:17 +02:00
Matthew Honnibal 188f620046 Improve parser defaults 2017-10-11 09:43:48 +02:00
Matthew Honnibal acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00
Matthew Honnibal 74c2c6a58c Add default name and lang to meta 2017-10-11 08:49:12 +02:00
Matthew Honnibal 3814a161e6 Avoid clobbering preset lemmas 2017-10-11 08:41:03 +02:00
Matthew Honnibal fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal 462b2e26b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 08:23:04 +02:00
Matthew Honnibal a6ac4699eb Allow Morphology class to setup tokens
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal 3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal c15d8278cb Avoid lemmatizing inappropriate tags in English lemmatizer 2017-10-11 03:23:23 +02:00
Matthew Honnibal d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00
Matthew Honnibal 2c118ab3a6 Add tests for Doc creation 2017-10-11 03:21:23 +02:00
ines f4ae6763b9 Fix consistency of imports from spacy.tokens in examples 2017-10-11 02:30:40 +02:00
ines 820bf85075 Move LookupLemmatizer to spacy.lemmatizer 2017-10-11 02:25:13 +02:00
ines 417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00