Commit Graph

9023 Commits

Author SHA1 Message Date
Matthew Honnibal 34cab8cc49 Update morphology API 2018-09-25 20:53:24 +02:00
Matthew Honnibal 9998d9b9ff Start testing morphology class 2018-09-25 20:38:08 +02:00
Matthew Honnibal 4b7e772f5d Implement the is_animacy_feature etc functions 2018-09-25 17:28:34 +02:00
Matthew Honnibal 6fe7c72560 Reorder morphology enum, and add begin and end markers 2018-09-25 17:28:13 +02:00
Matthew Honnibal 8308c1525e Fix exception loading 2018-09-25 15:18:21 +02:00
Matthew Honnibal e4d8f86d7f Merge branch 'develop' into feature/lemmatizer 2018-09-25 11:09:22 +02:00
Matthew Honnibal b42c123e5d Fix regression introduced by 1759abf1e 2018-09-25 11:08:58 +02:00
Matthew Honnibal 500898907b Fix regression in parser.begin_training() 2018-09-25 11:08:31 +02:00
Matthew Honnibal c2357d3ba0 Fix morphologizer class 2018-09-25 10:58:13 +02:00
Matthew Honnibal e6dde97295 Add function to make morphologizer model 2018-09-25 10:57:59 +02:00
Matthew Honnibal be8cf39e16 Fix morphology 2018-09-25 10:57:33 +02:00
Matthew Honnibal a3d2e616d5 Restore previous morphology stuff 2018-09-25 00:35:59 +02:00
Matthew Honnibal 3bba8e9245 Update structs 2018-09-24 23:58:08 +02:00
Matthew Honnibal 6ae645c4ef WIP on supporting morphology features 2018-09-24 23:57:41 +02:00
Matthew Honnibal ac5742223a Draft class to predict morphological tags 2018-09-24 23:14:06 +02:00
Matthew Honnibal b10d0cce05 Add MultiSoftmax class
Add a new class for the Tagger model, MultiSoftmax. This allows softmax
prediction of multiple classes on the same output layer, e.g. one
variable with 3 classes, another with 4 classes. This makes a layer with
7 output neurons, which we softmax into two distributions.
2018-09-24 17:35:28 +02:00
Matthew Honnibal 052c45dc2f Add as_int and as_string methods to StringStore 2018-09-24 15:25:20 +02:00
Matthew Honnibal 1759abf1e5 Fix bug in sentence starts for non-projective parses
The set_children_from_heads function assumed parse trees were
projective. However, non-projective parses may be passed in during
deserialization, or after deprojectivising. This caused incorrect
sentence boundaries to be set for non-projective parses. Close #2772.
2018-09-19 14:50:06 +02:00
Matthew Honnibal 48fd36bf05 Fix test for issue 27772 2018-09-19 14:47:27 +02:00
Matthew Honnibal 6cd920e088 Add xfail test for deprojectivization SBD bug 2018-09-19 14:00:31 +02:00
Matthew Honnibal 99a6011580 Avoid adding empty layer in model, to keep models backwards compatible 2018-09-14 22:51:58 +02:00
Matthew Honnibal c046392317 Trigger on_data hooks in parser model 2018-09-14 20:51:21 +02:00
Matthew Honnibal 5afd98dff5 Add a stepping function, for changing batch sizes or learning rates 2018-09-14 18:37:16 +02:00
Matthew Honnibal 27c00f4f22 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-09-14 12:30:57 +02:00
Matthew Honnibal f32b52e611 Fix bug that caused deprojectivisation to run multiple times 2018-09-14 12:12:54 +02:00
Matthew Honnibal 8f2a6367e9 Fix usage of PyTorch BiLSTM in ud_train 2018-09-13 22:54:59 +00:00
Matthew Honnibal afeddfff26 Fix PyTorch BiLSTM 2018-09-13 22:54:34 +00:00
Matthew Honnibal a26fe8e7bb Small hack in Language.update to make torch work 2018-09-13 22:51:52 +00:00
Matthew Honnibal 445b81ce3f Support bilstm_depth argument in ud-train 2018-09-13 19:30:22 +02:00
Matthew Honnibal b43643a953 Support bilstm_depth option in parser 2018-09-13 19:29:49 +02:00
Matthew Honnibal 45032fe9e1 Support option of BiLSTM in Tok2Vec (requires pytorch) 2018-09-13 19:28:35 +02:00
Matthew Honnibal 3eb9f3e2b8 Fix defaults for ud-train 2018-09-13 18:05:48 +02:00
Matthew Honnibal 59cf533879 Improve ud-train script. Make config optional 2018-09-13 14:24:08 +02:00
Matthew Honnibal 3e3a309764 Fix tagger 2018-09-13 14:14:38 +02:00
Matthew Honnibal da7650e84b Fix maximum doc length in ud_train script 2018-09-13 14:10:25 +02:00
Matthew Honnibal a95eea4c06 Fix multi-task objective for parser 2018-09-13 14:08:55 +02:00
Matthew Honnibal 21321cd6cf Add tok2vec property to parser model 2018-09-13 14:08:43 +02:00
Matthew Honnibal d6aa60139d Fix tagger training on GPU 2018-09-13 14:05:37 +02:00
Matthew Honnibal b2cb1fc67d Merge matcher tests 2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan 356af7b0a1 Fix tests 2018-09-06 01:39:36 +02:00
Matthew Honnibal 4d2d7d5866 Fix new feature flags 2018-08-27 02:12:39 +02:00
Matthew Honnibal 598dbf1ce0 Fix character-based tokenization for Japanese 2018-08-27 01:51:38 +02:00
Matthew Honnibal 3763e20afc Pass subword_features and conv_depth params 2018-08-27 01:51:15 +02:00
Matthew Honnibal 8051136d70 Support subword_features and conv_depth params in Tok2Vec 2018-08-27 01:50:48 +02:00
Matthew Honnibal 9c33d4d1df Add more hyper-parameters to spacy ud-train
* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.

* conv_depth: Depth of the convolutional layers. Defaults to 4.
2018-08-27 01:48:46 +02:00
Matthew Honnibal 51a9efbf3b Add draft Binder class 2018-08-22 13:12:51 +02:00
Matthew Honnibal f0e6be689a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-08-16 17:18:19 +02:00
Matthew Honnibal 5ce459d2ee Fix error in vocab 2018-08-16 17:18:09 +02:00
Ines Montani aeb49eb625 Update version [ci skip] 2018-08-16 16:56:02 +02:00
Ines Montani a0eacd3293 Merge branch 'master' into develop 2018-08-16 16:55:05 +02:00