spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	e4d8f86d7f	Merge branch 'develop' into feature/lemmatizer	2018-09-25 11:09:22 +02:00
Matthew Honnibal	b42c123e5d	Fix regression introduced by `1759abf1e`	2018-09-25 11:08:58 +02:00
Matthew Honnibal	500898907b	Fix regression in parser.begin_training()	2018-09-25 11:08:31 +02:00
Matthew Honnibal	c2357d3ba0	Fix morphologizer class	2018-09-25 10:58:13 +02:00
Matthew Honnibal	e6dde97295	Add function to make morphologizer model	2018-09-25 10:57:59 +02:00
Matthew Honnibal	be8cf39e16	Fix morphology	2018-09-25 10:57:33 +02:00
Matthew Honnibal	a3d2e616d5	Restore previous morphology stuff	2018-09-25 00:35:59 +02:00
Matthew Honnibal	3bba8e9245	Update structs	2018-09-24 23:58:08 +02:00
Matthew Honnibal	6ae645c4ef	WIP on supporting morphology features	2018-09-24 23:57:41 +02:00
Matthew Honnibal	ac5742223a	Draft class to predict morphological tags	2018-09-24 23:14:06 +02:00
Matthew Honnibal	b10d0cce05	Add MultiSoftmax class Add a new class for the Tagger model, MultiSoftmax. This allows softmax prediction of multiple classes on the same output layer, e.g. one variable with 3 classes, another with 4 classes. This makes a layer with 7 output neurons, which we softmax into two distributions.	2018-09-24 17:35:28 +02:00
Matthew Honnibal	052c45dc2f	Add as_int and as_string methods to StringStore	2018-09-24 15:25:20 +02:00
Matthew Honnibal	1759abf1e5	Fix bug in sentence starts for non-projective parses The set_children_from_heads function assumed parse trees were projective. However, non-projective parses may be passed in during deserialization, or after deprojectivising. This caused incorrect sentence boundaries to be set for non-projective parses. Close #2772.	2018-09-19 14:50:06 +02:00
Matthew Honnibal	48fd36bf05	Fix test for issue 27772	2018-09-19 14:47:27 +02:00
Matthew Honnibal	6cd920e088	Add xfail test for deprojectivization SBD bug	2018-09-19 14:00:31 +02:00
Matthew Honnibal	99a6011580	Avoid adding empty layer in model, to keep models backwards compatible	2018-09-14 22:51:58 +02:00
Matthew Honnibal	c046392317	Trigger on_data hooks in parser model	2018-09-14 20:51:21 +02:00
Matthew Honnibal	5afd98dff5	Add a stepping function, for changing batch sizes or learning rates	2018-09-14 18:37:16 +02:00
Matthew Honnibal	27c00f4f22	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2018-09-14 12:30:57 +02:00
Matthew Honnibal	f32b52e611	Fix bug that caused deprojectivisation to run multiple times	2018-09-14 12:12:54 +02:00
Matthew Honnibal	8f2a6367e9	Fix usage of PyTorch BiLSTM in ud_train	2018-09-13 22:54:59 +00:00
Matthew Honnibal	afeddfff26	Fix PyTorch BiLSTM	2018-09-13 22:54:34 +00:00
Matthew Honnibal	a26fe8e7bb	Small hack in Language.update to make torch work	2018-09-13 22:51:52 +00:00
Matthew Honnibal	445b81ce3f	Support bilstm_depth argument in ud-train	2018-09-13 19:30:22 +02:00
Matthew Honnibal	b43643a953	Support bilstm_depth option in parser	2018-09-13 19:29:49 +02:00
Matthew Honnibal	45032fe9e1	Support option of BiLSTM in Tok2Vec (requires pytorch)	2018-09-13 19:28:35 +02:00
Matthew Honnibal	3eb9f3e2b8	Fix defaults for ud-train	2018-09-13 18:05:48 +02:00
Matthew Honnibal	59cf533879	Improve ud-train script. Make config optional	2018-09-13 14:24:08 +02:00
Matthew Honnibal	3e3a309764	Fix tagger	2018-09-13 14:14:38 +02:00
Matthew Honnibal	da7650e84b	Fix maximum doc length in ud_train script	2018-09-13 14:10:25 +02:00
Matthew Honnibal	a95eea4c06	Fix multi-task objective for parser	2018-09-13 14:08:55 +02:00
Matthew Honnibal	21321cd6cf	Add tok2vec property to parser model	2018-09-13 14:08:43 +02:00
Matthew Honnibal	d6aa60139d	Fix tagger training on GPU	2018-09-13 14:05:37 +02:00
Matthew Honnibal	b2cb1fc67d	Merge matcher tests	2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan	356af7b0a1	Fix tests	2018-09-06 01:39:36 +02:00
Matthew Honnibal	4d2d7d5866	Fix new feature flags	2018-08-27 02:12:39 +02:00
Matthew Honnibal	598dbf1ce0	Fix character-based tokenization for Japanese	2018-08-27 01:51:38 +02:00
Matthew Honnibal	3763e20afc	Pass subword_features and conv_depth params	2018-08-27 01:51:15 +02:00
Matthew Honnibal	8051136d70	Support subword_features and conv_depth params in Tok2Vec	2018-08-27 01:50:48 +02:00
Matthew Honnibal	9c33d4d1df	Add more hyper-parameters to spacy ud-train * subword_features: Controls whether subword features are used in the word embeddings. True by default (specifically, prefix, suffix and word shape). Should be set to False for languages like Chinese and Japanese. * conv_depth: Depth of the convolutional layers. Defaults to 4.	2018-08-27 01:48:46 +02:00
Matthew Honnibal	51a9efbf3b	Add draft Binder class	2018-08-22 13:12:51 +02:00
Matthew Honnibal	5ce459d2ee	Fix error in vocab	2018-08-16 17:18:09 +02:00
Matthew Honnibal	00febda2e3	Improve alignment around quotes	2018-08-16 01:04:34 +02:00
Matthew Honnibal	66a3f2ba21	Lower-case text before alignment	2018-08-16 00:42:36 +02:00
Matthew Honnibal	595c893791	Expose noise_level option in train CLI	2018-08-16 00:41:44 +02:00
Matthew Honnibal	8365226bf3	Fix lookup of symbols in vocab.	2018-08-15 23:43:34 +02:00
Matthew Honnibal	b9f0588580	Set version to v2.1.0a1	2018-08-15 17:22:39 +02:00
Matthew Honnibal	e968016417	Note link between issues #2671 and #2675	2018-08-15 17:18:28 +02:00
Matthew Honnibal	63bdc734ba	Skip flakey test	2018-08-15 16:56:55 +02:00
Matthew Honnibal	ce512e1d47	Fix #2671 : Incorrect match ID on some patterns	2018-08-15 16:19:08 +02:00

1 2 3 4 5 ...

5303 Commits