Commit Graph

8990 Commits

Author SHA1 Message Date
Matthew Honnibal 3e3a309764 Fix tagger 2018-09-13 14:14:38 +02:00
Matthew Honnibal da7650e84b Fix maximum doc length in ud_train script 2018-09-13 14:10:25 +02:00
Matthew Honnibal a95eea4c06 Fix multi-task objective for parser 2018-09-13 14:08:55 +02:00
Matthew Honnibal 21321cd6cf Add tok2vec property to parser model 2018-09-13 14:08:43 +02:00
Matthew Honnibal d6aa60139d Fix tagger training on GPU 2018-09-13 14:05:37 +02:00
Matthew Honnibal b2cb1fc67d Merge matcher tests 2018-09-06 01:39:53 +02:00
Suraj Krishnan Rajan 356af7b0a1 Fix tests 2018-09-06 01:39:36 +02:00
Matthew Honnibal 4d2d7d5866 Fix new feature flags 2018-08-27 02:12:39 +02:00
Matthew Honnibal 598dbf1ce0 Fix character-based tokenization for Japanese 2018-08-27 01:51:38 +02:00
Matthew Honnibal 3763e20afc Pass subword_features and conv_depth params 2018-08-27 01:51:15 +02:00
Matthew Honnibal 8051136d70 Support subword_features and conv_depth params in Tok2Vec 2018-08-27 01:50:48 +02:00
Matthew Honnibal 9c33d4d1df Add more hyper-parameters to spacy ud-train
* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.

* conv_depth: Depth of the convolutional layers. Defaults to 4.
2018-08-27 01:48:46 +02:00
Matthew Honnibal 51a9efbf3b Add draft Binder class 2018-08-22 13:12:51 +02:00
Matthew Honnibal f0e6be689a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-08-16 17:18:19 +02:00
Matthew Honnibal 5ce459d2ee Fix error in vocab 2018-08-16 17:18:09 +02:00
Ines Montani aeb49eb625 Update version [ci skip] 2018-08-16 16:56:02 +02:00
Ines Montani a0eacd3293 Merge branch 'master' into develop 2018-08-16 16:55:05 +02:00
Ines Montani c0fa9903f4 Update model directory JS [ci skip]
Prevent the default release URL from being overwritten and add license type
2018-08-16 16:54:50 +02:00
Ines Montani 03f661fefb Add Greek to models directory [ci skip] 2018-08-16 16:51:56 +02:00
Matthew Honnibal 00febda2e3 Improve alignment around quotes 2018-08-16 01:04:34 +02:00
Matthew Honnibal 66a3f2ba21 Lower-case text before alignment 2018-08-16 00:42:36 +02:00
Matthew Honnibal 595c893791 Expose noise_level option in train CLI 2018-08-16 00:41:44 +02:00
Matthew Honnibal 8365226bf3 Fix lookup of symbols in vocab. 2018-08-15 23:43:34 +02:00
Matthew Honnibal b9f0588580 Set version to v2.1.0a1 2018-08-15 17:22:39 +02:00
Matthew Honnibal e968016417 Note link between issues #2671 and #2675 2018-08-15 17:18:28 +02:00
Matthew Honnibal 63bdc734ba Skip flakey test 2018-08-15 16:56:55 +02:00
Matthew Honnibal ce512e1d47 Fix #2671: Incorrect match ID on some patterns 2018-08-15 16:19:08 +02:00
Matthew Honnibal f12b9190f6 Xfail test for issue #2671 2018-08-15 15:55:31 +02:00
Matthew Honnibal 7cfa665ce6 Add failing test for issue 2671: Incorrect rule ID returned from matcher 2018-08-15 15:54:33 +02:00
Matthew Honnibal 1b2a5869ab Set version to v2.1.0a2.dev0 2018-08-15 15:38:52 +02:00
Matthew Honnibal 5080760288 Add extra comment on 'add label' in parser 2018-08-15 15:37:24 +02:00
Matthew Honnibal 6e749d3c70 Skip flakey parser test 2018-08-15 15:37:04 +02:00
Ines Montani fd9d175a53 Update live code [ci skip] 2018-08-15 15:28:48 +02:00
Matthew Honnibal 48ed1ca29d Add branch option to push-tag script 2018-08-15 03:16:43 +02:00
Matthew Honnibal 6ea981c839 Add converter for jsonl NER data 2018-08-14 14:04:32 +02:00
Matthew Honnibal a9fb6d5511 Fix docs2jsonl function 2018-08-14 14:03:48 +02:00
Matthew Honnibal ea2edd1e2c Merge branch 'feature/docs_to_json' into develop 2018-08-14 13:23:42 +02:00
Matthew Honnibal 6ec236ab08 Fix label-clobber bug in parser.begin_training()
The parser.begin_training() method was rewritten in v2.1. The rewrite
introduced a regression, where if you added labels prior to
begin_training(), these labels were discarded. This patch fixes that.
2018-08-14 13:20:19 +02:00
Matthew Honnibal 02c5c114d0 Fix usage of deprecated freqs.txt in init-model 2018-08-14 13:19:15 +02:00
Matthew Honnibal 2a5a61683e Add function to get train format from Doc objects
Our JSON training format is annoying to work with, and we've wanted to
retire it for some time. In the meantime, we can at least add some
missing functions to make it easier to live with.

This patch adds a function that generates the JSON format from a list
of Doc objects, one per paragraph. This should be a convenient way to handle
a lot of data conversions: whatever format you have the source
information in, you can use it to setup a Doc object. This approach
should offer better future-proofing as well. Hopefully, we can steadily
rewrite code that is sensitive to the current data-format, so that it
instead goes through this function. Then when we change the data format,
we won't have such a problem.
2018-08-14 13:13:10 +02:00
Matthew Honnibal 4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Matthew Honnibal 13fa550b36 Merge branch 'master' of https://github.com/explosion/spaCy 2018-08-14 02:32:01 +02:00
Ioannis Daras fe94e696d3 Optimize Greek language support (#2658) 2018-08-14 02:31:32 +02:00
Wojciech Łukasiewicz 3953e967a0 User correct variable name in the examples (#2664)
* correct naming

* add contributor agreement
2018-08-13 22:21:24 +02:00
Matthew Honnibal 85000ea13b Increment version to 2.0.13.dev2 2018-08-10 00:43:55 +02:00
Matthew Honnibal c4ac981e6d Try again to filter warnings 2018-08-10 00:42:54 +02:00
Matthew Honnibal ae7fc42a41 Increment version to v2.0.13.dev1 2018-08-10 00:14:31 +02:00
Matthew Honnibal 7be9118be3 Require numpy>=1.15.0 to avoid the RuntimeWarning 2018-08-10 00:14:13 +02:00
Matthew Honnibal 19f5046934 Undoing warning suppression, as doesnt really work 2018-08-10 00:13:34 +02:00
Matthew Honnibal 3fb828352d Set version to 2.0.13.dev0 2018-08-09 23:49:34 +02:00