Commit Graph

2866 Commits

Author SHA1 Message Date
Matthew Honnibal a66e2f2f53 * Fix gather_freqs.py 2016-02-04 20:22:54 +01:00
Matthew Honnibal 5dc6cffc67 * Fix gather_freqs.py 2016-02-04 20:21:58 +01:00
Henning Peters fc19a4a153 Merge branch 'master' of github.com:honnibal/spaCy 2016-02-04 17:37:33 +01:00
Matthew Honnibal 48ce09687d * Skip pickling the vocab in the tests 2016-02-04 15:51:19 +01:00
Matthew Honnibal 419edfab50 * Use generic flags for the new attributes until they're added 2016-02-04 15:50:54 +01:00
Matthew Honnibal c4017a06d9 * Add placeholders for the new flags in attrs and symbols 2016-02-04 15:49:45 +01:00
Matthew Honnibal e5c96c969f * Wire up new attributes 2016-02-04 13:04:58 +01:00
Matthew Honnibal 9703ccc3de * Remove unused import 2016-02-04 13:04:33 +01:00
Matthew Honnibal 11810be33e * Add Python hooks for is_bracket/is_quote/is_left_punct/is_right_punct 2016-02-04 13:04:16 +01:00
Matthew Honnibal fe611132f0 * Add stubs for is_bracket/is_quote/is_left_punct/is_right_punct functions 2016-02-04 13:03:04 +01:00
Matthew Honnibal ee975d36d0 * Add stubs to test is_bracket/is_quote/is_left_punct/is_right_punct functions 2016-02-04 13:02:25 +01:00
Henning Peters e7ec06cea2 Merge branch 'master' of github.com:honnibal/spaCy 2016-02-03 12:20:36 +01:00
Matthew Honnibal f9e765cae7 * Add pipe() method to tokenizer 2016-02-03 02:32:37 +01:00
Matthew Honnibal 4cbad510ff * Fix calculation of head for spans with punctuation. 2016-02-03 02:32:21 +01:00
Matthew Honnibal 84b247ef83 * Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading. 2016-02-03 02:10:58 +01:00
Matthew Honnibal fcfc17a164 Merge branch 'master' into rethinc2 2016-02-02 23:05:34 +01:00
Matthew Honnibal 1a2ee73e98 * Add missing pos and tag attributes to API 2016-02-02 23:00:53 +01:00
Matthew Honnibal f204daf27b * Add error warning that a gold tag is unrecognised 2016-02-02 22:59:59 +01:00
Matthew Honnibal 99b8906100 * Accept punct_labels as an argument to the scorer 2016-02-02 22:59:06 +01:00
Matthew Honnibal e2ed6251d7 * Fancy up the CLI for the conll train script 2016-02-02 22:58:06 +01:00
Matthew Honnibal 59123443e2 * Check for presence/absence of the different models in Language.end_training 2016-02-02 22:49:55 +01:00
Matthew Honnibal 7cbff48ace * Set the German lemma rules to be an empty JSON object 2016-02-02 22:30:51 +01:00
Matthew Honnibal d0f06c5cc4 * Add missing tags to the German tag map 2016-02-02 22:30:22 +01:00
Matthew Honnibal bf5a7cc598 * Update train_pos_tagger example 2016-02-02 22:30:00 +01:00
Matthew Honnibal a676d66807 * Update the CoNLL train script, to get working on other languages 2016-02-02 22:29:34 +01:00
Matthew Honnibal c9aa91041d * Don't expect openmp in options 2016-02-02 13:50:25 +01:00
Henning Peters 7d4d803ff6 Merge branch 'master' of github.com:honnibal/spaCy 2016-02-01 13:33:46 +01:00
Matthew Honnibal 9e9d4c8706 * Fix stupid error in Language.batch 2016-02-01 09:49:32 +01:00
Matthew Honnibal e3db39dd21 * Fix compiler warning about signed/unsigned comparison 2016-02-01 09:08:07 +01:00
Matthew Honnibal 98fbdf2856 * Add Language.batch() method, to support multi-threaded jobs 2016-02-01 09:01:13 +01:00
Matthew Honnibal b3802562d6 Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2 2016-02-01 08:59:24 +01:00
Matthew Honnibal 4b08a3fafd * Fix merge conflict 2016-02-01 08:58:18 +01:00
Matthew Honnibal 5188f6d9d8 * Fix parseC function 2016-02-01 08:48:48 +01:00
Matthew Honnibal bcf8f7ba40 * Add a parse_batch method to Parser, that releases the GIL around a batch of documents. 2016-02-01 08:34:55 +01:00
Matthew Honnibal bd47cb3290 Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2 2016-02-01 08:33:52 +01:00
Matthew Honnibal 80caba28c7 Merge branch 'master' of ssh://github.com/honnibal/spaCy into rethinc2 2016-02-01 08:33:26 +01:00
Matthew Honnibal d5579cd0d8 Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2 2016-02-01 03:08:49 +01:00
Matthew Honnibal 490ba65398 * Use openmp in parser 2016-02-01 03:08:42 +01:00
Matthew Honnibal cb78d91ec5 * Fix ArcEager.set_valid 2016-02-01 03:07:37 +01:00
Matthew Honnibal 9c34ca9e5d * Add _stack to mod_names 2016-02-01 03:00:53 +01:00
Matthew Honnibal 28e5ad62bc * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 03:00:15 +01:00
Matthew Honnibal a47f00901b * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 02:58:14 +01:00
Matthew Honnibal daaad66448 * Now fully proxied 2016-02-01 02:37:08 +01:00
Matthew Honnibal 7a0e3bb9c1 * Continue proxying. Some problem currently 2016-02-01 02:22:21 +01:00
Matthew Honnibal 2169bbb7ea * Shadow StateClass with StateC, to start proxying 2016-02-01 01:16:14 +01:00
Matthew Honnibal 2fa228458e * Add _state file, which StateClass will proxy to 2016-02-01 01:09:21 +01:00
Matthew Honnibal bc0f0d284c * Require different thinc version 2016-01-30 20:29:24 +01:00
Matthew Honnibal 6bb007d16e * Make set_parse nogil 2016-01-30 20:27:52 +01:00
Matthew Honnibal 9410e74c92 * Switch parser to use nogil functions 2016-01-30 20:27:07 +01:00
Matthew Honnibal 10877a7791 * Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser 2016-01-30 14:31:36 +01:00