Commit Graph

147 Commits

Author SHA1 Message Date
Wolfgang Seeker 690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker 3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Wolfgang Seeker 56b7210e82 moved nonproj.py to syntax/nonproj.pyx 2016-02-25 15:08:49 +01:00
Matthew Honnibal 1b41f868d2 * Check for errors in parser, and parallelise the left-over batch 2016-02-06 10:06:30 +01:00
Matthew Honnibal 165ca28b80 * Set is_parsed flag in Parser.pipe 2016-02-05 19:51:44 +01:00
Matthew Honnibal bdd579db0a * Set is_parsed flag in Parser.pipe 2016-02-05 19:50:11 +01:00
Matthew Honnibal b04c9aad71 * Fix off-by-one in Parser.pipe 2016-02-05 19:37:50 +01:00
Matthew Honnibal 048dfe35aa * cimport cython.parallel 2016-02-05 12:20:42 +01:00
Matthew Honnibal 8a13cebdcc * Update for modified thinc interface 2016-02-05 11:44:39 +01:00
Matthew Honnibal 84b247ef83 * Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading. 2016-02-03 02:10:58 +01:00
Matthew Honnibal b3802562d6 Merge branch 'rethinc2' of https://github.com/honnibal/spaCy into rethinc2 2016-02-01 08:59:24 +01:00
Matthew Honnibal 4b08a3fafd * Fix merge conflict 2016-02-01 08:58:18 +01:00
Matthew Honnibal 5188f6d9d8 * Fix parseC function 2016-02-01 08:48:48 +01:00
Matthew Honnibal bcf8f7ba40 * Add a parse_batch method to Parser, that releases the GIL around a batch of documents. 2016-02-01 08:34:55 +01:00
Matthew Honnibal 490ba65398 * Use openmp in parser 2016-02-01 03:08:42 +01:00
Matthew Honnibal 28e5ad62bc * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 03:00:15 +01:00
Matthew Honnibal a47f00901b * Pass a StateC pointer into the transition and validation methods in the parser, so that the GIL can be released over a batch of documents 2016-02-01 02:58:14 +01:00
Matthew Honnibal daaad66448 * Now fully proxied 2016-02-01 02:37:08 +01:00
Matthew Honnibal 7a0e3bb9c1 * Continue proxying. Some problem currently 2016-02-01 02:22:21 +01:00
Matthew Honnibal 9410e74c92 * Switch parser to use nogil functions 2016-01-30 20:27:07 +01:00
Matthew Honnibal 10877a7791 * Update for thinc 5.0, including changing cost from int to weight_t, and updating the tagger and parser 2016-01-30 14:31:36 +01:00
Matthew Honnibal 84c5dfbfc3 * Clean up debugging python list 2016-01-19 20:10:32 +01:00
Matthew Honnibal 65c5bc4988 * Add add_label method, to allow users to register new entity types and dependency labels. 2016-01-19 19:11:02 +01:00
Matthew Honnibal 3dc398b727 * Fix merge conflict in requirements.txt 2016-01-16 16:20:49 +01:00
Matthew Honnibal c025a0c64b * Check for KeyboardInerrupt in parser.__call__ 2016-01-16 16:18:44 +01:00
Matthew Honnibal aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Matthew Honnibal 6f47074214 * Make constructor of ParserModel and TaggerModel the same as AveragedPerceptron, for each pickling. 2015-11-07 18:25:17 +11:00
Matthew Honnibal 888c05a7fa * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 11:02:44 +11:00
Matthew Honnibal fc2185bfe3 * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 10:48:31 +11:00
Matthew Honnibal 954442a807 * Fix variable naming in StepwiseState, for thinc 4.0 2015-11-07 10:30:45 +11:00
Matthew Honnibal 19136b0e7d * Add better debug message for illegal move 2015-11-07 05:34:37 +11:00
Matthew Honnibal 3c162dcac3 * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
Matthew Honnibal b9991fbd20 * Update to use thinc 3.0 2015-11-06 00:25:59 +11:00
Matthew Honnibal 68f479e821 * Rename Doc.data to Doc.c 2015-11-04 00:15:14 +11:00
Matthew Honnibal 20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal 86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal 5edac11225 * Wrap self.parse in nogil, and break if an invalid move is predicted. The invalid break is a work-around that papers over likely bugs, but we can't easily break in the nogil block, and otherwise we'll get an infinite loop. Need to set this as an error flag. 2015-09-06 04:15:00 +02:00
Matthew Honnibal a3d5e6c0dd * Reform constructor and save/load workflow in parser model 2015-08-26 19:19:01 +02:00
Matthew Honnibal bf38b3b883 * Hack on l/r reversal bug 2015-08-10 05:58:43 +02:00
Matthew Honnibal 6116413b47 * Fix label prediction in StepwiseState 2015-08-10 05:05:31 +02:00
Matthew Honnibal 9de98f5a6f * Add Parser.stepthrough method, with context manager 2015-08-10 00:08:46 +02:00
Matthew Honnibal 9c090945e0 * Add Parser.predict method, and clean up Parser.get_state 2015-08-09 02:29:58 +02:00
Matthew Honnibal 04fccfb984 * Fix get_state for parser prediction 2015-08-09 02:11:22 +02:00
Matthew Honnibal 55fde0e240 * Fix get_state 2015-08-09 01:45:30 +02:00
Matthew Honnibal f0f4fa9838 * Fix Parser.get_state 2015-08-09 01:40:13 +02:00
Matthew Honnibal 18331dca89 * Add continue_for argument to parser 'partial' function, which is now renamed to get_state 2015-08-09 01:31:54 +02:00
Matthew Honnibal 9de218b7ba * Fix Parser.partial function 2015-08-08 23:45:18 +02:00
Matthew Honnibal 3af938365f * Add function partial to Parser 2015-08-08 23:32:15 +02:00
Matthew Honnibal 823ef4a00b * Remove profile declarations 2015-07-25 18:13:06 +02:00
Matthew Honnibal aa28e2e01d * Release the GIL around parse function 2015-07-24 04:53:27 +02:00