Commit Graph

98 Commits

Author SHA1 Message Date
Matthew Honnibal 2fd97c71cc Revert "Don't try to pickle matcher."
This reverts commit 97bd0c9d00.
2016-10-17 16:49:43 +02:00
Matthew Honnibal 97bd0c9d00 Don't try to pickle matcher. 2016-10-17 16:38:40 +02:00
Matthew Honnibal 049c937540 Have the matcher return character offsets, to handle the match better. 2016-10-17 15:58:57 +02:00
Matthew Honnibal 6cbdc94959 Lots of updates to Matcher, to make entity handling sane. 2016-10-17 15:23:31 +02:00
Matthew Honnibal 90baa9c7e6 Revert "Changes to matcher.pyx for new StringStore scheme"
This reverts commit 3ff09614e0.
2016-09-30 20:20:13 +02:00
Matthew Honnibal 3ff09614e0 Changes to matcher.pyx for new StringStore scheme 2016-09-30 19:56:48 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal 83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal eaf4065480 Expose the _patterns private member 2016-09-24 11:20:42 +02:00
Matthew Honnibal 55f1f7edaf Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.* 2016-09-24 01:16:45 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal 67ce96c9c9 * Make patterns argument to Matcher class optional 2016-04-17 21:32:24 +02:00
Wolfgang Seeker e6945c4d0e bugfix: uppercase attr values before looking them up 2016-04-15 15:46:31 +02:00
Matthew Honnibal 108aca0e50 * Make Matcher use attrs from the attrs.pyx file, rather than having an incomplete function doing the mapping. 2016-04-14 10:37:39 +02:00
Matthew Honnibal 7119e77fb6 * Fix Matcher.pipe 2016-02-05 19:46:02 +01:00
Matthew Honnibal 1ef84a0557 * Merge master into rethinc2 2016-02-05 12:55:59 +01:00
Matthew Honnibal 9703ccc3de * Remove unused import 2016-02-04 13:04:33 +01:00
Matthew Honnibal 84b247ef83 * Add a .pipe method, that takes a stream of input, operates on it, and streams the output. Internally, the stream may be buffered, to allow multi-threading. 2016-02-03 02:10:58 +01:00
Henning Peters 235f094534 untangle data_path/via 2016-01-16 12:23:45 +01:00
Henning Peters 846fa49b2a distinct load() and from_package() methods 2016-01-16 10:00:57 +01:00
Henning Peters 788f734513 refactored data_dir->via, add zip_safe, add spacy.load() 2016-01-15 18:01:02 +01:00
Henning Peters bc229790ac integrate with sputnik 2016-01-13 19:46:17 +01:00
Matthew Honnibal eaf2ad59f1 * Fix use of mock Package object 2015-12-31 04:13:15 +01:00
Matthew Honnibal 029136a007 * Fix resource loading for Matcher 2015-12-31 02:45:12 +01:00
Matthew Honnibal a6ba43ecaf * Fix errors in packaging revision 2015-12-29 18:37:26 +01:00
Matthew Honnibal aec130af56 Use util.Package class for io
Previous Sputnik integration caused API change: Vocab, Tagger, etc
were loaded via a from_package classmethod, that required a
sputnik.Package instance. This forced users to first create a
sputnik.Sputnik() instance, in order to acquire a Package via
sp.pool().

Instead I've created a small file-system shim, util.Package, which
allows classes to have a .load() classmethod, that accepts either
util.Package objects, or strings. We can later gut the internals
of this and make it a proxy for Sputnik if we need more functionality
that should live in the Sputnik library.

Sputnik is now only used to download and install the data, in
spacy.en.download
2015-12-29 18:00:48 +01:00
Henning Peters 8359bd4d93 strip data/ from package, friendlier Language invocation, make data_dir backward/forward-compatible 2015-12-18 09:52:55 +01:00
Henning Peters 9027cef3bc access model via sputnik 2015-12-07 06:01:28 +01:00
Matthew Honnibal 68f479e821 * Rename Doc.data to Doc.c 2015-11-04 00:15:14 +11:00
Matthew Honnibal 6727a46bb5 * Fix Issue #118: Matcher behaves unpredictably when matches overlap. 2015-10-19 16:45:32 +11:00
Matthew Honnibal c99285b8b9 * Clean up C++ usage in spacy/matcher.pyx 2015-10-18 17:20:50 +11:00
Matthew Honnibal 20fd36a0f7 * Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125: allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve. 2015-10-13 13:44:41 +11:00
Matthew Honnibal 85ce36ab11 * Refactor symbols, so that frequency rank can be derived from the orth id of a word. 2015-10-13 13:44:39 +11:00
Matthew Honnibal 801d55a6d9 * Fix phrase matcher 2015-10-09 02:00:45 +11:00
Matthew Honnibal 1def5a6cbe * Fix print statements in matcher 2015-09-08 15:38:19 +02:00
Matthew Honnibal 86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal d2fc104a26 * Begin merge of Gazetteer and DE branches 2015-09-06 19:45:15 +02:00
Matthew Honnibal 6427a3fcac * Temporarily import flag attributes in matcher 2015-09-06 17:53:12 +02:00
Matthew Honnibal 430affc347 * Fix missing n_patterns property in Matcher class. Fix from_dir method 2015-08-26 19:17:02 +02:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal 9f65879991 * Fix shape attr bug, and fix handling of false positive matches 2015-08-06 17:28:14 +02:00
Matthew Honnibal 383dfabd67 * Fix matcher setting of entities 2015-08-06 16:27:01 +02:00
Matthew Honnibal cd7d1682cd * Fix loading of gazetteer.json file 2015-08-06 16:08:25 +02:00
Matthew Honnibal 5737115e1e * Work on gazetteer matching 2015-08-06 14:33:21 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal 5bc0e83f9a * Reimplement matching in Cython, instead of Python. 2015-08-05 01:05:54 +02:00
Matthew Honnibal 4c87a696b3 * Add draft dfa matcher, in Python. Passing tests. 2015-08-04 15:55:28 +02:00