Commit Graph

3081 Commits

Author SHA1 Message Date
Matthew Honnibal 99de44d864 Changes to Doc and Token for new string store scheme 2016-09-30 20:00:21 +02:00
Matthew Honnibal 78f19baafa Fix report of ParserStateError 2016-09-30 19:59:22 +02:00
Matthew Honnibal 0442e0ab1e Changes to transition systems for new StringStore scheme 2016-09-30 19:58:51 +02:00
Matthew Honnibal 22d4752d64 Changes to strings.pyx for new StringStore scheme 2016-09-30 19:58:09 +02:00
Matthew Honnibal 4f794b215a Changes to iterators.pyx for new StringStore scheme 2016-09-30 19:57:49 +02:00
Matthew Honnibal 95f8cfd745 Changes to morphology.pyx for new StringStore scheme 2016-09-30 19:57:10 +02:00
Matthew Honnibal 3ff09614e0 Changes to matcher.pyx for new StringStore scheme 2016-09-30 19:56:48 +02:00
Matthew Honnibal eceeaefe53 Fix defaults for Parser and Entity, adding a blank= argument. 2016-09-30 19:56:06 +02:00
Matthew Honnibal d61feffe24 Require new preshed 2016-09-30 18:41:01 +02:00
Matthew Honnibal 8423e8627f Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good. 2016-09-30 10:14:47 +02:00
Matthew Honnibal d3dc5718b2 Fix syntax error in Doc 2016-09-28 11:39:49 +02:00
Matthew Honnibal 1b520e7bab Improve docstrings for Doc object 2016-09-28 11:15:13 +02:00
Matthew Honnibal 81a47c01d8 Fix test for empty sentence string. 2016-09-27 19:21:22 +02:00
Matthew Honnibal 4cbf0d3bb6 Handle errors when no valid actions are available, pointing users to the issue tracker. 2016-09-27 19:19:53 +02:00
Matthew Honnibal 430473bd98 Raise errors when no actions are available, re Issue #429 2016-09-27 19:09:37 +02:00
Matthew Honnibal fc4a7ad794 Test and fix Issue #411: IndexError when .sents property is used on empty string. 2016-09-27 18:49:14 +02:00
Matthew Honnibal 3d370b7d45 Add test for Issue #445, fixed in 3cb4d455d, with improved lemmatizer logic 2016-09-27 18:39:46 +02:00
Matthew Honnibal a2f3510d6d Fix lemmatizer 2016-09-27 17:47:05 +02:00
Matthew Honnibal 07776d8096 Fix pos name conflict in lemmatize 2016-09-27 17:35:58 +02:00
Matthew Honnibal 35cd953f9e Fix pos name conflict with morphology 2016-09-27 14:16:22 +02:00
Matthew Honnibal 8e7df3c4ca Expect the parser data, if parser.load() is called. 2016-09-27 14:02:12 +02:00
Matthew Honnibal bb4f201ad2 Pass morphological features from tag map into the lemmatizer. 2016-09-27 14:01:43 +02:00
Matthew Honnibal 40509e8bca Tweak the new is_base_form logic, because we can expect the 'pos' key in the morphology we're passed. 2016-09-27 14:01:16 +02:00
Matthew Honnibal 9c8ac91d72 Add test for Issue #435 2016-09-27 13:52:38 +02:00
Matthew Honnibal 3cb4d455d2 Pass lemmatizer morphological features, so that rules are sensitive to base/inflected distinction, which is how the WordNet data is designed. See Issue #435 2016-09-27 13:52:11 +02:00
Matthew Honnibal e233328d38 Fix Issue #371: Lexeme objects were unhashable. 2016-09-27 13:22:30 +02:00
Matthew Honnibal e382e48d9f Temporarily patch handling of defaul templates for tagger. Need to move these to language_data. 2016-09-27 13:21:28 +02:00
Matthew Honnibal a44763af0e Fix Issue #469: Incorrectly cased root label in noun chunk iterator 2016-09-27 13:13:01 +02:00
Matthew Honnibal b14b9b096b Return None if /deps directory not present, instead of trying to load the parser. 2016-09-26 18:48:03 +02:00
Matthew Honnibal e07b9665f7 Don't expect parser model 2016-09-26 18:09:33 +02:00
Matthew Honnibal ee6fa106da Fix parser features 2016-09-26 17:57:32 +02:00
Matthew Honnibal e607e4b598 Fix parser loading 2016-09-26 17:51:11 +02:00
Matthew Honnibal 0b2d7ae9d6 Fix Entity creation 2016-09-26 15:41:22 +02:00
Matthew Honnibal 2debc4e0a2 Add .blank() method to Parser. Start housing default dep labels and entity types within the Defaults class. 2016-09-26 11:57:54 +02:00
Matthew Honnibal 722199acb8 Add spacy.blank() method, that doesn't load data. Don't try to load data if path is falsey 2016-09-26 11:07:46 +02:00
Matthew Honnibal ae202e7a60 Fix init_model.py 2016-09-25 15:58:51 +02:00
Matthew Honnibal e56653f848 Add language data for German 2016-09-25 15:44:45 +02:00
Matthew Honnibal 7db956133e Move tokenizer data for German into spacy.de.language_data 2016-09-25 15:37:33 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal d7e9acdcdf Add English language data, so that the tokenizer doesn't require the data download 2016-09-25 14:49:00 +02:00
Matthew Honnibal 82b8cc5efb Whitespace 2016-09-24 22:17:01 +02:00
Matthew Honnibal fd58f7655a Python 3 compatible basestring 2016-09-24 22:16:43 +02:00
Matthew Honnibal 082e95b19e Python 3 compatible basestring 2016-09-24 22:09:21 +02:00
Matthew Honnibal f19af6cb2c Python 3 compatible basestring 2016-09-24 22:08:43 +02:00
Matthew Honnibal 3ed4cdfe32 Handle pathlib.Path objects in CFile 2016-09-24 22:01:46 +02:00
Matthew Honnibal df88690177 Fix encoding of path variable 2016-09-24 21:13:15 +02:00
Matthew Honnibal af847e07fc Fix usage of pathlib for Python3 -- turning paths to strings. 2016-09-24 21:05:27 +02:00
Matthew Honnibal 453683aaf0 Fix spacy/vocab.pyx 2016-09-24 20:50:31 +02:00
Matthew Honnibal d310dc73ef Fix bin/init_model.py after refactoring 2016-09-24 20:38:18 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00