Commit Graph

180 Commits

Author SHA1 Message Date
Matthew Honnibal add9a33782 Return False for vocab.has_vector 2017-06-04 14:26:14 -05:00
ines 05fe6758a7 Set lexeme attributes for tokenizer special cases 2017-06-03 19:44:39 +02:00
ines 41a6adf1f6 Initialise Vocab length correctly 2017-06-02 10:57:25 +02:00
ines 53b82f972a Add strings to Vocab in init, instead of StringStore 2017-06-02 10:57:06 +02:00
ines 023f38bdd4 Fix return value of Vocab.from_bytes 2017-06-02 10:56:40 +02:00
Matthew Honnibal 307d615c5f Fix serialization for tagger when tag_map has changed 2017-06-01 12:18:36 -05:00
Matthew Honnibal 9805e0e369 Fix vocab pickling 2017-05-31 08:25:01 -05:00
Matthew Honnibal a131981f3b Work on vectors 2017-05-30 23:34:50 +02:00
Matthew Honnibal 9bf22a94aa Fix tag set serialisation 2017-05-29 17:52:36 -05:00
Matthew Honnibal 920887f4e4 Specify order of vocab deserialization 2017-05-29 13:04:40 +02:00
Matthew Honnibal 6b019b0540 Update to/from bytes methods 2017-05-29 10:14:20 +02:00
Matthew Honnibal 6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
Matthew Honnibal 2edd96ce47 Draft Vocab to/from disk/bytes 2017-05-28 23:34:12 +02:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal fe4a746300 Accomodate symbols in new string scheme 2017-05-28 13:03:16 +02:00
Matthew Honnibal a5606c3eda Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
Matthew Honnibal 39293ab2ee Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-28 11:46:57 +02:00
Matthew Honnibal 15f6efc127 Remove vectors from vocab 2017-05-28 11:45:32 +02:00
ines c8543c8237 Fix formatting and docstrings and remove deprecated function 2017-05-28 00:22:40 +02:00
ines 251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 9e167b7bb6 Strip serializer from code 2017-05-09 17:28:50 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal d013aba7b5 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 18:30:53 +01:00
Matthew Honnibal 854cfce7cf Make vocabs more compatible across versions
Previously, symbols were inserted into the string-store
before strings were loaded. This meant that adding a symbol
would invalidate saved models. We now make sure that strings
are loaded faithfully, so that compatibility is maintained.
2017-03-17 18:29:04 +01:00
Matthew Honnibal 1cc841e600 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-17 08:18:11 -05:00
Matthew Honnibal 4bfc55b532 Auto-add words to vocab when loading vectors
When calling vocab.load_vectors_from_bin_loc, ensure that missing
entries are added to the vocab. Otherwise, loading vectors into an
empty vocab object resulted in no vectors being added.
2017-03-17 08:15:59 -05:00
Matthew Honnibal 4382f175b3 Squelch compiler warnings 2017-03-11 12:44:43 -06:00
Matthew Honnibal d814892805 Hackish pickle support for Vocab. 2017-03-07 20:25:12 +01:00
ines aa92d4e9b5 Fix unicode regex for Python 2 (see #834) 2017-02-16 23:49:54 +01:00
ines 85d249d451 Revert "Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)""
This reverts commit ea05f78660.
2017-02-16 23:26:25 +01:00
ines ea05f78660 Revert "Merge pull request #836 from raphael0202/load_vectors (closes #834)"
This reverts commit 7d8c9eee7f, reversing
changes made to f6b69babcc.
2017-02-16 15:27:12 +01:00
Raphaël Bournhonesque e17dc2db75 Remove useless import 2017-02-16 12:10:24 +01:00
Raphaël Bournhonesque 3fd2742649 load_vectors should accept arbitrary space characters as word tokens
Fix bug  #834
2017-02-16 12:08:30 +01:00
Daniel Hershcovich 99eb494a82 Fix #737: support loading word vectors with " " as a word 2017-01-12 17:00:14 +02:00
Daniel Hershcovich 8e603cc917 Avoid "True if ... else False" 2017-01-11 11:18:22 +02:00
Matthew Honnibal cade536d1e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-27 21:04:10 +01:00
Matthew Honnibal ce4539dafd Allow the vocabulary to grow to 10,000, to prevent cold-start problem. 2016-12-27 21:03:45 +01:00
Ines Montani 8978806ea6 Allow Vocab to load without serializer_freqs 2016-12-21 18:05:23 +01:00
Ines Montani be8ed811f6 Remove trailing whitespace 2016-12-21 18:04:41 +01:00
Matthew Honnibal 6ee1df93c5 Set tag_map to None if it's not seen in the data by vocab 2016-12-18 16:51:10 +01:00
Matthew Honnibal 1e0f566d95 Fix #656, #624: Support arbitrary token attributes when adding special-case rules. 2016-11-25 12:43:24 +01:00
Matthew Honnibal f123f92e0c Fix #617: Vocab.load() required Path. Should work with string as well. 2016-11-10 22:48:48 +01:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal 6036ec7c77 Fix vector norm when loading lexemes. 2016-10-23 19:40:18 +02:00