Commit Graph

72 Commits

Author SHA1 Message Date
Roman Domrachev 3e21680814 Use safer method to get string without hit 2017-11-14 22:58:46 +03:00
Roman Domrachev a33d5a068d Try to hold origin data instead of restore it 2017-11-14 22:40:03 +03:00
Roman Domrachev 91e2fa6561 Clean all caches 2017-11-14 21:15:04 +03:00
Roman caae77f72d
Update strings.pyx 2017-11-14 19:44:40 +03:00
Roman Domrachev 870defa815 Swap keys in proper place
Remove unnecessary clear of the hits
2017-11-14 17:56:30 +03:00
Roman Domrachev 86ca434c93 Merge github.com:explosion/spaCy 2017-11-14 17:46:22 +03:00
Roman Domrachev a2745b0e84 StringStore now actually cleaned
Do not lose docs in ref tracking
2017-11-14 17:45:50 +03:00
Matthew Honnibal c9251d79e3
Edit comment 2017-11-11 18:38:32 +01:00
Roman Domrachev 3c600adf23 Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
Matthew Honnibal 66e2eb8f39 Clean up remnant of frozen in StringStore 2017-10-16 19:34:41 +02:00
Matthew Honnibal 3e037054c8 Remove obsolete is_frozen functionality from StringStore 2017-10-16 19:23:10 +02:00
Matthew Honnibal aefef6fd28 Prevent strings from being lost during from_disk and from_bytes 2017-08-19 22:42:17 +02:00
Matthew Honnibal b1469d3360 Fix string serialisation 2017-05-31 13:43:44 +02:00
ines 414193e9ba Update docs to reflect StringStore changes 2017-05-28 18:19:11 +02:00
Matthew Honnibal 7996d21717 Fixes for new StringStore 2017-05-28 11:09:27 -05:00
Matthew Honnibal fe4a746300 Accomodate symbols in new string scheme 2017-05-28 13:03:16 +02:00
Matthew Honnibal a5606c3eda Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
Matthew Honnibal d8bb5bb959 Implement StringStore serialization, and update tests 2017-05-22 12:38:00 +02:00
ines 2c5cfe8bbf Update docstrings and API docs for StringStore 2017-05-21 14:18:58 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal 5de7e712b7 Add support for pickling StringStore. 2017-03-07 17:15:18 +01:00
Matthew Honnibal 62fc6b1afa Use 32 bit hashes for OOV, re Issue #589, Issue #285 2016-11-01 13:27:13 +01:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal b2d43b93d2 Fix Python 3 basestring error 2016-10-24 14:22:51 +02:00
Matthew Honnibal d8134817ff Workaround Issue #285: Allow the StringStore to be 'frozen', in which case strings will be pushed into an OOV map. We can then flush this OOV map, freeing all of the OOV strings. 2016-10-24 13:49:03 +02:00
Matthew Honnibal ca32a1ab01 Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f.
2016-09-30 20:20:22 +02:00
Matthew Honnibal de01e427fd Revert "Changes to strings.pyx for new StringStore scheme"
This reverts commit 22d4752d64.
2016-09-30 20:19:42 +02:00
Matthew Honnibal 22d4752d64 Changes to strings.pyx for new StringStore scheme 2016-09-30 19:58:09 +02:00
Matthew Honnibal 8423e8627f Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good. 2016-09-30 10:14:47 +02:00
Henning Peters 6215272786 remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels 2016-04-12 11:28:07 +02:00
Stefan Behnel f18805ee1c make StringStore.__contains__() return True for the empty string (which is also contained in iteration) 2016-03-24 15:42:12 +01:00
Stefan Behnel f2cfbfc412 remove internal redundancy and overhead from StringStore 2016-03-24 15:25:27 +01:00
Matthew Honnibal 963fe5258e * Add missing __contains__ method to vocab 2016-03-08 15:49:10 +00:00
Henning Peters b740f20191 hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2 2016-03-06 09:19:27 +01:00
Matthew Honnibal 3c162dcac3 * Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc. 2015-11-07 03:24:30 +11:00
Matthew Honnibal 72abbb43fb * Add type declarations in strings.pyx 2015-11-06 00:47:26 +11:00
Matthew Honnibal b18204cd52 * Fix StringStore._realloc, re Issue #155 2015-11-05 11:28:26 +00:00
Matthew Honnibal 65934b7cd4 * Enforce import of ujson in strings.pyx, because otherwise it's too slow 2015-11-04 00:32:02 +11:00
Matthew Honnibal 2348a08481 * Load/dump strings with a json file, instead of the hacky strings file we were using. 2015-10-22 21:13:03 +11:00
Matthew Honnibal 0cee928467 * Allow StringStore to be pickled, to start addressing Issue #125 2015-10-13 13:44:41 +11:00
Matthew Honnibal dfbcff2ff1 * Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate. 2015-10-10 15:54:55 +11:00
Matthew Honnibal 2153067958 * Fix use of io in strings.pyx 2015-10-10 15:03:12 +11:00
Matthew Honnibal 30de4135c9 * Fix merge problem 2015-10-10 14:22:32 +11:00
Matthew Honnibal 83dccf0fd7 * Use io module insteads of deprecated codecs module 2015-10-10 14:13:01 +11:00
alvations 8caedba42a caught more codecs.open -> io.open 2015-09-30 20:20:09 +02:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal d42fe2e694 * Add unicode_literals to strings.pyx 2015-07-28 16:15:53 +02:00