Roman Domrachev
|
3e21680814
|
Use safer method to get string without hit
|
2017-11-14 22:58:46 +03:00 |
Roman Domrachev
|
a33d5a068d
|
Try to hold origin data instead of restore it
|
2017-11-14 22:40:03 +03:00 |
Roman Domrachev
|
91e2fa6561
|
Clean all caches
|
2017-11-14 21:15:04 +03:00 |
Roman
|
caae77f72d
|
Update strings.pyx
|
2017-11-14 19:44:40 +03:00 |
Roman Domrachev
|
870defa815
|
Swap keys in proper place
Remove unnecessary clear of the hits
|
2017-11-14 17:56:30 +03:00 |
Roman Domrachev
|
86ca434c93
|
Merge github.com:explosion/spaCy
|
2017-11-14 17:46:22 +03:00 |
Roman Domrachev
|
a2745b0e84
|
StringStore now actually cleaned
Do not lose docs in ref tracking
|
2017-11-14 17:45:50 +03:00 |
Matthew Honnibal
|
c9251d79e3
|
Edit comment
|
2017-11-11 18:38:32 +01:00 |
Roman Domrachev
|
3c600adf23
|
Try to fix StringStore clean up (see #1506)
|
2017-11-11 03:11:27 +03:00 |
ines
|
d96e72f656
|
Tidy up rest
|
2017-10-27 21:07:59 +02:00 |
Matthew Honnibal
|
66e2eb8f39
|
Clean up remnant of frozen in StringStore
|
2017-10-16 19:34:41 +02:00 |
Matthew Honnibal
|
3e037054c8
|
Remove obsolete is_frozen functionality from StringStore
|
2017-10-16 19:23:10 +02:00 |
Matthew Honnibal
|
aefef6fd28
|
Prevent strings from being lost during from_disk and from_bytes
|
2017-08-19 22:42:17 +02:00 |
Matthew Honnibal
|
b1469d3360
|
Fix string serialisation
|
2017-05-31 13:43:44 +02:00 |
ines
|
414193e9ba
|
Update docs to reflect StringStore changes
|
2017-05-28 18:19:11 +02:00 |
Matthew Honnibal
|
7996d21717
|
Fixes for new StringStore
|
2017-05-28 11:09:27 -05:00 |
Matthew Honnibal
|
fe4a746300
|
Accomodate symbols in new string scheme
|
2017-05-28 13:03:16 +02:00 |
Matthew Honnibal
|
a5606c3eda
|
Work on changing StringStore to return hashes.
|
2017-05-28 12:36:27 +02:00 |
Matthew Honnibal
|
d8bb5bb959
|
Implement StringStore serialization, and update tests
|
2017-05-22 12:38:00 +02:00 |
ines
|
2c5cfe8bbf
|
Update docstrings and API docs for StringStore
|
2017-05-21 14:18:58 +02:00 |
ines
|
d24589aa72
|
Clean up imports, unused code, whitespace, docstrings
|
2017-04-15 12:05:47 +02:00 |
ines
|
561f2a3eb4
|
Use consistent formatting for docstrings
|
2017-04-15 11:59:21 +02:00 |
Matthew Honnibal
|
5de7e712b7
|
Add support for pickling StringStore.
|
2017-03-07 17:15:18 +01:00 |
Matthew Honnibal
|
62fc6b1afa
|
Use 32 bit hashes for OOV, re Issue #589, Issue #285
|
2016-11-01 13:27:13 +01:00 |
Matthew Honnibal
|
b86f8af0c1
|
Fix doc strings
|
2016-11-01 12:25:36 +01:00 |
Matthew Honnibal
|
b2d43b93d2
|
Fix Python 3 basestring error
|
2016-10-24 14:22:51 +02:00 |
Matthew Honnibal
|
d8134817ff
|
Workaround Issue #285: Allow the StringStore to be 'frozen', in which case strings will be pushed into an OOV map. We can then flush this OOV map, freeing all of the OOV strings.
|
2016-10-24 13:49:03 +02:00 |
Matthew Honnibal
|
ca32a1ab01
|
Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f .
|
2016-09-30 20:20:22 +02:00 |
Matthew Honnibal
|
de01e427fd
|
Revert "Changes to strings.pyx for new StringStore scheme"
This reverts commit 22d4752d64 .
|
2016-09-30 20:19:42 +02:00 |
Matthew Honnibal
|
22d4752d64
|
Changes to strings.pyx for new StringStore scheme
|
2016-09-30 19:58:09 +02:00 |
Matthew Honnibal
|
8423e8627f
|
Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good.
|
2016-09-30 10:14:47 +02:00 |
Henning Peters
|
6215272786
|
remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels
|
2016-04-12 11:28:07 +02:00 |
Stefan Behnel
|
f18805ee1c
|
make StringStore.__contains__() return True for the empty string (which is also contained in iteration)
|
2016-03-24 15:42:12 +01:00 |
Stefan Behnel
|
f2cfbfc412
|
remove internal redundancy and overhead from StringStore
|
2016-03-24 15:25:27 +01:00 |
Matthew Honnibal
|
963fe5258e
|
* Add missing __contains__ method to vocab
|
2016-03-08 15:49:10 +00:00 |
Henning Peters
|
b740f20191
|
hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2
|
2016-03-06 09:19:27 +01:00 |
Matthew Honnibal
|
3c162dcac3
|
* Refactor away from the _ml module, to use thinc 4.0. Still some work needs to be done, e.g. to add __reduce__ to the models, more testing, etc.
|
2015-11-07 03:24:30 +11:00 |
Matthew Honnibal
|
72abbb43fb
|
* Add type declarations in strings.pyx
|
2015-11-06 00:47:26 +11:00 |
Matthew Honnibal
|
b18204cd52
|
* Fix StringStore._realloc, re Issue #155
|
2015-11-05 11:28:26 +00:00 |
Matthew Honnibal
|
65934b7cd4
|
* Enforce import of ujson in strings.pyx, because otherwise it's too slow
|
2015-11-04 00:32:02 +11:00 |
Matthew Honnibal
|
2348a08481
|
* Load/dump strings with a json file, instead of the hacky strings file we were using.
|
2015-10-22 21:13:03 +11:00 |
Matthew Honnibal
|
0cee928467
|
* Allow StringStore to be pickled, to start addressing Issue #125
|
2015-10-13 13:44:41 +11:00 |
Matthew Honnibal
|
dfbcff2ff1
|
* Revert codecs/io change to strings.pyx, as it seemed to cause an error? Will investigate.
|
2015-10-10 15:54:55 +11:00 |
Matthew Honnibal
|
2153067958
|
* Fix use of io in strings.pyx
|
2015-10-10 15:03:12 +11:00 |
Matthew Honnibal
|
30de4135c9
|
* Fix merge problem
|
2015-10-10 14:22:32 +11:00 |
Matthew Honnibal
|
83dccf0fd7
|
* Use io module insteads of deprecated codecs module
|
2015-10-10 14:13:01 +11:00 |
alvations
|
8caedba42a
|
caught more codecs.open -> io.open
|
2015-09-30 20:20:09 +02:00 |
Matthew Honnibal
|
6f1743692a
|
* Work on language-independent refactoring
|
2015-08-23 20:49:18 +02:00 |
Matthew Honnibal
|
cad0cca4e3
|
* Tmp
|
2015-08-22 22:04:34 +02:00 |
Matthew Honnibal
|
d42fe2e694
|
* Add unicode_literals to strings.pyx
|
2015-07-28 16:15:53 +02:00 |