spaCy

Commit Graph

Author	SHA1	Message	Date
adrianeboyd	d107afcffb	Raise error for inplace resize with new vector dim (#5228 ) Raise an error if there is an attempt to resize the vectors in place with a different vector dimension.	2020-04-02 10:43:13 +02:00
adrianeboyd	963bd890c1	Modify Vector.resize to work with cupy and improve resizing (#5216 ) * Modify Vector.resize to work with cupy Modify `Vectors.resize` to work with cupy. Modify behavior when resizing to a different vector dimension so that individual vectors are truncated or extended with zeros instead of having the original values filled into the new shape without regard for the original axes. * Update spacy/tests/vocab_vectors/test_vectors.py Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com> Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-03-29 13:51:20 +02:00
adrianeboyd	d88a377bed	Remove Vectors.from_glove (#5209 )	2020-03-26 10:45:47 +01:00
adrianeboyd	0c47a53b5e	Use int only in key2row for better performance (#4990 ) Cast all keys and rows to `int` in `vectors.key2row` for more efficient access and serialization.	2020-02-16 17:19:41 +01:00
Matthew Honnibal	50f89cb85d	Make vectors.find() return keys in correct order (#4691 ) * Make vectors.find() return keys in correct order * Update spacy/vectors.pyx	2019-11-21 16:58:32 +01:00
Matthew Honnibal	9489c5f6b2	Clip most_similar to range [-1, 1] (fixes #4506 ) (#4507 ) * Clip most_similar to range [-1, 1] * Add/fix vectors tests * Fix test	2019-10-22 20:10:42 +02:00
Sofie Van Landeghem	d5d55312b2	prevent division by zero in most_similar method (#4488 )	2019-10-21 12:04:46 +02:00
Daniel King	e646956176	Most similar bug (#4446 ) * Add batch size indexing * Don't sort if n == 1 * Add test for most similar vectors issue * Change > to >=	2019-10-16 23:18:55 +02:00
adrianeboyd	d53a8d9313	Consider batch_size when sorting similar vectors (#4388 )	2019-10-07 13:38:35 +02:00
Ben Taylor	1db79a33cb	most_similar() return the k most similar vectors (#4364 ) * most_similar return n-most similar vectors * updated most_similar comment * add bintay contributor agreement * sign bintay contributor agreement * fix most_similar documentation typo * fixed error in prune_vectors * updated prune_vectors test	2019-10-03 14:09:44 +02:00
Ines Montani	da9a869d3f	Update vectors name docs [ci skip]	2019-09-26 16:21:32 +02:00
Ines Montani	2c5dd4d602	Update Vectors.find docs [ci skip]	2019-03-16 17:10:57 +01:00
Ines Montani	7ba3a5d95c	💫 Make serialization methods consistent (#3385 ) * Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields	2019-03-10 19:16:45 +01:00
Ines Montani	296446a1c8	Tidy up and improve docs and docstrings (#3370 ) <!--- Provide a general summary of your changes in the title. --> ## Description * tidy up and adjust Cython code to code style * improve docstrings and make calling `help()` nicer * add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects * fix various typos and inconsistencies in docs ### Types of change enhancement, docs ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2019-03-08 11:42:26 +01:00
Matthew Honnibal	449b889454	Fix KeyError in Vectors.most_similar. Fixes #2648	2018-12-10 16:19:18 +01:00
Matthew Honnibal	90aec6d2f6	Fix vectors for reserved words. Closes #2871	2018-12-10 16:09:49 +01:00
Ines Montani	f37863093a	💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003 ) Remove hacks and wrappers, keep code in sync across our libraries and move spaCy a few steps closer to only depending on packages with binary wheels 🎉 See here: https://github.com/explosion/srsly Serialization is hard, especially across Python versions and multiple platforms. After dealing with many subtle bugs over the years (encodings, locales, large files) our libraries like spaCy and Prodigy have steadily grown a number of utility functions to wrap the multiple serialization formats we need to support (especially json, msgpack and pickle). These wrapping functions ended up duplicated across our codebases, so we wanted to put them in one place. At the same time, we noticed that having a lot of small dependencies was making maintainence harder, and making installation slower. To solve this, we've made srsly standalone, by including the component packages directly within it. This way we can provide all the serialization utilities we need in a single binary wheel. srsly currently includes forks of the following packages: ujson msgpack msgpack-numpy cloudpickle * WIP: replace json/ujson with srsly * Replace ujson in examples Use regular json instead of srsly to make code easier to read and follow * Update requirements * Fix imports * Fix typos * Replace msgpack with srsly * Fix warning	2018-12-03 01:28:22 +01:00
Ines Montani	3141e04822	💫 New system for error messages and warnings (#2163 ) * Add spacy.errors module * Update deprecation and user warnings * Replace errors and asserts with new error message system * Remove redundant asserts * Fix whitespace * Add messages for print/util.prints statements * Fix typo * Fix typos * Move CLI messages to spacy.cli._messages * Add decorator to display error code with message An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc. * Remove unused link in spacy.about * Update errors for invalid pipeline components * Improve error for unknown factories * Add displaCy warnings * Update formatting consistency * Move error message to spacy.errors * Update errors and check if doc returned by component is None	2018-04-03 15:50:31 +02:00
Suraj Rajan	1cdbb7c97c	[2032] - Changed python set to cpp stl set (#2170 ) Changed python set to cpp stl set #2032 ## Description Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors. Reference : http://www.cplusplus.com/reference/set/set/ ### Types of change Enhancement for `Vectors` for faster initialising of word vectors(fasttext)	2018-03-31 13:28:25 +02:00
Ines Montani	a609a1ca29	Merge pull request #2152 from explosion/feature/tidy-up-dependencies 💫 Tidy up dependencies	2018-03-29 14:35:09 +02:00
Matthew Honnibal	8308bbc617	Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts	2018-03-29 00:14:55 +02:00
Matthew Honnibal	95a9615221	Fix loading of multiple pre-trained vectors This patch addresses #1660, which was caused by keying all pre-trained vectors with the same ID when telling Thinc how to refer to them. This meant that if multiple models were loaded that had pre-trained vectors, errors or incorrect behaviour resulted. The vectors class now includes a .name attribute, which defaults to: {nlp.meta['lang']_nlp.meta['name']}.vectors The vectors name is set in the cfg of the pipeline components under the key pretrained_vectors. This replaces the previous cfg key pretrained_dims. In order to make existing models compatible with this change, we check for the pretrained_dims key when loading models in from_disk and from_bytes, and add the cfg key pretrained_vectors if we find it.	2018-03-28 16:02:59 +02:00
Matthew Honnibal	8cefc58abc	Fix Vectors pickling	2018-03-14 16:59:37 +01:00
Claudiu-Vlad Ursache	e28de12cbd	Ensure files opened in `from_disk` are closed Fixes [issue 1706](https://github.com/explosion/spaCy/issues/1706).	2018-02-13 20:49:43 +01:00
Matthew Honnibal	29897ed1b3	Allow vector loading to work on 1d data files. Fixes #1831	2018-01-22 19:18:26 +01:00
Matthew Honnibal	1a1cca6052	Fix vectors.resize() on Py3. Closes #1539	2018-01-14 14:48:51 +01:00
Matthew Honnibal	36b47e3fa6	Fix (and test) vector pickling	2017-12-07 09:53:30 +01:00
Matthew Honnibal	b712de774e	Fix vectors pickling	2017-12-05 12:45:24 +01:00
Matthew Honnibal	a5ea0fdf5a	Fix #1518 : vocab.vectors.resize() didn't work	2017-11-08 22:18:37 +01:00
Matthew Honnibal	225cc249c9	Pass string path to numpy, to fix #1479	2017-11-05 14:42:46 +01:00
Matthew Honnibal	fdb4b8e456	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-11-01 02:07:17 +01:00
Matthew Honnibal	c48dd0e1d3	Fix vector pruning	2017-11-01 02:06:58 +01:00
ines	5683fd65ed	Update docstrings	2017-11-01 00:42:39 +01:00
Matthew Honnibal	c16310d156	Update vectors with find method	2017-11-01 00:34:55 +01:00
ines	2ad2f09d12	Update docstrings and simplify most_similar	2017-11-01 00:18:08 +01:00
ines	ba2e6c8c6f	Update docstrings and formatting	2017-10-31 23:23:34 +01:00
Matthew Honnibal	d90a22afe6	Fix loading previous vectors models	2017-10-31 19:58:35 +01:00
Matthew Honnibal	997a61557a	Add vectors.n_keys property	2017-10-31 19:30:52 +01:00
Matthew Honnibal	77d8f5de9a	Revise and simplify Vectors class	2017-10-31 18:25:08 +01:00
Matthew Honnibal	9c11ee4a1c	WIP on vectors fixes	2017-10-31 11:22:56 +01:00
Matthew Honnibal	368fdb389a	WIP on refactoring and fixing vectors	2017-10-31 02:00:26 +01:00
Matthew Honnibal	4112a991ec	Fix vector pruning	2017-10-30 19:44:40 +01:00
Explosion Bot	d0cf12c8c7	Fix off-by-one error in vectors	2017-10-30 16:22:03 +01:00
Explosion Bot	ab5d5ed880	Fix vectors.add()	2017-10-30 16:08:09 +01:00
Explosion Bot	72aea8f105	Update vectors.add() to allow setting keys to rows	2017-10-30 10:03:08 +01:00
ines	5167a0cce2	Tidy up Vectors and docs	2017-10-27 19:45:19 +02:00
Matthew Honnibal	cfae54c507	Make change to Vectors.__init__	2017-10-20 14:19:04 +02:00
Matthew Honnibal	92ac9316b5	Fix initialization of vectors, to address serialization problem	2017-10-20 13:59:24 +02:00
Matthew Honnibal	df488274b1	Fix deserialization of vectors	2017-10-16 20:55:00 +02:00
Matthew Honnibal	d90cc917fa	Merge vectors.pyx doc strings	2017-10-01 17:05:54 -05:00

1 2

65 Commits