spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	4e16f9e435	* Move tests underneath spacy/	2015-10-26 00:07:31 +11:00
Matthew Honnibal	3a6e48e814	Merge pull request #149 from chrisdubois/pickle-patch Add __reduce__ to Tokenizer so that English pickles.	2015-10-25 15:30:31 +11:00
Chris DuBois	dac8fe7bdb	Add __reduce__ to Tokenizer so that English pickles. - Add tests to test_pickle and test_tokenizer that save to tempfiles.	2015-10-23 22:24:03 -07:00
Matthew Honnibal	ff4fe524ee	* Fix exception for python 2	2015-10-23 01:56:13 +02:00
Matthew Honnibal	341a3e85cd	* Upd downloaded data version	2015-10-23 00:56:57 +02:00
Matthew Honnibal	f18fd8c659	* Fix language.py for change in StringStore load API	2015-10-23 03:48:12 +11:00
Matthew Honnibal	23855db3ca	Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop	2015-10-23 03:46:09 +11:00
Matthew Honnibal	4f13849065	Merge pull request #145 from henningpeters/master better error reporting, cleanup	2015-10-23 03:45:47 +11:00
Matthew Honnibal	3be94be0c0	Merge pull request #148 from maxirmx/master Utf8 encoding for lemma_rules.json	2015-10-22 21:46:28 +11:00
Matthew Honnibal	c86bda8d1a	* Fix import of uget	2015-10-22 21:13:56 +11:00
Matthew Honnibal	2348a08481	* Load/dump strings with a json file, instead of the hacky strings file we were using.	2015-10-22 21:13:03 +11:00
Matthew Honnibal	9baf0abd59	* Save vocab after training.	2015-10-22 21:09:14 +11:00
maxirmx	f07e4accd7	Fixing encoding issue #4	2015-10-21 20:45:56 +03:00
maxirmx	fcbfff043f	Fixing encoding issue #3	2015-10-21 15:52:34 +03:00
maxirmx	fe9d2e2c4e	Fixing encode issue #2	2015-10-21 15:36:21 +03:00
maxirmx	e4a1726f77	Fixing encoding issue UTF-8	2015-10-21 14:16:37 +03:00
Andreas Grivas	93ada458e2	added __repr__ that prints text in ipython for doc, token, and span objects	2015-10-21 14:11:46 +03:00
Henning Peters	ccffd2ef53	fixed extract directory	2015-10-21 07:59:34 +02:00
Henning Peters	da4c9cee06	assert filename match	2015-10-20 19:33:59 +02:00
Henning Peters	4f703f0cb4	better error reporting, cleanup	2015-10-20 19:11:29 +02:00
Matthew Honnibal	9cdea6e450	* Import uget correctly	2015-10-19 08:32:41 +02:00
Matthew Honnibal	6727a46bb5	* Fix Issue #118 : Matcher behaves unpredictably when matches overlap.	2015-10-19 16:45:32 +11:00
Matthew Honnibal	135062d23c	* Fix error with merged text when merged region did not have trailing whitespace	2015-10-19 15:47:04 +11:00
Henning Peters	bfde91fa49	add custom download tool (uget), replace wget with uget	2015-10-18 12:35:04 +02:00
Matthew Honnibal	9839cd2c0b	* Fix whitespace_ calculation in Token	2015-10-18 17:21:11 +11:00
Matthew Honnibal	c99285b8b9	* Clean up C++ usage in spacy/matcher.pyx	2015-10-18 17:20:50 +11:00
Matthew Honnibal	a7e6c5ac8f	* Fix Issue #122 : Incorrect calculation of children after Doc.merge()	2015-10-18 17:17:27 +11:00
Matthew Honnibal	3ba66f2dc7	* Add string length cap in Tokenizer.__call__	2015-10-16 04:54:16 +11:00
Matthew Honnibal	6e0f985afc	* Fix token.conjuncts	2015-10-15 03:49:45 +11:00
Matthew Honnibal	2e0104ac81	* Fix token.conjuncts	2015-10-15 03:47:45 +11:00
Matthew Honnibal	b8f3345a82	* Fix token.conjuncts method	2015-10-15 03:36:01 +11:00
Matthew Honnibal	23818f89b8	* Fix token.conjuncts method	2015-10-15 03:34:57 +11:00
Matthew Honnibal	7a15d1b60c	* Add Python 2/3 compatibility fix for copy_reg	2015-10-13 20:04:40 +11:00
Matthew Honnibal	329ae57520	* Fix whitespace attachment thing	2015-10-13 09:46:38 +02:00
Matthew Honnibal	37919eac82	* Fix whitespace attachment in simpler way. Leaves problem with setting left/right children.	2015-10-13 18:23:24 +11:00
Matthew Honnibal	c70eb776ae	* Fix whitespace attachment, so that left/right children are consistent with head.	2015-10-13 15:58:22 +11:00
Matthew Honnibal	531182f937	* Fix Model.__reduce__	2015-10-13 15:14:38 +11:00
Matthew Honnibal	6c227a6c1f	* Fix Model.__reduce__	2015-10-13 15:10:04 +11:00
Matthew Honnibal	358c82595c	* Fix NAMES list in spacy/parts_of_speech.pyx	2015-10-13 14:18:45 +11:00
Matthew Honnibal	c1fdc487bc	Merge branch 'attrs'	2015-10-13 14:03:41 +11:00
Matthew Honnibal	e886e6a406	* Inc version	2015-10-13 13:46:17 +11:00
Matthew Honnibal	20fd36a0f7	* Very scrappy, likely buggy first-cut pickle implementation, to work on Issue #125 : allow pickle for Apache Spark. The current implementation sends stuff to temp files, and does almost nothing to ensure all modifiable state is actually preserved. The Language() instance is a deep tree of extension objects, and if pickling during training, some of the C-data state is hard to preserve.	2015-10-13 13:44:41 +11:00
Matthew Honnibal	f8de403483	* Work on pickling Vocab instances. The current implementation is not correct, but it may serve to see whether this approach is workable. Pickling is necessary to address Issue #125	2015-10-13 13:44:41 +11:00
Matthew Honnibal	85e7944572	* Start trying to pickle Vocab	2015-10-13 13:44:41 +11:00
Matthew Honnibal	5ca57bd859	* Ensure Morphology can be pickled, to address Issue #125 .	2015-10-13 13:44:41 +11:00
Matthew Honnibal	0cee928467	* Allow StringStore to be pickled, to start addressing Issue #125	2015-10-13 13:44:41 +11:00
Matthew Honnibal	41012907a8	* Fix variable name	2015-10-13 13:44:40 +11:00
Matthew Honnibal	e70368d157	* Use lower case strings for dependency label names in symbols enum	2015-10-13 13:44:40 +11:00
Matthew Honnibal	7b4af3d1e7	* Fix parts_of_speech now that symbols list has been reformed	2015-10-13 13:44:40 +11:00
Matthew Honnibal	37b909b6b6	* Use the symbols file in vocab instead of the symbols subfiles like attrs.pxd	2015-10-13 13:44:40 +11:00

1 2 3 4 5 ...

1192 Commits