spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	0962ffc095	* Fix issue #37 : missing check_flag attribute from Token class	2015-03-26 15:06:26 +01:00
Matthew Honnibal	dbe26f5793	* Add children and subtree methods to Token, which are generators to assist parse-tree navigation.	2015-03-03 04:18:41 -05:00
Matthew Honnibal	cae077b583	* Work on fixing orphaned Token objects bug	2015-02-16 15:20:31 -05:00
Matthew Honnibal	7572e31f5e	* Pass ownership of C data to Token instances if Tokens object is being garbage-collected, but Token instances are staying alive.	2015-02-11 18:05:06 -05:00
Matthew Honnibal	ab8bb047d0	* Fix negative index for __getitem__	2015-02-07 12:58:46 -05:00
Matthew Honnibal	c7d8644149	* Fix regression on 'prob' attr of Token.	2015-02-03 03:32:18 +11:00
Matthew Honnibal	de772088e6	* Use parse tree for sbd in Tokens.sents	2015-02-02 12:17:32 +11:00
Matthew Honnibal	7de00c5a79	* Try not holding a reference to Pool, since that seems to confuse the GC	2015-01-31 22:10:22 +11:00
Matthew Honnibal	018e0bfa24	* Bug fixes to parse navigation	2015-01-31 16:37:13 +11:00
Matthew Honnibal	77d62d0179	* Large refactor of Token objects, making them much thinner. This is to support fast parse-tree navigation.	2015-01-31 13:42:58 +11:00
Matthew Honnibal	251dbf24d7	* Fix unintialised variable error	2015-01-30 20:46:34 +11:00
Matthew Honnibal	1a7a1c2771	* Fix Issue #16 : tokens recurse when printing	2015-01-30 19:47:50 +11:00
Matthew Honnibal	b68f563c2f	* Fix Issue #14 : Improve parsing API	2015-01-30 18:04:41 +11:00
Matthew Honnibal	e6c3d3471f	* Tweak documentation for Tokens, and hide constructor as __cinit__	2015-01-27 18:57:52 +11:00
Matthew Honnibal	12b034e3ef	* Move POS tag definitions to parts_of_speech.pxd	2015-01-25 16:31:07 +11:00
Matthew Honnibal	7431c133d8	* Add error if try to access head and not is_parsed	2015-01-25 15:33:54 +11:00
Matthew Honnibal	a97bed9359	* Fix POS and dependency label tag names. Add parse and string navigation functions.	2015-01-24 17:29:04 +11:00
Matthew Honnibal	76cd024095	* Add whitespace property to Token	2015-01-24 07:41:21 +11:00
Matthew Honnibal	5fd72bc220	* Have 'string' refer to the whitespace-padded string	2015-01-24 07:32:38 +11:00
Matthew Honnibal	fda94271af	* Rename NORM1 and NORM2 attrs to lower and norm	2015-01-24 06:17:03 +11:00
Matthew Honnibal	5ed8b2b98f	* Rename sic to orth	2015-01-23 02:08:25 +11:00
Matthew Honnibal	a27b23cc8f	* Have SBD return start/end indices	2015-01-22 22:24:44 +11:00
Matthew Honnibal	9cd0b6b3e9	* Various tweaks to Tokens class	2015-01-22 02:05:37 +11:00
Matthew Honnibal	d6ac60e91c	* Bug fixes to sentences method, and improved vector transport for tokens	2015-01-21 18:56:32 +11:00
Matthew Honnibal	f149259bf5	* Fix negative indices in tokens	2015-01-20 01:16:29 +11:00
Matthew Honnibal	b65b0c07bf	* Messily hook up vector in tokens	2015-01-19 19:59:55 +11:00
Matthew Honnibal	6c7e44140b	* Work on word vectors, and other stuff	2015-01-17 16:21:17 +11:00
Matthew Honnibal	802867e96a	* Revise interface to Token. Strings now have attribute names like norm1_	2015-01-15 03:51:47 +11:00
Matthew Honnibal	7d3c40de7d	* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme	2015-01-15 00:33:16 +11:00
Matthew Honnibal	0930892fc1	* Tmp. Working on refactor. Compiles, must hook up lexical feats.	2015-01-14 00:03:48 +11:00
Matthew Honnibal	46da3d74d2	* Tmp. Refactoring, introducing a Lexeme PyObject.	2015-01-12 11:23:44 +11:00
Matthew Honnibal	ce2edd6312	* Tmp commit. Refactoring to create a Python Lexeme class.	2015-01-12 10:26:22 +11:00
Matthew Honnibal	3f1944d688	* Make PyPy work	2015-01-05 17:54:38 +11:00
Matthew Honnibal	c1ef3febee	* Embedsignature in tokens.pyx	2014-12-30 21:22:00 +11:00
Matthew Honnibal	fe2a5e0370	* Work on docstrings	2014-12-27 21:46:04 +11:00
Matthew Honnibal	bb80937544	* Upd docstrings	2014-12-27 18:45:16 +11:00
Matthew Honnibal	b8b65903fc	* Tmp	2014-12-24 17:42:00 +11:00
Matthew Honnibal	ab61673edd	* Fix api of array method	2014-12-23 15:18:48 +11:00
Matthew Honnibal	73f200436f	* Tests passing except for morphology/lemmatization stuff	2014-12-23 11:40:32 +11:00
Matthew Honnibal	4c4aa2c5c9	* Work on train	2014-12-22 07:25:43 +11:00
Matthew Honnibal	4c6ce7ee84	* Update tokens.pyx as part of reorg	2014-12-20 07:03:26 +11:00
Matthew Honnibal	9d3ca13909	* Start work on parse-tree iteration classes	2014-12-20 03:48:10 +11:00
Matthew Honnibal	87e9487d76	* Work on parser	2014-12-17 21:10:12 +11:00
Matthew Honnibal	9959a64f7b	* Working morphology and lemmatisation. POS tagging quite fast.	2014-12-10 08:09:32 +11:00
Matthew Honnibal	accdbe989b	* Remove Tokens.extend method	2014-12-09 17:09:23 +11:00
Matthew Honnibal	495e1c7366	* Use fused type in Tokens.push_back, simplifying the use of the cache	2014-12-09 16:50:01 +11:00
Matthew Honnibal	99bbbb6feb	* Work on morphological processing	2014-12-08 21:12:15 +11:00
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	9f17467c2e	* Fix EMPTY_TOKEN	2014-12-07 22:07:41 +11:00
Matthew Honnibal	e27b912ef9	* Remove need for confusing _data pointer to be stored on Tokens	2014-12-05 16:31:30 +11:00

1 2

94 Commits