spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	b68f563c2f	* Fix Issue #14 : Improve parsing API	2015-01-30 18:04:41 +11:00
Matthew Honnibal	e6c3d3471f	* Tweak documentation for Tokens, and hide constructor as __cinit__	2015-01-27 18:57:52 +11:00
Matthew Honnibal	12b034e3ef	* Move POS tag definitions to parts_of_speech.pxd	2015-01-25 16:31:07 +11:00
Matthew Honnibal	7431c133d8	* Add error if try to access head and not is_parsed	2015-01-25 15:33:54 +11:00
Matthew Honnibal	a97bed9359	* Fix POS and dependency label tag names. Add parse and string navigation functions.	2015-01-24 17:29:04 +11:00
Matthew Honnibal	76cd024095	* Add whitespace property to Token	2015-01-24 07:41:21 +11:00
Matthew Honnibal	5fd72bc220	* Have 'string' refer to the whitespace-padded string	2015-01-24 07:32:38 +11:00
Matthew Honnibal	fda94271af	* Rename NORM1 and NORM2 attrs to lower and norm	2015-01-24 06:17:03 +11:00
Matthew Honnibal	5ed8b2b98f	* Rename sic to orth	2015-01-23 02:08:25 +11:00
Matthew Honnibal	a27b23cc8f	* Have SBD return start/end indices	2015-01-22 22:24:44 +11:00
Matthew Honnibal	9cd0b6b3e9	* Various tweaks to Tokens class	2015-01-22 02:05:37 +11:00
Matthew Honnibal	d6ac60e91c	* Bug fixes to sentences method, and improved vector transport for tokens	2015-01-21 18:56:32 +11:00
Matthew Honnibal	f149259bf5	* Fix negative indices in tokens	2015-01-20 01:16:29 +11:00
Matthew Honnibal	b65b0c07bf	* Messily hook up vector in tokens	2015-01-19 19:59:55 +11:00
Matthew Honnibal	6c7e44140b	* Work on word vectors, and other stuff	2015-01-17 16:21:17 +11:00
Matthew Honnibal	802867e96a	* Revise interface to Token. Strings now have attribute names like norm1_	2015-01-15 03:51:47 +11:00
Matthew Honnibal	7d3c40de7d	* Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme	2015-01-15 00:33:16 +11:00
Matthew Honnibal	0930892fc1	* Tmp. Working on refactor. Compiles, must hook up lexical feats.	2015-01-14 00:03:48 +11:00
Matthew Honnibal	46da3d74d2	* Tmp. Refactoring, introducing a Lexeme PyObject.	2015-01-12 11:23:44 +11:00
Matthew Honnibal	ce2edd6312	* Tmp commit. Refactoring to create a Python Lexeme class.	2015-01-12 10:26:22 +11:00
Matthew Honnibal	3f1944d688	* Make PyPy work	2015-01-05 17:54:38 +11:00
Matthew Honnibal	c1ef3febee	* Embedsignature in tokens.pyx	2014-12-30 21:22:00 +11:00
Matthew Honnibal	fe2a5e0370	* Work on docstrings	2014-12-27 21:46:04 +11:00
Matthew Honnibal	bb80937544	* Upd docstrings	2014-12-27 18:45:16 +11:00
Matthew Honnibal	b8b65903fc	* Tmp	2014-12-24 17:42:00 +11:00
Matthew Honnibal	ab61673edd	* Fix api of array method	2014-12-23 15:18:48 +11:00
Matthew Honnibal	73f200436f	* Tests passing except for morphology/lemmatization stuff	2014-12-23 11:40:32 +11:00
Matthew Honnibal	4c4aa2c5c9	* Work on train	2014-12-22 07:25:43 +11:00
Matthew Honnibal	4c6ce7ee84	* Update tokens.pyx as part of reorg	2014-12-20 07:03:26 +11:00
Matthew Honnibal	9d3ca13909	* Start work on parse-tree iteration classes	2014-12-20 03:48:10 +11:00
Matthew Honnibal	87e9487d76	* Work on parser	2014-12-17 21:10:12 +11:00
Matthew Honnibal	9959a64f7b	* Working morphology and lemmatisation. POS tagging quite fast.	2014-12-10 08:09:32 +11:00
Matthew Honnibal	accdbe989b	* Remove Tokens.extend method	2014-12-09 17:09:23 +11:00
Matthew Honnibal	495e1c7366	* Use fused type in Tokens.push_back, simplifying the use of the cache	2014-12-09 16:50:01 +11:00
Matthew Honnibal	99bbbb6feb	* Work on morphological processing	2014-12-08 21:12:15 +11:00
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	9f17467c2e	* Fix EMPTY_TOKEN	2014-12-07 22:07:41 +11:00
Matthew Honnibal	e27b912ef9	* Remove need for confusing _data pointer to be stored on Tokens	2014-12-05 16:31:30 +11:00
Matthew Honnibal	1c9253701d	* Introduce a TokenC struct, to handle token indices, pos tags and sense tags	2014-12-05 15:56:14 +11:00
Matthew Honnibal	564082e48e	* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...	2014-12-04 20:51:29 +11:00
Matthew Honnibal	69bb022204	* Add as_array and count_by method	2014-12-04 20:46:55 +11:00
Matthew Honnibal	d70d31aa45	* Introduce first attempt at const-ness	2014-12-03 15:44:25 +11:00
Matthew Honnibal	e170faf5b0	* Hack Tokens to work without tagger.pyx	2014-12-03 11:05:15 +11:00
Matthew Honnibal	522bb0346e	* Work on get_array method of Tokens	2014-12-02 23:48:05 +11:00
Matthew Honnibal	4ecbe8c893	* Complete refactor of Tagger features, to use a generic list of context names.	2014-11-05 20:45:29 +11:00
Matthew Honnibal	3733444101	* Generalize tagger code, in preparation for NER and supersense tagging.	2014-11-05 03:42:14 +11:00
Matthew Honnibal	954c970415	* Add __iter__ method to tokens	2014-11-04 01:07:08 +11:00
Matthew Honnibal	ae52f9f38c	* Remove vocab10k from tokens	2014-11-03 00:23:20 +11:00
Matthew Honnibal	b186a66bae	* Rename Token.lex_pos to Token.postype, and Token.lex_supersense to Token.sensetype	2014-10-31 17:44:39 +11:00
Matthew Honnibal	ac88893232	* Fix Token after lexeme changes	2014-10-30 15:30:52 +11:00

1 2

82 Commits