spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	f15deaad5b	* Upd docs	2014-12-09 16:08:01 +11:00
Matthew Honnibal	1ccabc806e	* Work on lemmatization	2014-12-09 16:06:18 +11:00
Matthew Honnibal	2a6bd2818f	* Load the lexicon before we check flag values	2014-12-09 15:18:43 +11:00
Matthew Honnibal	302e09018b	* Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas	2014-12-09 14:48:01 +11:00
Matthew Honnibal	cda9ea9a4a	* Add test to make sure iterating over the lexicon isnt broken	2014-12-08 21:12:51 +11:00
Matthew Honnibal	99bbbb6feb	* Work on morphological processing	2014-12-08 21:12:15 +11:00
Matthew Honnibal	7b68f911cf	* Add WordNet lemmatizer	2014-12-08 01:39:13 +11:00
Matthew Honnibal	c20dd79748	* Fiddle with const correctness and comments	2014-12-08 00:03:55 +11:00
Matthew Honnibal	b031c7c430	* Remove language-general context module	2014-12-07 23:53:01 +11:00
Matthew Honnibal	ef4398b204	* Rearrange POS stuff, so that language-specific stuff can live in language-specific modules	2014-12-07 23:52:41 +11:00
Matthew Honnibal	327383e38a	* Remove unused code in tagger.pyx	2014-12-07 22:16:17 +11:00
Matthew Honnibal	8f2f319c57	* Add a couple more contractions tests	2014-12-07 22:08:04 +11:00
Matthew Honnibal	9f17467c2e	* Fix EMPTY_TOKEN	2014-12-07 22:07:41 +11:00
Matthew Honnibal	3819a88e1b	* Add support for tag dictionary, and fix error-code for predict method	2014-12-07 22:07:16 +11:00
Matthew Honnibal	f00afe12c4	* Load POS tagger in load() function if path exists	2014-12-07 22:05:57 +11:00
Matthew Honnibal	677e111ee7	* Revise tokenization rules to match PTB. Rules are pretty messy around periods, need better support for these.	2014-12-07 22:04:47 +11:00
Matthew Honnibal	5fe5e6e66b	* Move context functions to header, inlining them.	2014-12-07 21:59:04 +11:00
Matthew Honnibal	91e8d9ea1c	* Compile context.pyx and tagger.pyx modules	2014-12-07 15:29:54 +11:00
Matthew Honnibal	5caabec789	* Link in tagger, to work on integrating POS tagging	2014-12-07 15:29:41 +11:00
Matthew Honnibal	0c7aeb9de7	* Begin revising tagger, focussing on POS tagging	2014-12-07 15:29:04 +11:00
Matthew Honnibal	f5c4f2eb52	* Revise context, focussing on POS tagging for now	2014-12-07 15:28:22 +11:00
Matthew Honnibal	e27b912ef9	* Remove need for confusing _data pointer to be stored on Tokens	2014-12-05 16:31:30 +11:00
Matthew Honnibal	1c9253701d	* Introduce a TokenC struct, to handle token indices, pos tags and sense tags	2014-12-05 15:56:14 +11:00
Matthew Honnibal	187372c7f3	* Allow the lexicon to create lexemes using an external memory pool, so that it can decide to make some lexemes temporary, rather than cached	2014-12-05 03:29:50 +11:00
Matthew Honnibal	75b8dfb348	* Remove upper_pc from lexeme.pyx	2014-12-04 22:14:34 +11:00
Matthew Honnibal	a14f9eaf63	* Add index.pyx to setup	2014-12-04 22:14:11 +11:00
Matthew Honnibal	49f3780ff5	* Fiddle with lexeme attrs	2014-12-04 21:22:38 +11:00
Matthew Honnibal	564082e48e	* Hack Token class to take lex.dense inplace of the old lex.norm. This needs to be fixed...	2014-12-04 20:51:29 +11:00
Matthew Honnibal	69bb022204	* Add as_array and count_by method	2014-12-04 20:46:55 +11:00
Matthew Honnibal	e1b1f45cc9	* Add STEM attribute to lexeme	2014-12-04 20:46:20 +11:00
Matthew Honnibal	d7952634ca	* Make the string-store serve const pointers to Utf8Str	2014-12-03 16:01:47 +11:00
Matthew Honnibal	7e04c22f8f	* const added to Lexicon interface. Seems to work.	2014-12-03 15:58:17 +11:00
Matthew Honnibal	d70d31aa45	* Introduce first attempt at const-ness	2014-12-03 15:44:25 +11:00
Matthew Honnibal	d0d812c548	* Hack setup.py to exclude tagger stuff	2014-12-03 11:06:57 +11:00
Matthew Honnibal	4560ada85b	* Add typedef for attr_t. Change flag_t to flags_t	2014-12-03 11:06:31 +11:00
Matthew Honnibal	e600f7b327	* Move String struct stuff into the utf8string module, from spacy.lang	2014-12-03 11:06:00 +11:00
Matthew Honnibal	e170faf5b0	* Hack Tokens to work without tagger.pyx	2014-12-03 11:05:15 +11:00
Matthew Honnibal	b463a7eb86	* Make flag-setting a language-specific thing	2014-12-03 11:04:32 +11:00
Matthew Honnibal	71b009e323	* Fix bug in refactored StringStore.__getitem__	2014-12-03 11:02:24 +11:00
Matthew Honnibal	14097311ae	* Make StringStore.__getitem__ accept unicode-typed keys.	2014-12-03 01:33:20 +11:00
Matthew Honnibal	522bb0346e	* Work on get_array method of Tokens	2014-12-02 23:48:05 +11:00
Matthew Honnibal	8c2938fe01	* Rename Lexicon._dict to Lexicon._map	2014-12-02 23:46:59 +11:00
Matthew Honnibal	2ee8a1e61f	* Make intro chattier, explain philosophy better	2014-12-02 15:20:18 +11:00
Matthew Honnibal	ea19850a69	* Add tokenizer section	2014-12-02 04:39:12 +11:00
Matthew Honnibal	3430d5f629	* Revise intro copy. Add NLTK comparison	2014-12-01 22:55:13 +11:00
Matthew Honnibal	33dfb4933c	* Remove taggers from Language class. Work on doc strings	2014-11-26 19:53:55 +11:00
Matthew Honnibal	cf55b48ba6	* Switch to predict label on shift. Big increase in accuracy.	2014-11-12 23:50:12 +11:00
Matthew Honnibal	8f84e8a78b	* Neaten oracle	2014-11-12 23:38:07 +11:00
Matthew Honnibal	66cb4f96e1	* Upd gitignore	2014-11-12 23:25:27 +11:00
Matthew Honnibal	60c1e78596	* Commit outstanding tests	2014-11-12 23:24:32 +11:00

1 2 3 4 5 ...

375 Commits All Branches Search

375 Commits

All Branches