spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	5aaf7a024d	* Move ner features to ner subdir	2014-11-11 21:09:03 +11:00
Matthew Honnibal	ff8989b63c	* Use greedy NER parser	2014-11-11 21:08:35 +11:00
Matthew Honnibal	0d943ab358	* Fixed greedy NER parsing. With static oracle, replicates accuracy from tagger.	2014-11-11 17:17:54 +11:00
Matthew Honnibal	399239760b	* Fix moves for new State struct	2014-11-10 22:16:05 +11:00
Matthew Honnibal	82247169f2	* Implement validation and oracle on pystate, for testing	2014-11-10 22:15:32 +11:00
Matthew Honnibal	3709ed9d6d	* Add curr field to State, to handle entity being built	2014-11-10 22:14:36 +11:00
Matthew Honnibal	10e9e14c4f	* Add tests for NER oracle	2014-11-10 22:13:46 +11:00
Matthew Honnibal	af9ed18cf1	* Bug fixes to NER	2014-11-10 17:39:23 +11:00
Matthew Honnibal	d7b2843643	* Add some tests for ner	2014-11-10 16:29:19 +11:00
Matthew Honnibal	9f2587f5ec	* Work on shift-reduce NER	2014-11-10 16:28:56 +11:00
Matthew Honnibal	f307eb2e36	* Refactor context extraction, and start breaking out gold standards into their own functions	2014-11-09 15:43:07 +11:00
Matthew Honnibal	602f993af9	* Moving tagger to accept multiple correct answers	2014-11-09 15:18:33 +11:00
Matthew Honnibal	10a33ec725	* Upd fabfile for experiments	2014-11-07 04:44:14 +11:00
Matthew Honnibal	f37d896a42	* Upd NER feats. With adadelta learner, getting 76.9 on NER	2014-11-07 04:43:54 +11:00
Matthew Honnibal	a42321bd4e	* Upd shape test	2014-11-07 04:42:54 +11:00
Matthew Honnibal	68d1cdad62	* When encoding POS/NER tags, accept '-' as a missing value	2014-11-07 04:42:31 +11:00
Matthew Honnibal	949a6245f9	* Increase default number of iterations from 5 to 10	2014-11-07 04:42:04 +11:00
Matthew Honnibal	3cab1d9a29	* Refine word_shape feature, by trimming the max sequence length	2014-11-07 04:41:29 +11:00
Matthew Honnibal	b4454cf036	* Add extra context tokens	2014-11-07 04:40:36 +11:00
Matthew Honnibal	50309e6e49	* Fix context vector, importing all features	2014-11-05 22:11:39 +11:00
Matthew Honnibal	07a23768de	* Play with NER feats a bit. Up to 82.00 training on MUC7.	2014-11-05 21:47:17 +11:00
Matthew Honnibal	edf739134c	* Make make quiet by default, and add a vmake option for verbose make	2014-11-05 20:46:29 +11:00
Matthew Honnibal	dbbb914480	* Upd setup	2014-11-05 20:45:44 +11:00
Matthew Honnibal	4ecbe8c893	* Complete refactor of Tagger features, to use a generic list of context names.	2014-11-05 20:45:29 +11:00
Matthew Honnibal	0a8c84625d	* Moving feature context stuff to a generalized place	2014-11-05 19:55:10 +11:00
Matthew Honnibal	3733444101	* Generalize tagger code, in preparation for NER and supersense tagging.	2014-11-05 03:42:14 +11:00
Matthew Honnibal	81da61f3cf	* Remove out-dated POS data test	2014-11-05 02:04:12 +11:00
Matthew Honnibal	0de700b566	* Comment out tests of hyphenation, while we decide what hyphenation policy should be.	2014-11-05 02:03:22 +11:00
Matthew Honnibal	abbe3e44b0	* Move spacy.pos tagger to spacy.tagger, and generalize it so that it can take on other tagging tasks, given a different set of feature templates.	2014-11-05 00:37:59 +11:00
Matthew Honnibal	2420d944cb	* Upd sales copy	2014-11-04 17:01:54 +11:00
Matthew Honnibal	954c970415	* Add __iter__ method to tokens	2014-11-04 01:07:08 +11:00
Matthew Honnibal	f07457a91f	* Remove POS alignment stuff. Now use training data based on raw text, instead of clumsy detokenization stuff	2014-11-04 01:06:43 +11:00
Matthew Honnibal	bea762ec04	* Update tokenization rules	2014-11-04 01:06:00 +11:00
Matthew Honnibal	b8d5881333	* Update sales copy	2014-11-03 13:54:18 +11:00
Matthew Honnibal	ae52f9f38c	* Remove vocab10k from tokens	2014-11-03 00:23:20 +11:00
Matthew Honnibal	11915e5238	* Update tests	2014-11-03 00:23:04 +11:00
Matthew Honnibal	75329e9ef8	* Add Co. abbreviation to tokenization rules	2014-11-03 00:16:20 +11:00
Matthew Honnibal	32fb50dc35	* Remove non_sparse method --- features wanting this can do it easily enough.	2014-11-03 00:15:47 +11:00
Matthew Honnibal	b5ae1471db	* Fiddle with POS tag features	2014-11-03 00:15:03 +11:00
Matthew Honnibal	70ea862703	* Remove vocab10k field, and add flags for gazetteers	2014-11-03 00:13:51 +11:00
Matthew Honnibal	f1c3e17c80	* Work on intro copy	2014-11-03 00:13:19 +11:00
Matthew Honnibal	fa91506073	* Add '' double quote to suffixes file	2014-11-03 00:12:59 +11:00
Matthew Honnibal	493d5ffb50	* Add test for '' in punct	2014-11-02 21:24:09 +11:00
Matthew Honnibal	711ed0f636	* Whitespace	2014-11-02 14:22:32 +11:00
Matthew Honnibal	fcd9490d56	* Add pos_tag method to Language	2014-11-02 14:21:43 +11:00
Matthew Honnibal	99b5cefa88	* Add tests for emoticon tokenization	2014-11-02 13:22:14 +11:00
Matthew Honnibal	23131f21bb	* Add tests for like_url	2014-11-02 13:21:57 +11:00
Matthew Honnibal	dc6c3c0f56	* Add tests for like_number	2014-11-02 13:21:39 +11:00
Matthew Honnibal	829bb2bdbe	* Add mappings to Twitter POS tag corpus	2014-11-02 13:21:19 +11:00
Matthew Honnibal	437cd2217d	* Fix strings i/o, removing use of ujson library in favour of plain text file. Allows better control of codecs.	2014-11-02 13:20:37 +11:00

1 2 3 4 5 ...

314 Commits All Branches Search

314 Commits

All Branches