Commit Graph

32 Commits

Author SHA1 Message Date
Matthew Honnibal c6707778dd * Fix Issue #51: Handle non-ascii lemmas correctly 2015-04-13 22:28:59 +02:00
Matthew Honnibal 567388e38d * Use values encoded by StringStore in POS tagging, rather than indices into a list of tags 2015-03-26 16:44:45 +01:00
Matthew Honnibal 8cc3524dc9 * Ws 2015-03-26 16:44:41 +01:00
Matthew Honnibal caf046b220 * Hastily add method to apply tags from a list of strings, instead of predicting the tags. 2015-02-23 15:40:17 -05:00
Matthew Honnibal 0b7e769211 * Add POS tags to support SWBD tag set 2015-02-11 14:08:28 -05:00
Matthew Honnibal 312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal 5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00
Matthew Honnibal be5536d239 * Fix Issue #22: PRP and PRP$ were mapped to NOUN. Should be PRON. 2015-02-08 18:36:18 -05:00
Matthew Honnibal 56c2ef2982 * Tweak POS features for web text 2015-02-02 11:59:36 +11:00
Matthew Honnibal 024cfd485c * Pass tag_strings as a tuple, to support new Tokens API 2015-01-31 13:43:37 +11:00
Matthew Honnibal 67d6e53a69 * Ensure parser and tagger function correctly when training from missing values, indicated by -1 2015-01-30 14:08:56 +11:00
Matthew Honnibal 12b034e3ef * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
Matthew Honnibal 7431c133d8 * Add error if try to access head and not is_parsed 2015-01-25 15:33:54 +11:00
Matthew Honnibal 4e857ab7a6 * Fix bug in POS tagger feature 2015-01-25 02:20:15 +11:00
Matthew Honnibal a97bed9359 * Fix POS and dependency label tag names. Add parse and string navigation functions. 2015-01-24 17:29:04 +11:00
Matthew Honnibal 5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal 0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal 46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal 3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal 94034f1112 * Fix encoding in lemmatization 2015-01-05 11:54:29 +11:00
Matthew Honnibal 0e4c2ba036 * Fix loading of special morph words 2015-01-03 23:13:00 +11:00
Matthew Honnibal 5d9a096e2f * Some minor clean-up after HastyModel 2014-12-31 19:46:04 +11:00
Matthew Honnibal aafaf58cbe * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. 2014-12-31 19:40:59 +11:00
Matthew Honnibal 1a075f77ff * Don't over-ride pre-loaded POS tags, if set by special-cases 2014-12-30 23:26:32 +11:00
Matthew Honnibal bb0b00f819 * Repurporse the Tagger class as a generic Model, wrapping thinc's interface 2014-12-30 21:20:15 +11:00
Matthew Honnibal bb80937544 * Upd docstrings 2014-12-27 18:45:16 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal b00bc01d8c * All tests now passing for reorg 2014-12-23 13:18:59 +11:00
Matthew Honnibal 73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal 61df50b598 * Add English-subclass POS tagger 2014-12-21 20:59:07 +11:00