Commit Graph

92 Commits

Author SHA1 Message Date
Matthew Honnibal 897de2d438 * Add 'bitter' property for serializer in English class 2015-07-16 17:47:53 +02:00
Matthew Honnibal ff9ff6f3fa * Ensure unseen words are given low log probability 2015-07-12 01:31:09 +02:00
Matthew Honnibal 6ddb2f5e45 * Restore merge_mwe in English class 2015-07-08 19:35:30 +02:00
Matthew Honnibal 6859f6adac * Restore merge_mwe in English class 2015-07-08 19:34:55 +02:00
Matthew Honnibal e3c53f5ecd * Fix mention of Tokens in docstring 2015-07-08 18:56:27 +02:00
Matthew Honnibal bb522496dd * Rename Tokens to Doc 2015-07-08 18:53:00 +02:00
Matthew Honnibal 4e4fac452b * Refactor __init__ for simplicity. Allow parse=True, tag=True etc flags to be passed at top-level. Do not lazy-load parser. 2015-07-08 12:35:29 +02:00
Matthew Honnibal 1d2deb4616 * Work on refactoring default arguments to English.__init__ 2015-07-07 15:53:25 +02:00
Matthew Honnibal 6788c86b2f * Begin refactor 2015-07-07 14:00:07 +02:00
Matthew Honnibal 58d5ac0944 * Add beam search capabilities to Parser. Rename GreedyParser to Parser. 2015-06-02 00:28:02 +02:00
Matthew Honnibal eba7b34f66 * Add flag to disable loading of word vectors 2015-05-25 01:02:42 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 42617548af * Disable merge_mwes by default 2015-04-16 04:20:31 +02:00
Matthew Honnibal b8d34531c4 * Add support for units to English.__init__, by loading and applying regular expressions 2015-04-07 04:02:32 +02:00
Matthew Honnibal 801bf14f4f * Clean up handling of dep_strings and ent_strings, using StringStore to encode the label names. 2015-03-26 16:44:45 +01:00
Matthew Honnibal f21ab2d7fb * Fix bug in ugly ent_strings hack on English class 2015-03-26 16:44:45 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal 220ce8bfed * Prepare English class for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal 179b7eb0a7 * Specify parser transition system in language 2015-03-26 16:44:43 +01:00
Matthew Honnibal 64645a1c2f * Improve docstring on English 2015-02-11 15:13:20 -05:00
Matthew Honnibal a1ed574b7b * Fix default model path for English 2015-01-31 16:38:27 +11:00
Matthew Honnibal c38c62d4a3 * Add docstring to English class 2015-01-27 02:45:21 +11:00
Matthew Honnibal 951d06c824 * Silently don't parse if data is not present 2015-01-25 14:47:38 +11:00
Matthew Honnibal dd56e298e2 * Ensure tagging is applied if parse=True 2015-01-25 02:19:44 +11:00
Matthew Honnibal 94750819cd * Set parse=True by default --- i.e. parse unless told not to. 2015-01-25 01:28:28 +11:00
Matthew Honnibal fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal 5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal f2a229136c * Fix data_dir=None argument to English class 2015-01-21 18:27:31 +11:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal 7d3c40de7d * Tests passing after refactor. API has obvious warts, particularly in Token and Lexeme 2015-01-15 00:33:16 +11:00
Matthew Honnibal 0930892fc1 * Tmp. Working on refactor. Compiles, must hook up lexical feats. 2015-01-14 00:03:48 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal 7689dccd0f * Remove unused import 2015-01-05 18:48:48 +11:00
Matthew Honnibal f5d41028b5 * Move around data files for test release 2015-01-03 01:59:22 +11:00
Matthew Honnibal aafaf58cbe * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. 2014-12-31 19:40:59 +11:00
Matthew Honnibal 30e5805656 * Lazy-load tagger and parser 2014-12-30 23:25:09 +11:00
Matthew Honnibal bb80937544 * Upd docstrings 2014-12-27 18:45:16 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal 98eb4c0426 * Fix path to parser model 2014-12-23 15:09:09 +11:00
Matthew Honnibal 73f200436f * Tests passing except for morphology/lemmatization stuff 2014-12-23 11:40:32 +11:00
Matthew Honnibal cf8d26c3d2 * POS tagger training working after reorg 2014-12-22 08:54:47 +11:00
Matthew Honnibal 4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00