Commit Graph

409 Commits

Author SHA1 Message Date
Matthew Honnibal 809ddf7887 * Add index.pxd 2014-12-19 07:23:00 +11:00
Matthew Honnibal 1879abd16a * Set const-correctness for tagger 2014-12-18 20:41:52 +11:00
Matthew Honnibal f72243b156 * Set const-correctness for Feature* array 2014-12-18 20:41:32 +11:00
Matthew Honnibal 6ab7e40590 * Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set 2014-12-18 11:33:25 +11:00
Matthew Honnibal 7e0c692daf * Automatically push when the stack is empty 2014-12-18 09:16:10 +11:00
Matthew Honnibal 61142a8eff * Tweak features 2014-12-18 09:15:03 +11:00
Matthew Honnibal e3b123e6e0 * Ignore cpp files from parser 2014-12-18 09:05:51 +11:00
Matthew Honnibal 8446ebfbbb * Work on parser. Up to 92 UAS on YM labels 2014-12-18 09:05:31 +11:00
Matthew Honnibal 55de747bfc * Remove .cpp files 2014-12-18 02:43:13 +11:00
Matthew Honnibal 4448a840f7 * Work on greedy parsing. Scoring about 91.2 2014-12-18 02:42:55 +11:00
Matthew Honnibal 87e9487d76 * Work on parser 2014-12-17 21:10:12 +11:00
Matthew Honnibal 9d7d97978d * Work on greedy parser 2014-12-17 21:09:29 +11:00
Matthew Honnibal d524dd306a * Work on greedy parser 2014-12-17 03:19:43 +11:00
Matthew Honnibal 95ccea03b2 * Work on greedy parser 2014-12-16 22:46:55 +11:00
Matthew Honnibal a432862fde * Add exception type to _arg_max_among in tagger 2014-12-16 09:44:19 +11:00
Matthew Honnibal 9e00798820 * Work on integrating a greedy dependency parser 2014-12-16 08:06:04 +11:00
Matthew Honnibal 24ffc32f2f * Another redraft of index.rst 2014-12-15 16:32:03 +11:00
Matthew Honnibal 77dd7a212a * More thoughts on intro 2014-12-15 09:19:29 +11:00
Matthew Honnibal 792802b2b9 * POS tag memoisation working, with good speed-up 2014-12-12 14:33:51 +11:00
Matthew Honnibal ca54d58638 * Merge setup.py 2014-12-10 15:21:27 +11:00
Matthew Honnibal 9959a64f7b * Working morphology and lemmatisation. POS tagging quite fast. 2014-12-10 08:09:32 +11:00
Matthew Honnibal 7831b06610 * Compile morphology.pyx file 2014-12-10 08:09:13 +11:00
Matthew Honnibal df3be14987 * Add pos_type features to POS tagger 2014-12-10 08:08:55 +11:00
Matthew Honnibal 42973c4b37 * Improve efficiency of tagger, and improve morphological processing 2014-12-10 01:02:04 +11:00
Matthew Honnibal 6b34a2f34b * Move morphological analysis into its own module, morphology.pyx 2014-12-09 21:16:17 +11:00
Matthew Honnibal b962fe73d7 * Make suffixes file use full-power regex, so that we can handle periods properly 2014-12-09 19:04:27 +11:00
Matthew Honnibal accdbe989b * Remove Tokens.extend method 2014-12-09 17:09:23 +11:00
Matthew Honnibal 495e1c7366 * Use fused type in Tokens.push_back, simplifying the use of the cache 2014-12-09 16:50:01 +11:00
Matthew Honnibal 516f0f1e14 * Remove test for loading ad hoc rules format 2014-12-09 16:08:45 +11:00
Matthew Honnibal 6369835306 * Add false positive test for emoticons 2014-12-09 16:08:17 +11:00
Matthew Honnibal f15deaad5b * Upd docs 2014-12-09 16:08:01 +11:00
Matthew Honnibal 1ccabc806e * Work on lemmatization 2014-12-09 16:06:18 +11:00
Matthew Honnibal 2a6bd2818f * Load the lexicon before we check flag values 2014-12-09 15:18:43 +11:00
Matthew Honnibal 302e09018b * Work on fixing special-cases, reading them in as JSON objects so that they can specify lemmas 2014-12-09 14:48:01 +11:00
Matthew Honnibal cda9ea9a4a * Add test to make sure iterating over the lexicon isnt broken 2014-12-08 21:12:51 +11:00
Matthew Honnibal 99bbbb6feb * Work on morphological processing 2014-12-08 21:12:15 +11:00
Matthew Honnibal 7b68f911cf * Add WordNet lemmatizer 2014-12-08 01:39:13 +11:00
Matthew Honnibal c20dd79748 * Fiddle with const correctness and comments 2014-12-08 00:03:55 +11:00
Matthew Honnibal b031c7c430 * Remove language-general context module 2014-12-07 23:53:01 +11:00
Matthew Honnibal ef4398b204 * Rearrange POS stuff, so that language-specific stuff can live in language-specific modules 2014-12-07 23:52:41 +11:00
Matthew Honnibal 327383e38a * Remove unused code in tagger.pyx 2014-12-07 22:16:17 +11:00
Matthew Honnibal 8f2f319c57 * Add a couple more contractions tests 2014-12-07 22:08:04 +11:00
Matthew Honnibal 9f17467c2e * Fix EMPTY_TOKEN 2014-12-07 22:07:41 +11:00
Matthew Honnibal 3819a88e1b * Add support for tag dictionary, and fix error-code for predict method 2014-12-07 22:07:16 +11:00
Matthew Honnibal f00afe12c4 * Load POS tagger in load() function if path exists 2014-12-07 22:05:57 +11:00
Matthew Honnibal 677e111ee7 * Revise tokenization rules to match PTB. Rules are pretty messy around periods, need better support for these. 2014-12-07 22:04:47 +11:00
Matthew Honnibal 5fe5e6e66b * Move context functions to header, inlining them. 2014-12-07 21:59:04 +11:00
Matthew Honnibal 91e8d9ea1c * Compile context.pyx and tagger.pyx modules 2014-12-07 15:29:54 +11:00
Matthew Honnibal 5caabec789 * Link in tagger, to work on integrating POS tagging 2014-12-07 15:29:41 +11:00
Matthew Honnibal 0c7aeb9de7 * Begin revising tagger, focussing on POS tagging 2014-12-07 15:29:04 +11:00