Commit Graph

450 Commits

Author SHA1 Message Date
Matthew Honnibal fb0a641a2d * Don't release the gil around Parser.parse. Does this indicate thread problems? 2015-07-17 23:07:37 +02:00
Matthew Honnibal e29daea85f * Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe. 2015-07-17 22:37:24 +02:00
Matthew Honnibal 45ae1ce428 * Remove unused declaration in parser 2015-07-16 01:27:11 +02:00
Matthew Honnibal 9a8db9743c * Remove gil from parser.call 2015-07-14 23:47:33 +02:00
Matthew Honnibal 38ca0c33f5 Merge branch 'neuralnet' into refactor
Mostly refactors parser, to use new thinc3.2 Example class.
Aim is to remove use of shared memory, so that we can parallelize
over documents easily.

Conflicts:
	setup.py
	spacy/syntax/parser.pxd
	spacy/syntax/parser.pyx
	spacy/syntax/stateclass.pyx
2015-07-14 14:13:47 +02:00
Matthew Honnibal 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal 55f1042443 * Improve efficiency of L and R features, correcting the non-linear-in-length problem. 2015-07-09 12:17:26 +02:00
Matthew Honnibal 70d2acb579 * Fix edge features 2015-07-09 12:15:01 +02:00
Matthew Honnibal adb868bdad * Add warning for models not found in parser 2015-07-08 20:04:55 +02:00
Matthew Honnibal 05b28ec9eb * Add warning for models not found in parser 2015-07-08 20:02:13 +02:00
Matthew Honnibal ef700401a6 * Add warning for models not found in parser 2015-07-08 20:00:46 +02:00
Matthew Honnibal 6218d8b389 * Add warning for models not found in parser 2015-07-08 19:59:16 +02:00
Matthew Honnibal f6a6c39ce8 * Add warning for models not found in parser 2015-07-08 19:52:30 +02:00
Matthew Honnibal 0ceb1f71c2 * Update parse features 2015-07-08 19:11:36 +02:00
Matthew Honnibal bb522496dd * Rename Tokens to Doc 2015-07-08 18:53:00 +02:00
Matthew Honnibal ff885e8511 * Add ParserFactory convenience function 2015-07-08 12:35:46 +02:00
Matthew Honnibal 52fd80c6c6 * Add experimental supersense features for parsing, based on lookup into wordnet. 2015-07-01 20:12:44 +02:00
Matthew Honnibal e20106fdff * Begin reorganizing neuralnet work 2015-06-30 14:26:32 +02:00
Matthew Honnibal 3bb5876c5a * Inline methods in StateClass 2015-06-29 01:10:14 +02:00
Matthew Honnibal 313a7f87b3 * Inline methods in StateClass 2015-06-29 01:06:28 +02:00
Matthew Honnibal a02fd3af5d * Check valency in L and R feature methods, to make feaure calculation faster 2015-06-29 00:27:56 +02:00
Matthew Honnibal 5d870720bc * Check valency in L and R feature methods, to make feaure calculation faster 2015-06-29 00:17:29 +02:00
Matthew Honnibal f4986d5d3c * Use new Example class 2015-06-28 22:36:03 +02:00
Matthew Honnibal 735f1af91f * Fix neural net stuff 2015-06-28 11:44:58 +02:00
Matthew Honnibal e7003f1cf3 * Remove hard-coding of vector lengths 2015-06-28 11:37:17 +02:00
Matthew Honnibal 897dd0dd0b * Merge changes, and adjust Example to use memoryview 2015-06-28 11:36:11 +02:00
Matthew Honnibal 9282a8e72c * Prepare for new models to be plugged in by using Example class 2015-06-28 11:02:35 +02:00
Matthew Honnibal 75aeccc064 * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search 2015-06-28 11:02:34 +02:00
Matthew Honnibal bbef71f213 * Fix min function in fill_context 2015-06-28 10:46:39 +02:00
Matthew Honnibal 142b6f9510 * Revert last changes 2015-06-28 10:44:28 +02:00
Matthew Honnibal b06962f18b * Pad buffers in state 2015-06-28 10:36:14 +02:00
Matthew Honnibal 53be72387c * Hack at fill_context to investigate performance loss 2015-06-28 10:34:28 +02:00
Matthew Honnibal 71a4e876a9 * Fix parse features 2015-06-28 09:27:33 +02:00
Matthew Honnibal 5af500909c * Remove unused directve from parser.pyx 2015-06-28 06:20:21 +02:00
Matthew Honnibal d5b4090705 * Add profile directive 2015-06-28 06:19:33 +02:00
Matthew Honnibal 2b5421e60c * Add profile directive 2015-06-28 06:07:04 +02:00
Matthew Honnibal 8b5de4a411 * Add word / tag / label sets, for use in neural net 2015-06-28 05:46:53 +02:00
Matthew Honnibal ed40a8380e * Remove hard-coding of vector lengths 2015-06-27 04:18:47 +02:00
Matthew Honnibal ebe630cc8d * Enable more features for NN 2015-06-27 04:17:29 +02:00
Matthew Honnibal f8bb43475e * Bridge to Theano working. Very disorganised. Using thinc adb60aba966ed2 2015-06-27 02:39:18 +02:00
Matthew Honnibal 2fe98b8a9a * Prepare for new models to be plugged in by using Example class 2015-06-26 13:51:39 +02:00
Matthew Honnibal 6896455884 * Rejig parser interface to use new thinc.api.Example class, in prep of theano model. Comment out beam search 2015-06-26 06:25:36 +02:00
Matthew Honnibal 02b171ee67 * Bug fixes to edge calculation 2015-06-24 04:28:02 +02:00
Matthew Honnibal 7f9384f53c * Remove deprecated _state module 2015-06-23 17:28:24 +02:00
Matthew Honnibal 6dbe182491 * Fix merge conflicts 2015-06-23 17:28:00 +02:00
Matthew Honnibal 579735a095 * Remove import of _state module 2015-06-23 17:25:08 +02:00
Matthew Honnibal 88f55d136b * Remove deprecated _state module 2015-06-23 17:19:51 +02:00
Matthew Honnibal 9ab9dd2bf7 * Clean up unused orig_arc_eager and tree_arc_eager modules, which were only added for EMNLP experiments 2015-06-23 17:17:33 +02:00
Matthew Honnibal 7ebfe4b983 * Fixes to edge features 2015-06-23 16:32:54 +02:00
Matthew Honnibal 7b125f5a86 * Fixes to edge features 2015-06-23 16:31:01 +02:00
Matthew Honnibal 35c290bee4 * Fix edge features 2015-06-23 15:50:56 +02:00
Matthew Honnibal 221e2e485f * Assign 'ROOT' as label, not 'root' 2015-06-23 15:09:54 +02:00
Matthew Honnibal a7bf7b0626 * Rename sent_start to sent_end, to reflect its new usage in the Break transition 2015-06-23 05:39:43 +02:00
Matthew Honnibal ee3e56f27b * Fix bounds checking on entities 2015-06-23 04:35:08 +02:00
Matthew Honnibal 43ef5ddea5 * Ensure root albel is spelled ROOT, for backwards compatibility 2015-06-23 04:14:03 +02:00
Matthew Honnibal 065c2e1d2d * Add some bounds checking around state arrays 2015-06-23 04:13:09 +02:00
Matthew Honnibal f01b3d043e * Add padding to arrays in stateclass. May be papering over a deeper bug. 2015-06-23 03:03:41 +02:00
Matthew Honnibal 69507bc729 * Re-enable Break transition in arc_eager.pyx 2015-06-23 00:03:30 +02:00
Matthew Honnibal ab110be125 * Remove debugging in parser.pyx 2015-06-16 23:37:25 +02:00
Matthew Honnibal 9b13d11ab3 * Fix handling of entities in StateClass 2015-06-16 23:35:21 +02:00
Matthew Honnibal c40a2c661c * Add tree_arc_eager 2015-06-15 08:23:24 +02:00
Matthew Honnibal 5da5cf7084 * Add some more features for S1/S0 2015-06-15 04:07:13 +02:00
Matthew Honnibal 8156a01bca * Fix root label for orig_arc_eager 2015-06-15 02:54:55 +02:00
Matthew Honnibal 21930ede15 * Switch toggle on USE_ROOT_ARC_SEGMENT 2015-06-15 02:54:32 +02:00
Matthew Honnibal 38a6afa484 * Make possibly dubious correction to the unshift oracle 2015-06-15 02:50:00 +02:00
Matthew Honnibal f66228f253 * Add some more features, esp for labels 2015-06-14 21:18:02 +02:00
Matthew Honnibal 3da8e0f317 * Add orig_arc_eager 2015-06-14 20:31:44 +02:00
Matthew Honnibal ea8a103007 * Fix import of TransitionSystem in parser.pyx 2015-06-14 19:01:26 +02:00
Matthew Honnibal e0984ca139 * Fix valency features in StateClass 2015-06-14 17:50:26 +02:00
Matthew Honnibal 763cbd23d5 * Upd stateclass.print_state 2015-06-14 17:44:29 +02:00
Matthew Honnibal bdd07bf000 * Fix Break oracle, but disable the Break transition for now, while we finalize the gold-standard experiments 2015-06-14 17:44:03 +02:00
Matthew Honnibal 399f15fbdf * Add flag to toggle handling of multi-root inputs without the Break transition. Clear up now unused best_valid stuff. 2015-06-14 00:28:37 +02:00
Matthew Honnibal 75289b4761 * Don't refuse to parse single token sentences, incase some transition system needs them, e.g. single word entity. Instead fix error in _init_state. 2015-06-13 22:55:55 +02:00
Matthew Honnibal 77d7e79c7e * Fix r/l and distance features. 2015-06-12 13:06:15 +02:00
Matthew Honnibal 15e177d7a1 * Fixes to unshift/fast-forward strategy. Getting 91.55 greedy on NW dev, gold preproc 2015-06-12 01:50:23 +02:00
Matthew Honnibal afd77a529b * Prepare for break transition, with fast-forwarding. 86.5 on 1k nw gold preproc 2015-06-10 14:08:30 +02:00
Matthew Honnibal 495f528709 * Add support for sentence breaks in stateclass 2015-06-10 12:34:28 +02:00
Matthew Honnibal b7b18c279d * Fix Reduce oracle. Getting 86.35 2015-06-10 11:33:39 +02:00
Matthew Honnibal bb09b5d91a * Fix shifted bit vector in stateclass --- should reflect whether the word has been *unshifted*. 2015-06-10 11:33:09 +02:00
Matthew Honnibal aa9625f688 * Do non-monotonic Unshift. Every word can be shifted at most 1 time. When the Reduce move is used, if S0 has no head, we put the word back on the buffer. Gets 86.4 on nw 1k with gold pre-proc. Break transition not yet implemented for this. 2015-06-10 10:15:56 +02:00
Matthew Honnibal 7bf6b7de3e * Add unshift action to StateClass, and track which moves have been shifted 2015-06-10 10:13:03 +02:00
Matthew Honnibal f7c8069e65 * Fix bug in distance feature 2015-06-10 10:12:17 +02:00
Matthew Honnibal abd07c067a * Inline B and S methods on stateclass 2015-06-10 07:22:33 +02:00
Matthew Honnibal e2f9a80713 * Remove old _state imports 2015-06-10 07:09:17 +02:00
Matthew Honnibal e9aaecc619 * Remove from_struct method from StateClass 2015-06-10 06:58:27 +02:00
Matthew Honnibal 18cc326dc0 * Bug fixes to ner.pyx 2015-06-10 06:57:41 +02:00
Matthew Honnibal e5570c9700 * Set nogil for oracle functions 2015-06-10 06:56:56 +02:00
Matthew Honnibal 4575e7a60f * Fix beam search with new StateClass 2015-06-10 06:33:39 +02:00
Matthew Honnibal 04b1cd9b8c * Greedy parsing working with new StateClass. Beam parsing broken 2015-06-10 04:20:23 +02:00
Matthew Honnibal 6a94b64eca * Remove State* from parser.pyx entirely, switching over to StateClass. Beam parsing still untested. 2015-06-10 02:03:38 +02:00
Matthew Honnibal f14a1526aa * Remove version of fill_context that takes State* 2015-06-10 01:39:07 +02:00
Matthew Honnibal d68c686ec1 * Move StateClass into interface of transition functions 2015-06-10 01:35:28 +02:00
Matthew Honnibal 4b98b3e9c8 * Cost functions now take StateClass argument, instead of State*. 2015-06-10 00:40:43 +02:00
Matthew Honnibal e0cf61f591 * Move StateClass into the interface for is_valid 2015-06-09 23:23:28 +02:00
Matthew Honnibal 0895d454fb * Prepare to switch to using state class, instead of state struct 2015-06-09 21:20:14 +02:00
Matthew Honnibal 2b9629ed62 * Begin adding stateclass to ArcEager 2015-06-09 01:41:09 +02:00
Matthew Honnibal ba10fd8af5 * Add StateClass, to replace/refactor the mess in _state 2015-06-09 01:39:54 +02:00
Matthew Honnibal c7e3dfc1dc * Don't automatically push words when stack is empty, as it messes up beam parsing. Add hash method to beam state. 2015-06-08 14:49:04 +02:00
Matthew Honnibal 6e2564239d * Bug fixes to beam parser. Search still broken on non-gold sentences 2015-06-07 19:12:59 +02:00
Matthew Honnibal 731e5f1e46 * Add get() function in spacy/syntax/Config 2015-06-07 19:09:15 +02:00
Matthew Honnibal 8f142c1838 * Refactor transition system oracles, to split out move and label cost. Preparing to add Unshift move. Will exclude non-monotonic. 2015-06-07 03:21:29 +02:00
Matthew Honnibal 1fee7ade61 * Tweak to ner 2015-06-05 23:48:43 +02:00
Matthew Honnibal 33e70b167f * Remove dead code from ner.pyx 2015-06-05 17:12:47 +02:00
Matthew Honnibal 88ac5c6e98 * Send beam_width < 0 to greedy parser 2015-06-05 17:12:06 +02:00
Matthew Honnibal 0114e7600d * Fix NER oracle 2015-06-05 17:11:26 +02:00
Matthew Honnibal 6bf35cecc3 * Refactor transition system to use classes with staticmethods. 2015-06-05 02:27:17 +02:00
Matthew Honnibal 36a34d544b * Refactoring arc_eager, grouping oracle functions into transitions 2015-06-04 22:43:03 +02:00
Matthew Honnibal 4433396005 * Impove efficiency of dynamic oracle, making beam training faster 2015-06-04 21:15:14 +02:00
Matthew Honnibal 079dad28a7 * Update for faster beam training 2015-06-04 19:32:32 +02:00
Matthew Honnibal a2627b6102 * Fix bug in refactored init_transition 2015-06-03 06:01:26 +02:00
Matthew Honnibal dd0867645d * Remove stray const from State header 2015-06-03 00:10:04 +02:00
Matthew Honnibal 6c47b10a6e * Make optimization to children_in_buffer: stop searching when we would cross a bracket. 2015-06-02 21:05:24 +02:00
Matthew Honnibal a513ec500f * Have oracle functions take a struct instead of a Python object 2015-06-02 20:01:06 +02:00
Matthew Honnibal d1b55310a1 * Refactor _advance_beam function 2015-06-02 18:38:41 +02:00
Matthew Honnibal 0786d9b3c7 * Refactor TransitionSystem, adding set_valid method 2015-06-02 18:38:07 +02:00
Matthew Honnibal a3964957f6 * Add profiling for _state.pyx 2015-06-02 18:36:27 +02:00
Matthew Honnibal e822df0867 * Fix bugs in new greedy/beam parser 2015-06-02 02:01:33 +02:00
Matthew Honnibal 66dfa95847 * Revise greedy_parse/beam_parse ownership goof 2015-06-02 01:34:19 +02:00
Matthew Honnibal 75658b2ed3 * Remove use of new beam.loss property, to maintain compatibility with older versions of thinc for now. 2015-06-02 00:57:09 +02:00
Matthew Honnibal 7c29362d60 * Rename parser class in parser.pxd, now that beam parsing is supported 2015-06-02 00:53:49 +02:00
Matthew Honnibal 58d5ac0944 * Add beam search capabilities to Parser. Rename GreedyParser to Parser. 2015-06-02 00:28:02 +02:00
Matthew Honnibal e09a08bd00 * Add copy_state function 2015-06-01 23:06:30 +02:00
Matthew Honnibal c7876aa8b6 * Add get_valid method 2015-06-01 23:06:00 +02:00
Matthew Honnibal 5e99ff94c8 * Edits to arc eager oracle. Couldn't figure out how the non-monotonic lines made sense. They seem covered by children_in_stack 2015-05-31 15:14:37 +02:00
Matthew Honnibal 6c5632b71c * Roll back proposed change to Break transition while investigate effect 2015-05-31 06:49:52 +02:00
Matthew Honnibal e77940565d * Add length cap to distance feature 2015-05-31 05:25:30 +02:00
Matthew Honnibal fd596351ba * Fix valency features 2015-05-31 05:24:33 +02:00
Matthew Honnibal 76300bbb1b * Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag. 2015-05-30 01:25:46 +02:00
Matthew Honnibal 8f31d3b864 * Relax constraint on Break transition for non-monotonic parsing. 2015-05-28 23:39:52 +02:00
Matthew Honnibal 4010b9b6d9 * Pass parameter for regularization in parser.pyx 2015-05-27 03:18:50 +02:00
Matthew Honnibal fc75210941 * Move spacy.syntax.conll to spacy.gold 2015-05-24 21:35:02 +02:00
Matthew Honnibal efe7a7d7d6 * Clean unused functions from spacy.syntax.conll 2015-05-24 20:06:46 +02:00
Matthew Honnibal 78487f3e66 * Update parser oracle for missing heads 2015-05-24 20:05:58 +02:00
Matthew Honnibal acd1245ad4 * Remove cruft from conll.pyx --- unused stuff about evlauation, which now lives in spacy.scorer 2015-05-24 17:35:49 +02:00
Matthew Honnibal 20f1d868a3 * Tmp commit. Working on whole document parsing 2015-05-24 02:49:56 +02:00
Matthew Honnibal f2ee9c4feb * Comment out constituency parsing stuff, so that code compiles 2015-05-20 16:55:05 +02:00
Matthew Honnibal 9dfc9c039c * Work on constituency parsing. 2015-05-20 16:02:51 +02:00
Matthew Honnibal ba07b925a7 * Fix compile error in conll.pyx 2015-05-12 22:33:47 +02:00
Matthew Honnibal f1e0272b18 * Disable c-parsing transitions 2015-05-12 22:33:25 +02:00
Matthew Honnibal 03a6626545 * Tmp commit 2015-05-12 20:27:56 +02:00
Matthew Honnibal 9568ebed08 * Fix off-by-one in head reading 2015-05-12 20:27:56 +02:00
Matthew Honnibal d2ac8d8007 * Add ctnt field to State, in preparation for constituency parsing 2015-05-12 20:27:56 +02:00
Matthew Honnibal ab67693393 * Add read_json_file to conll.pyx 2015-05-12 20:27:55 +02:00
Matthew Honnibal aff9359a8d * Update ner.pyx to expect brackets from gold_tuples 2015-05-12 20:27:55 +02:00
Matthew Honnibal 53cf77e1c8 * Bug fix: when non-monotonically correct a dependency, make sure to delete the old one from the child list 2015-05-12 20:26:41 +02:00
Matthew Honnibal a4e2af54f9 * Add support for l/r edge to add_dep, and move inlined methods into _state.pyx where possible 2015-05-12 20:26:41 +02:00
Matthew Honnibal fb8d50b3d5 Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-04-30 12:45:15 +02:00
Matthew Honnibal ed8e8c3bd0 * Whitespace 2015-04-29 14:22:47 +02:00
Matthew Honnibal 763ef01575 * Fix two bugs in feature calculation 2015-04-28 23:25:09 +02:00
Matthew Honnibal b3fd48c97b * Fix missing root labels bug identified in Issue #57 2015-04-28 20:45:51 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 99dbf8a38c * Fix error type in lookup_transition 2015-04-16 01:36:22 +02:00
Matthew Honnibal 9f16848b60 * Add (N0w, N1w) unigram pair to NER features, prompted by failure to detect 'this weekend' 2015-04-15 06:01:18 +02:00
Matthew Honnibal 507048dc45 * Rename StandardError to Exception, for Python 3 compatibility 2015-04-12 07:28:34 +02:00
Matthew Honnibal 1d05e6da00 * Add ne_iob and ne_type features to NER 2015-04-10 19:07:08 +02:00
Matthew Honnibal 4df8a3d90f * Add ne_iob and ne_type attributes to context vector 2015-04-10 05:02:15 +02:00
Matthew Honnibal 8c354c432b * Add ValueError condition to ner_tag reading 2015-04-10 04:59:59 +02:00
Matthew Honnibal 435cccf098 * Add read_conll03_file function to conll.pyx 2015-04-10 04:59:11 +02:00
Matthew Honnibal 99c9ecfc18 * Fix bug in prefix, suffix and word shape features in parser and NER 2015-04-10 03:53:33 +02:00
Matthew Honnibal 5a075ea3fc * Ensure NER moves are available for single-word tokens 2015-04-05 22:30:58 +02:00
Matthew Honnibal a60a366b2c * Support 'punct' dep label in conll.pyx 2015-04-05 22:30:19 +02:00
Matthew Honnibal a3af6b7c3d * Left-Arc from Root, to allow non-monotonic reduce to compete with left-arc when the stack is not empty. 2015-03-27 17:39:16 +01:00
Matthew Honnibal db5a43318c * Improve print_state debug printer 2015-03-27 17:29:58 +01:00
Matthew Honnibal 1705eccbbe * Remove whitespace 2015-03-27 15:22:39 +01:00
Matthew Honnibal 3feb52374c * Break apart a condition, for ease of debug printing 2015-03-27 15:21:38 +01:00
Matthew Honnibal b32f581acb * Fix bug in ArcEager.get_labels 2015-03-27 15:21:06 +01:00
Matthew Honnibal 1320bd19db * Move Span class to own file 2015-03-26 16:45:38 +01:00
Matthew Honnibal e854ba0a13 * Remove support for force_gold flag from GreedyParser, since it's not so useful, and it's clutter 2015-03-26 16:44:47 +01:00
Matthew Honnibal 6a6085f8b9 * Clean up GreedyParser.train function a bit 2015-03-26 16:44:47 +01:00
Matthew Honnibal b3157927e6 * Clean up unused feature templates 2015-03-26 16:44:47 +01:00
Matthew Honnibal 411bf377d4 * Remove dependency on ner_util module 2015-03-26 16:44:47 +01:00
Matthew Honnibal 01c892f583 * Add comment to fill_context 2015-03-26 16:44:47 +01:00
Matthew Honnibal 2741179aff * Important bug fix: Fill token N2w, which was being unfilled, after a bad edit while writing the NER features. 2015-03-26 16:44:47 +01:00
Matthew Honnibal 71648205d9 * Add support for debug feature set. Just use unigrams for this. 2015-03-26 16:44:47 +01:00
Matthew Honnibal 3b70b304b2 * Add words to gold_tuples from gold conll file 2015-03-26 16:44:47 +01:00
Matthew Honnibal 05d6065e2e * Add assertion 2015-03-26 16:44:46 +01:00
Matthew Honnibal 377e9b29b1 * Whitespace 2015-03-26 16:44:46 +01:00
Matthew Honnibal 9f4ad8fdfb * Assign root words the ROOT label via the Break transition. Something is still wrong here... 2015-03-26 16:44:46 +01:00
Matthew Honnibal f729164c01 * Fix bug in label assignment: ensure null-label transitions receive the label 0 2015-03-26 16:44:46 +01:00
Matthew Honnibal 31fad99518 * Use StringStore to encode label names, instead of label_ids 2015-03-26 16:44:45 +01:00
Matthew Honnibal b9b695fb1b * Remove debug word list 2015-03-26 16:44:45 +01:00
Matthew Honnibal 1c843934be * Fix oracle bug in NER. Now getting 77% F on ontonotes 2015-03-26 16:44:44 +01:00
Matthew Honnibal e181c051d5 * Improve features for NER 2015-03-26 16:44:44 +01:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal ae235e07b9 * Refactoring working for parser, but now need to rig up features for NER, and then debug oracle etc. 2015-03-26 16:44:44 +01:00
Matthew Honnibal b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00
Matthew Honnibal 6b6bce9e7a * Fix label loading for transition system 2015-03-26 16:44:43 +01:00
Matthew Honnibal 5278c7504b * Hacks to conll.pyx. Should clean these up. 2015-03-26 16:44:43 +01:00
Matthew Honnibal f321b2b2eb * Remove TODO comment 2015-03-26 16:44:43 +01:00
Matthew Honnibal fdabd93bfb * Ensure high loss for invalid moves, and fix label reading for arc-eager 2015-03-26 16:44:43 +01:00
Matthew Honnibal 10ed738df2 * Tmp commit 2015-03-26 16:44:43 +01:00
Matthew Honnibal 4f83c9b3d5 * Make costs label-sensitive 2015-03-26 16:44:43 +01:00
Matthew Honnibal 8c883cef58 * Refactored transition system code now compiling. Still need to hook up label oracle, and test 2015-03-26 16:44:43 +01:00
Matthew Honnibal f0159ab4b6 * Add file to hold GoldParse class 2015-03-26 16:44:42 +01:00
Matthew Honnibal 8eadb984cb * Refactor arc_eager to use new TransitionSystem base class. Need to fix oracle 2015-03-26 16:44:42 +01:00
Matthew Honnibal b063001596 * Add base TransitionSystem class. Still need to rethink how non-monotonic labelling will work for best_valid 2015-03-26 16:44:42 +01:00
Matthew Honnibal dc986dbc0b * Work on refactored parser, where TransitionSystem can be easily subclassed 2015-03-26 16:44:42 +01:00
Matthew Honnibal 135756ac3d * Tmp commit of NER refactoring 2015-03-26 16:44:42 +01:00
Matthew Honnibal 0ff078876a * Commit some work on ner.yx done on the plane 2015-03-26 16:44:41 +01:00
Matthew Honnibal d81b7be6a2 * Merge train.py 2015-03-26 16:44:41 +01:00
Matthew Honnibal 3d0570685c * Add NER transition system 2015-03-26 16:44:41 +01:00
Matthew Honnibal ea90d136e8 * Fix bug in labelled parsing, that caused an 8% drop in labelled accuracy. 2015-02-27 03:56:10 -05:00
Matthew Honnibal 312b3a45f3 * Fix issue #19: Allow parsing/pos tagging of empty strings 2015-02-10 10:15:58 -05:00
Matthew Honnibal 5c3513583d * Clear buffered python tokens when modifying the Tokens object. Need to clean this up, and modify via a method on Tokens. 2015-02-09 03:57:10 -05:00
Matthew Honnibal c55a33d045 * Catch oracle errors 2015-02-02 23:02:04 +11:00
Matthew Honnibal d68678a93e * Add Exception class, OracleError 2015-02-02 11:57:32 +11:00
Matthew Honnibal 88170e6295 * Supply dep_strings as a tuple, for the changed API on Tokens 2015-01-31 13:42:09 +11:00
Matthew Honnibal 0981d68022 * Set a sent_end flag during parsing, for later use 2015-01-31 13:41:46 +11:00
Matthew Honnibal 0f95712189 * Improve accuracy reporting during training 2015-01-30 18:05:06 +11:00
Matthew Honnibal 67d6e53a69 * Ensure parser and tagger function correctly when training from missing values, indicated by -1 2015-01-30 14:08:56 +11:00
Matthew Honnibal ebf7d2fab1 * Use non-joint sbd, for more simplicity and fewer classes 2015-01-29 06:22:03 +11:00
Matthew Honnibal d05c5bf141 * Remove comment 2015-01-29 05:19:27 +11:00
Matthew Honnibal 320b045daa * Oracle now consistent over gold standard derivation 2015-01-29 03:41:58 +11:00
Matthew Honnibal f590382134 * Work on sbd 2015-01-29 03:18:29 +11:00
Matthew Honnibal 1884a7a0be * Attach comment with paper 2015-01-28 03:18:43 +11:00
Matthew Honnibal a2d6b195db * Add messy Break transitions, carefully following the scheme of Dd Zhang et al (2013) 2015-01-28 03:09:45 +11:00
Matthew Honnibal f9ee5d9934 * Build a python list of word strings, for debugging 2015-01-28 01:06:13 +11:00
Matthew Honnibal d819101571 * Improve error message on oracle failure 2015-01-28 00:58:03 +11:00
Matthew Honnibal 7431c133d8 * Add error if try to access head and not is_parsed 2015-01-25 15:33:54 +11:00
Matthew Honnibal a97bed9359 * Fix POS and dependency label tag names. Add parse and string navigation functions. 2015-01-24 17:29:04 +11:00
Matthew Honnibal 5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal aacaf1a0f0 * Fix parser 2015-01-08 01:19:23 +11:00
Matthew Honnibal 9a21127bf7 * Fix parser, which was importing the wrong model 2015-01-08 00:10:15 +11:00
Matthew Honnibal 3f1944d688 * Make PyPy work 2015-01-05 17:54:38 +11:00
Matthew Honnibal ae7c811fd1 * Use Exception instead of StandardError 2015-01-04 01:22:12 +11:00
Matthew Honnibal 5d9a096e2f * Some minor clean-up after HastyModel 2014-12-31 19:46:04 +11:00
Matthew Honnibal aafaf58cbe * Refactor _ml.Model, and finish implementing HastyModel so far not worthwhile. 2014-12-31 19:40:59 +11:00
Matthew Honnibal 1ffb0229ed * Import tokens in parser.pxd 2014-12-30 21:21:17 +11:00
Matthew Honnibal bb80937544 * Upd docstrings 2014-12-27 18:45:16 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal 4c4aa2c5c9 * Work on train 2014-12-22 07:25:43 +11:00
Matthew Honnibal b34a1325d3 * Everything compiling after reorg. About to start testing. 2014-12-21 05:42:23 +11:00
Matthew Honnibal e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal ff252dd535 * Clean up 'guess_cache' idea, which didnt work well enough 2014-12-20 03:49:11 +11:00
Matthew Honnibal bed680c632 * Remove commented-out features 2014-12-20 03:47:32 +11:00
Matthew Honnibal 3d178c03ae * Prune the features a bit 2014-12-20 02:46:14 +11:00
Matthew Honnibal 7920ea72b4 * Working parser with the decision memory idea. Disabling that for now, for simplicity 2014-12-20 01:43:15 +11:00
Matthew Honnibal a2f2a48da9 * Add some extra features 2014-12-20 01:42:24 +11:00
Matthew Honnibal 53b8bc1f3c * Work on implementing a trainable cache for the parser. So far, doesn't improve efficiency 2014-12-19 09:30:50 +11:00
Matthew Honnibal f72243b156 * Set const-correctness for Feature* array 2014-12-18 20:41:32 +11:00
Matthew Honnibal 6ab7e40590 * Add non-monotonic parsing with cost-sensitive update. 92.26 on Y&M set 2014-12-18 11:33:25 +11:00
Matthew Honnibal 7e0c692daf * Automatically push when the stack is empty 2014-12-18 09:16:10 +11:00
Matthew Honnibal 61142a8eff * Tweak features 2014-12-18 09:15:03 +11:00
Matthew Honnibal 8446ebfbbb * Work on parser. Up to 92 UAS on YM labels 2014-12-18 09:05:31 +11:00
Matthew Honnibal 55de747bfc * Remove .cpp files 2014-12-18 02:43:13 +11:00
Matthew Honnibal 4448a840f7 * Work on greedy parsing. Scoring about 91.2 2014-12-18 02:42:55 +11:00
Matthew Honnibal 9d7d97978d * Work on greedy parser 2014-12-17 21:09:29 +11:00
Matthew Honnibal d524dd306a * Work on greedy parser 2014-12-17 03:19:43 +11:00
Matthew Honnibal 95ccea03b2 * Work on greedy parser 2014-12-16 22:46:55 +11:00