Commit Graph

40 Commits

Author SHA1 Message Date
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
Matthew Honnibal f51e6a6c16 Adjust lexeme sizing for attr_t being 64 bit 2017-05-28 12:51:09 +02:00
Matthew Honnibal 3ea98e2043 Remove vector member from lexeme 2017-05-28 11:46:24 +02:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Matthew Honnibal 9ec7b9c454 * Clean up unused Constituent struct. 2015-11-03 23:48:21 +11:00
Matthew Honnibal 1e99fcd413 * Rename .repvec to .vector in C API 2015-11-03 23:47:59 +11:00
Matthew Honnibal 7ac6cacc26 * Remove const qualifier on LexemeC.repvec 2015-09-15 14:42:51 +10:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 1d7f2d3abc * Hack on morphology structs 2015-08-26 19:18:36 +02:00
Matthew Honnibal 815bda201d * Remove UniStr struct 2015-07-22 13:39:17 +02:00
Matthew Honnibal 128b6d9714 * Move Utf8Str struct to strings module, as that's the only place it's relevant 2015-07-20 12:06:41 +02:00
Matthew Honnibal 4dddc8a69b * Fix type declarations for attr_t. Remove unused id_t. 2015-07-18 22:39:57 +02:00
Matthew Honnibal 95e57c2780 * Remove unnecessary key and id properties from Utf8String. 2015-07-17 01:40:18 +02:00
Matthew Honnibal aa82caf8f5 * Add TokenC.spacy attr 2015-07-13 19:48:07 +02:00
Matthew Honnibal 1d3a592edf * Remove the senses attr from LexemeC, to keep data compatibility 2015-07-08 19:24:44 +02:00
Matthew Honnibal e23d1582a2 * Add supersense data to Lexeme objects. Add simple has_sense method to check the flag. 2015-07-01 18:50:37 +02:00
Matthew Honnibal a7bf7b0626 * Rename sent_start to sent_end, to reflect its new usage in the Break transition 2015-06-23 05:39:43 +02:00
Matthew Honnibal 8ee7c541f1 * Update Constituent definition 2015-05-20 16:03:26 +02:00
Matthew Honnibal 03a6626545 * Tmp commit 2015-05-12 20:27:56 +02:00
Matthew Honnibal d2ac8d8007 * Add ctnt field to State, in preparation for constituency parsing 2015-05-12 20:27:56 +02:00
Matthew Honnibal d634038eb6 * Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token 2015-05-12 20:26:41 +02:00
Jordan Suchow 3a8d9b37a6 Remove trailing whitespace 2015-04-19 13:01:38 -07:00
Matthew Honnibal 8057a95f20 * NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring. 2015-03-26 16:44:44 +01:00
Matthew Honnibal b3eda03c9c * Tmp 2015-03-26 16:44:44 +01:00
Matthew Honnibal 135756ac3d * Tmp commit of NER refactoring 2015-03-26 16:44:42 +01:00
Matthew Honnibal b139aa92ba * Start setting out how NER will be implemented in the data model 2015-03-26 16:44:41 +01:00
Matthew Honnibal 75f9b7d6bf * Add L2 norm field to LexemeC struct 2015-02-07 08:43:17 -05:00
Matthew Honnibal 08ca5c8970 * Add sent_end flag to TokenC struct 2015-01-31 13:44:16 +11:00
Matthew Honnibal 12b034e3ef * Move POS tag definitions to parts_of_speech.pxd 2015-01-25 16:31:07 +11:00
Matthew Honnibal fda94271af * Rename NORM1 and NORM2 attrs to lower and norm 2015-01-24 06:17:03 +11:00
Matthew Honnibal 5ed8b2b98f * Rename sic to orth 2015-01-23 02:08:25 +11:00
Matthew Honnibal 45264e356b * Rename vec to repvec 2015-01-22 02:04:24 +11:00
Matthew Honnibal 6c7e44140b * Work on word vectors, and other stuff 2015-01-17 16:21:17 +11:00
Matthew Honnibal 46da3d74d2 * Tmp. Refactoring, introducing a Lexeme PyObject. 2015-01-12 11:23:44 +11:00
Matthew Honnibal ce2edd6312 * Tmp commit. Refactoring to create a Python Lexeme class. 2015-01-12 10:26:22 +11:00
Matthew Honnibal b8b65903fc * Tmp 2014-12-24 17:42:00 +11:00
Matthew Honnibal e1c1a4b868 * Tmp 2014-12-21 05:36:29 +11:00
Matthew Honnibal 780cbd68b1 * Move all struct definitions to structs.pxd, to avoid circular dependencies 2014-12-20 06:51:33 +11:00