spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	84e66ca6d4	WIP on stringstore change. 27 failures	2017-05-28 14:06:40 +02:00
Matthew Honnibal	f51e6a6c16	Adjust lexeme sizing for attr_t being 64 bit	2017-05-28 12:51:09 +02:00
Matthew Honnibal	3ea98e2043	Remove vector member from lexeme	2017-05-28 11:46:24 +02:00
Matthew Honnibal	793430aa7a	Get spaCy train command working with neural network * Integrate models into pipeline * Add basic serialization (maybe incorrect) * Fix pickle on vocab	2017-05-17 12:04:50 +02:00
Matthew Honnibal	58e83fe34b	Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match.	2016-09-21 14:54:55 +02:00
Wolfgang Seeker	03fb498dbe	introduce lang field for LexemeC to hold language id put noun_chunk logic into iterators.py for each language separately	2016-03-10 13:01:34 +01:00
Matthew Honnibal	9ec7b9c454	* Clean up unused Constituent struct.	2015-11-03 23:48:21 +11:00
Matthew Honnibal	1e99fcd413	* Rename .repvec to .vector in C API	2015-11-03 23:47:59 +11:00
Matthew Honnibal	7ac6cacc26	* Remove const qualifier on LexemeC.repvec	2015-09-15 14:42:51 +10:00
Matthew Honnibal	c2307fa9ee	* More work on language-generic parsing	2015-08-28 02:02:33 +02:00
Matthew Honnibal	1d7f2d3abc	* Hack on morphology structs	2015-08-26 19:18:36 +02:00
Matthew Honnibal	815bda201d	* Remove UniStr struct	2015-07-22 13:39:17 +02:00
Matthew Honnibal	128b6d9714	* Move Utf8Str struct to strings module, as that's the only place it's relevant	2015-07-20 12:06:41 +02:00
Matthew Honnibal	4dddc8a69b	* Fix type declarations for attr_t. Remove unused id_t.	2015-07-18 22:39:57 +02:00
Matthew Honnibal	95e57c2780	* Remove unnecessary key and id properties from Utf8String.	2015-07-17 01:40:18 +02:00
Matthew Honnibal	aa82caf8f5	* Add TokenC.spacy attr	2015-07-13 19:48:07 +02:00
Matthew Honnibal	1d3a592edf	* Remove the senses attr from LexemeC, to keep data compatibility	2015-07-08 19:24:44 +02:00
Matthew Honnibal	e23d1582a2	* Add supersense data to Lexeme objects. Add simple has_sense method to check the flag.	2015-07-01 18:50:37 +02:00
Matthew Honnibal	a7bf7b0626	* Rename sent_start to sent_end, to reflect its new usage in the Break transition	2015-06-23 05:39:43 +02:00
Matthew Honnibal	8ee7c541f1	* Update Constituent definition	2015-05-20 16:03:26 +02:00
Matthew Honnibal	03a6626545	* Tmp commit	2015-05-12 20:27:56 +02:00
Matthew Honnibal	d2ac8d8007	* Add ctnt field to State, in preparation for constituency parsing	2015-05-12 20:27:56 +02:00
Matthew Honnibal	d634038eb6	* Add l_edge and r_edge props in TokenC for tracking the parse-yield of the token	2015-05-12 20:26:41 +02:00
Jordan Suchow	3a8d9b37a6	Remove trailing whitespace	2015-04-19 13:01:38 -07:00
Matthew Honnibal	8057a95f20	* NER seems to be working, scoring 69 F. Need to add decision-history features --- currently only use current word, 2 words context. Need refactoring.	2015-03-26 16:44:44 +01:00
Matthew Honnibal	b3eda03c9c	* Tmp	2015-03-26 16:44:44 +01:00
Matthew Honnibal	135756ac3d	* Tmp commit of NER refactoring	2015-03-26 16:44:42 +01:00
Matthew Honnibal	b139aa92ba	* Start setting out how NER will be implemented in the data model	2015-03-26 16:44:41 +01:00
Matthew Honnibal	75f9b7d6bf	* Add L2 norm field to LexemeC struct	2015-02-07 08:43:17 -05:00
Matthew Honnibal	08ca5c8970	* Add sent_end flag to TokenC struct	2015-01-31 13:44:16 +11:00
Matthew Honnibal	12b034e3ef	* Move POS tag definitions to parts_of_speech.pxd	2015-01-25 16:31:07 +11:00
Matthew Honnibal	fda94271af	* Rename NORM1 and NORM2 attrs to lower and norm	2015-01-24 06:17:03 +11:00
Matthew Honnibal	5ed8b2b98f	* Rename sic to orth	2015-01-23 02:08:25 +11:00
Matthew Honnibal	45264e356b	* Rename vec to repvec	2015-01-22 02:04:24 +11:00
Matthew Honnibal	6c7e44140b	* Work on word vectors, and other stuff	2015-01-17 16:21:17 +11:00
Matthew Honnibal	46da3d74d2	* Tmp. Refactoring, introducing a Lexeme PyObject.	2015-01-12 11:23:44 +11:00
Matthew Honnibal	ce2edd6312	* Tmp commit. Refactoring to create a Python Lexeme class.	2015-01-12 10:26:22 +11:00
Matthew Honnibal	b8b65903fc	* Tmp	2014-12-24 17:42:00 +11:00
Matthew Honnibal	e1c1a4b868	* Tmp	2014-12-21 05:36:29 +11:00
Matthew Honnibal	780cbd68b1	* Move all struct definitions to structs.pxd, to avoid circular dependencies	2014-12-20 06:51:33 +11:00

40 Commits