spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	ae78c9e3ce	* Implement character-based codec, so that we can do word/char backoff	2015-07-19 22:03:39 +02:00
Matthew Honnibal	cd1d047cb8	* Delete out-dated HuffmanCodec comment	2015-07-19 18:28:14 +02:00
Matthew Honnibal	b8086067d5	* Build Huffman codec from unsorted inputs	2015-07-19 17:58:44 +02:00
Matthew Honnibal	317cbbc015	* Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time.	2015-07-19 15:18:17 +02:00
Matthew Honnibal	6b13e7227c	* Remove duplicate get_lex_attr method from doc.pyx	2015-07-18 22:46:07 +02:00
Matthew Honnibal	e49c7f1478	* Update oov check in tokenizer	2015-07-18 22:45:28 +02:00
Matthew Honnibal	cfd842769e	* Allow infix tokens to be variable length	2015-07-18 22:45:00 +02:00
Matthew Honnibal	5b4c78bbb2	* Use an AttributeCodec based on orth for words. Still no oov handling mechanism.	2015-07-18 22:43:18 +02:00
Matthew Honnibal	82d84b0f2b	* Index lexemes by orth, instead of a lexemes vector. Breaks the mechanism for deciding not to own LexemeC structs during parsing. Need to reinstate this.	2015-07-18 22:42:15 +02:00
Matthew Honnibal	4dddc8a69b	* Fix type declarations for attr_t. Remove unused id_t.	2015-07-18 22:39:57 +02:00
Matthew Honnibal	ced59ab9ea	* Make minor efficiency improvement in Doc.__iter__	2015-07-18 04:10:53 +02:00
Matthew Honnibal	cd91914dd8	* Fix hard-coded length	2015-07-18 04:09:56 +02:00
Matthew Honnibal	b1d74ce60d	* Remove unused joint.pyx and joint.pxd files	2015-07-17 23:31:44 +02:00
Matthew Honnibal	c27514512b	* Remove cruft ner/ directory	2015-07-17 23:24:32 +02:00
Matthew Honnibal	f8d6d319f4	* Remove cruft module	2015-07-17 23:23:05 +02:00
Matthew Honnibal	fb0a641a2d	* Don't release the gil around Parser.parse. Does this indicate thread problems?	2015-07-17 23:07:37 +02:00
Matthew Honnibal	e29daea85f	* Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool, but in C it means int. So, type-casting to bint* is unsafe.	2015-07-17 22:37:24 +02:00
Matthew Honnibal	cf0c788892	* Tests passing on round-trip pack/unpack on basic example	2015-07-17 21:20:48 +02:00
Matthew Honnibal	44f39a876f	* Add a blank attrs.pyx	2015-07-17 16:40:42 +02:00
Matthew Honnibal	c2c83120d4	* Remove codec property from Vocab	2015-07-17 16:40:11 +02:00
Matthew Honnibal	dfdf19f6a9	* Draft a from_orth method for Doc	2015-07-17 16:39:54 +02:00
Matthew Honnibal	9e3f17051b	* Move to ORTH instead of ID for encoding lexemes. Basic tests of the codec wrappers now passing	2015-07-17 16:38:29 +02:00
Matthew Honnibal	15ff739996	* Fix passing of ID attribute in string store	2015-07-17 14:49:42 +02:00
Matthew Honnibal	95e57c2780	* Remove unnecessary key and id properties from Utf8String.	2015-07-17 01:40:18 +02:00
Matthew Honnibal	234c7e440a	* Add spacy/serialize/__init__ files	2015-07-17 01:37:33 +02:00
Matthew Honnibal	db9dfd2e23	* Major refactor of serialization. Nearly complete now.	2015-07-17 01:27:54 +02:00
Matthew Honnibal	c8282f9934	* Work on serialization. Needs more reorganisation	2015-07-16 19:56:02 +02:00
Matthew Honnibal	d8458d6a25	* Fix attr_id_t import in Spans	2015-07-16 19:55:21 +02:00
Matthew Honnibal	897de2d438	* Add 'bitter' property for serializer in English class	2015-07-16 17:47:53 +02:00
Matthew Honnibal	fb54052ae0	* Work on serializer design	2015-07-16 17:46:46 +02:00
Matthew Honnibal	a6f401580d	* Add from_array function to Doc.	2015-07-16 17:46:11 +02:00
Matthew Honnibal	2a5d050134	* Give codec loading back to Vocab.	2015-07-16 17:45:42 +02:00
Matthew Honnibal	8bf0f65f1c	* Remove dead code in strings.pyx	2015-07-16 17:35:53 +02:00
Matthew Honnibal	a9c3863665	* Fix inefficiency in StringStore.dump function	2015-07-16 17:34:32 +02:00
Matthew Honnibal	b59d271510	* Move serialization functionality into Serializer class	2015-07-16 11:23:48 +02:00
Matthew Honnibal	30be4f15da	* Import attrs from spacy.attrs, not spacy.typedefs	2015-07-16 11:23:25 +02:00
Matthew Honnibal	6c99e5f4aa	* Move serialization into Serializer class, with __call__ and train() api	2015-07-16 11:22:35 +02:00
Matthew Honnibal	e2133d990e	* Move serialization functionality out into a Serializer object	2015-07-16 11:21:44 +02:00
Matthew Honnibal	a6d040bd11	* Import Lexeme attrs from spacy.attrs, not spacy.typedefs	2015-07-16 11:20:08 +02:00
Matthew Honnibal	45ae1ce428	* Remove unused declaration in parser	2015-07-16 01:27:11 +02:00
Matthew Honnibal	efa80096f1	* Upd attrs id list	2015-07-16 01:26:54 +02:00
Matthew Honnibal	01fab6bb90	* Improve de/serialize functions	2015-07-16 01:26:35 +02:00
Matthew Honnibal	0e07c1ed2a	* draft de/serialization functions in doc.pyx	2015-07-16 01:16:33 +02:00
Matthew Honnibal	9d956b07e9	* Fix import of attrs in doc.pyx, and update the get_token_attr function.	2015-07-16 01:15:34 +02:00
Matthew Honnibal	65251e7625	* Remove redundant attr_id_t from typedefs.pxd	2015-07-16 00:58:51 +02:00
Matthew Honnibal	9a8db9743c	* Remove gil from parser.call	2015-07-14 23:47:33 +02:00
Matthew Honnibal	38ca0c33f5	Merge branch 'neuralnet' into refactor Mostly refactors parser, to use new thinc3.2 Example class. Aim is to remove use of shared memory, so that we can parallelize over documents easily. Conflicts: setup.py spacy/syntax/parser.pxd spacy/syntax/parser.pyx spacy/syntax/stateclass.pyx	2015-07-14 14:13:47 +02:00
Matthew Honnibal	935ac53ee3	* Extend count_by method	2015-07-14 03:20:09 +02:00
Matthew Honnibal	3b5baa660f	* Fix tokenizer	2015-07-14 00:10:51 +02:00
Matthew Honnibal	2ae0b439b2	* Fix space check in gold.pyx	2015-07-14 00:10:27 +02:00

1 2 3 4 5 ...

878 Commits