Commit Graph

1412 Commits

Author SHA1 Message Date
Matthew Honnibal d27899658e * Import classes in spacy.tokens.__init__ 2015-07-13 19:48:55 +02:00
Matthew Honnibal aa82caf8f5 * Add TokenC.spacy attr 2015-07-13 19:48:07 +02:00
Matthew Honnibal dba6b47d4e * Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 19:20:48 +02:00
Matthew Honnibal 5b0a7190c9 * Round-trip for serialization finally working. Needs a lot of optimization. 2015-07-13 18:39:38 +02:00
Matthew Honnibal edd371246c * Make huffman coder take BitArray in encode/decode. Add __iter__ method to BitArray. 2015-07-13 17:33:33 +02:00
Matthew Honnibal af5cc926a4 * Add codec property to Vocab, to use the Huffman encoding 2015-07-13 13:55:14 +02:00
Matthew Honnibal 77385d5580 * Make .pxd file for huffman codec 2015-07-13 13:54:51 +02:00
Matthew Honnibal 0628e0e2a8 * Add tests for huffman encoding 2015-07-13 12:58:07 +02:00
Matthew Honnibal 083b6ea7ae * Clean up encoder a bit. now read for integration into Vocab. 2015-07-13 12:57:22 +02:00
Matthew Honnibal 8d0f1d98da * Draft dockstring for HuffmanCache 2015-07-13 12:01:18 +02:00
Matthew Honnibal 281f1faefb * Nearly finished huffman coder 2015-07-12 23:48:46 +02:00
Matthew Honnibal e1a25fba32 * Work on huffman coder 2015-07-12 19:58:05 +02:00
Matthew Honnibal 3fb9de2d13 * Remove vector[bint], in favor of simple Code struct. 2015-07-12 17:58:27 +02:00
Matthew Honnibal aa7bfd932b * Work on compressor 2015-07-12 16:03:43 +02:00
Matthew Honnibal 14eafcab15 * Refactor to use vector[bint] 2015-07-12 05:27:47 +02:00
Matthew Honnibal 6a6e852a39 * Refactor huffman coding stuff into class 2015-07-12 05:06:36 +02:00
Matthew Honnibal aad96fdb5c * Improve efficiency of huffman coding 2015-07-12 01:31:37 +02:00
Matthew Honnibal ff9ff6f3fa * Ensure unseen words are given low log probability 2015-07-12 01:31:09 +02:00
Matthew Honnibal 9d3b0d83de * Refactor huffman coding 2015-07-11 22:27:43 +02:00
Matthew Honnibal 8d29406cd6 * Rename span.right to span.rights 2015-07-11 22:15:04 +02:00
Matthew Honnibal da9f358166 * Fix span getting 2015-07-11 21:41:41 +02:00
Matthew Honnibal 11e8f2ffb4 * Huffman codes working 2015-07-11 20:01:10 +02:00
Matthew Honnibal cb6fc81909 * Work on huffman coding. 2015-07-11 15:23:35 +02:00
Matthew Honnibal 4c9b77fe95 * Begin working on serialization code 2015-07-11 10:57:30 +02:00
Matthew Honnibal 11a380e00f * Draft v0.89 update notes 2015-07-10 19:41:42 +02:00
Matthew Honnibal 53d1f5b2eb * Rename Span.head to Span.root. 2015-07-09 17:30:58 +02:00
Matthew Honnibal c0255ed7d8 * Allow slice indexing in Doc.__getitem__, returning a Span object 2015-07-09 15:15:32 +02:00
Matthew Honnibal 7d2964f673 * Test that whitespace is not assigned a tag 2015-07-09 13:31:40 +02:00
Matthew Honnibal b5223c4824 * Add whitespace to specials.json 2015-07-09 13:31:12 +02:00
Matthew Honnibal 89a91ad726 * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity 2015-07-09 13:30:41 +02:00
Matthew Honnibal f95da0bd52 * Allow tests to read model dir from SPACY_DATA environment variable 2015-07-09 12:18:02 +02:00
Matthew Honnibal 55f1042443 * Improve efficiency of L and R features, correcting the non-linear-in-length problem. 2015-07-09 12:17:26 +02:00
Matthew Honnibal 70d2acb579 * Fix edge features 2015-07-09 12:15:01 +02:00
Matthew Honnibal 8a7bbd5850 * Announce v0.88 2015-07-09 12:12:45 +02:00
Matthew Honnibal 703ca40420 * Inc version 2015-07-08 20:07:23 +02:00
Matthew Honnibal adb868bdad * Add warning for models not found in parser 2015-07-08 20:04:55 +02:00
Matthew Honnibal 05b28ec9eb * Add warning for models not found in parser 2015-07-08 20:02:13 +02:00
Matthew Honnibal ef700401a6 * Add warning for models not found in parser 2015-07-08 20:00:46 +02:00
Matthew Honnibal 6218d8b389 * Add warning for models not found in parser 2015-07-08 19:59:16 +02:00
Matthew Honnibal f6a6c39ce8 * Add warning for models not found in parser 2015-07-08 19:52:30 +02:00
Matthew Honnibal 78db7e32f7 * Remove has_sense method from Lexeme declaration 2015-07-08 19:41:20 +02:00
Matthew Honnibal 6ddb2f5e45 * Restore merge_mwe in English class 2015-07-08 19:35:30 +02:00
Matthew Honnibal 6859f6adac * Restore merge_mwe in English class 2015-07-08 19:34:55 +02:00
Matthew Honnibal 3c270fc8ff * Remove has_sense method from Lexeme 2015-07-08 19:28:29 +02:00
Matthew Honnibal b64c843861 * Remove senses attr 2015-07-08 19:26:24 +02:00
Matthew Honnibal 1d3a592edf * Remove the senses attr from LexemeC, to keep data compatibility 2015-07-08 19:24:44 +02:00
Matthew Honnibal 0ceb1f71c2 * Update parse features 2015-07-08 19:11:36 +02:00
Matthew Honnibal 2e51b5027a * Alias Doc to Tokens, for backwards compatibility 2015-07-08 18:59:35 +02:00
Matthew Honnibal 462301d9e6 * Fix reference to Tokens in documentation 2015-07-08 18:58:25 +02:00
Matthew Honnibal e3c53f5ecd * Fix mention of Tokens in docstring 2015-07-08 18:56:27 +02:00