Commit Graph

1686 Commits

Author SHA1 Message Date
Matthew Honnibal 62a01dd41d * Fix issue #92: lexemes.bin read error on 32-bit platforms. 2015-09-08 14:23:58 +02:00
Matthew Honnibal 55ed3b3a63 Merge pull request #85 from NSchrading/master
Add a script to generate the specials.json file
2015-09-07 09:05:19 +10:00
jxs8172 85f01c5e16 Add contributor agreement. Add exception to 'it' so that 'its' and 'Its' isn't generated (its =/= it's) 2015-08-24 18:20:06 -04:00
Matthew Honnibal 25f29232ca Merge pull request #86 from vsolovyov/fix-c-ext-in-setuppy
Correctly pass link_args in c_ext() in setup.py
2015-08-24 20:18:49 +10:00
Vsevolod Solovyov bbdb973398 Add contributor agreement for vsolovyov 2015-08-24 13:09:23 +03:00
Vsevolod Solovyov 39cfe28f33 Correctly pass link_args in c_ext() in setup.py 2015-08-24 12:52:05 +03:00
jxs8172 5876248109 Add missing we've and hardcoded 's and 'S 2015-08-21 22:57:47 -04:00
jxs8172 a5e0a0073b Add a script to generate the specials.json file, to take care of handling uppercase and missing apostrophe contractions 2015-08-21 22:39:33 -04:00
Matthew Honnibal bb910cff92 * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
Matthew Honnibal dcafb181b9 * Fix Python3 problem in align_raw 2015-07-28 15:52:10 +02:00
Matthew Honnibal c609ea18f0 * Increment version in download script 2015-07-28 15:22:17 +02:00
Matthew Honnibal 9c4d0aae62 * Switch to better Python2/3 compatible unicode handling 2015-07-28 14:45:37 +02:00
Matthew Honnibal 7606d9936f * Python3 correction for GoldParse 2015-07-28 14:44:53 +02:00
Matthew Honnibal ddc1a5cfe5 * Fix training under python3 2015-07-28 14:09:30 +02:00
Matthew Honnibal a8bbd7312c * Hackishly patch long dependencies problem 2015-07-28 00:14:29 +02:00
Matthew Honnibal bb583f7f09 * Hackishly patch long dependencies problem 2015-07-27 23:14:33 +02:00
Matthew Honnibal b96bf9b8cc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-07-27 22:57:48 +02:00
Matthew Honnibal aa7a964a4f * Add a type declaration for doc.from_array 2015-07-27 22:57:22 +02:00
Matthew Honnibal 9034f8a1cf * Update test_docs 2015-07-27 22:15:19 +02:00
Matthew Honnibal 25a8774f42 * Fix regression in packer 2015-07-27 21:53:38 +02:00
Matthew Honnibal 174ed1ad20 * Tighten the frequency filter in init_model 2015-07-27 21:44:51 +02:00
Matthew Honnibal 1601e488ee * Fix bug in decoding non-ascii characters 2015-07-27 21:43:58 +02:00
Matthew Honnibal 6deb1e84b6 * Upd serialization tests 2015-07-27 21:25:48 +02:00
Matthew Honnibal 6a95409cd2 * Fix type on bits 2015-07-27 21:16:49 +02:00
Matthew Honnibal a296d72b54 * Fix en/attrs 2015-07-27 21:16:33 +02:00
Matthew Honnibal 45460f505c * Fix data type on read32 in BitArray 2015-07-27 21:12:13 +02:00
Matthew Honnibal 3d43f49f69 * Revert prev change 2015-07-27 10:58:15 +02:00
Matthew Honnibal 6b586cdad4 * Change lexemes.bin format. Add a header specifying size of LexemeC and number of lexemes, and don't have the redundant orth information. 2015-07-27 08:31:51 +02:00
Matthew Honnibal 6047f2aa35 * Fix path to freqs.txt 2015-07-27 02:22:35 +02:00
Matthew Honnibal 4a0f40ec2d * Ensure data is packaged in vocab 2015-07-27 02:14:36 +02:00
Matthew Honnibal af6ed18f2a * Ensure we don't use orth_encode on OOV words. 2015-07-27 02:12:01 +02:00
Matthew Honnibal 912511f0aa * Update prebuild command, for shell bug 2015-07-27 01:52:04 +02:00
Matthew Honnibal b532f4eaa2 * Ensure serialize is packaged. 2015-07-27 01:51:37 +02:00
Matthew Honnibal 8535d872e8 * Set is_oov property in get_flags 2015-07-27 01:51:24 +02:00
Matthew Honnibal 0f4d0d51ab * Test is_oov property 2015-07-27 01:50:34 +02:00
Matthew Honnibal 8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal fc268f03eb * Assert against null pointer exceptions in vocab 2015-07-27 01:00:10 +02:00
Matthew Honnibal 2b5cde87fd * Add prebuild command, to test clean builds 2015-07-26 22:40:04 +02:00
Matthew Honnibal 0368889d6c * Support gzipped frequencies in init_model 2015-07-26 22:39:22 +02:00
Matthew Honnibal 62da5eb338 * Inc version 2015-07-26 22:22:54 +02:00
Matthew Honnibal b997b1122b * Mark test_io as requiring the model 2015-07-26 21:36:22 +02:00
Matthew Honnibal 0f093fdb30 * Fix get_by_orth for py3 2015-07-26 19:26:41 +02:00
Matthew Honnibal ceeda5a739 * Fix get_by_orth for py3 2015-07-26 18:39:27 +02:00
Matthew Honnibal 5c9b8d05e4 * Upd test_docs 2015-07-26 17:41:13 +02:00
Matthew Honnibal 609f729cc5 * Fix infix test 2015-07-26 17:32:55 +02:00
Matthew Honnibal 3cfe3d8c1c * Revert bad infix change 2015-07-26 17:32:37 +02:00
Matthew Honnibal 460b4c3207 * Add more infix tests 2015-07-26 17:30:34 +02:00
Matthew Honnibal bd608559bc * Fix infix-period tokenization 2015-07-26 17:14:52 +02:00
Matthew Honnibal 94f314c271 * Fix tokenization of email addresses. 2015-07-26 16:38:08 +02:00
Matthew Honnibal 48a4d15264 * Test token properties 2015-07-26 16:37:39 +02:00