Commit Graph

1708 Commits

Author SHA1 Message Date
Matthew Honnibal 5c3c962038 * Add html to gazetteer 2015-08-06 16:34:51 +02:00
Matthew Honnibal 10d869d102 * Don't allow conjunction between NPs in base NP chunks 2015-08-06 16:31:53 +02:00
Matthew Honnibal 8b8df851ca * Fix print statement in test_merge 2015-08-06 16:28:31 +02:00
Matthew Honnibal 383dfabd67 * Fix matcher setting of entities 2015-08-06 16:27:01 +02:00
Matthew Honnibal 91a94e152b * Make initial gazetteer 2015-08-06 16:10:04 +02:00
Matthew Honnibal 2767979135 * Update matcher tests 2015-08-06 16:09:28 +02:00
Matthew Honnibal 59c3bf60a6 * Ensure entity recognizer doesn't over-write preset types 2015-08-06 16:09:08 +02:00
Matthew Honnibal cd7d1682cd * Fix loading of gazetteer.json file 2015-08-06 16:08:25 +02:00
Matthew Honnibal 9c667b7f15 * Set a value in attrs.pxd on the first flag, to reduce bugs 2015-08-06 16:08:04 +02:00
Matthew Honnibal c263577424 * Fix lower attribute in lexeme.pxd 2015-08-06 16:07:41 +02:00
Matthew Honnibal 3ecacb9635 * Copy gazetteer file in init_model 2015-08-06 16:07:23 +02:00
Matthew Honnibal faf75dfcb9 * Update matcher tests 2015-08-06 14:33:35 +02:00
Matthew Honnibal 5737115e1e * Work on gazetteer matching 2015-08-06 14:33:21 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal 47db3067a0 * Compile spacy.matcher 2015-08-05 23:48:11 +02:00
Matthew Honnibal 5bc0e83f9a * Reimplement matching in Cython, instead of Python. 2015-08-05 01:05:54 +02:00
Matthew Honnibal 4c87a696b3 * Add draft dfa matcher, in Python. Passing tests. 2015-08-04 15:55:28 +02:00
Matthew Honnibal eb7138c761 * Add attr relation in base NP detection 2015-08-01 00:34:40 +02:00
Matthew Honnibal 4988356cf0 * Fix dependency type bug from merged tokens 2015-08-01 00:33:24 +02:00
Matthew Honnibal af84669306 * Add smart-quote possessive marker to tokenizer 2015-07-30 05:12:48 +02:00
Matthew Honnibal 78a9068319 * Fix spacy attr on merged tokens 2015-07-30 04:25:58 +02:00
Matthew Honnibal 430e2edb96 * Fix noun_chunks issue 2015-07-30 03:51:50 +02:00
Matthew Honnibal 9590968fc1 * Fix negative indices in Span 2015-07-30 02:30:24 +02:00
Matthew Honnibal 74d8cb3980 * Add noun_chunks iterator, and fix left/right child setting in Doc.merge 2015-07-30 02:29:49 +02:00
Matthew Honnibal d153f18969 * Fix negative indices on spans 2015-07-29 22:36:03 +02:00
Matthew Honnibal 320836e346 * Move string description further down for token, and highlght that it includes trailing whitespace 2015-07-28 21:05:08 +02:00
Matthew Honnibal d17a15ae66 * Add test to check parse is being deserialized properly 2015-07-28 21:04:00 +02:00
Matthew Honnibal b5132bed7d * Set left and right children when loading parse from byte string 2015-07-28 21:03:18 +02:00
Matthew Honnibal 6609fcf4b2 * Make mem and vocab python-visible in Doc 2015-07-28 20:46:59 +02:00
Matthew Honnibal d42fe2e694 * Add unicode_literals to strings.pyx 2015-07-28 16:15:53 +02:00
Matthew Honnibal bb910cff92 * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
Matthew Honnibal dcafb181b9 * Fix Python3 problem in align_raw 2015-07-28 15:52:10 +02:00
Matthew Honnibal c609ea18f0 * Increment version in download script 2015-07-28 15:22:17 +02:00
Matthew Honnibal 9c4d0aae62 * Switch to better Python2/3 compatible unicode handling 2015-07-28 14:45:37 +02:00
Matthew Honnibal 7606d9936f * Python3 correction for GoldParse 2015-07-28 14:44:53 +02:00
Matthew Honnibal ddc1a5cfe5 * Fix training under python3 2015-07-28 14:09:30 +02:00
Matthew Honnibal a8bbd7312c * Hackishly patch long dependencies problem 2015-07-28 00:14:29 +02:00
Matthew Honnibal bb583f7f09 * Hackishly patch long dependencies problem 2015-07-27 23:14:33 +02:00
Matthew Honnibal b96bf9b8cc Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-07-27 22:57:48 +02:00
Matthew Honnibal aa7a964a4f * Add a type declaration for doc.from_array 2015-07-27 22:57:22 +02:00
Matthew Honnibal 9034f8a1cf * Update test_docs 2015-07-27 22:15:19 +02:00
Matthew Honnibal 25a8774f42 * Fix regression in packer 2015-07-27 21:53:38 +02:00
Matthew Honnibal 174ed1ad20 * Tighten the frequency filter in init_model 2015-07-27 21:44:51 +02:00
Matthew Honnibal 1601e488ee * Fix bug in decoding non-ascii characters 2015-07-27 21:43:58 +02:00
Matthew Honnibal 6deb1e84b6 * Upd serialization tests 2015-07-27 21:25:48 +02:00
Matthew Honnibal 6a95409cd2 * Fix type on bits 2015-07-27 21:16:49 +02:00
Matthew Honnibal a296d72b54 * Fix en/attrs 2015-07-27 21:16:33 +02:00
Matthew Honnibal 45460f505c * Fix data type on read32 in BitArray 2015-07-27 21:12:13 +02:00
Matthew Honnibal 3d43f49f69 * Revert prev change 2015-07-27 10:58:15 +02:00
Matthew Honnibal 6b586cdad4 * Change lexemes.bin format. Add a header specifying size of LexemeC and number of lexemes, and don't have the redundant orth information. 2015-07-27 08:31:51 +02:00