Commit Graph

31 Commits

Author SHA1 Message Date
Matthew Honnibal 6161d2529a Merge branch 'master' of ssh://github.com/honnibal/spaCy 2015-11-03 13:36:30 +11:00
Matthew Honnibal f7dd377575 * Adjust conjuncts iterator in Token 2015-11-03 13:23:22 +11:00
Andreas Grivas d418f00eb1 fixed error when printing unicode 2015-11-02 20:23:18 +02:00
Andreas Grivas 93ada458e2 added __repr__ that prints text in ipython for doc, token, and span objects 2015-10-21 14:11:46 +03:00
Matthew Honnibal 9839cd2c0b * Fix whitespace_ calculation in Token 2015-10-18 17:21:11 +11:00
Matthew Honnibal 6e0f985afc * Fix token.conjuncts 2015-10-15 03:49:45 +11:00
Matthew Honnibal 2e0104ac81 * Fix token.conjuncts 2015-10-15 03:47:45 +11:00
Matthew Honnibal b8f3345a82 * Fix token.conjuncts method 2015-10-15 03:36:01 +11:00
Matthew Honnibal 23818f89b8 * Fix token.conjuncts method 2015-10-15 03:34:57 +11:00
Matthew Honnibal 94bafc1417 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-10 17:57:29 +11:00
Matthew Honnibal f7283a5067 * Fix vectors bugs for OOV words 2015-09-22 02:10:25 +02:00
Matthew Honnibal 44aecba701 * Fix Token.has_vector and Lexeme.has_vector 2015-09-22 01:43:16 +02:00
Matthew Honnibal 596fde8daa * Add has_vector attribute to Token and Lexeme 2015-09-21 19:52:43 +10:00
Matthew Honnibal f32927efbf * Raise exceptions if attempt to access parse, but data is not installed. This partly but not fully addresses Issue #97. Still need exceptions on the various Token attributes that access the parse tree, e.g. token.head, token.lefts, token.rights, etc. Exceptions should be centralized, too. 2015-09-21 18:35:40 +10:00
Matthew Honnibal 388062ae01 * Fix repvec_length problem 2015-09-21 18:10:51 +10:00
Matthew Honnibal 193f127f81 * Fix ugly py_check_flag and py_set_flag functions in Lexeme 2015-09-15 13:06:18 +10:00
Matthew Honnibal 65dc0d1dfb * Extend word vectors support, with .similarity() function, vector_norm property, and rename repvec to vector. Keep repvec name as well for now for backwards compatibility. 2015-09-14 17:49:58 +10:00
Matthew Honnibal c08f10083c * Add test and test_with_ws attributes. 2015-09-13 10:27:42 +10:00
Matthew Honnibal 86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal d2fc104a26 * Begin merge of Gazetteer and DE branches 2015-09-06 19:45:15 +02:00
Matthew Honnibal 7e4fea67d3 * Fix bug in token subtree, introduced by duplication of L/R code in Stateclass. Need to consolidate the two methods. 2015-09-06 10:48:36 +02:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal 01be34d55a * Whitespace 2015-08-08 23:37:44 +02:00
Matthew Honnibal 8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal 6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal 0bb839d299 * Fix string coercion for Python 3 2015-07-24 03:49:30 +02:00
Matthew Honnibal 30be4f15da * Import attrs from spacy.attrs, not spacy.typedefs 2015-07-16 11:23:25 +02:00
Matthew Honnibal 81aa4e6dcc * Go back to having token reference doc, instead of complicated gymnastics. Rename the attr 'doc', to expose it in the API 2015-07-14 00:10:11 +02:00
Matthew Honnibal 8214b74eec * Restore _py_tokens cache, to handle orphan tokens. 2015-07-13 22:28:10 +02:00
Matthew Honnibal 6eef0bf9ab * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
Matthew Honnibal dba6b47d4e * Refactor monster tokens.pyx file, into a tokens/ subpackage. Try to break the cycle between Doc and Token, and remove the need to pass around a unicode string reference 2015-07-13 19:20:48 +02:00