Commit Graph

64 Commits

Author SHA1 Message Date
Matthew Honnibal 66766c1454 Restore SP tag to English tag_map, until models migrate 2017-10-24 17:05:00 +02:00
ines 8492d5be6d Always make lemmatizer return a list of lemmas, not a set 2017-10-24 16:00:56 +02:00
Matthew Honnibal 49895fbef6 Rename 'SP' special tag to '_SP'
Renaming the tag with an underscore lets us add it to the tag map
without worrying that we'll change the sequence of tags, which throws
off the tag-to-ID mapping. For instance, if we inserted a 'SP' tag,
the "VERB" tag is pushed to a different class ID, and the model is all
messed up.
2017-10-20 14:01:12 +02:00
Matthew Honnibal 506cf2eb13 Remove cpdef enum, to avoid too much code generation 2017-10-20 14:00:23 +02:00
ines 6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
Matthew Honnibal 17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00
Matthew Honnibal 72bbcc0871 Handle lemmatization for unknown string IDs 2017-09-24 05:01:31 -05:00
Matthew Honnibal b78cc318c3 Fix loading of morphology exceptions 2017-06-04 16:34:32 -05:00
Matthew Honnibal 805495af27 Fix off-by-one in number of tags 2017-06-03 13:29:23 -05:00
Matthew Honnibal 11840ff5dd Store tag map before normalizing props 2017-05-29 17:53:48 -05:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal 84e66ca6d4 WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal c748907a66 Fix errors in previous commit 2017-03-25 22:25:01 +01:00
Matthew Honnibal 850d35dcb3 Make morphology use int attributes internally
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903.
2017-03-25 21:49:10 +01:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matthew Honnibal 95a52005df Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class."
This reverts commit 40e71586d6.
2017-01-09 09:55:55 -06:00
Matthew Honnibal 40e71586d6 Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class. 2016-12-18 23:44:05 +01:00
Matthew Honnibal 813249f826 Work on morphology class. Still not fully consistent with rest of library. 2016-12-18 17:35:22 +01:00
Matthew Honnibal 837a5d4100 Update morphology class so that exceptions can be added one-by-one, and so that arbitrary attributes can be referenced. 2016-12-18 16:49:46 +01:00
Matthew Honnibal e6fc4afb04 Whitespace 2016-12-18 15:48:00 +01:00
Matthew Honnibal 57c4341453 Refactor loading of morphology exceptions, adding a method add_special_case. 2016-12-18 14:59:44 +01:00
Ines Montani 8350d65695 Change morphology and lemmatizer API
Take morphology features as object instead of keyword arguments
2016-12-07 21:12:49 +01:00
Matthew Honnibal 1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal 6e37ba1d82 Fix #602, #603 --- Broken build 2016-11-04 09:54:24 +01:00
Matthew Honnibal 293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal 07776d8096 Fix pos name conflict in lemmatize 2016-09-27 17:35:58 +02:00
Matthew Honnibal bb4f201ad2 Pass morphological features from tag map into the lemmatizer. 2016-09-27 14:01:43 +02:00
Matthew Honnibal 7abe653223 * Fix imports 2016-01-19 03:36:51 +01:00
Matthew Honnibal 590f38bdb2 * Add hacky solution to Issue #220. Currently specials.json only supports literal patterns, which doesn't allow us to pre-tag whitespace with the correct token, SP, as a rule. The data-driven approach should be easy but for some reason fails here. Adding a hard code in Morphology isn't a good solution, but we do want to fix the behaviour right away, and don't want to wait for an architecturally better solution. 2016-01-19 03:35:20 +01:00
Matthew Honnibal 9d1b2a103a * Fix capitalization in lemmatizer 2015-11-06 05:44:35 +11:00
Matthew Honnibal 5b2af4864f * When lemmatizing non-noun, non-verb, non-adj words, output lower-case 2015-11-06 00:45:09 +11:00
Matthew Honnibal dde9e1357c * Add todo to morphology.lemmatize 2015-11-03 18:54:35 +11:00
Matthew Honnibal 833eb35c57 * Fix tag assignment in doc.from_array 2015-11-03 18:45:54 +11:00
Matthew Honnibal 5ca57bd859 * Ensure Morphology can be pickled, to address Issue #125. 2015-10-13 13:44:41 +11:00
Matthew Honnibal 278e12f7e8 * Addmorphology symbols to morphology. May need to remove these as an enum. 2015-10-13 13:44:40 +11:00
Matthew Honnibal 74c0853471 * Rename ATTR_IDS to attrs.IDS. Rename ATTR_NAMES to attrs.NAMES. Rename UNIV_POS_IDS to parts_of_speech.IDS 2015-10-13 13:44:39 +11:00
Matthew Honnibal 2d9e5bf566 * Allow punctuation to be lemmatized 2015-10-09 19:02:42 +11:00
Matthew Honnibal b3a70e6375 * Clean up unnecessary try/except block 2015-10-08 14:34:11 +11:00
Matthew Honnibal 85c3fec1d1 * Fix morphology loading 2015-09-10 14:52:23 +02:00
Matthew Honnibal 31ccf494e6 Merge branch 'develop' of https://github.com/honnibal/spaCy into develop 2015-09-09 14:33:38 +02:00
Matthew Honnibal 0b527fbdc8 * Set POS tag in morphology 2015-09-09 14:30:24 +02:00
Matthew Honnibal 2be3620333 * Save morphological analyses in a cache 2015-09-08 15:39:24 +02:00
Matthew Honnibal 9eae9837c4 * Fix morphology look up 2015-09-06 17:53:39 +02:00
Matthew Honnibal 534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 86c4a8e3e2 * Work on new morphology organization 2015-08-27 23:11:51 +02:00