Commit Graph

2392 Commits

Author SHA1 Message Date
Ines Montani a2526e66d8 Fix formatting, naming and unicode declaration 2017-01-12 16:51:13 +01:00
Ines Montani 052cdff07d Modernise vector similarity tests 2017-01-12 16:51:13 +01:00
Ines Montani bd20ec0a6a Add get_cosine util function 2017-01-12 16:51:13 +01:00
Ines Montani 51ef75f629 Fix regression test for #615 and remove unnecessary imports 2017-01-12 16:51:12 +01:00
Ines Montani aeb747e10c Adjust formatting 2017-01-12 16:51:12 +01:00
Ines Montani 8e3e58a7e6 Modernise and merge lexeme vocab tests 2017-01-12 16:51:12 +01:00
Ines Montani c3d4516fc2 Move test for #361 to regression tests 2017-01-12 16:51:12 +01:00
Daniel Hershcovich 99eb494a82 Fix #737: support loading word vectors with " " as a word 2017-01-12 17:00:14 +02:00
Ines Montani 7cb3d74426 Modernise span tests and don't depend on models 2017-01-12 15:30:49 +01:00
Ines Montani 92e3d8b3ee Modernise vocab API tests and remove old xfailing tests 2017-01-12 15:27:46 +01:00
Ines Montani 7ea87684cd Rename test_vocab.py to test_vocab_api.py 2017-01-12 15:12:21 +01:00
Ines Montani 0da2ee5c68 Merge flag features tests into orth tests in tests root 2017-01-12 15:12:00 +01:00
Ines Montani 03c136cfd3 Remove StringStore tests from vocab tests 2017-01-12 15:11:15 +01:00
Ines Montani d7bd57abdf Modernise add vectors vocab test 2017-01-12 15:09:49 +01:00
Ines Montani 89525ef345 Use consistent test names 2017-01-12 15:09:21 +01:00
Ines Montani f8803808ce Remove old unused tests and conftest files 2017-01-12 15:09:05 +01:00
Ines Montani 4d0bfebcd9 Move Pragmatic Segmenter test cases (currently unused) to parser tests 2017-01-12 15:08:02 +01:00
Ines Montani 26d018d874 Add tests for StringStore 2017-01-12 15:07:31 +01:00
Ines Montani 9b6784bab5 Add fixture for StringStore 2017-01-12 15:05:40 +01:00
Ines Montani 99d66d613a Modernise tests for merging spans and don't depend on models 2017-01-12 12:26:26 +01:00
Ines Montani fa8f67596d Remove unused old test 2017-01-12 12:26:08 +01:00
Ines Montani 359f73a96b Move test for #54 to regression tests 2017-01-12 12:25:51 +01:00
Ines Montani 3f3a46722c Remove unused conftest 2017-01-12 12:25:24 +01:00
Ines Montani c2406e92bc Allow setting ents in get_doc 2017-01-12 12:25:10 +01:00
Ines Montani c5914c6fe5 Fix and pass regression test for #736 2017-01-12 11:48:56 +01:00
Matthew Honnibal 4e48862fa8 Remove print statement 2017-01-12 11:25:39 +01:00
Matthew Honnibal d1d8214767 Increment version 2017-01-12 11:21:57 +01:00
Matthew Honnibal fba67fa342 Fix Issue #736: Times were being tokenized with incorrect string values. 2017-01-12 11:21:01 +01:00
Ines Montani a6790b6694 Rename tags to pos in get_doc and allow adding tags to tokens 2017-01-12 11:18:36 +01:00
Ines Montani 1add8ace67 Merge lemmatizer tests 2017-01-12 11:16:53 +01:00
Ines Montani 3bc082abdf Modernise morph exceptions test and don't depend on models 2017-01-12 11:14:29 +01:00
Ines Montani ec7739b76e Add regression test for #736 2017-01-12 11:12:44 +01:00
Ines Montani 6c1c564891 Move language-specific tests out of redundant tokenizer directories 2017-01-12 02:17:18 +01:00
Ines Montani 8fecedac3a Tidy up 2017-01-12 02:16:37 +01:00
Ines Montani ae7edd30e7 Move text file back to tokenizer tests directory 2017-01-12 02:10:23 +01:00
Ines Montani ffcaba9017 Remove old and/or redundant tests 2017-01-12 02:10:18 +01:00
Ines Montani 19c4132097 Modernise space attachment parser tests and don't depend on models 2017-01-12 01:54:44 +01:00
Ines Montani 69778924c8 Modernise and merge parser tests and don't depend on models 2017-01-12 01:07:29 +01:00
Ines Montani 178c147612 Modernise nonprojectivity tests and don't depend on models 2017-01-12 01:06:36 +01:00
Ines Montani 1a3984742c Modernise sentence boundary detection tests and don't depend on models (where possible) 2017-01-11 23:53:08 +01:00
Ines Montani 0cdb6ea61d Remove old unused pickle test 2017-01-11 23:52:28 +01:00
Ines Montani c9671329dc Move test for #309 to regression tests 2017-01-11 23:52:13 +01:00
Ines Montani d0e37b5670 Modernise parser tests and don't depend on models 2017-01-11 21:30:27 +01:00
Ines Montani 342cb41782 Add apply_transition_sequence util function to utils 2017-01-11 21:30:14 +01:00
Ines Montani 09807addff Add en_parser fixture 2017-01-11 21:29:59 +01:00
Ines Montani 55d151aa61 Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 21:14:15 +01:00
Ines Montani 7262421bb2 Use consistent test names 2017-01-11 19:00:52 +01:00
Ines Montani 33800c9367 Rename "tokens" tests to "doc" 2017-01-11 18:59:01 +01:00
Ines Montani 3a9c6a9563 Remove old unused files 2017-01-11 18:58:38 +01:00
Ines Montani 8e962de39f Remove old word vector tests 2017-01-11 18:55:08 +01:00
Ines Montani e027936920 Modernise Doc noun chunks tests 2017-01-11 18:54:56 +01:00
Ines Montani 439f396acd Modernise Doc array tests and don't depend on models 2017-01-11 18:54:46 +01:00
Ines Montani 05447be884 Modernise test for adding entities 2017-01-11 18:54:24 +01:00
Ines Montani 6e883f4c00 Modernise Doc API tests and don't depend on models 2017-01-11 18:05:36 +01:00
Ines Montani 8bf3bb5c44 Make words optional for get_doc 2017-01-11 18:05:10 +01:00
Ines Montani 928db7e419 Fix StringIO import for Python 3 2017-01-11 14:07:48 +01:00
Ines Montani 69998f216b Rename test_tokens_api.py to test_doc_api.py 2017-01-11 13:58:56 +01:00
Ines Montani d94dea1b18 Merge token tests into token API tests 2017-01-11 13:57:02 +01:00
Ines Montani eb23424ab0 Modernise token API tests and don't depend on loading models 2017-01-11 13:56:54 +01:00
Ines Montani c682b8ca90 Merge conftests into one cohesive file 2017-01-11 13:56:32 +01:00
Ines Montani 909f24d7df Add test utils and get_doc helper function
Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models.
2017-01-11 13:55:33 +01:00
Matthew Honnibal e12c90e03f Merge branch 'master' of ssh://github.com/explosion/spaCy 2017-01-11 13:03:51 +01:00
Matthew Honnibal 12cd27b821 Amend 8ae8b443f: Handle comparison with None tokens. 2017-01-11 13:03:32 +01:00
Daniel Hershcovich 8e603cc917 Avoid "True if ... else False" 2017-01-11 11:18:22 +02:00
Matthew Honnibal 44e2b0100d Support TAG attribute in doc.from_array 2017-01-10 22:47:07 +01:00
Ines Montani 3e6e1f0251 Tidy up regression tests 2017-01-10 19:24:10 +01:00
Magnus Burton aad23ab0b4 Supplemented with capitalized Swedish exceptions 2017-01-10 16:07:20 +01:00
Ines Montani 869963c3c4 Mark extensive prefix/suffix tests as slow 2017-01-10 15:57:35 +01:00
Ines Montani 487e020ebe Add simple test for surrounding brackets 2017-01-10 15:57:26 +01:00
Ines Montani 0ba5cf51d2 Assert length first 2017-01-10 15:57:00 +01:00
Ines Montani 2185d31907 Adjust names and formatting 2017-01-10 15:56:35 +01:00
Ines Montani e10d4ca964 Remove semi-redundant URLs and punctuation for faster testing 2017-01-10 15:54:25 +01:00
Ines Montani 3a3cb2c90c Add unicode declaration 2017-01-10 15:53:15 +01:00
Matthew Honnibal 0f9b8a00a5 Unbreak data download 2017-01-09 23:40:26 +01:00
Matthew Honnibal 8ae8b443f1 Add richcmp method to Token. Closes #631 2017-01-09 19:30:31 +01:00
Matthew Honnibal 64f747cb65 Token comparison test 2017-01-09 19:12:00 +01:00
Matthew Honnibal 18c3c2d05c Add tests for token comparison, re Issue #631 2017-01-09 19:09:59 +01:00
Matthew Honnibal 97a1286129 Revert changes to tagger and parser for thinc 6 2017-01-09 10:08:34 -06:00
Matthew Honnibal 95a52005df Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class."
This reverts commit 40e71586d6.
2017-01-09 09:55:55 -06:00
Ines Montani 363f09e68c Merge pull request #726 from magnusburton/master
Added Swedish abbreviations as token exceptions
2017-01-09 14:58:15 +01:00
Matthew Honnibal 42cd598f57 Use correct fixtures in URL tokenizer 2017-01-09 14:10:40 +01:00
Matthew Honnibal d9a77ddf14 Return None for data path if it doesn't exist 2017-01-09 14:10:05 +01:00
Matthew Honnibal e4862d1dab Merge branch 'develop' 2017-01-09 13:36:01 +01:00
Ines Montani aa876884f0 Revert "Revert "Merge remote-tracking branch 'origin/master'""
This reverts commit fb9d3bb022.
2017-01-09 13:28:13 +01:00
Ines Montani d5c72c40eb Remove old tests for old website example code 2017-01-08 22:28:53 +01:00
Ines Montani eef94e3ee2 Split off period after two or more uppercase letters (fixes #483) 2017-01-08 22:28:25 +01:00
Ines Montani a89a6000e5 Remove unused import 2017-01-08 22:17:37 +01:00
Ines Montani 5d28664fc5 Don't test Hungarian for numbers and hyphens for now
Reinvestigate behaviour of case affixes given reorganised tokenizer
patterns.
2017-01-08 20:45:40 +01:00
Ines Montani 53362b6b93 Reorganise Hungarian prefixes/suffixes/infixes
Use global prefixes and suffixes for non-language-specific rules,
import list of alpha unicode characters and adjust regexes.
2017-01-08 20:40:33 +01:00
Ines Montani 347c4a2d06 Reorganise and reformat global tokenizer prefixes, suffixes and infixes 2017-01-08 20:37:39 +01:00
Ines Montani 0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Ines Montani 7c3cb2a652 Add global abbreviations data 2017-01-08 20:34:03 +01:00
Ines Montani de5aa92bc2 Handle deprecated tokenizer prefix data 2017-01-08 20:33:28 +01:00
Ines Montani abb09782f9 Move sun.txt to original location and fix path to not break parser tests 2017-01-08 20:32:54 +01:00
Ines Montani cab39c59c5 Add missing contractions to English tokenizer exceptions
Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init
__.py
2017-01-05 19:59:06 +01:00
Ines Montani a23504fe07 Move abbreviations below other exceptions 2017-01-05 19:58:07 +01:00
Ines Montani 7d2cf934b9 Generate he/she/it correctly with 's instead of 've 2017-01-05 19:57:00 +01:00
Ines Montani 8328925e1f Add newlines to long German text 2017-01-05 18:13:30 +01:00
Ines Montani 55b46d7cf6 Add tokenizer tests for German 2017-01-05 18:11:25 +01:00
Ines Montani 5bb4081f52 Remove redundant test_tokenizer.py for English 2017-01-05 18:11:11 +01:00
Ines Montani 8216ba599b Add tests for longer and mixed English texts 2017-01-05 18:11:04 +01:00
Ines Montani 65f937d5c6 Move basic contraction tests to test_contractions.py 2017-01-05 18:09:53 +01:00
Ines Montani bbe7cab3a1 Move non-English-specific tests back to general tokenizer tests 2017-01-05 18:09:29 +01:00
Ines Montani 038002d616 Reformat HU tokenizer tests and adapt to general style
Improve readability of test cases and add conftest.py with fixture
2017-01-05 18:06:44 +01:00
Ines Montani bc911322b3 Move ") to emoticons (see Tweebo challenge test) 2017-01-05 18:05:38 +01:00
Ines Montani 637f785036 Add general sanity tests for all tokenizers 2017-01-05 16:25:38 +01:00
Ines Montani c5f2dc15de Move English tokenizer tests to directory /en 2017-01-05 16:25:04 +01:00
Ines Montani 8b45363b4d Modernize and merge general tokenizer tests 2017-01-05 13:17:05 +01:00
Ines Montani 02cfda48c9 Modernize and merge tokenizer tests for string loading 2017-01-05 13:16:55 +01:00
Ines Montani a11f684822 Modernize and merge tokenizer tests for whitespace 2017-01-05 13:16:33 +01:00
Ines Montani 8b284fc6f1 Modernize and merge tokenizer tests for text from file 2017-01-05 13:15:52 +01:00
Ines Montani 2c2e878653 Modernize and merge tokenizer tests for punctuation 2017-01-05 13:14:16 +01:00
Ines Montani 8a74129cdf Modernize and merge tokenizer tests for prefixes/suffixes/infixes 2017-01-05 13:13:12 +01:00
Ines Montani 0e65dca9a5 Modernize and merge tokenizer tests for exception and emoticons 2017-01-05 13:11:31 +01:00
Ines Montani 34c47bb20d Fix formatting 2017-01-05 13:10:51 +01:00
Ines Montani 2e72683baa Add missing docstrings 2017-01-05 13:10:21 +01:00
Ines Montani da10a049a6 Add unicode declarations 2017-01-05 13:09:48 +01:00
Ines Montani 58adae8774 Remove unused file 2017-01-05 13:09:22 +01:00
Ines Montani c6e5a5349d Move regression test for #360 into own file 2017-01-04 00:49:31 +01:00
Ines Montani 8279993a6f Modernize and merge tokenizer tests for punctuation 2017-01-04 00:49:20 +01:00
Ines Montani 550630df73 Update tokenizer tests for contractions 2017-01-04 00:48:42 +01:00
Ines Montani 109f202e8f Update conftest fixture 2017-01-04 00:48:21 +01:00
Ines Montani ee6b49b293 Modernize tokenizer tests for emoticons 2017-01-04 00:47:59 +01:00
Ines Montani f09b5a5dfd Modernize tokenizer tests for infixes 2017-01-04 00:47:42 +01:00
Ines Montani 59059fed27 Move regression test for #351 to own file 2017-01-04 00:47:11 +01:00
Ines Montani 667051375d Modernize tokenizer tests for whitespace 2017-01-04 00:46:35 +01:00
Ines Montani aafc894285 Modernize tokenizer tests for contractions
Use @pytest.mark.parametrize.
2017-01-03 23:02:21 +01:00
Ines Montani 1d237664af Add lowercase lemma to tokenizer exceptions 2017-01-03 23:02:21 +01:00
Ines Montani 84a87951eb Fix typos 2017-01-03 18:27:43 +01:00
Ines Montani 35b39f53c3 Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
2017-01-03 18:26:09 +01:00
Ines Montani fb9d3bb022 Revert "Merge remote-tracking branch 'origin/master'"
This reverts commit d3b181cdf1, reversing
changes made to b19cfcc144.
2017-01-03 18:21:36 +01:00
Ines Montani 461cbb99d8 Revert "Reorganise English tokenizer exceptions (as discussed in #718)"
This reverts commit b19cfcc144.
2017-01-03 18:21:29 +01:00
Ines Montani d3b181cdf1 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	spacy/en/tokenizer_exceptions.py
2017-01-03 18:20:01 +01:00
Ines Montani b19cfcc144 Reorganise English tokenizer exceptions (as discussed in #718)
Add logic to generate exceptions that follow a consistent pattern (like
verbs and pronouns) and allow certain tokens to be excluded explicitly.
2017-01-03 18:17:57 +01:00
Ines Montani 1bd53bbf89 Fix typos (resolves #718) 2017-01-03 11:26:21 +01:00
Matthew Honnibal fde53be3b4 Move whole token mach inside _split_affixes. 2016-12-30 17:11:50 -06:00
Matthew Honnibal 3ba7c167a8 Fix URL tests 2016-12-30 17:10:08 -06:00
Matthew Honnibal 9936a1b9b5 Merge branch 'tokenization_w_exception_patterns' of https://github.com/oroszgy/spaCy.hu into oroszgy-tokenization_w_exception_patterns 2016-12-30 14:53:40 -06:00
Magnus Burton 56e2219b65 Added Swedish city abbreviations 2016-12-30 21:17:34 +01:00
Magnus Burton e935c950d8 Added months and days as abbreviations for Swedish 2016-12-30 21:08:44 +01:00
Matthew Honnibal 3e8d9c772e Test interaction of token_match and punctuation
Check that the new token_match function applies after punctuation is split off.
2016-12-31 00:52:17 +11:00
Matthew Honnibal 74b921f394 Merge branch 'master' of ssh://github.com/explosion/spaCy into develop 2016-12-30 14:38:27 +01:00
Matthew Honnibal 623d94e14f Whitespace 2016-12-31 00:30:28 +11:00
Matthew Honnibal af81ac8bb0 Use thinc 6.0 2016-12-29 11:58:42 +01:00
Petter Hohle f112e7754e Add PART to tag map
16 of the 17 PoS tags in the UD tag set is added; PART is missing.
2016-12-28 18:39:01 +01:00
Matthew Honnibal f62db78dc3 Increment version 2016-12-27 21:11:22 +01:00
Matthew Honnibal cade536d1e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-27 21:04:10 +01:00
Matthew Honnibal ce4539dafd Allow the vocabulary to grow to 10,000, to prevent cold-start problem. 2016-12-27 21:03:45 +01:00
Ines Montani ad3669cef5 Merge pull request #703 from magnusburton/master
Added Swedish abbreviations
2016-12-27 01:01:49 +01:00
Ines Montani 78f754dd9a Merge pull request #705 from oroszgy/hu_tokenizer
Initial support for Hungarian
2016-12-27 00:48:13 +01:00
Ines Montani 8785706039 Reformat stop words for better readability 2016-12-24 00:58:40 +01:00
Gyorgy Orosz 45e045a87b Unicode/UTF8 compatibility for Python2 2016-12-24 00:21:00 +01:00
Gyorgy Orosz 72b61b6d03 Typo fix. 2016-12-24 00:10:29 +01:00
Gyorgy Orosz 3a9be4d485 Updated token exception handling mechanism to allow the usage of arbitrary functions as token exception matchers. 2016-12-23 23:49:34 +01:00
Ines Montani 1436b9f15a Fix formatting and consistency 2016-12-23 21:36:01 +01:00
Ines Montani 1d64527727 Update Spanish tokenizer
Remove reflexive pronouns as they're part of an open class, fix
mistakes and add exceptions
2016-12-23 21:36:01 +01:00
Ines Montani 7f411fd01c Remove exceptions containing whitespace / no special chars 2016-12-23 14:30:06 +01:00
Magnus Burton fdf4776262 Added Swedish abbreviations 2016-12-22 22:45:18 +01:00
Gyorgy Orosz d9c59c4751 Maintaining backward compatibility. 2016-12-21 23:30:49 +01:00
Gyorgy Orosz 1748549aeb Added exception pattern mechanism to the tokenizer. 2016-12-21 23:16:19 +01:00
Gyorgy Orosz 35aa54765d Hungarian module is exposed in spacy. 2016-12-21 20:45:36 +01:00
Gyorgy Orosz ab2f6ea46c Removed data files from tests.. 2016-12-21 20:22:09 +01:00
Ines Montani 3c87c71d43 Add tokenizer exceptions for a.m. and p.m. in Spanish 2016-12-21 18:19:10 +01:00
Ines Montani 78e63dc7d0 Update tokenizer exceptions for English 2016-12-21 18:06:34 +01:00
Ines Montani 702d1eed93 Update tokenizer exceptions for German 2016-12-21 18:06:27 +01:00
Ines Montani d60380418e Update tokenizer exceptions for Spanish 2016-12-21 18:06:17 +01:00
Ines Montani 920fa0fed2 Add DET_LEMMA constant 2016-12-21 18:05:41 +01:00
Ines Montani 8978806ea6 Allow Vocab to load without serializer_freqs 2016-12-21 18:05:23 +01:00
Ines Montani be8ed811f6 Remove trailing whitespace 2016-12-21 18:04:41 +01:00
Ines Montani 926e19184a Merge pull request #695 from magnusburton/master
Added Swedish morph rules
2016-12-21 01:06:00 +01:00
Gyorgy Orosz 3d5306acb9 Added further testcases. 2016-12-20 23:49:35 +01:00
Gyorgy Orosz 23956e72ff Improved partial support for tokenzing Hungarian numbers 2016-12-20 23:36:59 +01:00
Gyorgy Orosz 6add156075 Refactored language data structure 2016-12-20 22:28:20 +01:00
Gyorgy Orosz 366b3f8685 Merge branch 'master' into hu_tokenizer 2016-12-20 20:53:31 +01:00
Gyorgy Orosz c035928156 Partial Hungarian number tokenization is added. 2016-12-20 20:46:20 +01:00
JM 70ff0639b5 Fixed missing vec_path declaration that was failing if 'add_vectors' was set
Added vec_path variable declaration to avoid accessing it before assignment in case 'add_vectors' is in overrides.
2016-12-20 18:21:05 +01:00
Magnus Burton 48dcc9f647 Added morph rules 2016-12-20 13:18:41 +01:00
Magnus Burton db5a077d2b Initial commit for Swedish 2016-12-20 11:05:06 +01:00
Matthew Honnibal 3f5747a9b2 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-18 23:44:22 +01:00
Matthew Honnibal 40e71586d6 Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class. 2016-12-18 23:44:05 +01:00
Matthew Honnibal fa1d23e10d Merge branch 'master' of https://github.com/explosion/spaCy 2016-12-18 23:32:03 +01:00
Matthew Honnibal f38eb25fe1 Fix test for word vector 2016-12-18 23:31:55 +01:00
Matthew Honnibal 4e68abebc4 Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-12-18 23:19:45 +01:00
Matthew Honnibal 5a6328a5a4 Increment version 2016-12-18 23:19:19 +01:00
Matthew Honnibal 13a0b31279 Another tweak to GloVe path hackery. 2016-12-18 23:12:49 +01:00
Matthew Honnibal 2c6228565e Fix vector loading re glove hack 2016-12-18 23:06:44 +01:00
Matthew Honnibal 618b50a064 Fix issue #684: GloVe vectors not loaded in spacy.en.English. 2016-12-18 22:46:31 +01:00
Matthew Honnibal 404019ad2f Fix issue #672: ent_iob_ was a string, not unicode, due to missing unicode_literals statement. 2016-12-18 22:33:53 +01:00
Matthew Honnibal 2ef9d53117 Untested fix for issue #684: GloVe vectors hack should be inserted in English, not in spacy.load. 2016-12-18 22:29:31 +01:00
Matthew Honnibal c065359459 Fix path-override bug in spacy.load 2016-12-18 22:15:29 +01:00
Matthew Honnibal 813249f826 Work on morphology class. Still not fully consistent with rest of library. 2016-12-18 17:35:22 +01:00
Matthew Honnibal 3679fb43a3 Fix loading of lemmatizer 2016-12-18 17:34:09 +01:00
Matthew Honnibal 3980f1b0cb Ignore more morphology attributes in deprecated mode of intify_attrs 2016-12-18 17:33:46 +01:00
Matthew Honnibal 7a98ee5e5a Merge language data change 2016-12-18 17:03:52 +01:00
Matthew Honnibal e4c951c153 Merge branch 'organize-language-data' of ssh://github.com/explosion/spaCy into organize-language-data 2016-12-18 17:01:08 +01:00
Ines Montani b99d683a93 Fix formatting 2016-12-18 16:58:28 +01:00
Ines Montani b11d8cd3db Merge remote-tracking branch 'origin/organize-language-data' into organize-language-data 2016-12-18 16:57:12 +01:00
Ines Montani d1c1d3f9cd Fix tokenizer test 2016-12-18 16:55:32 +01:00
Ines Montani 753068f1d5 Use base language data as default 2016-12-18 16:55:25 +01:00
Ines Montani bcc1d50d09 Remove trailing whitespace 2016-12-18 16:54:52 +01:00