Commit Graph

2196 Commits

Author SHA1 Message Date
Ines Montani 55d151aa61 Modernise Doc parse tree navigation tests and don't depend on models 2017-01-11 21:14:15 +01:00
Ines Montani 7262421bb2 Use consistent test names 2017-01-11 19:00:52 +01:00
Ines Montani 33800c9367 Rename "tokens" tests to "doc" 2017-01-11 18:59:01 +01:00
Ines Montani 3a9c6a9563 Remove old unused files 2017-01-11 18:58:38 +01:00
Ines Montani 8e962de39f Remove old word vector tests 2017-01-11 18:55:08 +01:00
Ines Montani e027936920 Modernise Doc noun chunks tests 2017-01-11 18:54:56 +01:00
Ines Montani 439f396acd Modernise Doc array tests and don't depend on models 2017-01-11 18:54:46 +01:00
Ines Montani 05447be884 Modernise test for adding entities 2017-01-11 18:54:24 +01:00
Ines Montani 6e883f4c00 Modernise Doc API tests and don't depend on models 2017-01-11 18:05:36 +01:00
Ines Montani 8bf3bb5c44 Make words optional for get_doc 2017-01-11 18:05:10 +01:00
Ines Montani 928db7e419 Fix StringIO import for Python 3 2017-01-11 14:07:48 +01:00
Ines Montani 69998f216b Rename test_tokens_api.py to test_doc_api.py 2017-01-11 13:58:56 +01:00
Ines Montani d94dea1b18 Merge token tests into token API tests 2017-01-11 13:57:02 +01:00
Ines Montani eb23424ab0 Modernise token API tests and don't depend on loading models 2017-01-11 13:56:54 +01:00
Ines Montani c682b8ca90 Merge conftests into one cohesive file 2017-01-11 13:56:32 +01:00
Ines Montani 909f24d7df Add test utils and get_doc helper function
Create Doc object from given vocab, words and annotations to allow
tests not to depend on loading the models.
2017-01-11 13:55:33 +01:00
Matthew Honnibal e12c90e03f Merge branch 'master' of ssh://github.com/explosion/spaCy 2017-01-11 13:03:51 +01:00
Matthew Honnibal 12cd27b821 Amend 8ae8b443f: Handle comparison with None tokens. 2017-01-11 13:03:32 +01:00
Daniel Hershcovich 8e603cc917 Avoid "True if ... else False" 2017-01-11 11:18:22 +02:00
Matthew Honnibal 44e2b0100d Support TAG attribute in doc.from_array 2017-01-10 22:47:07 +01:00
Ines Montani 3e6e1f0251 Tidy up regression tests 2017-01-10 19:24:10 +01:00
Ines Montani 869963c3c4 Mark extensive prefix/suffix tests as slow 2017-01-10 15:57:35 +01:00
Ines Montani 487e020ebe Add simple test for surrounding brackets 2017-01-10 15:57:26 +01:00
Ines Montani 0ba5cf51d2 Assert length first 2017-01-10 15:57:00 +01:00
Ines Montani 2185d31907 Adjust names and formatting 2017-01-10 15:56:35 +01:00
Ines Montani e10d4ca964 Remove semi-redundant URLs and punctuation for faster testing 2017-01-10 15:54:25 +01:00
Ines Montani 3a3cb2c90c Add unicode declaration 2017-01-10 15:53:15 +01:00
Matthew Honnibal 0f9b8a00a5 Unbreak data download 2017-01-09 23:40:26 +01:00
Matthew Honnibal 8ae8b443f1 Add richcmp method to Token. Closes #631 2017-01-09 19:30:31 +01:00
Matthew Honnibal 64f747cb65 Token comparison test 2017-01-09 19:12:00 +01:00
Matthew Honnibal 18c3c2d05c Add tests for token comparison, re Issue #631 2017-01-09 19:09:59 +01:00
Matthew Honnibal 97a1286129 Revert changes to tagger and parser for thinc 6 2017-01-09 10:08:34 -06:00
Matthew Honnibal 95a52005df Revert "Fix Issue #683: Add 'SP' to tag_map, if it's not there already, within the Morphology class."
This reverts commit 40e71586d6.
2017-01-09 09:55:55 -06:00
Ines Montani 363f09e68c Merge pull request #726 from magnusburton/master
Added Swedish abbreviations as token exceptions
2017-01-09 14:58:15 +01:00
Matthew Honnibal 42cd598f57 Use correct fixtures in URL tokenizer 2017-01-09 14:10:40 +01:00
Matthew Honnibal d9a77ddf14 Return None for data path if it doesn't exist 2017-01-09 14:10:05 +01:00
Matthew Honnibal e4862d1dab Merge branch 'develop' 2017-01-09 13:36:01 +01:00
Ines Montani aa876884f0 Revert "Revert "Merge remote-tracking branch 'origin/master'""
This reverts commit fb9d3bb022.
2017-01-09 13:28:13 +01:00
Ines Montani d5c72c40eb Remove old tests for old website example code 2017-01-08 22:28:53 +01:00
Ines Montani eef94e3ee2 Split off period after two or more uppercase letters (fixes #483) 2017-01-08 22:28:25 +01:00
Ines Montani a89a6000e5 Remove unused import 2017-01-08 22:17:37 +01:00
Ines Montani 5d28664fc5 Don't test Hungarian for numbers and hyphens for now
Reinvestigate behaviour of case affixes given reorganised tokenizer
patterns.
2017-01-08 20:45:40 +01:00
Ines Montani 53362b6b93 Reorganise Hungarian prefixes/suffixes/infixes
Use global prefixes and suffixes for non-language-specific rules,
import list of alpha unicode characters and adjust regexes.
2017-01-08 20:40:33 +01:00
Ines Montani 347c4a2d06 Reorganise and reformat global tokenizer prefixes, suffixes and infixes 2017-01-08 20:37:39 +01:00
Ines Montani 0dec90e9f7 Use global abbreviation data languages and remove duplicates 2017-01-08 20:36:00 +01:00
Ines Montani 7c3cb2a652 Add global abbreviations data 2017-01-08 20:34:03 +01:00
Ines Montani de5aa92bc2 Handle deprecated tokenizer prefix data 2017-01-08 20:33:28 +01:00
Ines Montani abb09782f9 Move sun.txt to original location and fix path to not break parser tests 2017-01-08 20:32:54 +01:00
Ines Montani cab39c59c5 Add missing contractions to English tokenizer exceptions
Inspired by
https://github.com/kootenpv/contractions/blob/master/contractions/__init
__.py
2017-01-05 19:59:06 +01:00
Ines Montani a23504fe07 Move abbreviations below other exceptions 2017-01-05 19:58:07 +01:00