Commit Graph

2832 Commits

Author SHA1 Message Date
Matthew Honnibal 17c9fffb9e Fix naked except 2017-04-16 15:28:16 -05:00
ines 5610fdcc06 Get language name first if no model path exists
Makes sure spaCy fails early if no tokenizer exists, and allows
printing better error message.
2017-04-16 22:16:47 +02:00
ines ad168ba88c Set model name to empty string if path override exists
Required for parse_package_meta, which composes path of data_path and
model_name (needs to be fixed in the future)
2017-04-16 22:15:51 +02:00
ines 97647c46cd Add docstring and todo note 2017-04-16 22:14:45 +02:00
ines 5c5f8c0a72 Check if full string is found in lang classes first
This allows users to set arbitrary strings. (Otherwise, custom lang
class "my_custom_class" would always load Burmese "my" tokenizer if one
was available.)
2017-04-16 22:14:38 +02:00
ines 13d30b6c01 xfail lemmatizer test that's causing problems (see #546) 2017-04-16 21:18:39 +02:00
Matthew Honnibal 4931c56afc Increment version 2017-04-16 13:59:38 -05:00
ines 6145b7c153 Remove redundant Path 2017-04-16 20:53:25 +02:00
Matthew Honnibal fa89613444 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-16 13:42:56 -05:00
ines 1f9f867c70 Remove unused util function 2017-04-16 20:37:45 +02:00
ines 7670c745b6 Update spacy.load() and fix path checks 2017-04-16 20:37:45 +02:00
ines d3759dfb32 Fix docstring 2017-04-16 20:37:45 +02:00
ines ed7e19ad68 Remove unused import 2017-04-16 20:37:45 +02:00
ines 0084466a66 Remove unused utf8open util and replace os.path with ensure_path 2017-04-16 20:37:45 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
Matthew Honnibal 6a4221a6de Allow lemma to be set from Python. Re #973 2017-04-16 18:07:53 +02:00
Matthew Honnibal 137b210bcf Restore use of FTRL training 2017-04-16 18:02:42 +02:00
ines d10bd0eaf9 Fix formatting 2017-04-16 13:42:34 +02:00
ines 8191e33cf1 Update link error message with info on permissions 2017-04-16 13:32:31 +02:00
ines a3ddbc0444 Add note about --force flag to error message 2017-04-16 13:14:36 +02:00
ines e3de035814 Add meta validation to check for required settings
Complain if no "lang", "name" or "version" is found (those settings are
used in directory / package names). Package will still build without,
but it'll inevitably fail somewhere down the line.
2017-04-16 13:13:17 +02:00
ines a7574b7572 Add more options to read in meta data in package command
Add meta option to supply path to meta.json. If no meta path is set,
check if meta.json exists in input directory and use it. Otherwise,
prompt for details on the command line.
2017-04-16 13:06:02 +02:00
ines 13c8a42d2b Fix typos 2017-04-16 13:03:58 +02:00
ines 31fa73293a Move read_json out to own util function 2017-04-16 13:03:28 +02:00
Matthew Honnibal 45464d065e Remove print statement 2017-04-15 16:11:43 +02:00
Matthew Honnibal c76cb8af35 Fix training for new labels 2017-04-15 16:11:26 +02:00
Matthew Honnibal 4884b2c113 Refix StepwiseState 2017-04-15 16:00:28 +02:00
Matthew Honnibal e6ee7e130f Fix parse package meta 2017-04-15 13:38:53 +02:00
Matthew Honnibal 1a98e48b8e Fix Stepwisestate' 2017-04-15 13:35:01 +02:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
ines fefe6684cd Fix symlink function to check for Windows 2017-04-15 12:17:27 +02:00
ines 35fb4febe2 Fix whitespace 2017-04-15 12:13:45 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines 958b12dec8 Use pathlib instead of os.path 2017-04-15 12:13:00 +02:00
ines 956dc36785 Move functions to deprecated 2017-04-15 12:12:31 +02:00
ines c05ec4b89a Add compat functions and remove old workarounds
Add ensure_path util function to handle checking instance of path
2017-04-15 12:11:16 +02:00
ines 26445ee304 Add compat module for Python2/3 and platform compatibility 2017-04-15 12:07:02 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal d13f0a7017 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-04-14 23:54:57 +02:00
Matthew Honnibal 354458484c WIP on add_label bug during NER training
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.
2017-04-14 23:52:17 +02:00
Matthew Honnibal 33ba5066eb Refactor Language.end_training, making new save_to_directory method 2017-04-14 23:51:24 +02:00
ines 84341c2975 Only compile list of models if data_path exists 2017-04-14 16:48:02 +02:00
Gyorgy Orosz dd3244c08a Made json dump to produce unicode strings in py2 2017-04-13 23:30:47 +02:00
Gyorgy Orosz a9469c8173 Fixed typo 2017-04-13 15:24:14 +02:00
ines 41037f0f07 Remove unused imports 2017-04-13 13:52:11 +02:00
ines 1b92c8d5d5 Use unicode paths on Windows/Python 2 and catch other errors (resolves #970)
try/except here is quite dirty, but it'll at least make sure users see
an error message that explains what's going on
2017-04-10 17:49:51 +02:00
Matthew Honnibal 49e2de900e Add costs property to StepwiseState, to show which moves are gold. 2017-04-10 11:37:04 +02:00
Matthew Honnibal e26577b202 Increment version 2017-04-07 18:45:06 +02:00
Matthew Honnibal 40bf7ecf27 Increment version 2017-04-07 18:44:20 +02:00
Matthew Honnibal 1dca7eeb03 Add unicode declaration on new regression test 2017-04-07 18:09:23 +02:00
ines 887827fc6a Merge branch 'develop' 2017-04-07 17:36:23 +02:00
ines 444dd511c5 Fix xpassing URL test case 2017-04-07 17:36:05 +02:00
ines bf0f15e762 Add / to tokenizer infixes (resolves #891) 2017-04-07 17:30:44 +02:00
ines 00b9011a49 Fix whitespace 2017-04-07 17:29:59 +02:00
ines f9869e4dc5 Merge branch 'master' into develop 2017-04-07 17:23:40 +02:00
Matthew Honnibal 4a6204dbad Merge remote-tracking branch 'origin/develop' 2017-04-07 17:20:09 +02:00
Matthew Honnibal 0513c43bf0 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-07 17:07:10 +02:00
Matthew Honnibal cc36c308f4 Fix noun_chunk rules around coordination
Closes #693.
2017-04-07 17:06:40 +02:00
Matthew Honnibal ab846256cf Merge pull request #966 from recognai/master
Prepare Spanish language for training models, including configuration, rich-UD tag map and tests
2017-04-07 16:12:29 +02:00
Matthew Honnibal 83dca920d4 Rename test #913 -> #957, comment
Make test for #957 reference correct bug. Add comment.

Previous commit closes #957.
2017-04-07 15:54:25 +02:00
Matthew Honnibal be204ed714 Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-07 15:50:14 +02:00
Matthew Honnibal e7b1ee9efd Switch to regex module for URL identification
The URL detection regex was failing on input such as 0.1.2.3, as this
input triggered excessive back-tracking in the builtin re module.
The solution was to switch to the regex module, which behaves better.

Closes #913.
2017-04-07 15:47:36 +02:00
Matthew Honnibal 5887383fc0 Add test for Issue #913: Hang from bad regex 2017-04-07 15:47:27 +02:00
ines 7ea1673072 Fix whitespace 2017-04-07 13:28:48 +02:00
ines 255650dbc2 Add connlu2json converter from explosion/spacy-dev-resources/#11 2017-04-07 13:05:12 +02:00
ines 789ce8a45e Add convert command 2017-04-07 13:04:17 +02:00
ines 9952d3b08a Fix whitespace 2017-04-07 13:02:05 +02:00
ines 47ddce6eb7 Remove unused variable 2017-04-07 13:01:48 +02:00
ines dcf8ab0c47 Merge branch 'develop' 2017-04-07 12:00:09 +02:00
ines 75f9b4c6e2 Fix whitespace 2017-04-07 10:22:18 +02:00
oeg c693d40791 feature(model): Add support for creating the Spanish model, including rich tagset, configuration, and basich tests 2017-04-06 18:48:45 +02:00
oeg 010293fb2f fix(typo): Fixes typo in method calling PseudoProjectivity.deprojectivize, failing with new train cli 2017-04-06 17:33:15 +02:00
ines 808cd6cf7f Add missing tags to verbs (resolves #948) 2017-04-03 18:12:52 +02:00
ines ad8bf1829f Import and combine Portuguese tokenizer exceptions (see #943) 2017-04-01 10:37:42 +02:00
Ines Montani f8b2d9c3b7 Merge pull request #943 from mamoit/master
Portuguese improvements
2017-04-01 10:32:00 +02:00
ines 3b667a24d4 Remove whitespace 2017-04-01 10:21:08 +02:00
ines e71a1f4bd0 Fix download commands in error messages (see #946) 2017-04-01 10:20:57 +02:00
ines 42382d5692 Fix download commands in error messages (see #946) 2017-04-01 10:19:32 +02:00
ines d4a59c254b Remove whitespace 2017-04-01 10:19:01 +02:00
Matthew Honnibal 51882ee2b8 Fix check for setting ent_id in merge 2017-03-31 19:32:01 +02:00
Miguel Almeida 4fde64c4ea Portuguese contractions and some abreviations 2017-03-31 15:52:55 +01:00
Miguel Almeida 465b240bcb Review Portuguese stop words
Mainly to review typos and add missing masculines/feminines
2017-03-31 13:00:47 +01:00
Matthew Honnibal fc3900e5b2 Allow ent_id to be set in Token 2017-03-31 14:00:14 +02:00
Matthew Honnibal 9720103428 Improve attribute handlign in doc.merge(). Still unsatisfying 2017-03-31 13:59:58 +02:00
Matthew Honnibal cfff4e0f61 Improve test 2017-03-31 13:59:32 +02:00
Matthew Honnibal 1bb7b4ca71 Add comment 2017-03-31 13:59:19 +02:00
Matthew Honnibal 725249c59a Add merge_phrase callback in matcher.pyx 2017-03-31 13:58:59 +02:00
Matthew Honnibal e854f28304 Add test for Issue #758
Issue #758 occurs when no actions are available for a single token
doc after merging.
2017-03-31 13:26:25 +02:00
Miguel Almeida c1d020b0a6 Remove "ista" from portuguese stop words 2017-03-31 12:26:13 +01:00
Miguel Almeida 17a1e7a119 Add Portuguese numbers and ordinals 2017-03-31 12:21:01 +01:00
Matthew Honnibal 47a3ef06a6 Unhack deprojetivization, moving it into pipeline
Previously the deprojectivize() call was attached to the transition
system, and only called for German. Instead it should be a separate
process, called after the parser. This makes it available for any
language. Closes #898.
2017-03-31 12:31:50 +02:00
Joshua Reeter 564daf6dec Issue #934 symlink should not convert paths as_posix under windows. 2017-03-30 23:47:45 -05:00
Bruno P. Kinoshita c2d48974bc Fix typos in Portuguese stop words 2017-03-30 21:59:18 +13:00
Matthew Honnibal 0fefdfcbda Merge pull request #935 from ericzhao28/master
Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862)
2017-03-30 02:51:24 +02:00
ines 4759fd437d Merge branch 'master' into develop 2017-03-29 10:37:13 +02:00
ines 7e4befec88 Add Hebrew to init and setup.py 2017-03-29 10:34:57 +02:00
Grégory Howard 9c2996b27f correction of package.py (encoding on open instead of write) 2017-03-29 09:11:02 +02:00
Eric Zhao aafdf6ffb8 Add option to use label karg to determine ent_type in doc.merge 2017-03-28 23:35:03 -07:00
ines 7198cf1c8a Remove unused import 2017-03-26 20:56:05 +02:00
ines 7ceaa1614b Add experimental model init command 2017-03-26 20:51:40 +02:00
Matthew Honnibal 83ba6c247c Fix init of Language without model 2017-03-26 16:46:00 +02:00
Matthew Honnibal fa107f95f6 Remove unused train_config command 2017-03-26 09:28:59 -05:00
Matthew Honnibal df83921f0a Increment version 2017-03-26 09:27:32 -05:00
Matthew Honnibal 92ac3af21d Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-26 09:26:59 -05:00
Matthew Honnibal a9b1f23c7d Enable regression loss for parser 2017-03-26 09:26:30 -05:00
ines c00d997924 Merge branch 'develop' 2017-03-26 15:57:00 +02:00
Matthew Honnibal 2efdbc08ff Make training work with directories 2017-03-26 08:46:44 -05:00
ines 007a2492bd Remove train_config command for now 2017-03-26 15:40:50 +02:00
ines b297fab062 Update error message for missing commands 2017-03-26 15:40:02 +02:00
ines 7f95023fc0 Fix formatting 2017-03-26 15:37:37 +02:00
ines 5901c8f7f0 Update spacy train CLI documentation 2017-03-26 15:33:48 +02:00
Matthew Honnibal 9dcb58aaaf Merge CLI changes 2017-03-26 07:30:45 -05:00
Matthew Honnibal 6b7f7a2060 Connect parser L1 option to train CLI 2017-03-26 07:24:07 -05:00
Matthew Honnibal ed2b106f4d Fix circular import in lemmatizer 2017-03-26 07:17:07 -05:00
Matthew Honnibal dec5571bf3 Update train CLI 2017-03-26 07:16:52 -05:00
ines 53cf2f1c0e Make dev data optional 2017-03-26 11:48:17 +02:00
Matthew Honnibal 5eac089fbe Merge branch 'master' into develop 2017-03-26 04:45:43 -05:00
ines 0fc56e2544 Update flag and defaults 2017-03-26 11:42:11 +02:00
Matthew Honnibal 2f63806ddb Update config when adding label. Re #910 2017-03-25 22:35:44 +01:00
Matthew Honnibal b94286de30 Fix regression test 2017-03-25 22:35:07 +01:00
Matthew Honnibal c748907a66 Fix errors in previous commit 2017-03-25 22:25:01 +01:00
Matthew Honnibal 4f400fa486 Prevent lemmatization of base nouns
Update lemmatizer's base-form check, for change in morphology class.
Closes #903.
2017-03-25 21:51:12 +01:00
Matthew Honnibal 850d35dcb3 Make morphology use int attributes internally
The morphology class was calling the lemmatizer inconsistently,
which some string-valued attributes. This caused Issue #903.
2017-03-25 21:49:10 +01:00
Matthew Honnibal 4454c1b23f Block lemmatization of base-form adjectives
Fixes check that an adjective is a base form (as opposed to a
comparative or superlative), so that it's not lemmatized.
e.g. inner -!> inn. Closes #912.
2017-03-25 21:29:57 +01:00
ines 97814f8da6 Update Windows Python 2 link workaround to use helper functions 2017-03-25 14:04:27 +01:00
ines fdec758113 Add is_windows and is_python2 utility functions 2017-03-25 14:04:02 +01:00
Ines Montani 09837158e4 Merge pull request #921 from solresol/master
Possible solution to #909
2017-03-25 13:51:55 +01:00
Greg Baker b7f714b498 Possible solution to #909 2017-03-25 21:36:38 +11:00
Ines Montani 97cb4d5e3c Merge branch 'master' into master 2017-03-25 10:03:47 +01:00
Iddo Berger da135bd823 add hebrew tokenizer 2017-03-24 18:27:44 +03:00
Matthew Honnibal f40fbc3710 Add test for Issue #910: Resuming entity training 2017-03-23 23:38:57 +01:00
Matthew Honnibal 9c9cd99144 Merge branch 'master' of https://github.com/explosion/spaCy 2017-03-23 11:11:24 +01:00
ines 0035fd9efe Add spacy train work in progress 2017-03-23 11:08:41 +01:00
ines d5ebf583a4 Fix formatting 2017-03-23 11:08:30 +01:00
ines 3f20efe165 Merge branch 'develop'
# Conflicts:
#	spacy/util.py
2017-03-22 17:14:15 +01:00
Ines Montani f86a3a92d5 Merge pull request #899 from raphael0202/duplicate_keys
Remove duplicate keys in [en|fi] language data dicts
2017-03-22 10:20:11 +01:00
Ines Montani 87a2c85e1b Merge pull request #900 from raphael0202/unused_imports
Remove unused import statements
2017-03-22 10:10:43 +01:00
ines ce065e5d65 Fix imports 2017-03-22 10:02:14 +01:00
Andrew Poliakov 07199c3e8b Fix infinite recursion in spacy.info 2017-03-22 11:43:22 +03:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
ines c3a9f73896 Fix writing to file 2017-03-21 12:35:22 +01:00
ines d74aa428ad Fix path 2017-03-21 12:26:00 +01:00
ines 83a999ea83 Change default license from MIT to CC 2017-03-21 12:24:43 +01:00
ines ae46647560 Fix brackets 2017-03-21 12:21:42 +01:00
ines 3e134b5b2b Make sure paths in copytree and rmtree are strings 2017-03-21 12:15:33 +01:00
ines cf0094187e Fetch MANIFEST.in from GitHub as well 2017-03-21 11:32:38 +01:00
ines 09b24bc5a9 Add docs for package command 2017-03-21 11:19:21 +01:00
ines 3f4e3fda1d Update command and fetch file templates from GitHub
While feature is still experimental, this allows files to be modified
without having to ship a new version of spaCy.
2017-03-21 11:17:36 +01:00
ines 5230ed5b98 Move directory check and overwriting/creating dirs to own function 2017-03-21 02:06:53 +01:00