Commit Graph

8889 Commits

Author SHA1 Message Date
Paul O'Leary McCann b36f6eabfb Add note that Unidic is required for Japanese (#3017)
This addresses #3001. -POLM
2018-12-06 15:14:10 +01:00
Gavriel Loria ae5601beae Initialize trues to 0.0 in training example (#3004)
* added contributor agreement

* if there are no true positives, precision should be 0.0
2018-12-03 01:33:22 +01:00
Justin DuJardin 33fca8672f fix issue compiling the latest spacy on MacOS 10.3.6 (#2998) 2018-12-02 05:51:11 +01:00
Matthew Honnibal bbaca991ba Set version to v2.0.18 2018-12-01 03:35:09 +01:00
Matthew Honnibal 05b2336ffa Try again to fix OSX build 2018-12-01 03:12:21 +01:00
Matthew Honnibal e1a4b0d7f7 Set version to v2.0.18.dev1 2018-12-01 03:12:12 +01:00
Matthew Honnibal 413530b269 Set version to 2.0.18 2018-12-01 03:00:27 +01:00
Matthew Honnibal 24d52876e1 Set version to v2.0.18.dev0 2018-12-01 02:38:04 +01:00
Matthew Honnibal 4895b2e830 Merge branch 'master' of https://github.com/explosion/spaCy 2018-12-01 02:37:21 +01:00
Matthew Honnibal 3f16af123e Try to fix OSX build error 2018-12-01 02:36:56 +01:00
Matthew Honnibal 61abb1ef70 Remove msgpack dependency, to try to fix #2995 2018-12-01 02:36:41 +01:00
Ines Montani add6469225 Add "new in v2.0.12" note to Span.ents (closes #2986) 2018-11-30 20:50:55 +01:00
Ines Montani c9bdeafbc7 Don't run weird failing test for now 2018-11-30 16:13:40 +01:00
wxv 06820ef6e7 Fix is_ascii documentation and create contributor file (#2988)
Proposed in #2933
2018-11-30 15:57:58 +01:00
Sofie 585de273cd Fix small typo bug in French regexp + relevant unit test (#2980)
* additional unit test for new entr word not in other lists

* bugfix - unit test works

* use _latin_lower instead of alpha_lower for french

* revert back to ALPHA_LOWER (following the code for languages)

* contributor agreement
2018-11-29 20:16:13 +01:00
Ben Batorsky 658f7e0dc8 OntoNotes url fix (#2981)
The website for OntoNotes 5 is: https://catalog.ldc.upenn.edu/LDC2013T19, currently the named entity section has it as https://catalog.ldc.upenn.edu/ldc2013T19.
2018-11-29 19:34:30 +01:00
Adam Schwalm 00566949de Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977)
Fixes #2976
2018-11-28 19:49:33 +01:00
Ines Montani 58757c5684
Update README.rst 2018-11-26 20:56:17 +01:00
Matthew Honnibal 9e2ff2f583
Fix regex pin to harmonize with conda (#2964) 2018-11-26 19:28:54 +01:00
Ines Montani 968aff2f6a
Update tests for pytest 4.x (#2965)
<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Replace marks in params for pytest 4.0 compat ([see here](https://docs.pytest.org/en/latest/deprecations.html#marks-in-pytest-mark-parametrize))
- [x] Un-xfail passing tests (some fixes in a recent update resolved a bunch of issues, but tests were apparently never updated here)

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-26 18:14:57 +01:00
Ines Montani c80c20e1ec Sort languages alphabetically [ci skip] 2018-11-26 15:37:53 +01:00
Marc Puig 98fe1ab259 Catalan Language Support (#2940)
* Catalan language Support

* Ddding Catalan to documentation
2018-11-26 15:25:47 +01:00
Ines Montani 1844bc238a Update universe [ci skip] 2018-11-26 14:16:22 +01:00
Ines Montani 048416f265 Fix formatting 2018-11-26 13:27:41 +01:00
Shawn Cicoria 7601ae0cff fixes symbolic link on py3 and windows (#2949)
* fixes symbolic link on py3 and windows
during setup of spacy using command
python -m spacy link en_core_web_sm en
closes #2948

* Update spacy/compat.py

Co-Authored-By: cicorias <cicorias@users.noreply.github.com>
2018-11-24 15:34:23 +01:00
Ines Montani 696acb0f92 Fix typo [ci skip] 2018-11-24 15:20:57 +01:00
Ines Montani 02fc73ca53
💫 Create random IDs for SVGs to prevent ID clashes (#2927)
Resolves #2924.

## Description
Fixes problem where multiple visualizations in Jupyter notebooks would have clashing arc IDs, resulting in weirdly positioned arc labels. Generating a random ID prefix so even identical parses won't receive the same IDs for consistency (even if effect of ID clash isn't noticable here.)

### Types of change
bug fix

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-15 11:40:10 +01:00
mauryaland 87ce435aff Check if the word is in one of the regular lists specific to each POS (#2886) 2018-11-14 15:58:43 +01:00
Ines Montani dfcc8f02af Fix image [ci skip]
Twitter URL doesn't work on live site
2018-11-14 01:01:33 +01:00
Ines Montani 1aa91e926f Minor formatting changes [ci skip] 2018-11-13 23:59:59 +01:00
Francisco Aranda be99f1cac5 Include universe spec for spacy-wordnet component (#2919)
* feat: include universe spec for spacy-wordnet component

* chore: include spaCy contributor agreement
2018-11-13 23:54:46 +01:00
Daniel Hershcovich d3d419ecc0 Allow input text of length up to max_length, inclusive (#2922) 2018-11-13 16:46:29 +01:00
mikelibg 75e7d503b7 Removed space in docs + added contributor indo (#2909)
* - removed unneeded space in documentation

* - added contributor info
2018-11-08 14:18:25 +01:00
Ines Montani 11db4d2f27 Add script to validate universe json [ci skip] 2018-11-06 12:50:41 +01:00
Ines Montani a9fda638a9 Add spacy-raspberry to universe (closes #2889) 2018-11-06 12:45:50 +01:00
Ines Montani c235ddf44f Add spacy-js to universe [ci-skip] 2018-11-06 12:45:03 +01:00
Matthew Honnibal db08b168a3 Set version to 2.0.17 2018-10-29 23:22:18 +01:00
Matthew Honnibal e2ae25d6f5 Try setting older regex version, to align with conda 2018-10-29 13:39:00 +01:00
Matthew Honnibal a2745d310e Revert "Update regex version"
This reverts commit 62358dd867.
2018-10-28 16:38:56 +01:00
Matthew Honnibal 62358dd867 Update regex version 2018-10-28 16:27:50 +01:00
Matthew Honnibal d4fa9af56f Set version to 2.0.17.dev0 2018-10-28 16:15:26 +01:00
Matthew Honnibal 5a4aeb96b7 Add example showing a fix-up rule for space entities 2018-10-28 16:06:00 +01:00
Matthew Honnibal b2e2bba8b0
Fix missing comma 2018-10-28 00:09:16 +02:00
Wannaphong Phatthiyaphaibun 2d2765fd8a Change PyThaiNLP Url (#2876) 2018-10-27 14:46:07 +02:00
Matthew Honnibal 9447739027 Merge branch 'master' of https://github.com/explosion/spaCy 2018-10-27 00:50:48 +02:00
Matthew Honnibal ad068f51be Fix out-of-bounds access in NER training
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!

This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
2018-10-27 00:46:30 +02:00
Grivaz 57f274b693 raise error when setting overlapping entities as doc.ents (#2880) 2018-10-26 23:29:16 +02:00
Bram Vanroy 071789467e Documentation improvement regarding joblib and SO (#2867)
Some documentation improvements

## Description
1. Fixed the dead URL to joblib
2. Fixed Stack Overflow brand name (with space)

### Types of change
Documentation

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-24 15:19:17 +02:00
Roman 5766d09a5b Redundant ')' in the Stop words' example (#2856)
<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [ ] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-18 10:21:16 +02:00
Ines Montani c6a320cad4 Update version [ci skip] 2018-10-15 16:42:35 +02:00