Commit Graph

5151 Commits

Author SHA1 Message Date
ines 526be40823 Add test for 46d8a66 2018-06-29 14:33:12 +02:00
ines f08c871adf Fix typo in Language.from_disk 2018-06-29 14:32:16 +02:00
Matthew Honnibal 46d8a66fef Fix tokenizer serialization if token_match is None 2018-06-29 14:24:46 +02:00
Matthew Honnibal e0860bcfb3 Fix bug when docs are empty 2018-06-29 13:56:29 +02:00
Matthew Honnibal a4d2b0c293 Fix bug when docs are empty 2018-06-29 13:44:25 +02:00
Matthew Honnibal c83fccfe2a Fix output of best model 2018-06-25 23:05:56 +02:00
Matthew Honnibal 5a65418c40 Fix handling of unseen labels in tagger 2018-06-25 22:28:59 +02:00
Matthew Honnibal 5b56aad4c2 Fix handling of unseen labels in tagger 2018-06-25 22:24:54 +02:00
Matthew Honnibal 3aabf621a3 Fix handling of unknown tags in tagger update 2018-06-25 22:01:02 +02:00
Matthew Honnibal 69c900f003 Fix init-model if no vectors provided 2018-06-25 18:26:02 +02:00
Matthew Honnibal 664f89327a Fix init-model if no vectors provided 2018-06-25 17:58:45 +02:00
Matthew Honnibal c4698f5712 Don't collate model unless training succeeds 2018-06-25 16:36:42 +02:00
Matthew Honnibal 24dfbb8a28 Fix model collation 2018-06-25 14:35:24 +02:00
Matthew Honnibal 62237755a4 Import shutil 2018-06-25 13:40:17 +02:00
Matthew Honnibal a040fca99e Import json into cli.train 2018-06-25 11:50:37 +02:00
Matthew Honnibal 2c703d99c2 Fix collation of best models 2018-06-25 01:21:34 +02:00
Matthew Honnibal 9d6a1c57f2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-06-24 23:40:06 +02:00
Matthew Honnibal 2c80b7c013 Collate best model after training 2018-06-24 23:39:52 +02:00
ines 778e5f4da3 Merge branch 'master' into develop 2018-06-11 00:38:04 +02:00
himkt 57311d5d47 replace janome with mecab in the documentation and the test (#2415)
* Add links to Reddit data (see #2401)

* replace janome with mecab in the documentation and the test

* add the assignment
2018-06-11 00:33:13 +02:00
Nour Shalabi a169b79092 Additions to Arabic stop words. (#2422)
* Additions to Arabic stop words.

* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
ines a0017e4909 Merge branch 'master' into develop 2018-05-30 14:10:47 +02:00
ines b8ef9c1000 Fix model names in conftest (see #2379) 2018-05-30 14:10:20 +02:00
ines 4a62486340 Merge branch 'master' into develop 2018-05-30 13:01:01 +02:00
Maciej c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines 3c3a175018 Merge branch 'master' into develop 2018-05-28 18:37:09 +02:00
ansgar-t 9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
ines f7103babd9 Only overwrite warnings filter if set explicitly (resolves #2369)
This way, pre-defined warning filters are respected and users are still able to use the fine-grained warning settings if they like.
2018-05-26 18:44:15 +02:00
ines 330c039106 Merge branch 'master' into develop 2018-05-26 18:30:52 +02:00
James Messinger 4515e96e90 Better formatting for `spacy train` CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Aristo Rinjuang 432ede04af adding more words and rephrasing (#2351)
* adding more words and rephrasing

* adding a contributor

* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Jani Monoses ec62cadf4c Updates to Romanian support (#2354)
* Add back Romanian in conftest

* Romanian lex_attr

* More tokenizer exceptions for Romanian

* Add tests for some Romanian tokenizer exceptions
2018-05-24 11:40:00 +02:00
Matthew Honnibal 5d281cf302 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-22 20:50:59 +02:00
Matthew Honnibal ce458c2428 Fix spacy requirement constraint in package template 2018-05-22 20:50:46 +02:00
Ines Montani 862da5e793 Support pipeline factories via entry points (#2348) 2018-05-22 18:29:45 +02:00
Matthew Honnibal d5af38f80c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-21 17:42:55 +02:00
Matthew Honnibal ee33de8652 Fix unpickling of NER parser 2018-05-21 17:42:40 +02:00
ines f9dbcac8e4 Merge branch 'master' into develop 2018-05-21 02:29:29 +02:00
cclauss f7dcaa1f6b Simplify is_config() and normalize_string_keys() (#2305)
* Simplify is_config() and normalize_string_keys()

* Use __in__ to avoid the nested _ands_ and _ors_.
* Dict comprehension directly tracks with the doc string

* Keep more basic loop in normalize_string_keys

* Whitespace
2018-05-21 01:54:35 +02:00
Ines Montani cae4457c38 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197)
* Add logic to filter out warning IDs via environment variable

Usage: SPACY_WARNING_EXCLUDE=W001,W007

* Add warnings for empty vectors

* Add warning if no word vectors are used in .similarity methods

For example, if only tensors are available in small models – should hopefully clear up some confusion around this

* Capture warnings in tests

* Rename SPACY_WARNING_EXCLUDE to SPACY_WARNING_IGNORE
2018-05-21 01:22:38 +02:00
Matthew Honnibal b096b22c20
Merge pull request #2247 from skrcode/1480
1480 - Implement Fast-Text vectors with subword features
2018-05-21 01:16:21 +02:00
Matthew Honnibal f3b4f6a4ec Merge setup.py 2018-05-20 23:21:00 +02:00
Ines Montani d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
Matthew Honnibal 3eb446e0a5 Require thinc 6.11.1 and prepare for release to spacy-nightly 2018-05-20 19:00:34 +02:00
Matthew Honnibal bdc23dd8c1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-20 18:59:24 +02:00
ines 5401c55c75 Merge branch 'master' into develop 2018-05-20 16:49:40 +02:00
ines b59e3b157f Don't require attrs argument in Doc.retokenize and allow both ints and unicode (resolves #2304) 2018-05-20 15:15:37 +02:00
ines 5768df4f09 Add SimpleFrozenDict util to use as default function argument 2018-05-20 15:13:37 +02:00
Matthew Honnibal 7431e9c87f Fix parser for GPU 2018-05-19 17:24:34 +00:00
Matthew Honnibal 401213fb1f Only warn about unnamed vectors if non-zero sized. 2018-05-19 18:51:55 +02:00