Commit Graph

8794 Commits

Author SHA1 Message Date
Matthew Honnibal 664f89327a Fix init-model if no vectors provided 2018-06-25 17:58:45 +02:00
Matthew Honnibal c4698f5712 Don't collate model unless training succeeds 2018-06-25 16:36:42 +02:00
Matthew Honnibal 24dfbb8a28 Fix model collation 2018-06-25 14:35:24 +02:00
Matthew Honnibal 62237755a4 Import shutil 2018-06-25 13:40:17 +02:00
Matthew Honnibal a040fca99e Import json into cli.train 2018-06-25 11:50:37 +02:00
Matthew Honnibal 2c703d99c2 Fix collation of best models 2018-06-25 01:21:34 +02:00
Matthew Honnibal 9d6a1c57f2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-06-24 23:40:06 +02:00
Matthew Honnibal 2c80b7c013 Collate best model after training 2018-06-24 23:39:52 +02:00
Matthew Honnibal 5435b071b9 Add make clean command 2018-06-24 23:39:34 +02:00
ines 778e5f4da3 Merge branch 'master' into develop 2018-06-11 00:38:04 +02:00
himkt 57311d5d47 replace janome with mecab in the documentation and the test (#2415)
* Add links to Reddit data (see #2401)

* replace janome with mecab in the documentation and the test

* add the assignment
2018-06-11 00:33:13 +02:00
ines effb55d591 Adjust formatting [ci skip] 2018-06-11 00:29:13 +02:00
Nathan Breit ba6d2cf393 Add EpiTator to Universe (#2429) 2018-06-11 00:24:13 +02:00
Daniel Ruf d6d688914f chore: cache dependencies (#2418)
* chore: cache dependencies

* chore: add CLA
2018-06-11 00:22:41 +02:00
himkt 1a568f2e08 fix wrong documentations (#2423) 2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi d66292f767 fix UD data file extensions (#2425)
* fix UD data files extension

* add contributor agreement for msklvsk
2018-06-08 14:26:11 +02:00
Nour Shalabi a169b79092 Additions to Arabic stop words. (#2422)
* Additions to Arabic stop words.

* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
Matthew Honnibal 12f09313b1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-06-02 17:10:34 +02:00
Matthew Honnibal 4f19fe0f3a Add Makefile 2018-06-02 17:10:15 +02:00
Ines Montani 3f2e3cbd27
Add links to Reddit data (see #2401) 2018-05-31 16:22:43 +02:00
ines a0017e4909 Merge branch 'master' into develop 2018-05-30 14:10:47 +02:00
ines b8ef9c1000 Fix model names in conftest (see #2379) 2018-05-30 14:10:20 +02:00
ines 0baaf836cf Update formatting [ci skip] 2018-05-30 13:32:49 +02:00
ines 3913e18201 Add self-attentive-parser to universe (see #59) 2018-05-30 13:31:28 +02:00
ines 4a62486340 Merge branch 'master' into develop 2018-05-30 13:01:01 +02:00
Maciej c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines 605c663a4c Fix HTML merger examples (see #2390) 2018-05-30 12:22:32 +02:00
ines 3c3a175018 Merge branch 'master' into develop 2018-05-28 18:37:09 +02:00
ansgar-t 9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
ines d0b16aa014 Update list of languages 2018-05-26 18:56:26 +02:00
ines f7103babd9 Only overwrite warnings filter if set explicitly (resolves #2369)
This way, pre-defined warning filters are respected and users are still able to use the fine-grained warning settings if they like.
2018-05-26 18:44:15 +02:00
ines 330c039106 Merge branch 'master' into develop 2018-05-26 18:30:52 +02:00
Samuel Pouyt d85494bfae Added agrement (#2374) 2018-05-26 18:19:08 +02:00
Samuel Pouyt 5f988b8e9c Update _custom.jade (#2372)
It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`
2018-05-26 18:17:12 +02:00
ines d84a830d79 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-26 17:57:05 +02:00
ines fb923b31ea Fix bad HTML example (see #2376) and turn it into section on matcher + components
Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.
2018-05-26 17:57:02 +02:00
James Messinger 4515e96e90 Better formatting for `spacy train` CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Shantam Raj 592834183a corrected spelling (#2359)
changed **interpretted** to **interpreted**
2018-05-24 13:29:52 +02:00
ines 8adb967e0c Fix from source quickstart instructions for Windows
See: https://stackoverflow.com/a/50478036/6400719
2018-05-24 12:42:16 +02:00
Aristo Rinjuang 432ede04af adding more words and rephrasing (#2351)
* adding more words and rephrasing

* adding a contributor

* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Jani Monoses ec62cadf4c Updates to Romanian support (#2354)
* Add back Romanian in conftest

* Romanian lex_attr

* More tokenizer exceptions for Romanian

* Add tests for some Romanian tokenizer exceptions
2018-05-24 11:40:00 +02:00
Matthew Honnibal 5d281cf302 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-22 20:50:59 +02:00
Matthew Honnibal ce458c2428 Fix spacy requirement constraint in package template 2018-05-22 20:50:46 +02:00
Ines Montani 862da5e793 Support pipeline factories via entry points (#2348) 2018-05-22 18:29:45 +02:00
Matthew Honnibal 94ad2d66b6 Require thinc 6.11.2 2018-05-21 19:26:28 +02:00
Matthew Honnibal d5af38f80c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-05-21 17:42:55 +02:00
Matthew Honnibal ee33de8652 Fix unpickling of NER parser 2018-05-21 17:42:40 +02:00
Shantam Raj 1a4682dd0b Update _training.jade (#2340)
* Update _training.jade

Correcting grammar. Replacing "The" with "To".

* Create armsp.md

* Update armsp.md
2018-05-21 11:09:33 +02:00
ines f9dbcac8e4 Merge branch 'master' into develop 2018-05-21 02:29:29 +02:00
cclauss f7dcaa1f6b Simplify is_config() and normalize_string_keys() (#2305)
* Simplify is_config() and normalize_string_keys()

* Use __in__ to avoid the nested _ands_ and _ors_.
* Dict comprehension directly tracks with the doc string

* Keep more basic loop in normalize_string_keys

* Whitespace
2018-05-21 01:54:35 +02:00