Commit Graph

1680 Commits

Author SHA1 Message Date
Ines Montani 52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Adriane Boyd 931d80de72
Warning for sudachipy 0.4.5 (#5611) 2020-06-19 12:43:41 +02:00
Ines Montani 6d712f3e06
Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor 2020-06-16 13:49:25 -07:00
Adriane Boyd 02369f91d3 Fix spacy convert argument 2020-06-16 20:41:17 +02:00
Adriane Boyd f0fd77648f Change example title to Dr.
Change example title to Dr. so the current model does exclude the title
in the initial example.
2020-06-16 20:36:21 +02:00
Adriane Boyd a6abdfbc3c Fix numpy.zeros() dtype for Doc.from_array 2020-06-16 20:35:45 +02:00
Adriane Boyd 9aff317ca7 Update POS in tagging example 2020-06-16 20:26:57 +02:00
Adriane Boyd 457babfa0c Update alignment example for new gold.align 2020-06-16 20:22:03 +02:00
Ines Montani 41003a5117 Update Binder version [ci skip] 2020-06-16 17:41:23 +02:00
Ines Montani fd89f44c0c Update Binder URL [ci skip] 2020-06-16 17:34:26 +02:00
Ines Montani 44af53bdd9 Add pkuseg warnings and auto-format [ci skip] 2020-06-16 17:13:35 +02:00
Ines Montani a9e5b840ee Fix typos and auto-format [ci skip] 2020-06-16 16:38:45 +02:00
Ines Montani e9d3e177f0 Merge branch 'master' into v2.3.x 2020-06-16 16:31:38 +02:00
Ines Montani bb54f54369 Fix model accuracy table [ci skip] 2020-06-16 16:10:12 +02:00
Adriane Boyd d5110ffbf2
Documentation updates for v2.3.0 (#5593)
* Update website models for v2.3.0

* Add docs for Chinese word segmentation

* Tighten up Chinese docs section

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Merge branch 'master' into docs/v2.3.0 [ci skip]

* Auto-format and update version

* Update matcher.md

* Update languages and sorting

* Typo in landing page

* Infobox about token_match behavior

* Add meta and basic docs for Japanese

* POS -> TAG in models table

* Add info about lookups for normalization

* Updates to API docs for v2.3

* Update adding norm exceptions for adding languages

* Add --omit-extra-lookups to CLI API docs

* Add initial draft of "What's New in v2.3"

* Add new in v2.3 tags to Chinese and Japanese sections

* Add tokenizer to migration section

* Add new in v2.3 flags to init-model

* Typo

* More what's new in v2.3

Co-authored-by: Ines Montani <ines@ines.io>
2020-06-16 15:37:35 +02:00
Sofie Van Landeghem c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
Martino Mensio de00f967ce
adding spacy-universal-sentence-encoder (#5534)
* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
2020-06-08 20:26:30 +02:00
Sofie Van Landeghem 4d1ba6feb4
add tag variant for 2.3 (#5542) 2020-06-04 19:16:33 +02:00
Ines Montani 810fce3bb1 Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
svlandeg 5f0a91cf37 fix conv-depth parameter 2020-05-29 09:56:29 +02:00
Rajat 8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Ines Montani 262d306eaa unicode -> str consistency 2020-05-24 17:23:00 +02:00
Ines Montani 5d3806e059 unicode -> str consistency 2020-05-24 17:20:58 +02:00
Sofie Van Landeghem ae1c179f3a
Remove the nested quote 2020-05-23 17:58:19 +02:00
Jannis aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal f6078d866a
Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match
Revert token_match priority changes from #4374 and extend token match options
2020-05-22 14:42:51 +02:00
Ines Montani 65c7e82de2 Auto-format and remove 2.3 feature [ci skip] 2020-05-22 13:50:30 +02:00
Adriane Boyd e4a1b5dab1 Rename to url_match
Rename to `url_match` and update docs.
2020-05-22 12:41:03 +02:00
Adriane Boyd 730fa493a4 Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-22 12:18:00 +02:00
Ines Montani ee027de032 Update universe and display of videos [ci skip] 2020-05-21 21:54:23 +02:00
Ines Montani 53da6bd672 Add course to landing [ci skip] 2020-05-21 20:45:33 +02:00
Ines Montani 24f72c669c Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
Kevin Lu c7c4cd5fe1
Changed pyate code example in universe.json 2020-05-20 09:11:32 -07:00
Kevin Lu 0a5b140235
Update universe.json 2020-05-19 20:12:21 -07:00
Sofie Van Landeghem 0d94737857
Feature toggle_pipes (#5378)
* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-18 22:27:10 +02:00
Ines Montani f333c2a011
Merge pull request #5386 from svlandeg/fix/nel-docs 2020-05-10 12:00:09 +02:00
Travis Hoppe d4cc18b746
Added author information for NLPre (#5414)
* Add author links for NLPre and update category

* Add contributor statement
2020-05-08 11:28:54 +02:00
adrianeboyd 4a15b559ba
Clarify Token.pos as UPOS (#5419) 2020-05-08 10:36:25 +02:00
adrianeboyd a2345618f1
Fix Token API docs from #5375 (#5418) 2020-05-08 10:25:02 +02:00
Adriane Boyd 565e0eef73 Add tokenizer option for token match with affixes
To fix the slow tokenizer URL (#4374) and allow `token_match` to take
priority over prefixes and suffixes by default, introduce a new
tokenizer option for a token match pattern that's applied after prefixes
and suffixes but before infixes.
2020-05-05 10:35:33 +02:00
Adriane Boyd 792c8af8cf Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match 2020-05-05 09:25:57 +02:00
svlandeg ebaed7dcfa Few more updates to the EL documentation 2020-04-30 10:17:06 +02:00
adrianeboyd bdff76dede
Various updates/additions to CLI scripts (#5362)
* `debug-data`: determine coverage of provided vectors

* `evaluate`: support `blank:lg` model to make it possible to just evaluate
tokenization

* `init-model`: add option to truncate vectors to N most frequent vectors
from word2vec file

* `train`:

  * if training on GPU, only run evaluation/timing on CPU in the first
    iteration

  * if training is aborted, exit with a non-0 exit status
2020-04-29 12:56:46 +02:00
Sofie Van Landeghem cfdaf99b80
Fix passing of component configuration (#5374)
* add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument

* add fix and test for Issue 5137
2020-04-29 12:56:17 +02:00
Ines Montani 63885c1836 Remove u string and auto-format [ci skip] 2020-04-29 12:54:57 +02:00
Sofie Van Landeghem f67343295d
Update NEL examples and documentation (#5370)
* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
2020-04-29 12:53:53 +02:00
adrianeboyd a6e521cd79
Add is_sent_end token property (#5375)
Reconstruction of the original PR #4697 by @MiniLau.

Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Ines Montani a77754120d
Merge pull request #5177 from nlptechbook/patch-5 2020-04-29 12:52:21 +02:00
Ines Montani 1cbb272a6b
Update website/meta/universe.json 2020-04-29 12:51:44 +02:00
Ines Montani 732629b0dd
Update website/meta/universe.json 2020-04-29 12:51:37 +02:00