Commit Graph

11044 Commits

Author SHA1 Message Date
Priscilla de Abreu Lopes 39e79fcc86 Bugfix/dep matcher issue 4590 (#4601)
* add contributor agreement for prilopes

* add test for issue #4590

* fix on_match params for DependencyMacther (#4590)
2019-11-07 12:01:06 +01:00
Ines Montani 09cec3e41b
Replace function registries with catalogue (#4584)
* Replace functions registries with catalogue

* Update __init__.py

* Fix test

* Revert unrelated flag [ci skip]
2019-11-07 11:45:22 +01:00
adrianeboyd 0f8678c0b1 Fix DocBin.merge() example (#4599) 2019-11-07 11:26:48 +01:00
walterhenry 5563c42ef5 Fixed typo: Added space between "recognize" and "various" (#4600) 2019-11-06 23:06:36 +01:00
Ines Montani 828ef27a32 Add warnings about 3.8 (resolves #4593) [ci skip] 2019-11-05 18:30:11 +01:00
Ines Montani fed53b1552 Update README.md 2019-11-05 18:26:47 +01:00
Ines Montani 83381018d3 Add load_from_docbin example [ci skip]
TODO: upload the file somewhere
2019-11-05 11:52:43 +01:00
Sofie Van Landeghem 4ec7623288 Fix conllu script (#4579)
* force extensions to avoid clash between example scripts

* fix arg order and default file encoding

* add example config for conllu script

* newline

* move extension definitions to main function

* few more encodings fixes
2019-11-04 20:31:26 +01:00
Matthew Honnibal 4e43c0ba93 Fix multiprocessing for as_tuples=True (#4582) 2019-11-04 20:29:03 +01:00
Ines Montani 4b95587ad4 Update universe.json [ci skip] 2019-11-04 13:55:55 +01:00
Yash Patadia 0c396aeed4 add dframcy to universe.json (#4580) 2019-11-04 13:53:23 +01:00
Ines Montani 3ec231f7e1 Reorganise install_requires 2019-11-04 02:39:28 +01:00
Ines Montani cf4ec88b38 Use latest wasabi 2019-11-04 02:38:45 +01:00
Ines Montani d82630d7c1 Revert "Update azure-pipelines.yml"
This reverts commit ed1060cf59.
2019-11-03 17:48:54 +01:00
Ines Montani ed1060cf59 Update azure-pipelines.yml 2019-11-03 17:48:26 +01:00
Ines Montani 6ec119d976 Add error in debug-data if no dev docs are available (see #4575) 2019-11-02 16:08:11 +01:00
adrianeboyd 56ad3a3988 Add LAS per dependency to Scorer (#4560) 2019-10-31 21:18:16 +01:00
Matthew Honnibal de98d66f87 Set version to v2.2.2 2019-10-31 15:53:31 +01:00
Matthw Honnibal 55f2241d72 Merge branch 'master' of https://github.com/explosion/spaCy 2019-10-31 15:37:52 +01:00
Ines Montani df4c9ae3dc Fix formatting [ci skip] 2019-10-31 15:10:25 +01:00
Ines Montani 59358d9b71
Remove box-decoration-break from entities in displacy (#4564) 2019-10-31 15:09:43 +01:00
Matthw Honnibal 8b9954d1b7 Set version to v2.2.2.dev5 2019-10-31 15:06:19 +01:00
Ines Montani 2c107f02a4 Auto-format [ci skip] 2019-10-31 15:01:56 +01:00
Matthew Honnibal e82306937e Put Tok2Vec refactor behind feature flag (#4563)
* Add back pre-2.2.2 tok2vec

* Add simple tok2vec tests

* Add simple tok2vec tests

* Reformat

* Fix CharacterEmbed in new tok2vec

* Fix legacy tok2vec

* Resolve circular imports

* Fix test for Python 2
2019-10-31 15:01:15 +01:00
Ines Montani 828108a57f Update README.md [ci skip] 2019-10-31 13:23:25 +01:00
Ines Montani 5e9849b60f Auto-format [ci skip] 2019-10-30 19:27:18 +01:00
Ines Montani afe4a428f7
Fix pipeline analysis on remove pipe (#4557)
Validate *after* component is removed, not before
2019-10-30 19:04:17 +01:00
Matthew Honnibal 6b874ef096 Set version to v2.2.2.dev4 2019-10-30 17:36:20 +01:00
Ines Montani 85f2b04c45
Support span._. in component decorator attrs (#4555)
* Support span._. in component decorator attrs

* Adjust error [ci skip]
2019-10-30 17:19:36 +01:00
Ines Montani 4e1de85e43 Update syntax iterators [ci skip] 2019-10-30 14:31:40 +01:00
Ines Montani 726c5dd306 Update universe.json [ci skip] 2019-10-30 13:29:00 +01:00
Neel Kamath 6c036ab57d Add "spaCy Server" to spaCy Universe (#4553)
* Add "spaCy Server" to spaCy Universe

* Accept the spaCy Contributor Agreement
2019-10-30 13:20:46 +01:00
Nipun Sadvilkar 2a5e71232b project: pySBD - Python Sentence Boundary Disambiguation (#4455)
*   project: pySBD - Python Sentence Boundary Disambiguation

* 📝  Update links and description

* 🐛  Fix missing comma

* Update universe.json

pysbd as a spacy component through entrypoints

* 🚨  Fix universe.json

* 📝  Update code_example
2019-10-30 12:13:29 +01:00
Matthew Honnibal c2f5f9f572 Set version to v2.2.2.dev3 2019-10-29 16:37:58 +01:00
Sofie Van Landeghem 33ba9ff464 set encodings explicitly to utf8 (#4551) 2019-10-29 13:16:55 +01:00
Matthew Honnibal 9e210fa7fd
Fix tok2vec structure after model registry refactor (#4549)
The model registry refactor of the Tok2Vec function broke loading models
trained with the previous function, because the model tree was slightly
different. Specifically, the new function wrote:

    concatenate(norm, prefix, suffix, shape)

To build the embedding layer. In the previous implementation, I had used
the operator overloading shortcut:

    ( norm | prefix | suffix | shape )

This actually gets mapped to a binary association, giving something
like:

    concatenate(norm, concatenate(prefix, concatenate(suffix, shape)))

This is a different tree, so the layers iterate differently and we
loaded the weights wrongly.
2019-10-28 23:59:03 +01:00
Matthew Honnibal bade60fe64 Set version to v2.2.2.dev1 2019-10-28 19:09:34 +01:00
Matthew Honnibal b1505380ff Fix training with vectors 2019-10-28 18:06:38 +01:00
Matthew Honnibal a927b3a21e Put new alignment behind flag for v2.2.2 release (#4541)
* Xfail new tokenization test

* Put new alignment behind feature flag

* Move USE_ALIGN to top of the file [ci skip]


Co-authored-by: Ines Montani <ines@ines.io>
2019-10-28 16:12:32 +01:00
Ines Montani a90025b277
Fix serialization of extension attr values in DocBin (#4540) 2019-10-28 16:02:13 +01:00
tamuhey df293f3894 modified gold.align to handle space tokens (#4537)
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2019-10-28 15:44:28 +01:00
adrianeboyd f2bfaa1b38 Filter subtoken matches in merge_subtokens() (#4539)
The `Matcher` in `merge_subtokens()` returns all possible subsequences
of `subtok`, so for sequences of two or more subtoks it's necessary to
filter the matches so that the retokenizer is only merging the longest
matches with no overlapping spans.
2019-10-28 15:40:28 +01:00
Matthew Honnibal d5509e0989 Support Mish activation (requires Thinc 7.3) (#4536)
* Add arch for MishWindowEncoder

* Support mish in tok2vec and conv window >=2

* Pass new tok2vec settings from parser

* Syntax error

* Fix tok2vec setting

* Fix registration of MishWindowEncoder

* Fix receptive field setting

* Fix mish arch

* Pass more options from parser

* Support more tok2vec options in pretrain

* Require thinc 7.3

* Add docs [ci skip]

* Require thinc 7.3.0.dev0 to run CI

* Run black

* Fix typo

* Update Thinc version


Co-authored-by: Ines Montani <ines@ines.io>
2019-10-28 15:16:33 +01:00
Ines Montani 96bb8f2187 Add regression test for #4528 [ci skip] 2019-10-28 14:36:03 +01:00
Matthew Honnibal 02e8adf2c2 Add the spacy_lookups_data to pex file 2019-10-28 14:03:35 +01:00
Ines Montani c5e41247e8 Tidy up and auto-format 2019-10-28 12:43:55 +01:00
Ines Montani 92018b9cd4 Tidy up and auto-format 2019-10-28 12:36:23 +01:00
Matthew Honnibal f0ec7bcb79
Flag to ignore examples with mismatched raw/gold text (#4534)
* Flag to ignore examples with mismatched raw/gold text

After #4525, we're seeing some alignment failures on our OntoNotes data. I think we actually have fixes for most of these cases.

In general it's better to fix the data, but it seems good to allow the GoldCorpus class to just skip cases where the raw text doesn't
match up to the gold words. I think previously we were silently ignoring these cases.

* Try to fix test on Python 2.7
2019-10-28 11:40:12 +01:00
Matthew Honnibal 795699015c
Clarify parser model CPU/GPU code (#4535)
The previous version worked with previous thinc, but only
because some thinc ops happened to have gpu/cpu compatible
implementations. It's better to call the right Ops instance.
2019-10-27 23:43:09 +01:00
Matthw Honnibal 46eecdcb70 Remove print 2019-10-27 22:24:19 +01:00