Commit Graph

11718 Commits

Author SHA1 Message Date
Sofie Van Landeghem 6f7e7d88b9
remove cause without apostrophe from norm exceptions (#6636) 2021-01-06 12:30:30 +08:00
Sofie Van Landeghem 87562e470d
fix backticks in docs (#6635) 2020-12-27 22:12:37 +01:00
Sofie Van Landeghem 8df5b7f513
fix documentation of 'path' in tokenizer.to_disk (#6634) 2020-12-27 22:01:06 +01:00
Yosi cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Tim Gates 292c1d6a73
docs: fix simple typo, speficied -> specified (#6611)
There is a small typo in spacy/cli/info.py.

Should read `specified` rather than `speficied`.
2020-12-22 09:14:10 +01:00
Gareth Sparks efc229c3f4
Doc.char_span arg: alignment_mode (#6591)
Currently labeled "mode", actually "alignment_mode"
2020-12-18 09:54:56 +01:00
Ines Montani 7c9a2f298c
Merge pull request #6578 from jenojp/master [ci skip] 2020-12-16 17:31:55 +11:00
Ines Montani d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Ines Montani 4feef6bf9f
Update citation 2020-12-16 15:59:57 +11:00
Jeno Pizarro a6fe35a0f9
Update universe.json 2020-12-15 21:53:20 -05:00
Jeno Pizarro 343a44abe9 Merge branch 'master' of https://github.com/explosion/spaCy 2020-12-15 21:49:46 -05:00
Thomas Bird f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns db2a34d610 Update CITATION to Zenodo 2020-12-14 22:01:24 +01:00
Raf Guns a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Ines Montani 1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani 37c5d7e826
Merge pull request #6542 from adrianeboyd/chore/prepare-v2.3.5
Set version to v2.3.5
2020-12-11 10:33:18 +11:00
Ines Montani fb43a30a71
Merge pull request #6545 from svlandeg/feature/discussions [ci skip] 2020-12-11 10:20:35 +11:00
Ines Montani 76cfd89dea Update site.json 2020-12-11 10:19:42 +11:00
Ines Montani c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
Ines Montani 43a69eecb7 Update site.json 2020-12-11 10:05:21 +11:00
Ines Montani 73896fcbc8 Update README.md 2020-12-11 10:05:19 +11:00
Ines Montani 25186fa431
Merge pull request #6543 from adrianeboyd/docs/install-v2
Docs and extras updates for v2.3.5
2020-12-11 09:53:53 +11:00
svlandeg 4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
svlandeg d156b423ae remove gitter and reddit links 2020-12-10 20:41:02 +01:00
svlandeg 5afa567767 replace gitter with discussions in 101 2020-12-10 20:17:36 +01:00
svlandeg ae1ccf2b04 update link to discussion forum 2020-12-10 20:02:49 +01:00
svlandeg 52cdb12d26 add GH discussions to readme 2020-12-10 19:58:43 +01:00
Adriane Boyd 27bb75e2a0 Docs and extras updates for v2.3.5
* Update install instructions for updated packages

* Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only
compatible with `thinc>=7.4.4`)
2020-12-10 15:34:34 +01:00
Adriane Boyd 7b277661f6 Set version to v2.3.5 2020-12-10 13:32:10 +01:00
Koichi Yasuoka 0afb54ac93
JapaneseTokenizer.pipe added (#6515)
* JapaneseTokenizer.pipe added

For [spacymoji](https://spacy.io/universe/project/spacymoji)  with `Japanese()`.

* DummyTokenizer.pipe added instead
2020-12-08 20:02:23 +01:00
Adriane Boyd df4891bed1
Remove blis python version constraints (#6522)
* Remove blis version constraints

After updating the blis sdist in v0.7.4, remove python version
constraints for blis build and install dependencies.

* Install sdist with --prefer-binary for python 3.5

* Fix duplicate sdist install steps

* Fix sdist install step types

* Fix blis pins in requirements.txt

* Remove wheel hack for python 3.5 from CI
2020-12-08 15:25:19 +01:00
Ines Montani 4e77349106
Merge pull request #6524 from adrianeboyd/bugfix/entity-ruler-subsequent
Fix subsequent pipe detection in EntityRuler
2020-12-08 22:17:28 +11:00
Adriane Boyd 6c221d4841 Fix subsequent pipe detection in EntityRuler
Fix subsequent pipe detection to detect the position of the current
object by comparing the component itself rather than from the factory
name.
2020-12-08 10:01:30 +01:00
Ines Montani b87793a89a
Merge pull request #6523 from adrianeboyd/bugfix/remove-use-chars
Remove non-working --use-chars from train CLI
2020-12-08 09:30:48 +01:00
Adriane Boyd 5ceac425ee Remove non-working --use-chars from train CLI
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00
Adriane Boyd dcecc75270
Improve blis and numpy build dependencies (#6455)
* Fix blis build dependencies

* Add blis with python_version constraints to pyproject.toml
* Add blis to setup_requires

* Remove --only-binary from CI

* Reduce number of builds to speed up CI

* Add hack to install wheel for python 3.5 in linux

* Remove os spec from CI

* Remove detailed numpy build constraints

* Remove detailed numpy build constraints from `pyproject.toml` because
  it is too difficult to maintain for many architectures
  * These constraints are more a reflection of what is available on
    pypi as binary wheels rather than any real build requirements that
    it is necessary for users to follow when building from source
  * Users building their own binary packages will need to enforce the
    constraints that make sense in their environments, e.g., the `conda`
    compatible numpy pins

* Keep the build constraints in `build-constraints.txt` for use with our
  builds
  * Our builds with wheelwright are built against the earliest
    compatible binary versions of numpy on pypi
  * These constraints are documented within the distribution

* Revert "Remove os spec from CI"

This reverts commit 7489476688.
2020-12-08 14:29:34 +08:00
Adriane Boyd e931d3f72b
Move max_length to nlp.make_doc() (#6512)
Move max_length check to `nlp.make_doc()` so that's it's also checked
for `nlp.pipe()`.
2020-12-08 14:24:02 +08:00
Sofie Van Landeghem 52fa46dd58
tested EL scripts with 2.3.4 (#6517) 2020-12-07 20:46:38 +01:00
Adriane Boyd 53c0fb7431
Only set NORM on Token in retokenizer (#6464)
* Only set NORM on Token in retokenizer

Instead of setting `NORM` on both the token and lexeme, set `NORM` only
on the token.

The retokenizer tries to set all possible attributes with
`Token/Lexeme.set_struct_attr` so that it doesn't have to enumerate
which attributes are available for each. `NORM` is the only attribute
that's stored on both and for most cases it doesn't make sense to set
the global norms based on a individual retokenization. For lexeme-only
attributes like `IS_STOP` there's no way to avoid the global side
effects, but I think that `NORM` would be better only on the token.

* Fix test
2020-11-30 09:35:42 +08:00
Adriane Boyd 03ae77e603
Add SPACY as a Matcher attribute (#6463) 2020-11-30 09:34:50 +08:00
Adriane Boyd 3a5cc5f8b4 Set version to v2.3.4 2020-11-26 08:48:52 +01:00
Adriane Boyd e0f5646a4a
Restore cleanup_beam method (#6446) 2020-11-25 13:21:48 +01:00
Jacob Bortell fe9009911a Update rule-based-matching.md (#6421)
* Update rule-based-matching.md

Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc).

Clarified "Type" column header to "Value Type"

* Update rule-based-matching.md

Improved clarity of wording
2020-11-24 16:20:19 +01:00
Jacob Bortell 992723dfac
Add jabortell to the contributors (#6422)
* Add jabortell to the contributors

* Update jabortell.md

Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Adriane Boyd afd744bc05
Update Travis CI pip install steps (#6440) 2020-11-24 14:10:16 +01:00
Adriane Boyd 573f5c863f
Fix tag map clobbering in spacy train (#6437)
Fix bug from #5768 where the tag map is clobbered if a custom tag map
isn't provided.
2020-11-24 13:13:16 +01:00
Adriane Boyd ce18fc6588 Set version to v2.3.3 2020-11-24 10:03:45 +01:00
Adriane Boyd cd61d264ef Set version to v2.3.3.dev0 2020-11-23 13:51:59 +01:00
Sofie Van Landeghem 2af31a8c8d
Bugfix textcat reproducibility on GPU (#6411)
* add seed argument to ParametricAttention layer

* bump thinc to 7.4.3

* set thinc version range

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-23 12:29:35 +01:00