Commit Graph

16171 Commits

Author SHA1 Message Date
Matthew Honnibal dd47fbb45f Remove 'apple' extra 2024-10-01 22:24:25 +02:00
DomHudson a61a1d43cf
[Documentation] Replace broken URL in _serialization.mdx (#13641) 2024-09-30 17:45:50 +02:00
Matthew Honnibal 2f1e7ed09a Lint 2024-09-14 11:36:27 +02:00
Matthew Honnibal e2dc9b79e1 Format 2024-09-14 11:29:40 +02:00
Matthew Honnibal 3c3d75015b Set version to v3.7.7 2024-09-14 11:27:32 +02:00
Matthew Honnibal 50aa3b5cbe Merge branch 'master' of https://github.com/explosion/spaCy 2024-09-14 11:09:44 +02:00
Matthew Honnibal 8266031454 Merge numpy version update 2024-09-14 11:08:35 +02:00
Matthew Honnibal 8dcc4b8daf Skip running tests on PRs 2024-09-14 11:07:23 +02:00
Matthew Honnibal 3a635d2c94 Try skipping 686 2024-09-14 00:12:49 +02:00
Matthew Honnibal a0ce61f55a Fix thinc pin 2024-09-13 14:21:03 +02:00
Matthew Honnibal 83b4015b36 Remove aarch 2024-09-13 12:35:50 +02:00
Matthew Honnibal 419bfaf6e7 Update cibuildwheel 2024-09-13 10:44:48 +02:00
Matthew Honnibal 69ecb85fad Set version to v3.8.1 2024-09-13 10:43:40 +02:00
Matthew Honnibal b427597fc8 Set version to v3.8.0 2024-09-11 21:32:26 +02:00
Matthew Honnibal 1869a197c9 Try enabling macos-14 for arm builds 2024-09-11 16:06:57 +02:00
Matthew Honnibal c068e1de1b Fix dependencies 2024-09-11 15:57:52 +02:00
Matthew Honnibal 184e508d9c Update numpy pin 2024-09-11 15:57:17 +02:00
William Mattingly 30f1f33e78
Added Date spaCy to universe (#13415) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:29:03 +02:00
William Mattingly f1a5ff9dba
added spacy whisper to universe (#13418) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:28:00 +02:00
William Mattingly c80dacd046
added spacy annoy to universe (#13416) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:26:21 +02:00
William Mattingly 7fbbb2002a
updated universe for number spacy (#13424) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:25:23 +02:00
William Mattingly 89c1774d43
added bagpipes-spacy to universe (#13425) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:24:06 +02:00
thjbdvlt 081e4e385d
universe-project-presque (#13515) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:21:41 +02:00
thjbdvlt 0190e669c5
universe-package-quelquhui (#13514) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:17:33 +02:00
Oren Halvani 54dc4ee8fb
Added: Constituent-Treelib to: universe.json (#13432) [ci skip]
Co-authored-by: Halvani <>
2024-09-10 14:13:36 +02:00
William Mattingly 5a7ad5572c
added gliner-spacy to universe (#13417) [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:12:52 +02:00
marinelay b18cc94451
Delete unnecessary method (#13441)
Co-authored-by: marinelay <marinelay@gmail.com>
2024-09-09 20:57:13 +02:00
Matthew Honnibal 4cc3ebe74e Format 2024-09-09 20:56:01 +02:00
Matthew Honnibal a019315534 Fix memory zones 2024-09-09 13:49:41 +02:00
Matthew Honnibal 59ac7e6bdb Format 2024-09-09 11:22:52 +02:00
Matthew Honnibal b65491b641 Set version to v3.8.0.dev0 2024-09-09 11:20:23 +02:00
Matthew Honnibal 1b8d560d0e
Support 'memory zones' for user memory management (#13621)
Add a context manage nlp.memory_zone(), which will begin
memory_zone() blocks on the vocab, string store, and potentially
other components.

Example usage:

```
with nlp.memory_zone():
    for text in nlp.pipe(texts):
        do_something(doc)
# do_something(doc) <-- Invalid
```

Once the memory_zone() block expires, spaCy will free any shared
resources that were allocated for the text-processing that occurred
within the memory_zone. If you create Doc objects within a memory
zone, it's invalid to access them once the memory zone is expired.

The purpose of this is that spaCy creates and stores Lexeme objects
in the Vocab that can be shared between multiple Doc objects. It also
interns strings. Normally, spaCy can't know when all Doc objects using
a Lexeme are out-of-scope, so new Lexemes accumulate in the vocab,
causing memory pressure.

Memory zones solve this problem by telling spaCy "okay none of the
documents allocated within this block will be accessed again". This
lets spaCy free all new Lexeme objects and other data that were
created during the block.

The mechanism is general, so memory_zone() context managers can be
added to other components that could benefit from them, e.g. pipeline
components.

I experimented with adding memory zone support to the tokenizer as well,
for its cache. However, this seems unnecessarily complicated. It makes
more sense to just stick a limit on the cache size. This lets spaCy
benefit from the efficiency advantage of the cache better, because
we can maintain a (bounded) cache even if only small batches of
documents are being processed.
2024-09-09 11:19:39 +02:00
ykyogoku 608f65ce40
add Tibetan (#13510) 2024-09-09 11:18:03 +02:00
Muzaffer Cikay acbf2a428f
Add Kurdish Kurmanji language (#13561)
* Add Kurdish Kurmanji language

* Add lex_attrs
2024-09-09 11:15:40 +02:00
Mark Liberko 55db9c2e87
Added gd language folder (#13570)
Implemented a foundational Scottish Gaelic (gd) language option with tokenizer_exceptions and stop_words files.
2024-09-09 11:14:09 +02:00
Matthew Honnibal 319e02545c Set version to 3.7.6 2024-08-20 12:16:08 +02:00
Matthew Honnibal a8accc3396
Use cibuildwheel to build wheels (#13603)
* Add workflow files for cibuildwheel

* Add config for cibuildwheel

* Set version for experimental prerelease

* Try updating cython

* Skip 32-bit windows builds

* Revert "Try updating cython"

This reverts commit c1b794ab5c.

* Try to import cibuildwheel settings from previous setup
2024-08-20 12:15:05 +02:00
Ines Montani 8cda27aefa Add case study [ci skip] 2024-06-26 09:41:23 +02:00
Matthew Honnibal f78e5ce732 Disable extra CI 2024-06-21 14:32:00 +02:00
Sofie Van Landeghem a6d0fc3602
Remove typing-extensions from requirements (#13516) 2024-05-31 19:20:46 +02:00
Sofie Van Landeghem 82fc2ecfa5
Bump version to 3.7.5 (#13493) 2024-05-15 12:11:33 +02:00
Sofie Van Landeghem c195ca4f9c
fix docs for MorphAnalysis.__contains__ (#13433) 2024-05-02 16:46:41 +02:00
Sofie Van Landeghem d3a232f773
Update LICENSE to include 2024 (#13472) 2024-04-30 09:17:59 +02:00
Sofie Van Landeghem ecd85d2618
Update Typer pin and GH actions (#13471)
* update gh actions

* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Alex Strick van Linschoten 045cd43c3f
Fix typos in docs (#13466)
* fix typos

* prettier formatting

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Sofie Van Landeghem 74836524e3
Bump to v5 (#13470) 2024-04-29 10:36:31 +02:00
Sofie Van Landeghem 6d6c10ab9c
Fix CI (#13469)
* Remove hardcoded architecture setting

* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Sofie Van Landeghem 2e2334632b
Fix use_gold_ents behaviour for EntityLinker (#13400)
* fix type annotation in docs

* only restore entities after loss calculation

* restore entities of sample in initialization

* rename overfitting function

* fix EL scorer

* Relax test

* fix formatting

* Update spacy/pipeline/entity_linker.py

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* rename to _ensure_ents

* further rename

* allow for scorer to be None

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Joe Schiff 2e96797696
Convert properties to decorator syntax (#13390) 2024-04-16 11:51:14 +02:00
Sofie Van Landeghem f5e85fa05a
allow weasel 0.4.x (#13409) 2024-04-04 12:55:08 +02:00