Add stub files for main cython classes (#8427)

* Add stub files for main API classes

* Add contributor agreement for ezorita

* Update types for ndarray and hash()

* Fix __getitem__ and __iter__

* Add attributes of Doc and Token classes

* Overload type hints for Span.__getitem__

* Fix type hint overload for Span.__getitem__

Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
This commit is contained in:
Eduard Zorita 2021-08-07 12:30:03 +02:00 committed by GitHub
parent 56d4d87aeb
commit 439f30faad
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 881 additions and 0 deletions

106
.github/contributors/ezorita.md vendored Normal file
View File

@ -0,0 +1,106 @@
# spaCy contributor agreement
This spaCy Contributor Agreement (**"SCA"**) is based on the
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
The SCA applies to any contribution that you make to any product or project
managed by us (the **"project"**), and sets out the intellectual property rights
you grant to us in the contributed materials. The term **"us"** shall mean
[ExplosionAI GmbH](https://explosion.ai/legal). The term
**"you"** shall mean the person or entity identified below.
If you agree to be bound by these terms, fill in the information requested
below and include the filled-in version with your first pull request, under the
folder [`.github/contributors/`](/.github/contributors/). The name of the file
should be your GitHub username, with the extension `.md`. For example, the user
example_user would create the file `.github/contributors/example_user.md`.
Read this agreement carefully before signing. These terms and conditions
constitute a binding legal agreement.
## Contributor Agreement
1. The term "contribution" or "contributed materials" means any source code,
object code, patch, tool, sample, graphic, specification, manual,
documentation, or any other material posted or submitted by you to the project.
2. With respect to any worldwide copyrights, or copyright applications and
registrations, in your contribution:
* you hereby assign to us joint ownership, and to the extent that such
assignment is or becomes invalid, ineffective or unenforceable, you hereby
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
royalty-free, unrestricted license to exercise all rights under those
copyrights. This includes, at our option, the right to sublicense these same
rights to third parties through multiple levels of sublicensees or other
licensing arrangements;
* you agree that each of us can do all things in relation to your
contribution as if each of us were the sole owners, and if one of us makes
a derivative work of your contribution, the one who makes the derivative
work (or has it made will be the sole owner of that derivative work;
* you agree that you will not assert any moral rights in your contribution
against us, our licensees or transferees;
* you agree that we may register a copyright in your contribution and
exercise all ownership rights associated with it; and
* you agree that neither of us has any duty to consult with, obtain the
consent of, pay or render an accounting to the other for any use or
distribution of your contribution.
3. With respect to any patents you own, or that you can license without payment
to any third party, you hereby grant to us a perpetual, irrevocable,
non-exclusive, worldwide, no-charge, royalty-free license to:
* make, have made, use, sell, offer to sell, import, and otherwise transfer
your contribution in whole or in part, alone or in combination with or
included in any product, work or materials arising out of the project to
which your contribution was submitted, and
* at our option, to sublicense these same rights to third parties through
multiple levels of sublicensees or other licensing arrangements.
4. Except as set out above, you keep all right, title, and interest in your
contribution. The rights that you grant to us under these terms are effective
on the date you first submitted a contribution to us, even if your submission
took place before the date you sign these terms.
5. You covenant, represent, warrant and agree that:
* Each contribution that you submit is and shall be an original work of
authorship and you can legally grant the rights set out in this SCA;
* to the best of your knowledge, each contribution will not violate any
third party's copyrights, trademarks, patents, or other intellectual
property rights; and
* each contribution shall be in compliance with U.S. export control laws and
other applicable export and import laws. You agree to notify us if you
become aware of any circumstance which would make any of the foregoing
representations inaccurate in any respect. We may publicly disclose your
participation in the project, including the fact that you have signed the SCA.
6. This SCA is governed by the laws of the State of California and applicable
U.S. Federal law. Any choice of law rules will not apply.
7. Please place an “x” on one of the applicable statement below. Please do NOT
mark both statements:
* [x] I am signing on behalf of myself as an individual and no other person
or entity, including my employer, has or will have rights with respect to my
contributions.
* [ ] I am signing on behalf of my employer or a legal entity and I have the
actual authority to contractually bind that entity.
## Contributor Details
| Field | Entry |
|------------------------------- | -------------------- |
| Name | Eduard Zorita |
| Company name (if applicable) | |
| Title or role (if applicable) | |
| Date | 06/17/2021 |
| GitHub username | ezorita |
| Website (optional) | |

61
spacy/lexeme.pyi Normal file
View File

@ -0,0 +1,61 @@
from typing import (
Union,
Any,
)
from thinc.types import Floats1d
from .tokens import Doc, Span, Token
from .vocab import Vocab
class Lexeme:
def __init__(self, vocab: Vocab, orth: int) -> None: ...
def __richcmp__(self, other: Lexeme, op: int) -> bool: ...
def __hash__(self) -> int: ...
def set_attrs(self, **attrs: Any) -> None: ...
def set_flag(self, flag_id: int, value: bool) -> None: ...
def check_flag(self, flag_id: int) -> bool: ...
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
@property
def has_vector(self) -> bool: ...
@property
def vector_norm(self) -> float: ...
vector: Floats1d
rank: str
sentiment: float
@property
def orth_(self) -> str: ...
@property
def text(self) -> str: ...
lower: str
norm: int
shape: int
prefix: int
suffix: int
cluster: int
lang: int
prob: float
lower_: str
norm_: str
shape_: str
prefix_: str
suffix_: str
lang_: str
flags: int
@property
def is_oov(self) -> bool: ...
is_stop: bool
is_alpha: bool
is_ascii: bool
is_digit: bool
is_lower: bool
is_upper: bool
is_title: bool
is_punct: bool
is_space: bool
is_bracket: bool
is_quote: bool
is_left_punct: bool
is_right_punct: bool
is_currency: bool
like_url: bool
like_num: bool
like_email: bool

41
spacy/matcher/matcher.pyi Normal file
View File

@ -0,0 +1,41 @@
from typing import Any, List, Dict, Tuple, Optional, Callable, Union, Iterator, Iterable
from ..vocab import Vocab
from ..tokens import Doc, Span
class Matcher:
def __init__(self, vocab: Vocab, validate: bool = ...) -> None: ...
def __reduce__(self) -> Any: ...
def __len__(self) -> int: ...
def __contains__(self, key: str) -> bool: ...
def add(
self,
key: str,
patterns: List[List[Dict[str, Any]]],
*,
on_match: Optional[
Callable[[Matcher, Doc, int, List[Tuple[Any, ...]]], Any]
] = ...,
greedy: Optional[str] = ...
) -> None: ...
def remove(self, key: str) -> None: ...
def has_key(self, key: Union[str, int]) -> bool: ...
def get(
self, key: Union[str, int], default: Optional[Any] = ...
) -> Tuple[Optional[Callable[[Any], Any]], List[List[Dict[Any, Any]]]]: ...
def pipe(
self,
docs: Iterable[Tuple[Doc, Any]],
batch_size: int = ...,
return_matches: bool = ...,
as_tuples: bool = ...,
) -> Union[
Iterator[Tuple[Tuple[Doc, Any], Any]], Iterator[Tuple[Doc, Any]], Iterator[Doc]
]: ...
def __call__(
self,
doclike: Union[Doc, Span],
*,
as_spans: bool = ...,
allow_missing: bool = ...,
with_alignments: bool = ...
) -> Union[List[Tuple[int, int, int]], List[Span]]: ...

22
spacy/strings.pyi Normal file
View File

@ -0,0 +1,22 @@
from typing import Optional, Iterable, Iterator, Union, Any
from pathlib import Path
def get_string_id(key: str) -> int: ...
class StringStore:
def __init__(
self, strings: Optional[Iterable[str]] = ..., freeze: bool = ...
) -> None: ...
def __getitem__(self, string_or_id: Union[bytes, str, int]) -> Union[str, int]: ...
def as_int(self, key: Union[bytes, str, int]) -> int: ...
def as_string(self, key: Union[bytes, str, int]) -> str: ...
def add(self, string: str) -> int: ...
def __len__(self) -> int: ...
def __contains__(self, string: str) -> bool: ...
def __iter__(self) -> Iterator[str]: ...
def __reduce__(self) -> Any: ...
def to_disk(self, path: Union[str, Path]) -> None: ...
def from_disk(self, path: Union[str, Path]) -> StringStore: ...
def to_bytes(self, **kwargs: Any) -> bytes: ...
def from_bytes(self, bytes_data: bytes, **kwargs: Any) -> StringStore: ...
def _reset_and_load(self, strings: Iterable[str]) -> None: ...

View File

@ -0,0 +1,17 @@
from typing import Dict, Any, Union, List, Tuple
from .doc import Doc
from .span import Span
from .token import Token
class Retokenizer:
def __init__(self, doc: Doc) -> None: ...
def merge(self, span: Span, attrs: Dict[Union[str, int], Any] = ...) -> None: ...
def split(
self,
token: Token,
orths: List[str],
heads: List[Union[Token, Tuple[Token, int]]],
attrs: Dict[Union[str, int], List[Any]] = ...,
) -> None: ...
def __enter__(self) -> Retokenizer: ...
def __exit__(self, *args: Any) -> None: ...

180
spacy/tokens/doc.pyi Normal file
View File

@ -0,0 +1,180 @@
from typing import (
Callable,
Protocol,
Iterable,
Iterator,
Optional,
Union,
Tuple,
List,
Dict,
Any,
overload,
)
from cymem.cymem import Pool
from thinc.types import Floats1d, Floats2d, Ints2d
from .span import Span
from .token import Token
from ._dict_proxies import SpanGroups
from ._retokenize import Retokenizer
from ..lexeme import Lexeme
from ..vocab import Vocab
from .underscore import Underscore
from pathlib import Path
import numpy
class DocMethod(Protocol):
def __call__(self: Doc, *args: Any, **kwargs: Any) -> Any: ...
class Doc:
vocab: Vocab
mem: Pool
spans: SpanGroups
max_length: int
length: int
sentiment: float
cats: Dict[str, float]
user_hooks: Dict[str, Callable[..., Any]]
user_token_hooks: Dict[str, Callable[..., Any]]
user_span_hooks: Dict[str, Callable[..., Any]]
tensor: numpy.ndarray
user_data: Dict[str, Any]
has_unknown_spaces: bool
@classmethod
def set_extension(
cls,
name: str,
default: Optional[Any] = ...,
getter: Optional[Callable[[Doc], Any]] = ...,
setter: Optional[Callable[[Doc, Any], None]] = ...,
method: Optional[DocMethod] = ...,
force: bool = ...,
) -> None: ...
@classmethod
def get_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[DocMethod],
Optional[Callable[[Doc], Any]],
Optional[Callable[[Doc, Any], None]],
]: ...
@classmethod
def has_extension(cls, name: str) -> bool: ...
@classmethod
def remove_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[DocMethod],
Optional[Callable[[Doc], Any]],
Optional[Callable[[Doc, Any], None]],
]: ...
def __init__(
self,
vocab: Vocab,
words: Optional[List[str]] = ...,
spaces: Optional[List[bool]] = ...,
user_data: Optional[Dict[Any, Any]] = ...,
tags: Optional[List[str]] = ...,
pos: Optional[List[str]] = ...,
morphs: Optional[List[str]] = ...,
lemmas: Optional[List[str]] = ...,
heads: Optional[List[int]] = ...,
deps: Optional[List[str]] = ...,
sent_starts: Optional[List[Union[bool, None]]] = ...,
ents: Optional[List[str]] = ...,
) -> None: ...
@property
def _(self) -> Underscore: ...
@property
def is_tagged(self) -> bool: ...
@property
def is_parsed(self) -> bool: ...
@property
def is_nered(self) -> bool: ...
@property
def is_sentenced(self) -> bool: ...
def has_annotation(
self, attr: Union[int, str], *, require_complete: bool = ...
) -> bool: ...
@overload
def __getitem__(self, i: int) -> Token: ...
@overload
def __getitem__(self, i: slice) -> Span: ...
def __iter__(self) -> Iterator[Token]: ...
def __len__(self) -> int: ...
def __unicode__(self) -> str: ...
def __bytes__(self) -> bytes: ...
def __str__(self) -> str: ...
def __repr__(self) -> str: ...
@property
def doc(self) -> Doc: ...
def char_span(
self,
start_idx: int,
end_idx: int,
label: Union[int, str] = ...,
kb_id: Union[int, str] = ...,
vector: Optional[Floats1d] = ...,
alignment_mode: str = ...,
) -> Span: ...
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
@property
def has_vector(self) -> bool: ...
vector: Floats1d
vector_norm: float
@property
def text(self) -> str: ...
@property
def text_with_ws(self) -> str: ...
ents: Tuple[Span]
def set_ents(
self,
entities: List[Span],
*,
blocked: Optional[List[Span]] = ...,
missing: Optional[List[Span]] = ...,
outside: Optional[List[Span]] = ...,
default: str = ...
) -> None: ...
@property
def noun_chunks(self) -> Iterator[Span]: ...
@property
def sents(self) -> Iterator[Span]: ...
@property
def lang(self) -> int: ...
@property
def lang_(self) -> str: ...
def count_by(
self, attr_id: int, exclude: Optional[Any] = ..., counts: Optional[Any] = ...
) -> Dict[Any, int]: ...
def from_array(self, attrs: List[int], array: Ints2d) -> Doc: ...
@staticmethod
def from_docs(
docs: List[Doc],
ensure_whitespace: bool = ...,
attrs: Optional[Union[Tuple[Union[str, int]], List[Union[int, str]]]] = ...,
) -> Doc: ...
def get_lca_matrix(self) -> Ints2d: ...
def copy(self) -> Doc: ...
def to_disk(
self, path: Union[str, Path], *, exclude: Iterable[str] = ...
) -> None: ...
def from_disk(
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
) -> Doc: ...
def to_bytes(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
def from_bytes(
self, bytes_data: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
) -> Doc: ...
def to_dict(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
def from_dict(
self, msg: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
) -> Doc: ...
def extend_tensor(self, tensor: Floats2d) -> None: ...
def retokenize(self) -> Retokenizer: ...
def to_json(self, underscore: Optional[List[str]] = ...) -> Dict[str, Any]: ...
def to_utf8_array(self, nr_char: int = ...) -> Ints2d: ...
@staticmethod
def _get_array_attrs() -> Tuple[Any]: ...

View File

@ -0,0 +1,20 @@
from typing import Any, Dict, Iterator, List, Union
from ..vocab import Vocab
class MorphAnalysis:
def __init__(
self, vocab: Vocab, features: Union[Dict[str, str], str] = ...
) -> None: ...
@classmethod
def from_id(cls, vocab: Vocab, key: Any) -> MorphAnalysis: ...
def __contains__(self, feature: str) -> bool: ...
def __iter__(self) -> Iterator[str]: ...
def __len__(self) -> int: ...
def __hash__(self) -> int: ...
def __eq__(self, other: MorphAnalysis) -> bool: ...
def __ne__(self, other: MorphAnalysis) -> bool: ...
def get(self, field: Any) -> List[str]: ...
def to_json(self) -> str: ...
def to_dict(self) -> Dict[str, str]: ...
def __str__(self) -> str: ...
def __repr__(self) -> str: ...

124
spacy/tokens/span.pyi Normal file
View File

@ -0,0 +1,124 @@
from typing import Callable, Protocol, Iterator, Optional, Union, Tuple, Any, overload
from thinc.types import Floats1d, Ints2d, FloatsXd
from .doc import Doc
from .token import Token
from .underscore import Underscore
from ..lexeme import Lexeme
from ..vocab import Vocab
class SpanMethod(Protocol):
def __call__(self: Span, *args: Any, **kwargs: Any) -> Any: ...
class Span:
@classmethod
def set_extension(
cls,
name: str,
default: Optional[Any] = ...,
getter: Optional[Callable[[Span], Any]] = ...,
setter: Optional[Callable[[Span, Any], None]] = ...,
method: Optional[SpanMethod] = ...,
force: bool = ...,
) -> None: ...
@classmethod
def get_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[SpanMethod],
Optional[Callable[[Span], Any]],
Optional[Callable[[Span, Any], None]],
]: ...
@classmethod
def has_extension(cls, name: str) -> bool: ...
@classmethod
def remove_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[SpanMethod],
Optional[Callable[[Span], Any]],
Optional[Callable[[Span, Any], None]],
]: ...
def __init__(
self,
doc: Doc,
start: int,
end: int,
label: int = ...,
vector: Optional[Floats1d] = ...,
vector_norm: Optional[float] = ...,
kb_id: Optional[int] = ...,
) -> None: ...
def __richcmp__(self, other: Span, op: int) -> bool: ...
def __hash__(self) -> int: ...
def __len__(self) -> int: ...
def __repr__(self) -> str: ...
@overload
def __getitem__(self, i: int) -> Token: ...
@overload
def __getitem__(self, i: slice) -> Span: ...
def __iter__(self) -> Iterator[Token]: ...
@property
def _(self) -> Underscore: ...
def as_doc(self, *, copy_user_data: bool = ...) -> Doc: ...
def get_lca_matrix(self) -> Ints2d: ...
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
@property
def vocab(self) -> Vocab: ...
@property
def sent(self) -> Span: ...
@property
def ents(self) -> Tuple[Span]: ...
@property
def has_vector(self) -> bool: ...
@property
def vector(self) -> Floats1d: ...
@property
def vector_norm(self) -> float: ...
@property
def tensor(self) -> FloatsXd: ...
@property
def sentiment(self) -> float: ...
@property
def text(self) -> str: ...
@property
def text_with_ws(self) -> str: ...
@property
def noun_chunks(self) -> Iterator[Span]: ...
@property
def root(self) -> Token: ...
def char_span(
self,
start_idx: int,
end_idx: int,
label: int = ...,
kb_id: int = ...,
vector: Optional[Floats1d] = ...,
) -> Span: ...
@property
def conjuncts(self) -> Tuple[Token]: ...
@property
def lefts(self) -> Iterator[Token]: ...
@property
def rights(self) -> Iterator[Token]: ...
@property
def n_lefts(self) -> int: ...
@property
def n_rights(self) -> int: ...
@property
def subtree(self) -> Iterator[Token]: ...
start: int
end: int
start_char: int
end_char: int
label: int
kb_id: int
ent_id: int
ent_id_: str
@property
def orth_(self) -> str: ...
@property
def lemma_(self) -> str: ...
label_: str
kb_id_: str

View File

@ -0,0 +1,24 @@
from typing import Any, Dict, Iterable
from .doc import Doc
from .span import Span
class SpanGroup:
def __init__(
self,
doc: Doc,
*,
name: str = ...,
attrs: Dict[str, Any] = ...,
spans: Iterable[Span] = ...
) -> None: ...
def __repr__(self) -> str: ...
@property
def doc(self) -> Doc: ...
@property
def has_overlap(self) -> bool: ...
def __len__(self) -> int: ...
def append(self, span: Span) -> None: ...
def extend(self, spans: Iterable[Span]) -> None: ...
def __getitem__(self, i: int) -> Span: ...
def to_bytes(self) -> bytes: ...
def from_bytes(self, bytes_data: bytes) -> SpanGroup: ...

208
spacy/tokens/token.pyi Normal file
View File

@ -0,0 +1,208 @@
from typing import (
Callable,
Protocol,
Iterator,
Optional,
Union,
Tuple,
Any,
)
from thinc.types import Floats1d, FloatsXd
from .doc import Doc
from .span import Span
from .morphanalysis import MorphAnalysis
from ..lexeme import Lexeme
from ..vocab import Vocab
from .underscore import Underscore
class TokenMethod(Protocol):
def __call__(self: Token, *args: Any, **kwargs: Any) -> Any: ...
class Token:
i: int
doc: Doc
vocab: Vocab
@classmethod
def set_extension(
cls,
name: str,
default: Optional[Any] = ...,
getter: Optional[Callable[[Token], Any]] = ...,
setter: Optional[Callable[[Token, Any], None]] = ...,
method: Optional[TokenMethod] = ...,
force: bool = ...,
) -> None: ...
@classmethod
def get_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[TokenMethod],
Optional[Callable[[Token], Any]],
Optional[Callable[[Token, Any], None]],
]: ...
@classmethod
def has_extension(cls, name: str) -> bool: ...
@classmethod
def remove_extension(
cls, name: str
) -> Tuple[
Optional[Any],
Optional[TokenMethod],
Optional[Callable[[Token], Any]],
Optional[Callable[[Token, Any], None]],
]: ...
def __init__(self, vocab: Vocab, doc: Doc, offset: int) -> None: ...
def __hash__(self) -> int: ...
def __len__(self) -> int: ...
def __unicode__(self) -> str: ...
def __bytes__(self) -> bytes: ...
def __str__(self) -> str: ...
def __repr__(self) -> str: ...
def __richcmp__(self, other: Token, op: int) -> bool: ...
@property
def _(self) -> Underscore: ...
def nbor(self, i: int = ...) -> Token: ...
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
def has_morph(self) -> bool: ...
morph: MorphAnalysis
@property
def lex(self) -> Lexeme: ...
@property
def lex_id(self) -> int: ...
@property
def rank(self) -> int: ...
@property
def text(self) -> str: ...
@property
def text_with_ws(self) -> str: ...
@property
def prob(self) -> float: ...
@property
def sentiment(self) -> float: ...
@property
def lang(self) -> int: ...
@property
def idx(self) -> int: ...
@property
def cluster(self) -> int: ...
@property
def orth(self) -> int: ...
@property
def lower(self) -> int: ...
@property
def norm(self) -> int: ...
@property
def shape(self) -> int: ...
@property
def prefix(self) -> int: ...
@property
def suffix(self) -> int: ...
lemma: int
pos: int
tag: int
dep: int
@property
def has_vector(self) -> bool: ...
@property
def vector(self) -> Floats1d: ...
@property
def vector_norm(self) -> float: ...
@property
def tensor(self) -> Optional[FloatsXd]: ...
@property
def n_lefts(self) -> int: ...
@property
def n_rights(self) -> int: ...
@property
def sent(self) -> Span: ...
sent_start: bool
is_sent_start: Optional[bool]
is_sent_end: Optional[bool]
@property
def lefts(self) -> Iterator[Token]: ...
@property
def rights(self) -> Iterator[Token]: ...
@property
def children(self) -> Iterator[Token]: ...
@property
def subtree(self) -> Iterator[Token]: ...
@property
def left_edge(self) -> Token: ...
@property
def right_edge(self) -> Token: ...
@property
def ancestors(self) -> Iterator[Token]: ...
def is_ancestor(self, descendant: Token) -> bool: ...
def has_head(self) -> bool: ...
head: Token
@property
def conjuncts(self) -> Tuple[Token]: ...
ent_type: int
ent_type_: str
@property
def ent_iob(self) -> int: ...
@classmethod
def iob_strings(cls) -> Tuple[str]: ...
@property
def ent_iob_(self) -> str: ...
ent_id: int
ent_id_: str
ent_kb_id: int
ent_kb_id_: str
@property
def whitespace_(self) -> str: ...
@property
def orth_(self) -> str: ...
@property
def lower_(self) -> str: ...
norm_: str
@property
def shape_(self) -> str: ...
@property
def prefix_(self) -> str: ...
@property
def suffix_(self) -> str: ...
@property
def lang_(self) -> str: ...
lemma_: str
pos_: str
tag_: str
def has_dep(self) -> bool: ...
dep_: str
@property
def is_oov(self) -> bool: ...
@property
def is_stop(self) -> bool: ...
@property
def is_alpha(self) -> bool: ...
@property
def is_ascii(self) -> bool: ...
@property
def is_digit(self) -> bool: ...
@property
def is_lower(self) -> bool: ...
@property
def is_upper(self) -> bool: ...
@property
def is_title(self) -> bool: ...
@property
def is_punct(self) -> bool: ...
@property
def is_space(self) -> bool: ...
@property
def is_bracket(self) -> bool: ...
@property
def is_quote(self) -> bool: ...
@property
def is_left_punct(self) -> bool: ...
@property
def is_right_punct(self) -> bool: ...
@property
def is_currency(self) -> bool: ...
@property
def like_url(self) -> bool: ...
@property
def like_num(self) -> bool: ...
@property
def like_email(self) -> bool: ...

78
spacy/vocab.pyi Normal file
View File

@ -0,0 +1,78 @@
from typing import (
Callable,
Iterator,
Optional,
Union,
Tuple,
List,
Dict,
Any,
)
from thinc.types import Floats1d, FloatsXd
from . import Language
from .strings import StringStore
from .lexeme import Lexeme
from .lookups import Lookups
from .tokens import Doc, Span
from pathlib import Path
def create_vocab(
lang: Language, defaults: Any, vectors_name: Optional[str] = ...
) -> Vocab: ...
class Vocab:
def __init__(
self,
lex_attr_getters: Optional[Dict[str, Callable[[str], Any]]] = ...,
strings: Optional[Union[List[str], StringStore]] = ...,
lookups: Optional[Lookups] = ...,
oov_prob: float = ...,
vectors_name: Optional[str] = ...,
writing_system: Dict[str, Any] = ...,
get_noun_chunks: Optional[Callable[[Union[Doc, Span]], Iterator[Span]]] = ...,
) -> None: ...
@property
def lang(self) -> Language: ...
def __len__(self) -> int: ...
def add_flag(
self, flag_getter: Callable[[str], bool], flag_id: int = ...
) -> int: ...
def __contains__(self, key: str) -> bool: ...
def __iter__(self) -> Iterator[Lexeme]: ...
def __getitem__(self, id_or_string: Union[str, int]) -> Lexeme: ...
@property
def vectors_length(self) -> int: ...
def reset_vectors(
self, *, width: Optional[int] = ..., shape: Optional[int] = ...
) -> None: ...
def prune_vectors(self, nr_row: int, batch_size: int = ...) -> Dict[str, float]: ...
def get_vector(
self,
orth: Union[int, str],
minn: Optional[int] = ...,
maxn: Optional[int] = ...,
) -> FloatsXd: ...
def set_vector(self, orth: Union[int, str], vector: Floats1d) -> None: ...
def has_vector(self, orth: Union[int, str]) -> bool: ...
lookups: Lookups
def to_disk(
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
) -> None: ...
def from_disk(
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
) -> Vocab: ...
def to_bytes(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
def from_bytes(
self, bytes_data: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
) -> Vocab: ...
def pickle_vocab(vocab: Vocab) -> Any: ...
def unpickle_vocab(
sstore: StringStore,
vectors: Any,
morphology: Any,
data_dir: Any,
lex_attr_getters: Any,
lookups: Any,
get_noun_chunks: Any,
) -> Vocab: ...