mirror of https://github.com/explosion/spaCy.git
Add stub files for main cython classes (#8427)
* Add stub files for main API classes * Add contributor agreement for ezorita * Update types for ndarray and hash() * Fix __getitem__ and __iter__ * Add attributes of Doc and Token classes * Overload type hints for Span.__getitem__ * Fix type hint overload for Span.__getitem__ Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
This commit is contained in:
parent
56d4d87aeb
commit
439f30faad
|
@ -0,0 +1,106 @@
|
|||
# spaCy contributor agreement
|
||||
|
||||
This spaCy Contributor Agreement (**"SCA"**) is based on the
|
||||
[Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf).
|
||||
The SCA applies to any contribution that you make to any product or project
|
||||
managed by us (the **"project"**), and sets out the intellectual property rights
|
||||
you grant to us in the contributed materials. The term **"us"** shall mean
|
||||
[ExplosionAI GmbH](https://explosion.ai/legal). The term
|
||||
**"you"** shall mean the person or entity identified below.
|
||||
|
||||
If you agree to be bound by these terms, fill in the information requested
|
||||
below and include the filled-in version with your first pull request, under the
|
||||
folder [`.github/contributors/`](/.github/contributors/). The name of the file
|
||||
should be your GitHub username, with the extension `.md`. For example, the user
|
||||
example_user would create the file `.github/contributors/example_user.md`.
|
||||
|
||||
Read this agreement carefully before signing. These terms and conditions
|
||||
constitute a binding legal agreement.
|
||||
|
||||
## Contributor Agreement
|
||||
|
||||
1. The term "contribution" or "contributed materials" means any source code,
|
||||
object code, patch, tool, sample, graphic, specification, manual,
|
||||
documentation, or any other material posted or submitted by you to the project.
|
||||
|
||||
2. With respect to any worldwide copyrights, or copyright applications and
|
||||
registrations, in your contribution:
|
||||
|
||||
* you hereby assign to us joint ownership, and to the extent that such
|
||||
assignment is or becomes invalid, ineffective or unenforceable, you hereby
|
||||
grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge,
|
||||
royalty-free, unrestricted license to exercise all rights under those
|
||||
copyrights. This includes, at our option, the right to sublicense these same
|
||||
rights to third parties through multiple levels of sublicensees or other
|
||||
licensing arrangements;
|
||||
|
||||
* you agree that each of us can do all things in relation to your
|
||||
contribution as if each of us were the sole owners, and if one of us makes
|
||||
a derivative work of your contribution, the one who makes the derivative
|
||||
work (or has it made will be the sole owner of that derivative work;
|
||||
|
||||
* you agree that you will not assert any moral rights in your contribution
|
||||
against us, our licensees or transferees;
|
||||
|
||||
* you agree that we may register a copyright in your contribution and
|
||||
exercise all ownership rights associated with it; and
|
||||
|
||||
* you agree that neither of us has any duty to consult with, obtain the
|
||||
consent of, pay or render an accounting to the other for any use or
|
||||
distribution of your contribution.
|
||||
|
||||
3. With respect to any patents you own, or that you can license without payment
|
||||
to any third party, you hereby grant to us a perpetual, irrevocable,
|
||||
non-exclusive, worldwide, no-charge, royalty-free license to:
|
||||
|
||||
* make, have made, use, sell, offer to sell, import, and otherwise transfer
|
||||
your contribution in whole or in part, alone or in combination with or
|
||||
included in any product, work or materials arising out of the project to
|
||||
which your contribution was submitted, and
|
||||
|
||||
* at our option, to sublicense these same rights to third parties through
|
||||
multiple levels of sublicensees or other licensing arrangements.
|
||||
|
||||
4. Except as set out above, you keep all right, title, and interest in your
|
||||
contribution. The rights that you grant to us under these terms are effective
|
||||
on the date you first submitted a contribution to us, even if your submission
|
||||
took place before the date you sign these terms.
|
||||
|
||||
5. You covenant, represent, warrant and agree that:
|
||||
|
||||
* Each contribution that you submit is and shall be an original work of
|
||||
authorship and you can legally grant the rights set out in this SCA;
|
||||
|
||||
* to the best of your knowledge, each contribution will not violate any
|
||||
third party's copyrights, trademarks, patents, or other intellectual
|
||||
property rights; and
|
||||
|
||||
* each contribution shall be in compliance with U.S. export control laws and
|
||||
other applicable export and import laws. You agree to notify us if you
|
||||
become aware of any circumstance which would make any of the foregoing
|
||||
representations inaccurate in any respect. We may publicly disclose your
|
||||
participation in the project, including the fact that you have signed the SCA.
|
||||
|
||||
6. This SCA is governed by the laws of the State of California and applicable
|
||||
U.S. Federal law. Any choice of law rules will not apply.
|
||||
|
||||
7. Please place an “x” on one of the applicable statement below. Please do NOT
|
||||
mark both statements:
|
||||
|
||||
* [x] I am signing on behalf of myself as an individual and no other person
|
||||
or entity, including my employer, has or will have rights with respect to my
|
||||
contributions.
|
||||
|
||||
* [ ] I am signing on behalf of my employer or a legal entity and I have the
|
||||
actual authority to contractually bind that entity.
|
||||
|
||||
## Contributor Details
|
||||
|
||||
| Field | Entry |
|
||||
|------------------------------- | -------------------- |
|
||||
| Name | Eduard Zorita |
|
||||
| Company name (if applicable) | |
|
||||
| Title or role (if applicable) | |
|
||||
| Date | 06/17/2021 |
|
||||
| GitHub username | ezorita |
|
||||
| Website (optional) | |
|
|
@ -0,0 +1,61 @@
|
|||
from typing import (
|
||||
Union,
|
||||
Any,
|
||||
)
|
||||
from thinc.types import Floats1d
|
||||
from .tokens import Doc, Span, Token
|
||||
from .vocab import Vocab
|
||||
|
||||
class Lexeme:
|
||||
def __init__(self, vocab: Vocab, orth: int) -> None: ...
|
||||
def __richcmp__(self, other: Lexeme, op: int) -> bool: ...
|
||||
def __hash__(self) -> int: ...
|
||||
def set_attrs(self, **attrs: Any) -> None: ...
|
||||
def set_flag(self, flag_id: int, value: bool) -> None: ...
|
||||
def check_flag(self, flag_id: int) -> bool: ...
|
||||
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
|
||||
@property
|
||||
def has_vector(self) -> bool: ...
|
||||
@property
|
||||
def vector_norm(self) -> float: ...
|
||||
vector: Floats1d
|
||||
rank: str
|
||||
sentiment: float
|
||||
@property
|
||||
def orth_(self) -> str: ...
|
||||
@property
|
||||
def text(self) -> str: ...
|
||||
lower: str
|
||||
norm: int
|
||||
shape: int
|
||||
prefix: int
|
||||
suffix: int
|
||||
cluster: int
|
||||
lang: int
|
||||
prob: float
|
||||
lower_: str
|
||||
norm_: str
|
||||
shape_: str
|
||||
prefix_: str
|
||||
suffix_: str
|
||||
lang_: str
|
||||
flags: int
|
||||
@property
|
||||
def is_oov(self) -> bool: ...
|
||||
is_stop: bool
|
||||
is_alpha: bool
|
||||
is_ascii: bool
|
||||
is_digit: bool
|
||||
is_lower: bool
|
||||
is_upper: bool
|
||||
is_title: bool
|
||||
is_punct: bool
|
||||
is_space: bool
|
||||
is_bracket: bool
|
||||
is_quote: bool
|
||||
is_left_punct: bool
|
||||
is_right_punct: bool
|
||||
is_currency: bool
|
||||
like_url: bool
|
||||
like_num: bool
|
||||
like_email: bool
|
|
@ -0,0 +1,41 @@
|
|||
from typing import Any, List, Dict, Tuple, Optional, Callable, Union, Iterator, Iterable
|
||||
from ..vocab import Vocab
|
||||
from ..tokens import Doc, Span
|
||||
|
||||
class Matcher:
|
||||
def __init__(self, vocab: Vocab, validate: bool = ...) -> None: ...
|
||||
def __reduce__(self) -> Any: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __contains__(self, key: str) -> bool: ...
|
||||
def add(
|
||||
self,
|
||||
key: str,
|
||||
patterns: List[List[Dict[str, Any]]],
|
||||
*,
|
||||
on_match: Optional[
|
||||
Callable[[Matcher, Doc, int, List[Tuple[Any, ...]]], Any]
|
||||
] = ...,
|
||||
greedy: Optional[str] = ...
|
||||
) -> None: ...
|
||||
def remove(self, key: str) -> None: ...
|
||||
def has_key(self, key: Union[str, int]) -> bool: ...
|
||||
def get(
|
||||
self, key: Union[str, int], default: Optional[Any] = ...
|
||||
) -> Tuple[Optional[Callable[[Any], Any]], List[List[Dict[Any, Any]]]]: ...
|
||||
def pipe(
|
||||
self,
|
||||
docs: Iterable[Tuple[Doc, Any]],
|
||||
batch_size: int = ...,
|
||||
return_matches: bool = ...,
|
||||
as_tuples: bool = ...,
|
||||
) -> Union[
|
||||
Iterator[Tuple[Tuple[Doc, Any], Any]], Iterator[Tuple[Doc, Any]], Iterator[Doc]
|
||||
]: ...
|
||||
def __call__(
|
||||
self,
|
||||
doclike: Union[Doc, Span],
|
||||
*,
|
||||
as_spans: bool = ...,
|
||||
allow_missing: bool = ...,
|
||||
with_alignments: bool = ...
|
||||
) -> Union[List[Tuple[int, int, int]], List[Span]]: ...
|
|
@ -0,0 +1,22 @@
|
|||
from typing import Optional, Iterable, Iterator, Union, Any
|
||||
from pathlib import Path
|
||||
|
||||
def get_string_id(key: str) -> int: ...
|
||||
|
||||
class StringStore:
|
||||
def __init__(
|
||||
self, strings: Optional[Iterable[str]] = ..., freeze: bool = ...
|
||||
) -> None: ...
|
||||
def __getitem__(self, string_or_id: Union[bytes, str, int]) -> Union[str, int]: ...
|
||||
def as_int(self, key: Union[bytes, str, int]) -> int: ...
|
||||
def as_string(self, key: Union[bytes, str, int]) -> str: ...
|
||||
def add(self, string: str) -> int: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __contains__(self, string: str) -> bool: ...
|
||||
def __iter__(self) -> Iterator[str]: ...
|
||||
def __reduce__(self) -> Any: ...
|
||||
def to_disk(self, path: Union[str, Path]) -> None: ...
|
||||
def from_disk(self, path: Union[str, Path]) -> StringStore: ...
|
||||
def to_bytes(self, **kwargs: Any) -> bytes: ...
|
||||
def from_bytes(self, bytes_data: bytes, **kwargs: Any) -> StringStore: ...
|
||||
def _reset_and_load(self, strings: Iterable[str]) -> None: ...
|
|
@ -0,0 +1,17 @@
|
|||
from typing import Dict, Any, Union, List, Tuple
|
||||
from .doc import Doc
|
||||
from .span import Span
|
||||
from .token import Token
|
||||
|
||||
class Retokenizer:
|
||||
def __init__(self, doc: Doc) -> None: ...
|
||||
def merge(self, span: Span, attrs: Dict[Union[str, int], Any] = ...) -> None: ...
|
||||
def split(
|
||||
self,
|
||||
token: Token,
|
||||
orths: List[str],
|
||||
heads: List[Union[Token, Tuple[Token, int]]],
|
||||
attrs: Dict[Union[str, int], List[Any]] = ...,
|
||||
) -> None: ...
|
||||
def __enter__(self) -> Retokenizer: ...
|
||||
def __exit__(self, *args: Any) -> None: ...
|
|
@ -0,0 +1,180 @@
|
|||
from typing import (
|
||||
Callable,
|
||||
Protocol,
|
||||
Iterable,
|
||||
Iterator,
|
||||
Optional,
|
||||
Union,
|
||||
Tuple,
|
||||
List,
|
||||
Dict,
|
||||
Any,
|
||||
overload,
|
||||
)
|
||||
from cymem.cymem import Pool
|
||||
from thinc.types import Floats1d, Floats2d, Ints2d
|
||||
from .span import Span
|
||||
from .token import Token
|
||||
from ._dict_proxies import SpanGroups
|
||||
from ._retokenize import Retokenizer
|
||||
from ..lexeme import Lexeme
|
||||
from ..vocab import Vocab
|
||||
from .underscore import Underscore
|
||||
from pathlib import Path
|
||||
import numpy
|
||||
|
||||
class DocMethod(Protocol):
|
||||
def __call__(self: Doc, *args: Any, **kwargs: Any) -> Any: ...
|
||||
|
||||
class Doc:
|
||||
vocab: Vocab
|
||||
mem: Pool
|
||||
spans: SpanGroups
|
||||
max_length: int
|
||||
length: int
|
||||
sentiment: float
|
||||
cats: Dict[str, float]
|
||||
user_hooks: Dict[str, Callable[..., Any]]
|
||||
user_token_hooks: Dict[str, Callable[..., Any]]
|
||||
user_span_hooks: Dict[str, Callable[..., Any]]
|
||||
tensor: numpy.ndarray
|
||||
user_data: Dict[str, Any]
|
||||
has_unknown_spaces: bool
|
||||
@classmethod
|
||||
def set_extension(
|
||||
cls,
|
||||
name: str,
|
||||
default: Optional[Any] = ...,
|
||||
getter: Optional[Callable[[Doc], Any]] = ...,
|
||||
setter: Optional[Callable[[Doc, Any], None]] = ...,
|
||||
method: Optional[DocMethod] = ...,
|
||||
force: bool = ...,
|
||||
) -> None: ...
|
||||
@classmethod
|
||||
def get_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[DocMethod],
|
||||
Optional[Callable[[Doc], Any]],
|
||||
Optional[Callable[[Doc, Any], None]],
|
||||
]: ...
|
||||
@classmethod
|
||||
def has_extension(cls, name: str) -> bool: ...
|
||||
@classmethod
|
||||
def remove_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[DocMethod],
|
||||
Optional[Callable[[Doc], Any]],
|
||||
Optional[Callable[[Doc, Any], None]],
|
||||
]: ...
|
||||
def __init__(
|
||||
self,
|
||||
vocab: Vocab,
|
||||
words: Optional[List[str]] = ...,
|
||||
spaces: Optional[List[bool]] = ...,
|
||||
user_data: Optional[Dict[Any, Any]] = ...,
|
||||
tags: Optional[List[str]] = ...,
|
||||
pos: Optional[List[str]] = ...,
|
||||
morphs: Optional[List[str]] = ...,
|
||||
lemmas: Optional[List[str]] = ...,
|
||||
heads: Optional[List[int]] = ...,
|
||||
deps: Optional[List[str]] = ...,
|
||||
sent_starts: Optional[List[Union[bool, None]]] = ...,
|
||||
ents: Optional[List[str]] = ...,
|
||||
) -> None: ...
|
||||
@property
|
||||
def _(self) -> Underscore: ...
|
||||
@property
|
||||
def is_tagged(self) -> bool: ...
|
||||
@property
|
||||
def is_parsed(self) -> bool: ...
|
||||
@property
|
||||
def is_nered(self) -> bool: ...
|
||||
@property
|
||||
def is_sentenced(self) -> bool: ...
|
||||
def has_annotation(
|
||||
self, attr: Union[int, str], *, require_complete: bool = ...
|
||||
) -> bool: ...
|
||||
@overload
|
||||
def __getitem__(self, i: int) -> Token: ...
|
||||
@overload
|
||||
def __getitem__(self, i: slice) -> Span: ...
|
||||
def __iter__(self) -> Iterator[Token]: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __unicode__(self) -> str: ...
|
||||
def __bytes__(self) -> bytes: ...
|
||||
def __str__(self) -> str: ...
|
||||
def __repr__(self) -> str: ...
|
||||
@property
|
||||
def doc(self) -> Doc: ...
|
||||
def char_span(
|
||||
self,
|
||||
start_idx: int,
|
||||
end_idx: int,
|
||||
label: Union[int, str] = ...,
|
||||
kb_id: Union[int, str] = ...,
|
||||
vector: Optional[Floats1d] = ...,
|
||||
alignment_mode: str = ...,
|
||||
) -> Span: ...
|
||||
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
|
||||
@property
|
||||
def has_vector(self) -> bool: ...
|
||||
vector: Floats1d
|
||||
vector_norm: float
|
||||
@property
|
||||
def text(self) -> str: ...
|
||||
@property
|
||||
def text_with_ws(self) -> str: ...
|
||||
ents: Tuple[Span]
|
||||
def set_ents(
|
||||
self,
|
||||
entities: List[Span],
|
||||
*,
|
||||
blocked: Optional[List[Span]] = ...,
|
||||
missing: Optional[List[Span]] = ...,
|
||||
outside: Optional[List[Span]] = ...,
|
||||
default: str = ...
|
||||
) -> None: ...
|
||||
@property
|
||||
def noun_chunks(self) -> Iterator[Span]: ...
|
||||
@property
|
||||
def sents(self) -> Iterator[Span]: ...
|
||||
@property
|
||||
def lang(self) -> int: ...
|
||||
@property
|
||||
def lang_(self) -> str: ...
|
||||
def count_by(
|
||||
self, attr_id: int, exclude: Optional[Any] = ..., counts: Optional[Any] = ...
|
||||
) -> Dict[Any, int]: ...
|
||||
def from_array(self, attrs: List[int], array: Ints2d) -> Doc: ...
|
||||
@staticmethod
|
||||
def from_docs(
|
||||
docs: List[Doc],
|
||||
ensure_whitespace: bool = ...,
|
||||
attrs: Optional[Union[Tuple[Union[str, int]], List[Union[int, str]]]] = ...,
|
||||
) -> Doc: ...
|
||||
def get_lca_matrix(self) -> Ints2d: ...
|
||||
def copy(self) -> Doc: ...
|
||||
def to_disk(
|
||||
self, path: Union[str, Path], *, exclude: Iterable[str] = ...
|
||||
) -> None: ...
|
||||
def from_disk(
|
||||
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> Doc: ...
|
||||
def to_bytes(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
|
||||
def from_bytes(
|
||||
self, bytes_data: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> Doc: ...
|
||||
def to_dict(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
|
||||
def from_dict(
|
||||
self, msg: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> Doc: ...
|
||||
def extend_tensor(self, tensor: Floats2d) -> None: ...
|
||||
def retokenize(self) -> Retokenizer: ...
|
||||
def to_json(self, underscore: Optional[List[str]] = ...) -> Dict[str, Any]: ...
|
||||
def to_utf8_array(self, nr_char: int = ...) -> Ints2d: ...
|
||||
@staticmethod
|
||||
def _get_array_attrs() -> Tuple[Any]: ...
|
|
@ -0,0 +1,20 @@
|
|||
from typing import Any, Dict, Iterator, List, Union
|
||||
from ..vocab import Vocab
|
||||
|
||||
class MorphAnalysis:
|
||||
def __init__(
|
||||
self, vocab: Vocab, features: Union[Dict[str, str], str] = ...
|
||||
) -> None: ...
|
||||
@classmethod
|
||||
def from_id(cls, vocab: Vocab, key: Any) -> MorphAnalysis: ...
|
||||
def __contains__(self, feature: str) -> bool: ...
|
||||
def __iter__(self) -> Iterator[str]: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __hash__(self) -> int: ...
|
||||
def __eq__(self, other: MorphAnalysis) -> bool: ...
|
||||
def __ne__(self, other: MorphAnalysis) -> bool: ...
|
||||
def get(self, field: Any) -> List[str]: ...
|
||||
def to_json(self) -> str: ...
|
||||
def to_dict(self) -> Dict[str, str]: ...
|
||||
def __str__(self) -> str: ...
|
||||
def __repr__(self) -> str: ...
|
|
@ -0,0 +1,124 @@
|
|||
from typing import Callable, Protocol, Iterator, Optional, Union, Tuple, Any, overload
|
||||
from thinc.types import Floats1d, Ints2d, FloatsXd
|
||||
from .doc import Doc
|
||||
from .token import Token
|
||||
from .underscore import Underscore
|
||||
from ..lexeme import Lexeme
|
||||
from ..vocab import Vocab
|
||||
|
||||
class SpanMethod(Protocol):
|
||||
def __call__(self: Span, *args: Any, **kwargs: Any) -> Any: ...
|
||||
|
||||
class Span:
|
||||
@classmethod
|
||||
def set_extension(
|
||||
cls,
|
||||
name: str,
|
||||
default: Optional[Any] = ...,
|
||||
getter: Optional[Callable[[Span], Any]] = ...,
|
||||
setter: Optional[Callable[[Span, Any], None]] = ...,
|
||||
method: Optional[SpanMethod] = ...,
|
||||
force: bool = ...,
|
||||
) -> None: ...
|
||||
@classmethod
|
||||
def get_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[SpanMethod],
|
||||
Optional[Callable[[Span], Any]],
|
||||
Optional[Callable[[Span, Any], None]],
|
||||
]: ...
|
||||
@classmethod
|
||||
def has_extension(cls, name: str) -> bool: ...
|
||||
@classmethod
|
||||
def remove_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[SpanMethod],
|
||||
Optional[Callable[[Span], Any]],
|
||||
Optional[Callable[[Span, Any], None]],
|
||||
]: ...
|
||||
def __init__(
|
||||
self,
|
||||
doc: Doc,
|
||||
start: int,
|
||||
end: int,
|
||||
label: int = ...,
|
||||
vector: Optional[Floats1d] = ...,
|
||||
vector_norm: Optional[float] = ...,
|
||||
kb_id: Optional[int] = ...,
|
||||
) -> None: ...
|
||||
def __richcmp__(self, other: Span, op: int) -> bool: ...
|
||||
def __hash__(self) -> int: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __repr__(self) -> str: ...
|
||||
@overload
|
||||
def __getitem__(self, i: int) -> Token: ...
|
||||
@overload
|
||||
def __getitem__(self, i: slice) -> Span: ...
|
||||
def __iter__(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def _(self) -> Underscore: ...
|
||||
def as_doc(self, *, copy_user_data: bool = ...) -> Doc: ...
|
||||
def get_lca_matrix(self) -> Ints2d: ...
|
||||
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
|
||||
@property
|
||||
def vocab(self) -> Vocab: ...
|
||||
@property
|
||||
def sent(self) -> Span: ...
|
||||
@property
|
||||
def ents(self) -> Tuple[Span]: ...
|
||||
@property
|
||||
def has_vector(self) -> bool: ...
|
||||
@property
|
||||
def vector(self) -> Floats1d: ...
|
||||
@property
|
||||
def vector_norm(self) -> float: ...
|
||||
@property
|
||||
def tensor(self) -> FloatsXd: ...
|
||||
@property
|
||||
def sentiment(self) -> float: ...
|
||||
@property
|
||||
def text(self) -> str: ...
|
||||
@property
|
||||
def text_with_ws(self) -> str: ...
|
||||
@property
|
||||
def noun_chunks(self) -> Iterator[Span]: ...
|
||||
@property
|
||||
def root(self) -> Token: ...
|
||||
def char_span(
|
||||
self,
|
||||
start_idx: int,
|
||||
end_idx: int,
|
||||
label: int = ...,
|
||||
kb_id: int = ...,
|
||||
vector: Optional[Floats1d] = ...,
|
||||
) -> Span: ...
|
||||
@property
|
||||
def conjuncts(self) -> Tuple[Token]: ...
|
||||
@property
|
||||
def lefts(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def rights(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def n_lefts(self) -> int: ...
|
||||
@property
|
||||
def n_rights(self) -> int: ...
|
||||
@property
|
||||
def subtree(self) -> Iterator[Token]: ...
|
||||
start: int
|
||||
end: int
|
||||
start_char: int
|
||||
end_char: int
|
||||
label: int
|
||||
kb_id: int
|
||||
ent_id: int
|
||||
ent_id_: str
|
||||
@property
|
||||
def orth_(self) -> str: ...
|
||||
@property
|
||||
def lemma_(self) -> str: ...
|
||||
label_: str
|
||||
kb_id_: str
|
|
@ -0,0 +1,24 @@
|
|||
from typing import Any, Dict, Iterable
|
||||
from .doc import Doc
|
||||
from .span import Span
|
||||
|
||||
class SpanGroup:
|
||||
def __init__(
|
||||
self,
|
||||
doc: Doc,
|
||||
*,
|
||||
name: str = ...,
|
||||
attrs: Dict[str, Any] = ...,
|
||||
spans: Iterable[Span] = ...
|
||||
) -> None: ...
|
||||
def __repr__(self) -> str: ...
|
||||
@property
|
||||
def doc(self) -> Doc: ...
|
||||
@property
|
||||
def has_overlap(self) -> bool: ...
|
||||
def __len__(self) -> int: ...
|
||||
def append(self, span: Span) -> None: ...
|
||||
def extend(self, spans: Iterable[Span]) -> None: ...
|
||||
def __getitem__(self, i: int) -> Span: ...
|
||||
def to_bytes(self) -> bytes: ...
|
||||
def from_bytes(self, bytes_data: bytes) -> SpanGroup: ...
|
|
@ -0,0 +1,208 @@
|
|||
from typing import (
|
||||
Callable,
|
||||
Protocol,
|
||||
Iterator,
|
||||
Optional,
|
||||
Union,
|
||||
Tuple,
|
||||
Any,
|
||||
)
|
||||
from thinc.types import Floats1d, FloatsXd
|
||||
from .doc import Doc
|
||||
from .span import Span
|
||||
from .morphanalysis import MorphAnalysis
|
||||
from ..lexeme import Lexeme
|
||||
from ..vocab import Vocab
|
||||
from .underscore import Underscore
|
||||
|
||||
class TokenMethod(Protocol):
|
||||
def __call__(self: Token, *args: Any, **kwargs: Any) -> Any: ...
|
||||
|
||||
class Token:
|
||||
i: int
|
||||
doc: Doc
|
||||
vocab: Vocab
|
||||
@classmethod
|
||||
def set_extension(
|
||||
cls,
|
||||
name: str,
|
||||
default: Optional[Any] = ...,
|
||||
getter: Optional[Callable[[Token], Any]] = ...,
|
||||
setter: Optional[Callable[[Token, Any], None]] = ...,
|
||||
method: Optional[TokenMethod] = ...,
|
||||
force: bool = ...,
|
||||
) -> None: ...
|
||||
@classmethod
|
||||
def get_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[TokenMethod],
|
||||
Optional[Callable[[Token], Any]],
|
||||
Optional[Callable[[Token, Any], None]],
|
||||
]: ...
|
||||
@classmethod
|
||||
def has_extension(cls, name: str) -> bool: ...
|
||||
@classmethod
|
||||
def remove_extension(
|
||||
cls, name: str
|
||||
) -> Tuple[
|
||||
Optional[Any],
|
||||
Optional[TokenMethod],
|
||||
Optional[Callable[[Token], Any]],
|
||||
Optional[Callable[[Token, Any], None]],
|
||||
]: ...
|
||||
def __init__(self, vocab: Vocab, doc: Doc, offset: int) -> None: ...
|
||||
def __hash__(self) -> int: ...
|
||||
def __len__(self) -> int: ...
|
||||
def __unicode__(self) -> str: ...
|
||||
def __bytes__(self) -> bytes: ...
|
||||
def __str__(self) -> str: ...
|
||||
def __repr__(self) -> str: ...
|
||||
def __richcmp__(self, other: Token, op: int) -> bool: ...
|
||||
@property
|
||||
def _(self) -> Underscore: ...
|
||||
def nbor(self, i: int = ...) -> Token: ...
|
||||
def similarity(self, other: Union[Doc, Span, Token, Lexeme]) -> float: ...
|
||||
def has_morph(self) -> bool: ...
|
||||
morph: MorphAnalysis
|
||||
@property
|
||||
def lex(self) -> Lexeme: ...
|
||||
@property
|
||||
def lex_id(self) -> int: ...
|
||||
@property
|
||||
def rank(self) -> int: ...
|
||||
@property
|
||||
def text(self) -> str: ...
|
||||
@property
|
||||
def text_with_ws(self) -> str: ...
|
||||
@property
|
||||
def prob(self) -> float: ...
|
||||
@property
|
||||
def sentiment(self) -> float: ...
|
||||
@property
|
||||
def lang(self) -> int: ...
|
||||
@property
|
||||
def idx(self) -> int: ...
|
||||
@property
|
||||
def cluster(self) -> int: ...
|
||||
@property
|
||||
def orth(self) -> int: ...
|
||||
@property
|
||||
def lower(self) -> int: ...
|
||||
@property
|
||||
def norm(self) -> int: ...
|
||||
@property
|
||||
def shape(self) -> int: ...
|
||||
@property
|
||||
def prefix(self) -> int: ...
|
||||
@property
|
||||
def suffix(self) -> int: ...
|
||||
lemma: int
|
||||
pos: int
|
||||
tag: int
|
||||
dep: int
|
||||
@property
|
||||
def has_vector(self) -> bool: ...
|
||||
@property
|
||||
def vector(self) -> Floats1d: ...
|
||||
@property
|
||||
def vector_norm(self) -> float: ...
|
||||
@property
|
||||
def tensor(self) -> Optional[FloatsXd]: ...
|
||||
@property
|
||||
def n_lefts(self) -> int: ...
|
||||
@property
|
||||
def n_rights(self) -> int: ...
|
||||
@property
|
||||
def sent(self) -> Span: ...
|
||||
sent_start: bool
|
||||
is_sent_start: Optional[bool]
|
||||
is_sent_end: Optional[bool]
|
||||
@property
|
||||
def lefts(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def rights(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def children(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def subtree(self) -> Iterator[Token]: ...
|
||||
@property
|
||||
def left_edge(self) -> Token: ...
|
||||
@property
|
||||
def right_edge(self) -> Token: ...
|
||||
@property
|
||||
def ancestors(self) -> Iterator[Token]: ...
|
||||
def is_ancestor(self, descendant: Token) -> bool: ...
|
||||
def has_head(self) -> bool: ...
|
||||
head: Token
|
||||
@property
|
||||
def conjuncts(self) -> Tuple[Token]: ...
|
||||
ent_type: int
|
||||
ent_type_: str
|
||||
@property
|
||||
def ent_iob(self) -> int: ...
|
||||
@classmethod
|
||||
def iob_strings(cls) -> Tuple[str]: ...
|
||||
@property
|
||||
def ent_iob_(self) -> str: ...
|
||||
ent_id: int
|
||||
ent_id_: str
|
||||
ent_kb_id: int
|
||||
ent_kb_id_: str
|
||||
@property
|
||||
def whitespace_(self) -> str: ...
|
||||
@property
|
||||
def orth_(self) -> str: ...
|
||||
@property
|
||||
def lower_(self) -> str: ...
|
||||
norm_: str
|
||||
@property
|
||||
def shape_(self) -> str: ...
|
||||
@property
|
||||
def prefix_(self) -> str: ...
|
||||
@property
|
||||
def suffix_(self) -> str: ...
|
||||
@property
|
||||
def lang_(self) -> str: ...
|
||||
lemma_: str
|
||||
pos_: str
|
||||
tag_: str
|
||||
def has_dep(self) -> bool: ...
|
||||
dep_: str
|
||||
@property
|
||||
def is_oov(self) -> bool: ...
|
||||
@property
|
||||
def is_stop(self) -> bool: ...
|
||||
@property
|
||||
def is_alpha(self) -> bool: ...
|
||||
@property
|
||||
def is_ascii(self) -> bool: ...
|
||||
@property
|
||||
def is_digit(self) -> bool: ...
|
||||
@property
|
||||
def is_lower(self) -> bool: ...
|
||||
@property
|
||||
def is_upper(self) -> bool: ...
|
||||
@property
|
||||
def is_title(self) -> bool: ...
|
||||
@property
|
||||
def is_punct(self) -> bool: ...
|
||||
@property
|
||||
def is_space(self) -> bool: ...
|
||||
@property
|
||||
def is_bracket(self) -> bool: ...
|
||||
@property
|
||||
def is_quote(self) -> bool: ...
|
||||
@property
|
||||
def is_left_punct(self) -> bool: ...
|
||||
@property
|
||||
def is_right_punct(self) -> bool: ...
|
||||
@property
|
||||
def is_currency(self) -> bool: ...
|
||||
@property
|
||||
def like_url(self) -> bool: ...
|
||||
@property
|
||||
def like_num(self) -> bool: ...
|
||||
@property
|
||||
def like_email(self) -> bool: ...
|
|
@ -0,0 +1,78 @@
|
|||
from typing import (
|
||||
Callable,
|
||||
Iterator,
|
||||
Optional,
|
||||
Union,
|
||||
Tuple,
|
||||
List,
|
||||
Dict,
|
||||
Any,
|
||||
)
|
||||
from thinc.types import Floats1d, FloatsXd
|
||||
from . import Language
|
||||
from .strings import StringStore
|
||||
from .lexeme import Lexeme
|
||||
from .lookups import Lookups
|
||||
from .tokens import Doc, Span
|
||||
from pathlib import Path
|
||||
|
||||
def create_vocab(
|
||||
lang: Language, defaults: Any, vectors_name: Optional[str] = ...
|
||||
) -> Vocab: ...
|
||||
|
||||
class Vocab:
|
||||
def __init__(
|
||||
self,
|
||||
lex_attr_getters: Optional[Dict[str, Callable[[str], Any]]] = ...,
|
||||
strings: Optional[Union[List[str], StringStore]] = ...,
|
||||
lookups: Optional[Lookups] = ...,
|
||||
oov_prob: float = ...,
|
||||
vectors_name: Optional[str] = ...,
|
||||
writing_system: Dict[str, Any] = ...,
|
||||
get_noun_chunks: Optional[Callable[[Union[Doc, Span]], Iterator[Span]]] = ...,
|
||||
) -> None: ...
|
||||
@property
|
||||
def lang(self) -> Language: ...
|
||||
def __len__(self) -> int: ...
|
||||
def add_flag(
|
||||
self, flag_getter: Callable[[str], bool], flag_id: int = ...
|
||||
) -> int: ...
|
||||
def __contains__(self, key: str) -> bool: ...
|
||||
def __iter__(self) -> Iterator[Lexeme]: ...
|
||||
def __getitem__(self, id_or_string: Union[str, int]) -> Lexeme: ...
|
||||
@property
|
||||
def vectors_length(self) -> int: ...
|
||||
def reset_vectors(
|
||||
self, *, width: Optional[int] = ..., shape: Optional[int] = ...
|
||||
) -> None: ...
|
||||
def prune_vectors(self, nr_row: int, batch_size: int = ...) -> Dict[str, float]: ...
|
||||
def get_vector(
|
||||
self,
|
||||
orth: Union[int, str],
|
||||
minn: Optional[int] = ...,
|
||||
maxn: Optional[int] = ...,
|
||||
) -> FloatsXd: ...
|
||||
def set_vector(self, orth: Union[int, str], vector: Floats1d) -> None: ...
|
||||
def has_vector(self, orth: Union[int, str]) -> bool: ...
|
||||
lookups: Lookups
|
||||
def to_disk(
|
||||
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> None: ...
|
||||
def from_disk(
|
||||
self, path: Union[str, Path], *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> Vocab: ...
|
||||
def to_bytes(self, *, exclude: Union[List[str], Tuple[str]] = ...) -> bytes: ...
|
||||
def from_bytes(
|
||||
self, bytes_data: bytes, *, exclude: Union[List[str], Tuple[str]] = ...
|
||||
) -> Vocab: ...
|
||||
|
||||
def pickle_vocab(vocab: Vocab) -> Any: ...
|
||||
def unpickle_vocab(
|
||||
sstore: StringStore,
|
||||
vectors: Any,
|
||||
morphology: Any,
|
||||
data_dir: Any,
|
||||
lex_attr_getters: Any,
|
||||
lookups: Any,
|
||||
get_noun_chunks: Any,
|
||||
) -> Vocab: ...
|
Loading…
Reference in New Issue