Commit Graph

179 Commits

Author SHA1 Message Date
Ines Montani cdd418b93e Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
Matthew Honnibal b0b990e405 Fix token.conjuncts (closes #795) (#3392)
* Implement conjuncts method

* Add span.conjuncts property

* Un-xfail token.conjuncts tests

* Update docs for token.conjuncts and span.conjuncts

* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani 25cb764e64 Document new API [ci skip] 2019-03-11 15:23:53 +01:00
Ines Montani ebcf2bb1c3 Add Doc.lang and Doc.lang_ 2019-03-11 14:21:40 +01:00
Matthew Honnibal 98acf5ffe4 💫 Allow passing of config parameters to specific pipeline components (#3386)
* Add component_cfg kwarg to begin_training

* Document component_cfg arg to begin_training

* Update docs and auto-format

* Support component_cfg across Language

* Format

* Update docs and docstrings [ci skip]

* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani 7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani 0426689db8 💫 Improve Doc.to_json and add Doc.is_nered (#3381)
* Use default return instead of else

* Add Doc.is_nered to indicate if entities have been set

* Add properties in Doc.to_json if they were set, not if they're available

This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.
2019-03-10 15:24:34 +01:00
Ines Montani 76764fcf59 💫 Improve converters and training data file formats (#3374)
* Populate converter argument info automatically

* Add conversion option for msgpack

* Update docs

* Allow reading training data from JSONL
2019-03-08 23:15:23 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani fa7314b221 Clarify train_path and dev_path format (see #3366) [ci skip] 2019-03-07 12:23:27 +01:00
Ines Montani e9babd9973 Update hyperparameters section (see #3352) 2019-03-06 14:40:30 +01:00
Ines Montani 5eadf61327 Update pretraining docs on file format (closes #3354) 2019-03-04 16:30:13 +00:00
Ines Montani 1d4ba7678f Auto-format [ci skip] 2019-02-27 12:07:35 +01:00
Matthew Honnibal f1d77eb140
💫 Improve handling of missing NER tags (closes #2603) (#3341)
* Improve handling of missing NER tags

GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.

Fix bug that occurred when first tag was a None value. Closes #2603.

* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Matthew Honnibal 4a3371acd5
Make doc[0].is_sent_start == True (closes #2869) (#3340)
* Make doc[0] have sent_start True. Closes #2869

* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani d0b3af9222 Fix remaining inaccuracies in API docs (closes #2329) 2019-02-24 22:21:25 +01:00
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani df19e2bff6
💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.

```python
Token.set_extension('is_musician', default=False)

doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
    attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
    retokenizer.merge(doc[2:4], attrs=attrs)

assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani 1ea1bc98e7 Document regex utilities [ci skip] 2019-02-24 18:34:10 +01:00
Ines Montani 46ec5cdccc Update TextCategorizer docs 2019-02-24 13:11:57 +01:00
Ines Montani c03cb1cc63 Improve built-in component API docs 2019-02-24 13:11:49 +01:00
Ines Montani 250e88ef55 Fix docs example (see #2728) 2019-02-21 14:22:06 +01:00
Ines Montani 04b4df0ec9 Remove n_threads 2019-02-17 22:25:42 +01:00
Ines Montani e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
ines 808f7ee417 Update API documentation 2017-10-03 14:27:22 +02:00
ines d15775c3ad Fix typos and commands in alpha docs 2017-08-21 13:40:11 +02:00
ines 3c33003078 Port over typo corrections from #1245 2017-08-20 12:00:17 +02:00
ines 1261b01e46 Update Doc.char_span docs 2017-08-19 16:34:32 +02:00
ines 5cb0200e63 Document new Span.to_array() method 2017-08-19 12:45:28 +02:00
ines 471eed4126 Add example to Span.merge() 2017-08-19 12:45:16 +02:00
ines 404d3067b8 Document new Doc.char_span() method 2017-08-19 12:45:00 +02:00
ines d53cbf369f Document as_tuples kwarg on Language.pipe() 2017-08-19 12:44:50 +02:00
ines 6a37c93311 Update argument type 2017-08-19 12:44:33 +02:00
ines 4731d50220 Add break utility for long nowrap items (e.g. code) 2017-08-19 12:44:23 +02:00
ines 0aba11b64b Update package command docs 2017-08-14 16:45:44 +02:00
ines a29f132ffd Change python -m spacy to spacy
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines f085b88f9d Add TextCategorizer API docs stub 2017-07-22 17:56:33 +02:00
ines ab1a4e8b3c Add Tensorizer API docs stub 2017-07-22 17:56:25 +02:00
ines d2a7e5b8e5 Add GoldParse.cats attribute 2017-07-22 17:55:35 +02:00
ines 23d976ed00 Add Doc.cats attribute and missing v2 tag 2017-07-22 17:55:14 +02:00
Ines Montani 1ddbeddca2 Fix typo 2017-07-22 15:00:58 +02:00
Vetea 8e20cf6368 Update doc.jade
Just remove a duplicate 'doc ='
2017-06-08 10:35:58 +02:00
ines 9f55c0d4f6 Add Vectors class 2017-06-05 13:33:11 +02:00
ines e204788c30 Add docs for util.load_model_from_path 2017-06-05 13:18:22 +02:00
ines efc37ea3de Update train CLI 2017-06-04 23:45:14 +02:00
ines 3419ecbfdd Update docs on model shortcut links 2017-06-04 13:55:00 +02:00
ines b0225183c2 Update displaCy defaults 2017-06-03 13:27:06 +02:00
ines c60431357d Port over docs typo corrections 2017-06-03 11:31:30 +02:00
ines 1bebc6392c Add source files to pipeline components 2017-06-01 17:38:06 +02:00
ines 706cec6d58 Move annotation specs up 2017-06-01 13:02:43 +02:00