Commit Graph

369 Commits

Author SHA1 Message Date
Björn Böing 205c73a589 Update tokenizer and doc init example (#3939)
* Fix Doc.to_json hyperlink

* Update tokenizer and doc init examples

* Change "matchin rules" to "punctuation rules"

* Auto-format
2019-07-10 10:16:48 +02:00
Björn Böing 04982ccc40 Update pretrain to prevent unintended overwriting of weight fil… (#3902)
* Update pretrain to prevent unintended overwriting of weight files for #3859

* Add '--epoch-start' to pretrain docs

* Add mising pretrain arguments to bash example

* Update doc tag for v2.1.5
2019-07-09 21:48:30 +02:00
Joshua Smith 2eb925bd05 Added an argument to `EntityRuler` constructor to pass attrs to… (#3919)
* Perserve flags in EntityRuler

The EntityRuler (explosion/spaCy#3526) does not preserve
overwrite flags (or `ent_id_sep`) when serialized.  This
commit adds support for serialization/deserialization preserving
overwrite and ent_id_sep flags.

* add signed contributor agreement

* flake8 cleanup

mostly blank line issues.

* mark test from the issue as needing a model

The test from the issue needs some language model for serialization
but the test wasn't originally marked correctly.

* Adds `phrase_matcher_attr` to allow args to PhraseMatcher

This is an added arg to pass to the `PhraseMatcher`. For example,
this allows creation of a case insensitive phrase matcher when the
`EntityRuler` is created.  References explosion/spaCy#3822

* remove unneeded model loading

The model didn't need to be loaded, and I replaced it with
a change that doesn't require it (using existings fixtures)

* updated docstring for new argument

* updated docs to reflect new argument to the EntityRuler constructor

* change tempdir handling to be compatible with python 2.7

* return conflicted code to entityruler

Some stuff got cut out because of merge conflicts, this
returns that code for the phrase_matcher_attr.

* fixed typo in the code added back after conflicts

* flake8 compliance

When I deconflicted the branch there were some flake8 issues
introduced. This resolves the spacing problems.

* test changes:  attempts to fix flaky test in python3.5

These tests seem to be alittle flaky in 3.5 so I changed the check to avoid
the comparisons that seem to be fail sometimes.
2019-07-09 20:09:17 +02:00
Guillaume Claret d7a519a922 Typo (#3865)
* Typo

* Add contributor agreement
2019-06-20 10:31:19 +02:00
Björn Böing ebf5a04d6c Update pretrain docs and add unsupported loss_func error (#3860)
* Add error to `get_vectors_loss` for unsupported loss function of `pretrain`

* Add missing "--loss-func" argument to pretrain docs. Update pretrain plac annotations to match docs.

* Add missing quotation marks
2019-06-20 10:30:44 +02:00
Ines Montani 81c12640ab Auto-format [ci skip] 2019-06-16 14:33:20 +02:00
Greg Werner 9041a72d7f Update tokenizer.md for construction example (#3790)
* Update tokenizer.md for construction example

Self contained example.  You should really say what nlp is so that the example will work as is

* Update CONTRIBUTOR_AGREEMENT.md

* Restore contributor agreement

* Adjust construction examples
2019-06-16 14:32:56 +02:00
BreakBB d8573ee715 Update error raising for CLI pretrain to fix #3840 (#3843)
* Add check for empty input file to CLI pretrain

* Raise error if JSONL is not a dict or contains neither `tokens` nor `text` key

* Skip empty values for correct pretrain keys and log a counter as warning

* Add tests for CLI pretrain core function make_docs.

* Add a short hint for the `tokens` key to the CLI pretrain docs

* Add success message to CLI pretrain

* Update model loading to fix the tests

* Skip empty values and do not create docs out of it
2019-06-16 13:22:57 +02:00
Motoki Wu 9c064e6ad9 Add resume logic to spacy pretrain (#3652)
* Added ability to resume training

* Add to readmee

* Remove duplicate entry
2019-06-12 13:29:23 +02:00
Nipun Sadvilkar 1f13005751 Incorrect Token attribute ent_iob_ description (#3800)
* Incorrect Token attribute ent_iob_ description

* Add spaCy contributor agreement
2019-05-31 16:50:45 +02:00
Ramanan Balakrishnan 26c37c5a4d fix all references to BILUO annotation format (#3797) 2019-05-31 12:19:19 +02:00
Ines Montani 7634812172 Document Language.evaluate 2019-05-24 14:06:36 +02:00
Ines Montani 45e6855550 Update Language.update docs 2019-05-24 14:06:26 +02:00
Ines Montani b78a8dc1d2 Update Scorer and add API docs 2019-05-24 14:06:04 +02:00
Ines Montani 321c9f5acc Fix lex_id docs (closes #3743) 2019-05-16 23:15:58 +02:00
Ines Montani f96af8526a Merge branch 'spacy.io' [ci skip] 2019-05-11 23:03:56 +02:00
Ines Montani 7534f7cb44 Fix return value of Language.update (closes #3692) 2019-05-11 18:40:19 +02:00
devforfu 21af12eb53 Make "text" key in JSONL format optional when "tokens" key is provided (#3721)
* Fix issue with forcing text key when it is not required

* Extending the docs to reflect the new behavior
2019-05-11 15:41:29 +02:00
Ines Montani 6cfa1e1f47 Fix DependencyParser.predict docs (resolves #3561) 2019-05-11 15:37:54 +02:00
Ines Montani 25f5592d57 Improve Token.prob and Lexeme.prob docs (resolves #3701) 2019-05-11 15:23:41 +02:00
Ines Montani 65b55f1aaa Add version tag to `--base-model` argument (closes #3720) 2019-05-10 14:06:47 +02:00
Ines Montani 505c9e0e19 Add util.filter_spans helper (#3686) 2019-05-08 02:33:40 +02:00
Ines Montani ec0d840ab5 Document early stopping 2019-04-22 14:31:32 +02:00
Ines Montani 1d567913f9 Update spacy evaluate example 2019-04-22 14:28:42 +02:00
Ines Montani 7917ce2f73 Make flag shortcut consistent and document 2019-04-22 14:23:44 +02:00
Ines Montani 52658c80d5 Allow jupyter=False to override Jupyter mode (closes #3598) 2019-04-22 14:18:32 +02:00
Motoki Wu 8e2cef49f3 Add save after `--save-every` batches for `spacy pretrain` (#3510)
<!--- Provide a general summary of your changes in the title. -->

When using `spacy pretrain`, the model is saved only after every epoch. But each epoch can be very big since `pretrain` is used for language modeling tasks. So I added a `--save-every` option in the CLI to save after every `--save-every` batches.

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

To test...

Save this file to `sample_sents.jsonl`

```
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
{"text": "hello there."}
```

Then run `--save-every 2` when pretraining.

```bash
spacy pretrain sample_sents.jsonl en_core_web_md here -nw 1 -bs 1 -i 10 --save-every 2
```

And it should save the model to the `here/` folder after every 2 batches. The models that are saved during an epoch will have a `.temp` appended to the save name.

At the end the training, you should see these files (`ls here/`):

```bash
config.json     model2.bin      model5.bin      model8.bin
log.jsonl       model2.temp.bin model5.temp.bin model8.temp.bin
model0.bin      model3.bin      model6.bin      model9.bin
model0.temp.bin model3.temp.bin model6.temp.bin model9.temp.bin
model1.bin      model4.bin      model7.bin
model1.temp.bin model4.temp.bin model7.temp.bin
```

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

This is a new feature to `spacy pretrain`.

🌵 **Unfortunately, I haven't been able to test this because compiling from source is not working (cythonize error).** 

```
Processing matcher.pyx
[Errno 2] No such file or directory: '/Users/mwu/github/spaCy/spacy/matcher.pyx'
Traceback (most recent call last):
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 169, in <module>
    run(args.root)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 158, in run
    process(base, filename, db)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 124, in process
    preserve_cwd(base, process_pyx, root + ".pyx", root + ".cpp")
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 87, in preserve_cwd
    func(*args)
  File "/Users/mwu/github/spaCy/bin/cythonize.py", line 63, in process_pyx
    raise Exception("Cython failed")
Exception: Cython failed
Traceback (most recent call last):
  File "setup.py", line 276, in <module>
    setup_package()
  File "setup.py", line 209, in setup_package
    generate_cython(root, "spacy")
  File "setup.py", line 132, in generate_cython
    raise RuntimeError("Running cythonize failed")
RuntimeError: Running cythonize failed
```

Edit: Fixed! after deleting all `.cpp` files: `find spacy -name "*.cpp" | xargs rm`

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-04-22 14:10:16 +02:00
Ines Montani 5289dd1356 Fix formatting 2019-04-13 17:58:26 +02:00
Santiago Castro 86e4b68aa9 Fix website docs for Vectors.from_glove (#3565)
* Fix website docs for Vectors.from_glove

* Add myself as a contributor
2019-04-10 15:23:27 +02:00
Bharat Raghunathan 72820896d4 Fix typo in web docs cli.md (#3559) 2019-04-09 11:40:03 +02:00
pierremonico 0d26bfe677 Removes duplicate in table (#3550)
* Removes duplicate in table

Just fixing typos.

* Remove newline


Co-authored-by: Ines Montani <ines@ines.io>
2019-04-08 10:30:42 +02:00
Samuel Kane 06a1846379 fix(util): fix decaying function output (#3495)
* fix(util): fix decaying function output

* fix(util): better test and adhere to code standards

* fix(util): correct variable name, pytestify test, update website text
2019-03-28 13:24:47 +01:00
Bharat Raghunathan 1db3e47509 DOC: Update tokenizer docs to include default value for batch_size in pipe (#3492) 2019-03-28 12:48:02 +01:00
Ines Montani 1e5b917d75 Fix formatting [ci skip] 2019-03-23 16:45:50 +01:00
Matthew Honnibal 6c783f8045 Bug fixes and options for TextCategorizer (#3472)
* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
Ines Montani 06bf130890 💫 Add better and serializable sentencizer (#3471)
* Add better serializable sentencizer component

* Replace default factory

* Add tests

* Tidy up

* Pass test

* Update docs
2019-03-23 15:45:02 +01:00
Ines Montani dac8f8ff99 Update Span.__init__ docs (see #3445) [ci skip] 2019-03-20 17:24:17 +01:00
Matthew Honnibal 62afa64a8d Expose batch size and length caps on CLI for pretrain (#3417)
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.

Also improve CLI output.

Closes #3216 

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Ines Montani 2c5dd4d602 Update Vectors.find docs [ci skip] 2019-03-16 17:10:57 +01:00
Ines Montani cecc31b765 Don't auto-slugify accordion links [ci skip] 2019-03-12 15:30:49 +01:00
Ines Montani cdd418b93e Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
Matthew Honnibal b0b990e405 Fix token.conjuncts (closes #795) (#3392)
* Implement conjuncts method

* Add span.conjuncts property

* Un-xfail token.conjuncts tests

* Update docs for token.conjuncts and span.conjuncts

* Fix merge error in token.conjuncts
2019-03-11 17:05:45 +01:00
Ines Montani 25cb764e64 Document new API [ci skip] 2019-03-11 15:23:53 +01:00
Ines Montani ebcf2bb1c3 Add Doc.lang and Doc.lang_ 2019-03-11 14:21:40 +01:00
Matthew Honnibal 98acf5ffe4 💫 Allow passing of config parameters to specific pipeline components (#3386)
* Add component_cfg kwarg to begin_training

* Document component_cfg arg to begin_training

* Update docs and auto-format

* Support component_cfg across Language

* Format

* Update docs and docstrings [ci skip]

* Fix begin_training
2019-03-10 23:36:47 +01:00
Ines Montani 7ba3a5d95c 💫 Make serialization methods consistent (#3385)
* Make serialization methods consistent

exclude keyword argument instead of random named keyword arguments and deprecation handling

* Update docs and add section on serialization fields
2019-03-10 19:16:45 +01:00
Ines Montani 0426689db8 💫 Improve Doc.to_json and add Doc.is_nered (#3381)
* Use default return instead of else

* Add Doc.is_nered to indicate if entities have been set

* Add properties in Doc.to_json if they were set, not if they're available

This way, if a processed Doc exports "pos": None, it means that the tag was explicitly unset. If it exports "ents": [], it means that entity annotations are available but that this document doesn't contain any entities. Before, this would have been unclear and problematic for training.
2019-03-10 15:24:34 +01:00
Ines Montani 76764fcf59 💫 Improve converters and training data file formats (#3374)
* Populate converter argument info automatically

* Add conversion option for msgpack

* Update docs

* Allow reading training data from JSONL
2019-03-08 23:15:23 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani fa7314b221 Clarify train_path and dev_path format (see #3366) [ci skip] 2019-03-07 12:23:27 +01:00
Ines Montani e9babd9973 Update hyperparameters section (see #3352) 2019-03-06 14:40:30 +01:00
Ines Montani 5eadf61327 Update pretraining docs on file format (closes #3354) 2019-03-04 16:30:13 +00:00
Ines Montani 1d4ba7678f Auto-format [ci skip] 2019-02-27 12:07:35 +01:00
Matthew Honnibal f1d77eb140
💫 Improve handling of missing NER tags (closes #2603) (#3341)
* Improve handling of missing NER tags

GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.

Fix bug that occurred when first tag was a None value. Closes #2603.

* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Matthew Honnibal 4a3371acd5
Make doc[0].is_sent_start == True (closes #2869) (#3340)
* Make doc[0] have sent_start True. Closes #2869

* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Ines Montani d0b3af9222 Fix remaining inaccuracies in API docs (closes #2329) 2019-02-24 22:21:25 +01:00
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00
Ines Montani df19e2bff6
💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324)
<!--- Provide a general summary of your changes in the title. -->

## Description

This PR adds the abilility to override custom extension attributes during merging. This will only work for attributes that are writable, i.e. attributes registered with a default value like `default=False` or attribute that have both a getter *and* a setter implemented.

```python
Token.set_extension('is_musician', default=False)

doc = nlp("I like David Bowie.")
with doc.retokenize() as retokenizer:
    attrs = {"LEMMA": "David Bowie", "_": {"is_musician": True}}
    retokenizer.merge(doc[2:4], attrs=attrs)

assert doc[2].text == "David Bowie"
assert doc[2].lemma_ == "David Bowie"
assert doc[2]._.is_musician
```

### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-24 18:38:47 +01:00
Ines Montani 1ea1bc98e7 Document regex utilities [ci skip] 2019-02-24 18:34:10 +01:00
Ines Montani 46ec5cdccc Update TextCategorizer docs 2019-02-24 13:11:57 +01:00
Ines Montani c03cb1cc63 Improve built-in component API docs 2019-02-24 13:11:49 +01:00
Ines Montani 250e88ef55 Fix docs example (see #2728) 2019-02-21 14:22:06 +01:00
Ines Montani 04b4df0ec9 Remove n_threads 2019-02-17 22:25:42 +01:00
Ines Montani e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
ines 808f7ee417 Update API documentation 2017-10-03 14:27:22 +02:00
ines d15775c3ad Fix typos and commands in alpha docs 2017-08-21 13:40:11 +02:00
ines 3c33003078 Port over typo corrections from #1245 2017-08-20 12:00:17 +02:00
ines 1261b01e46 Update Doc.char_span docs 2017-08-19 16:34:32 +02:00
ines 5cb0200e63 Document new Span.to_array() method 2017-08-19 12:45:28 +02:00
ines 471eed4126 Add example to Span.merge() 2017-08-19 12:45:16 +02:00
ines 404d3067b8 Document new Doc.char_span() method 2017-08-19 12:45:00 +02:00
ines d53cbf369f Document as_tuples kwarg on Language.pipe() 2017-08-19 12:44:50 +02:00
ines 6a37c93311 Update argument type 2017-08-19 12:44:33 +02:00
ines 4731d50220 Add break utility for long nowrap items (e.g. code) 2017-08-19 12:44:23 +02:00
ines 0aba11b64b Update package command docs 2017-08-14 16:45:44 +02:00
ines a29f132ffd Change python -m spacy to spacy
Reflects latest change to entry point or auto-alias
2017-08-14 13:04:48 +02:00
ines f085b88f9d Add TextCategorizer API docs stub 2017-07-22 17:56:33 +02:00
ines ab1a4e8b3c Add Tensorizer API docs stub 2017-07-22 17:56:25 +02:00
ines d2a7e5b8e5 Add GoldParse.cats attribute 2017-07-22 17:55:35 +02:00
ines 23d976ed00 Add Doc.cats attribute and missing v2 tag 2017-07-22 17:55:14 +02:00
Ines Montani 1ddbeddca2 Fix typo 2017-07-22 15:00:58 +02:00
Vetea 8e20cf6368 Update doc.jade
Just remove a duplicate 'doc ='
2017-06-08 10:35:58 +02:00
ines 9f55c0d4f6 Add Vectors class 2017-06-05 13:33:11 +02:00
ines e204788c30 Add docs for util.load_model_from_path 2017-06-05 13:18:22 +02:00
ines efc37ea3de Update train CLI 2017-06-04 23:45:14 +02:00
ines 3419ecbfdd Update docs on model shortcut links 2017-06-04 13:55:00 +02:00
ines b0225183c2 Update displaCy defaults 2017-06-03 13:27:06 +02:00
ines c60431357d Port over docs typo corrections 2017-06-03 11:31:30 +02:00
ines 1bebc6392c Add source files to pipeline components 2017-06-01 17:38:06 +02:00
ines 706cec6d58 Move annotation specs up 2017-06-01 13:02:43 +02:00
ines 77dca25c7f Update Language API docs 2017-06-01 11:51:31 +02:00
ines f86289566a Update new in v2 section and add note on Matcher acceptors 2017-05-30 13:53:06 +02:00
ines b5bfab8699 Add description 2017-05-29 15:27:16 +02:00
ines 567485a818 Fix and document model loading with pipeline and overrides 2017-05-29 14:10:10 +02:00
ines 00b2094dc3 Fix typos, long integers and tests 2017-05-29 01:09:52 +02:00
ines 606879b217 Update hash strings examples 2017-05-28 19:42:44 +02:00
ines c7b57ea314 Update docs and change integer IDs to hash values 2017-05-28 19:25:34 +02:00
ines 0ea31d1e31 Add under construction note to pipeline components 2017-05-28 18:44:07 +02:00
ines 414193e9ba Update docs to reflect StringStore changes 2017-05-28 18:19:11 +02:00
ines 69bda9aed7 Update text, examples, typos, wording and formatting 2017-05-28 16:41:01 +02:00
ines eb5a8be9ad Update language overview and add section on 'xx' lang class 2017-05-28 01:15:44 +02:00
ines eb703f7656 Update API docs 2017-05-28 00:32:43 +02:00
ines c1983621fb Update util functions for model loading 2017-05-28 00:22:40 +02:00
ines 70afcfec3e Update defaults and example 2017-05-26 14:04:31 +02:00
ines 1b982f0838 Update train command and add docs on hyperparameters 2017-05-26 14:02:38 +02:00
ines 1b9c6ded71 Update API docs and add "source" button to GH source 2017-05-26 13:40:32 +02:00
ines d48530835a Update API docs and fix typos 2017-05-26 12:43:16 +02:00
ines ea9474f71c Add version tag mixin to label new features 2017-05-26 12:42:36 +02:00
ines 353f0ef8d7 Use disable argument (list) for serialization 2017-05-26 12:33:54 +02:00
ines 0f48fb1f97 Rename processing text to production use and remove linear feature scheme 2017-05-25 00:10:33 +02:00
ines 8b86b08bed Update usage workflows 2017-05-24 11:59:08 +02:00
ines 66088851dc Add Doc.to_disk() and Doc.from_disk() methods 2017-05-24 11:58:17 +02:00
ines 10afb3c796 Tidy up and merge usage pages 2017-05-24 00:37:47 +02:00
ines 697d3d7cb3 Fix links to CLI docs 2017-05-24 00:36:38 +02:00
ines a38393e2f6 Update annotation docs 2017-05-23 23:16:17 +02:00
ines 786af87ffb Update IOB docs 2017-05-23 23:15:50 +02:00
ines c8bde2161c Add kwargs to spacy.load 2017-05-23 23:14:02 +02:00
ines 0a8a2d2f6d Remove tip infoboxes from annotation docs 2017-05-23 23:13:51 +02:00
ines e6acd3bbf2 Fix matcher tests and matcher docs 2017-05-23 11:36:02 +02:00
ines f497cf60b2 Update formatting 2017-05-23 11:32:25 +02:00
ines a23f487b06 Tidy up displaCy and add "manual" option
Also don't require title in EntityRenderer
2017-05-22 18:48:20 +02:00
ines dddad5bf26 Update util.prints docs 2017-05-22 13:54:52 +02:00
ines d5a6a9a6a9 Use string values for attrs in Matcher docs 2017-05-22 13:54:45 +02:00
ines 54f04a9fe0 Update API docs with changes in spacy.gold and spacy.language 2017-05-22 12:29:30 +02:00
ines fc3ec733ea Reduce complexity in CLI
Remove now redundant model command and move plac annotations to cli
files
2017-05-22 12:28:58 +02:00
ines 2c5cfe8bbf Update docstrings and API docs for StringStore 2017-05-21 14:18:58 +02:00
ines 251346b59f Fix typos and formatting 2017-05-21 14:18:46 +02:00
ines 075f5ff87a Update docstrings and API docs for GoldParse 2017-05-21 13:53:46 +02:00
ines 465a1dd710 Add BILUO scheme to annotation docs 2017-05-21 13:53:34 +02:00
ines c9f04f3cd0 Add note on automated processes to download command 2017-05-21 13:23:39 +02:00
ines 8ab59515b2 Fix typo and use consistent description for from_bytes 2017-05-21 13:18:39 +02:00
ines c5a653fa48 Update docstrings and API docs for Tokenizer 2017-05-21 13:18:14 +02:00
ines d82ae9a585 Change "function" to "callable" in docs 2017-05-21 13:17:40 +02:00
ines ee3fdffffb Move attributes and remove deprecated methods 2017-05-21 01:18:31 +02:00
ines 1cb2c86f9a Update CLI docs 2017-05-21 01:13:05 +02:00
ines 272a8981c3 Add model tag to spacy.load API docs 2017-05-21 01:12:43 +02:00
ines 3871157d84 Update spacy.util documentation 2017-05-21 01:12:09 +02:00
ines da12aee0c1 Update spacy.load with note on get_lang_class 2017-05-21 00:19:26 +02:00
ines 27de0834b2 Update docstrings and API docs for Lexeme 2017-05-20 15:13:42 +02:00
ines 7ed8a92ed1 Update docstrings and API docs for Token 2017-05-20 15:13:33 +02:00
ines 4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines 39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines 463e3cc80f Remove resize_vectors and vectors_length 2017-05-20 14:02:14 +02:00
ines f0cc642bb9 Update docstrings and API docs for Vocab 2017-05-20 14:00:41 +02:00
Matthew Honnibal a93276bb78 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-05-20 13:55:12 +02:00
Matthew Honnibal ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
ines 8b14476253 Fix typo 2017-05-20 13:00:13 +02:00
ines 6557ff9e85 Update example 2017-05-20 13:00:07 +02:00
ines fea4925f41 Reorganise API docs navigation 2017-05-20 12:59:57 +02:00
ines b2678372c7 Add API docs for top-level spaCy functions
i.e. spacy.load(), spacy.info(), spacy.explain()
2017-05-20 12:59:44 +02:00
ines 797f10ab16 Update formatting 2017-05-20 12:59:16 +02:00
ines e10c48210d Update Matcher API and workflow to reflect new API
on_match is now the second positional argument, to easily allow a
variable number of patterns while keeping the method clean and readable.
2017-05-20 12:59:03 +02:00
ines eb521af267 Fix formatting 2017-05-20 12:58:15 +02:00
ines 7973912114 Update CLI docs 2017-05-20 12:58:05 +02:00
ines 5163a4513e Update API docs 2017-05-20 01:43:48 +02:00
ines e3256e7406 Update Matcher API docs 2017-05-20 01:38:34 +02:00
ines 0cabf9e13f Fix model tag 2017-05-20 01:38:14 +02:00
ines fe5d8819ea Update Matcher docstrings and API docs 2017-05-19 21:47:06 +02:00
ines c8580da686 Update "requires model" tags 2017-05-19 20:24:46 +02:00
ines c3e903e4c2 Update examples and API docs 2017-05-19 19:59:02 +02:00
ines e9e62b01b0 Update docstrings and API docs for Token 2017-05-19 18:47:56 +02:00
ines 62ceec4fc6 Update docstrings and API docs for Span 2017-05-19 18:47:46 +02:00
ines 23f9a3ccc8 Update docstrings and API docs for Doc 2017-05-19 18:47:39 +02:00
ines 2c8c9dc0c9 Update docstrings and API docs for Language 2017-05-19 18:47:24 +02:00
ines 0791f0aae6 Update docstrings and API docs for Span class 2017-05-19 00:31:31 +02:00
ines 5b68579eb8 Use returns/yields instead of return/yield 2017-05-19 00:02:34 +02:00
ines b687ad109d Update docstrings and API docs for Doc class 2017-05-18 23:59:44 +02:00
ines d42bc16868 Update docstrings and API docs for Language class 2017-05-18 23:57:38 +02:00
ines b87066ff10 Update docstrings and API docs for Doc class 2017-05-18 22:17:41 +02:00
ines 476b8209fe Update docs with new Jupyter auto-detection 2017-05-18 14:58:17 +02:00
ines 02a4841e7b Move CLI docs to API reference 2017-05-17 12:04:03 +02:00
ines d7244ae72d Add docs on collapse_punct option 2017-05-15 13:51:33 +02:00
ines c33bdeb564 Use uppercase for entity types 2017-05-15 01:24:57 +02:00
ines cf7e5ed534 Use American spelling for "visualizers"
Kinda sucks because we normally use British spelling, but it just looks
weird and confusing otherwise... same with tokenizer and all other
library internals. So this is sort of the "official policy" for now.
2017-05-14 23:29:36 +02:00
ines fe5a5086e1 Fix typo 2017-05-14 23:27:56 +02:00
ines 1ae07da18f Add API docs for spacy.displacy (see #1058) 2017-05-14 19:31:23 +02:00
ines b462076d80 Merge load_lang_class and get_lang_class 2017-05-14 01:31:10 +02:00
ines 1465c6c221 Add API docs for util functions 2017-05-13 21:23:12 +02:00
ines 19879cb693 Update alpha support docs 2017-05-12 15:57:49 +02:00
ines 63d79947c8 Update title in navigation 2017-05-12 15:40:43 +02:00
ines 531ee1373b Rename "Language models" to "Languages" in API 2017-05-12 15:38:56 +02:00
ines fac3566aac Add descriptions to POS tagging scheme 2017-05-03 20:11:02 +02:00
ines 1570b83ee5 Add spacy.explain() note to NER annotation scheme 2017-05-03 20:11:02 +02:00
ines 219369bb7d Add detailed docs for dependency label annotations 2017-05-03 20:11:02 +02:00
ines f9384b0fbd Update alpha languages and add aside for tokenizer dependencies 2017-05-03 09:58:31 +02:00
Yasuaki Uechi 0e7a9b9fac Add Japanese to 'Alpha support’ section 2017-05-03 13:56:45 +09:00
ines 034ec5710b Fix typo and add Norwegian to alpha languages 2017-04-27 11:24:21 +02:00
ines 375edf0bb5 Add list of models and include French 2017-04-26 20:50:27 +02:00
ines ddd5194088 Update Language docs and docstrings 2017-04-17 01:52:13 +02:00
ines aad80a291f Add save_to_directory method to API docs 2017-04-17 01:40:34 +02:00
ines 13df2d6a60 Add documentation for spaCy's JSON format 2017-03-26 15:56:15 +02:00
ines a5fc5fb0db Add Hebrew to list of alpha languages 2017-03-25 10:22:46 +01:00
ines 9600cd1b9e Fix download commands 2017-03-25 10:22:05 +01:00
ines d25f17f139 Add Bengali to list of languages (see #865) 2017-03-01 15:59:21 +01:00
ines 2b07ab7db4 Add feature scheme to API docs (see #857, #739) 2017-02-24 18:26:32 +01:00
Ines Montani 49a102aff3 Merge pull request #841 from jondoughty/patch-1
Updated Token class documentation
2017-02-16 23:47:51 +01:00
Jon Doughty 12a8757343 Update token.jade 2017-02-16 10:55:33 -08:00
nycmonkey 8946a2a496 Fix typo in IOB integer to letter map
ent_iob value for an ent.iob_ value of 'B' should be 3, not B
2017-02-16 13:49:57 -05:00
ines a44da8fb34 Update language models and alpha support overview 2017-02-04 13:49:05 +01:00