Commit Graph

276 Commits

Author SHA1 Message Date
John Stewart 2d15859d2a Fixed spaCy+Keras example (#2763)
* bug fixes in keras example

* created contributor agreement
2018-09-15 13:06:39 +02:00
Matthew Honnibal 4336397ecb Update develop from master 2018-08-14 03:04:28 +02:00
Matthew Honnibal f762d52b24 Add example for Issue #2627 2018-08-05 13:33:52 +02:00
ines 4339f64128 Merge branch 'master' into develop 2018-07-19 16:15:03 +02:00
ines d489ffb78b Fix formatting [ci skip] 2018-07-19 13:22:25 +02:00
himkt 57311d5d47 replace janome with mecab in the documentation and the test (#2415)
* Add links to Reddit data (see #2401)

* replace janome with mecab in the documentation and the test

* add the assignment
2018-06-11 00:33:13 +02:00
Ines Montani 3f2e3cbd27
Add links to Reddit data (see #2401) 2018-05-31 16:22:43 +02:00
Matthew Honnibal 546dd99cdf Merge master into develop -- mostly Arabic and website 2018-05-15 18:14:28 +02:00
Matt Upson 9a1d3b63fb Add missing default to .set_extension (#2297)
Failing to set a default, method, or getter results in a ValueError:

ValueError: [E083] Error setting extension: only one of `default`, `method`, or `getter` (plus optional `setter`) is allowed. Got: 0
2018-05-04 18:47:01 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani 49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00
Matthew Honnibal cca7e7ad11 Merge branch 'master' of https://github.com/explosion/spaCy 2018-03-29 20:27:06 +02:00
Matthew Honnibal 68ad366935 Improve train_new_entity_type example 2018-03-29 20:26:41 +02:00
ines 07b8c255a5 Updatee example with note to install requests 2018-03-28 12:46:27 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Justin DuJardin 4eeb178856 Add example using TensorBoard standalone projector
- the tensorboard standalone project expects a different set of files than the plugin to TensorFlow.
2018-03-25 21:50:13 -07:00
ines 4ec2809eb5 Port over TensorBoard example 2018-03-24 17:15:48 +01:00
Matthew Honnibal 00557c5fdd Add example of NER multitask objective 2018-01-21 19:46:37 +01:00
avinash b379c9d7d3 typos corrected 2018-01-03 16:54:22 +05:30
mpuels 1e8147aec7
fix: Add missing period in train data 2017-12-13 10:51:05 +01:00
mpuels ee4d6fdd40
Fix typo in comment 2017-12-09 13:14:57 +01:00
ines 726fb2d0b5 Use fewer iterations by default to avoid overfitting on blank model (resolves #1632) 2017-11-23 15:27:12 +01:00
ines ec08996000 Add note on tags matching tokenization (see #1613) 2017-11-20 15:12:47 +01:00
ines 1a38575de3 Make example Python 2 compatible (see #1617) 2017-11-20 13:57:51 +01:00
ines 7d5afadf5e Update vectors_loc description 2017-11-17 14:57:11 +01:00
ines c57e05bec1 Make sure nr_dim is an int
In some languages (e.g. Dutch), the nr_dim is extracted as a byte string, causing an error down the line.
2017-11-17 14:56:27 +01:00
yogendrasoni 334ed433b2
rstrip line before rsplit
loading english fast text giving error because line contains new line at the end and rsplit is splitting it incorrectly
2017-11-15 13:55:08 +05:30
Matthew Honnibal f0e28e8ae5
Make fasttext reader accommodate whitespace 2017-11-12 12:07:13 +01:00
ines f36fab39b0 Don't rename component in intent parser example (resolves #1551)
Otherwise, the default saved model won't know that it's supposed to create spaCy's 'parser'.
2017-11-10 23:35:38 +01:00
Ines Montani 1a23a0f87e
Remove broken link (resolves #1541) 2017-11-10 12:28:39 +01:00
ines 3597a29c24 Update fastText vectors example (see #1525)
Add option to specify language, and add note on "lang" being required to save out model
2017-11-09 14:54:39 +01:00
ines 33b84f4c39 Change clear_vectors to reset_vectors (resolves #1516) 2017-11-08 18:11:23 +01:00
ines 89bd40b821 Fix print statement in textcat training example (resolves #1515) 2017-11-08 17:17:40 +01:00
ines a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
ines 173b1551af Update examples 2017-11-07 01:22:30 +01:00
ines 1b1c9105b4 Update example compatibility statements 2017-11-07 01:11:45 +01:00
ines 8fb48b9b91 Update and document new util functions 2017-11-07 00:22:43 +01:00
Matthew Honnibal d7016d4050 Update intent parser example 2017-11-06 23:31:11 +01:00
ines fe498b3d5e Update training examples to use "simple style" 2017-11-06 23:14:04 +01:00
ines c646365e2f Port over changes and add note on compat (see #1445) 2017-11-06 13:58:34 +01:00
ines 2dca9e71a1 Add notes on catastrophic forgetting (see #1496) 2017-11-06 13:17:02 +01:00
Matthew Honnibal 717e8124fb Update Keras sentiment analysis example 2017-11-05 17:11:00 +01:00
Matthew Honnibal cfb83c231c Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-11-04 23:08:19 +01:00
Matthew Honnibal ba0201de07 Update multiprocessing example 2017-11-04 23:07:57 +01:00
ines 70a9504560 Add inbetween print statement 2017-11-04 23:06:55 +01:00
Matthew Honnibal e033162a1d Update tagger training example 2017-11-01 21:49:08 +01:00
ines 8f1d3fc3ee Update textcat example 2017-11-01 17:09:22 +01:00
Matthew Honnibal dad8f09fba Fix print statements in text classifier example 2017-11-01 16:34:31 +01:00
ines bfe17b7df1 Fix begin_training if get_gold_tuples is None 2017-11-01 13:14:31 +01:00
ines 0ca152a015 Fix syntax error 2017-11-01 00:43:28 +01:00
ines 4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines 33af6ac69a Use even smaller examle size
100 was still too much, so try 20 instead
2017-10-30 19:46:45 +01:00
ines f02b0af821 Fix path and use smaller example size
500 was too larger and caused laggy rendering
2017-10-30 19:44:35 +01:00
ines 18dde7869a Update training data docs and add vocab JSONL 2017-10-30 19:40:05 +01:00
ines b5643d8575 Update intent parser docs and add to usage docs 2017-10-27 04:49:05 +02:00
ines 9dfca0f2f8 Add example for custom intent parser 2017-10-27 03:55:11 +02:00
ines 4d272e25ee Fix examples 2017-10-27 03:55:04 +02:00
ines 44f83b35bc Update pipeline component examples to use plac 2017-10-27 02:58:14 +02:00
ines af28ca1ba0 Move example to pipeline directory 2017-10-27 02:00:01 +02:00
ines 1d69a46cd4 Update multi-processing example and add to docs 2017-10-27 01:58:55 +02:00
ines 4eabaafd66 Update docstring and example 2017-10-27 01:50:44 +02:00
ines ed69bd69f4 Update parallel tagging example 2017-10-27 01:48:52 +02:00
ines 096a80170d Remove old example files 2017-10-27 01:48:39 +02:00
ines a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines b7b285971f Update examples README 2017-10-26 18:47:11 +02:00
ines cc2917c9e8 Update fastText example and add to examples in docs 2017-10-26 18:47:02 +02:00
ines db843735d3 Remove outdated examples 2017-10-26 18:46:25 +02:00
ines daed7ff8fe Update information extraction examples 2017-10-26 18:46:11 +02:00
ines bca5372fb1 Clean up examples 2017-10-26 17:32:59 +02:00
ines f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines 421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines 4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines 586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines 9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines 615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines 2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines 9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines 6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk 84c6c20d1c Fix #1444: fix pipeline logic and wrong paramater in update call 2017-10-22 15:18:36 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines f4ae6763b9 Fix consistency of imports from spacy.tokens in examples 2017-10-11 02:30:40 +02:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
ines 6679117000 Add pipeline component examples 2017-10-10 04:26:06 +02:00
Matthew Honnibal e79fc41ff8 Merge pull request #1391 from explosion/feature/multilabel-textcat
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal 056b08c0df Delete obsolete nn_text_class example 2017-10-05 18:27:10 +02:00
Matthew Honnibal f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal 79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00