Commit Graph

276 Commits

Author SHA1 Message Date
ines 4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines 33af6ac69a Use even smaller examle size
100 was still too much, so try 20 instead
2017-10-30 19:46:45 +01:00
ines f02b0af821 Fix path and use smaller example size
500 was too larger and caused laggy rendering
2017-10-30 19:44:35 +01:00
ines 18dde7869a Update training data docs and add vocab JSONL 2017-10-30 19:40:05 +01:00
ines b5643d8575 Update intent parser docs and add to usage docs 2017-10-27 04:49:05 +02:00
ines 9dfca0f2f8 Add example for custom intent parser 2017-10-27 03:55:11 +02:00
ines 4d272e25ee Fix examples 2017-10-27 03:55:04 +02:00
ines 44f83b35bc Update pipeline component examples to use plac 2017-10-27 02:58:14 +02:00
ines af28ca1ba0 Move example to pipeline directory 2017-10-27 02:00:01 +02:00
ines 1d69a46cd4 Update multi-processing example and add to docs 2017-10-27 01:58:55 +02:00
ines 4eabaafd66 Update docstring and example 2017-10-27 01:50:44 +02:00
ines ed69bd69f4 Update parallel tagging example 2017-10-27 01:48:52 +02:00
ines 096a80170d Remove old example files 2017-10-27 01:48:39 +02:00
ines a7b9074b4c Update textcat training example and docs 2017-10-27 00:48:45 +02:00
ines b61866a2e4 Update textcat example 2017-10-27 00:32:19 +02:00
ines f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines b7b285971f Update examples README 2017-10-26 18:47:11 +02:00
ines cc2917c9e8 Update fastText example and add to examples in docs 2017-10-26 18:47:02 +02:00
ines db843735d3 Remove outdated examples 2017-10-26 18:46:25 +02:00
ines daed7ff8fe Update information extraction examples 2017-10-26 18:46:11 +02:00
ines bca5372fb1 Clean up examples 2017-10-26 17:32:59 +02:00
ines f57043e6fe Update docstring 2017-10-26 16:29:08 +02:00
ines b90e958975 Update tagger and parser examples and add to docs 2017-10-26 16:27:42 +02:00
ines f1529463a8 Update tagger training example 2017-10-26 16:19:02 +02:00
ines e44bbb5361 Remove old example 2017-10-26 16:12:41 +02:00
ines 421c3837e8 Fix formatting 2017-10-26 16:11:25 +02:00
ines 4d896171ae Use plac annotations for arguments 2017-10-26 16:11:20 +02:00
ines c3b681e5fb Use plac annotations for arguments and add n_iter 2017-10-26 16:11:05 +02:00
ines bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines b5c74dbb34 Update parser training example 2017-10-26 15:15:37 +02:00
ines 586b9047fd Use create_pipe instead of importing the entity recognizer 2017-10-26 15:15:26 +02:00
ines d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines 9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines e904075f35 Remove stray print statements 2017-10-26 14:24:00 +02:00
ines c30258c3a2 Remove old example 2017-10-26 14:23:52 +02:00
ines 615c315d70 Update train_new_entity_type example to use disable_pipes 2017-10-25 14:56:53 +02:00
ines 2b8e7c45e0 Use better training data JSON example 2017-10-24 16:00:56 +02:00
ines 9bf5751064 Pretty-print JSON 2017-10-24 12:22:17 +02:00
ines 6675755005 Add training data JSON example 2017-10-24 12:05:10 +02:00
Jeroen Bobbeldijk 84c6c20d1c Fix #1444: fix pipeline logic and wrong paramater in update call 2017-10-22 15:18:36 +02:00
Jeffrey Gerard 5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Jeffrey Gerard 39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines f4ae6763b9 Fix consistency of imports from spacy.tokens in examples 2017-10-11 02:30:40 +02:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
ines 6679117000 Add pipeline component examples 2017-10-10 04:26:06 +02:00
Matthew Honnibal e79fc41ff8 Merge pull request #1391 from explosion/feature/multilabel-textcat
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal 056b08c0df Delete obsolete nn_text_class example 2017-10-05 18:27:10 +02:00
Matthew Honnibal f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal 79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00
Matthew Honnibal cbb1fbef80 Update train_ner_standalone example 2017-10-03 18:49:38 +02:00
Matthew Honnibal 38286b6f07 Add example loadig Fast Text vectors 2017-10-01 23:40:02 +02:00
Matthew Honnibal f92ab03dc8 Rename phrase matcher example 2017-09-20 22:51:58 +02:00
Matthew Honnibal 01858e9b59 Fix PhraseMatcher example 2017-09-20 22:51:41 +02:00
Matthew Honnibal 027a5d8b75 Update train_ner_standalone example 2017-09-15 10:36:46 +02:00
Matthew Honnibal 683d81bb49 Update example for adding entity type 2017-09-14 16:15:59 +02:00
Matthew Honnibal c16ef0a85c Clarify train textcat example 2017-07-29 21:59:27 +02:00
Matthew Honnibal 54a539a113 Finish text classifier example 2017-07-23 00:34:12 +02:00
Matthew Honnibal 2bc7d87c70 Add example for training text classifier 2017-07-22 20:15:32 +02:00
ines 992559bf9a Fix formatting and remove unused imports 2017-06-01 12:47:18 +02:00
Matthew Honnibal 5c30466c95 Update NER training example 2017-05-31 13:42:12 +02:00
akYoung c158cdb1da Corretions for model test example
The sentences of test data in sentence entailment example should be generated with integers limited to vocab_size.
2017-05-03 22:41:23 +08:00
Matthew Honnibal 2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Matthew Honnibal 0605b95f2e Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-18 13:48:00 +02:00
Matthew Honnibal 2f84626417 Fix train_new_entity_type example 2017-04-18 13:47:36 +02:00
Ines Montani e7ae3b7cc2 Fix formatting and typo (closes #967) 2017-04-16 23:56:12 +02:00
Ines Montani 734b0a4e4a Update train_new_entity_type.py 2017-04-16 23:42:16 +02:00
ines 264af6cd17 Add documentation 2017-04-16 20:37:46 +02:00
ines c7adca58a9 Tidy up example and only save/test if output_directory is not None 2017-04-16 16:55:01 +02:00
Matthew Honnibal 40e3024241 Move standalone NER training script into examples directory 2017-04-15 16:13:42 +02:00
Matthew Honnibal b9c26aae11 Remove neptune refs from new train example 2017-04-15 16:13:17 +02:00
Matthew Honnibal c729d72fc6 Add new example for training new entity types 2017-04-15 16:11:06 +02:00
Matthew Honnibal a7626bd7fd Tmp commit to example 2017-04-15 15:43:14 +02:00
Matthew Honnibal 97b83c74dc WIP on training example 2017-04-14 23:54:27 +02:00
Kumaran Rajendhiran 3f55d6afae Update README 2017-04-05 16:59:52 +05:30
Kumaran Rajendhiran 47d7137c83 Set max_length to 100 for demo and evaluate 2017-04-05 16:48:35 +05:30
Kumaran Rajendhiran 10e8dcdfdb Remove not needed parameters from function 2017-04-05 16:20:47 +05:30
Matthew Honnibal 07726cf0a6 Add example of standalone NER training 2017-03-19 15:01:38 +01:00
Matthew Honnibal f028f8ad28 Remove unfinished examples 2017-02-18 11:04:41 +01:00
Matthew Honnibal c031c677cc Remove unused model_dir option
As noted in #845, the `model_dir` argument was not being used. I've removed it for now, although it would be good to have this option restored and working.
2017-02-18 10:38:22 +01:00
Matthew Honnibal 16ce7409e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-01-31 13:27:34 -06:00
Matthew Honnibal 80aa4e114b Fix x keras deep learning example 2017-01-31 13:27:13 -06:00
Matthew Honnibal ab70f6e18d Update NER training example 2017-01-27 12:27:10 +01:00
Ines Montani 853130bcf8 Update installation instructions (see #727) 2017-01-14 22:12:42 +01:00
Matthew Honnibal 5a319060b9 Merge branch 'master' of https://github.com/explosion/spaCy 2016-12-20 16:26:57 -06:00
Matthew Honnibal 7793e2ad82 Fix use of dropout in sentiment analysis LSTM example 2016-12-20 16:26:38 -06:00
Christos Savvopoulos c19b83f6ae use model_dir inside of load_model 2016-12-12 20:23:24 +00:00
Christos Savvopoulos 93cf4af701 actually commit load_ner.py 2016-12-12 20:13:33 +00:00
Christos Savvopoulos ad54a929f8 train_ner should save vocab; add load_ner example 2016-12-12 20:09:49 +00:00
Matthew Honnibal d0c999e0ad Add config.py for paddle example 2016-11-20 23:24:51 +01:00
Matthew Honnibal d75fe7c19a Update paddle example 2016-11-20 21:45:08 +01:00
Matthew Honnibal 1ef541ddff Add train.sh for paddle 2016-11-20 21:44:33 +01:00
Matthew Honnibal 001abe2b9d Update config.py 2016-11-20 03:45:51 +01:00
Matthew Honnibal 409a18bd42 Add paddle sentiment example 2016-11-20 03:35:23 +01:00
Matthew Honnibal e7eac08819 Work on paddle example 2016-11-20 03:29:36 +01:00
Matthew Honnibal 1ed40682a3 Set vectors in chainer example 2016-11-19 18:42:58 -06:00
Matthew Honnibal b701a08249 Fix embedding in chainer sentiment example 2016-11-19 19:05:37 +01:00
Matthew Honnibal 8a2de46fcb Fix GPU usage in chainer example 2016-11-19 10:58:00 -06:00
Matthew Honnibal 4c84aae571 Merge branch 'master' of https://github.com/explosion/spaCy 2016-11-19 02:41:17 -06:00
Matthew Honnibal 3195c52741 Add WIP Chainer sentiment analysis code. 2016-11-19 09:27:59 +01:00
Matthew Honnibal ff5ab75f5e Add partial embedding updates to Parikh model, fix dropout, other corrections. 2016-11-18 06:32:12 -06:00
Matthew Honnibal 718e66a7b9 Minibatch the forward pass. THe output argmax is incorrect... 2016-11-16 06:15:28 -06:00
Matthew Honnibal 8f053fd943 Add flag to toggle GPU to DyNet code 2016-11-16 05:51:00 -06:00
Matthew Honnibal 3a31c3a961 Merge branch 'master' of https://github.com/explosion/spaCy 2016-11-16 05:49:42 -06:00
Kyle P. Johnson d105771a07 Add setup directions for data dir
This script's data needs are not intuitive. I have added a note explaining that (a) it expects pos/neg polarity data, (b) the structure of the data dir (train/test), and (c) a standard resource for such polarity data.
2016-11-13 10:08:16 -08:00
Kyle P. Johnson c8d3694e2d Ch lex.repvec to lex.vector
For preventing the AttributeError: `File "spacy/lexeme.pyx", line 159, in spacy.lexeme.Lexeme.repvec.__get__ (spacy/lexeme.cpp:5016)
AttributeError: lex.repvec has been renamed to lex.vector`
2016-11-13 09:54:42 -08:00
Matthew Honnibal 389e8b700e Fix conflict 2016-11-13 08:52:20 -06:00
Matthew Honnibal 12a7b05360 Merge branch 'master' of https://github.com/explosion/spaCy 2016-11-13 08:49:07 -06:00
Matthew Honnibal ef76c28d70 Update dynet example to use minibatching 2016-11-13 08:48:43 -06:00
Matthew Honnibal fb8acc1dfb Merge pull request #628 from chenb67/master
Remove theano dependency from parikh model + small bug fix
2016-11-14 01:28:22 +11:00
Chen Buskilla 738f38e8d6 remove theano dependency, using keras backend functions 2016-11-13 15:06:01 +02:00
Chen Buskilla a592075720 fix parikh entailment test methods bug with settings 2016-11-13 14:53:55 +02:00
Matthew Honnibal ae681aa555 Work on DyNet example 2016-11-13 13:45:21 +01:00
Matthew Honnibal 89df91846c Fix entailment example, and add a flag for BiRNN encoding. 2016-11-12 11:43:37 -06:00
Paul Spiegelhalter edf77a9dae added import of build_model 2016-11-11 15:13:12 -08:00
Paul Spiegelhalter 0d7031a8f1 syntax error on two functions 2016-11-11 15:12:03 -08:00
Matthew Honnibal ca996fc01a Add BiRNN for entailment
Hastily add bidirectional RNN to entailment example
2016-11-12 01:15:01 +11:00
Matthew Honnibal 1ef62f39ef Update README.md 2016-11-01 13:30:10 +11:00
Matthew Honnibal 967412fb85 Minor edit 2016-11-01 13:22:36 +11:00
Ines Montani 589fc73910 Update README.md 2016-11-01 03:19:15 +01:00
Matthew Honnibal 18aab4f71e Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-01 03:05:49 +01:00
Matthew Honnibal 6cf989ad26 Make the README more concise 2016-11-01 13:05:17 +11:00
Matthew Honnibal 45ebab4677 Rename inventory count example 2016-11-01 02:30:22 +01:00
Ines Montani 274cc0f08f Update README.md 2016-11-01 02:13:54 +01:00
Matthew Honnibal 0b7af54219 Rename entailment example 2016-11-01 01:52:11 +01:00
Matthew Honnibal 1b9c6240a7 Rename entailment example 2016-11-01 01:51:54 +01:00
Matthew Honnibal 58f7be93ee Draft readme for NLI example 2016-11-01 01:46:55 +01:00
Ines Montani 6b30475725 Add README.md to examples 2016-11-01 01:14:04 +01:00
Matthew Honnibal de32b6e5b8 Add code for Keras NLI example 2016-10-31 23:54:28 +01:00
kendricktan ba8841234a Fixed training examples
Changes:
1. train_ner won't crash if no data directory is not found
2. Fixed train_tagger expected spacy.gold.GoldParse, got list
2016-10-24 16:09:23 +10:00
kendricktan 9877f3298f updated training examples to v1.1.2 2016-10-24 11:53:33 +10:00
Matthew Honnibal 105aaadc07 Make deep_learning_keras example use sentences 2016-10-23 23:17:41 +02:00
Matthew Honnibal 1ae3bde58f Fix deep learning example code 2016-10-20 21:32:26 +02:00
kendricktan f77b3dc677 Fixed train_parser examples when model_dir isn't None 2016-10-20 23:40:51 +10:00
kendricktan d817d57219 Fixed train_ner examples when model_dir isn't None 2016-10-20 21:09:07 +10:00
Matthew Honnibal 213027a1a1 Fix deep learning example 2016-10-20 04:39:54 +02:00
Matthew Honnibal 5378949326 Fix example 2016-10-20 03:42:34 +02:00
Matthew Honnibal d17546681c Fix deep learning tutorial 2016-10-20 03:21:56 +02:00
Matthew Honnibal 4c27958990 Fix bugs in deep_learning_keras example. 2016-10-20 02:49:14 +02:00
Matthew Honnibal ca89fd0919 Update Keras deep learning tutorial 2016-10-19 19:37:09 +02:00
Matthew Honnibal f60cefc048 Add first draft of spaCy+keras integration example. 2016-10-19 14:43:13 +02:00
Matthew Honnibal c36e8676aa Move old examples 2016-10-16 21:56:32 +02:00
Matthew Honnibal 3fba897e0f Update train_parser example 2016-10-16 21:41:14 +02:00
Matthew Honnibal f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00
Matthew Honnibal 4e9727b474 Use new words keyword argument in Doc. 2016-10-16 18:16:25 +02:00
Matthew Honnibal 2508117553 Make train_parser example a bit simpler. 2016-10-16 17:58:37 +02:00
Matthew Honnibal 4574fe87c6 Add example for training parser 2016-10-16 17:05:55 +02:00
Matthew Honnibal 01b42c531f Update train_tagger script 2016-10-16 16:10:23 +02:00
Matthew Honnibal e5151056cf Fix NER training example 2016-10-16 11:41:20 +02:00
Henning Peters 470cdf5bf9 remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00