spaCy/examples
Matthew Honnibal 6c783f8045 Bug fixes and options for TextCategorizer (#3472)
* Fix code for bag-of-words feature extraction

The _ml.py module had a redundant copy of a function to extract unigram
bag-of-words features, except one had a bug that set values to 0.
Another function allowed extraction of bigram features. Replace all three
with a new function that supports arbitrary ngram sizes and also allows
control of which attribute is used (e.g. ORTH, LOWER, etc).

* Support 'bow' architecture for TextCategorizer

This allows efficient ngram bag-of-words models, which are better when
the classifier needs to run quickly, especially when the texts are long.
Pass architecture="bow" to use it. The extra arguments ngram_size and
attr are also available, e.g. ngram_size=2 means unigram and bigram
features will be extracted.

* Fix size limits in train_textcat example

* Explain architectures better in docs
2019-03-23 16:44:44 +01:00
..
information_extraction Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
keras_parikh_entailment Merge branch 'master' into develop 2018-12-18 13:48:10 +01:00
notebooks 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
pipeline Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
training Bug fixes and options for TextCategorizer (#3472) 2019-03-23 16:44:44 +01:00
README.md Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
deep_learning_keras.py Tidy up references to n_threads and fix default 2019-03-15 16:24:26 +01:00
vectors_fast_text.py Auto-format examples 2018-12-02 04:26:26 +01:00
vectors_tensorboard.py Auto-format examples 2018-12-02 04:26:26 +01:00

README.md

spaCy examples

The examples are Python scripts with well-behaved command line interfaces. For more detailed usage guides, see the documentation.

To see the available arguments, you can use the --help or -h flag:

$ python examples/training/train_ner.py --help

While we try to keep the examples up to date, they are not currently exercised by the test suite, as some of them require significant data downloads or take time to train. If you find that an example is no longer running, please tell us! We know there's nothing worse than trying to figure out what you're doing wrong, and it turns out your code was never the problem.