spaCy/spacy
Matthew Honnibal a338c6f8f6 Fix JSON segmentation bug that affected French
Fix a bug in the JSON streaming code that GoldCorpus uses. Escaped
slashes were being handled incorrectly. This bug caused low scores for
French in the early v2.1.0 alphas, because most of the data was not
being read in.

Fittingly, the document that triggered the bug was a Wikipedia article about
Perl. Parsing perl remains difficult!
2018-12-08 10:41:24 +01:00
..
cli Move dropout and batch sizes out of global scope in train cmd 2018-12-07 20:54:35 +01:00
data
displacy 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
lang 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
syntax Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-07 00:12:22 +00:00
tests Fix pickle tests 2018-12-06 20:46:36 +01:00
tokens Fix removabl of dill (for srsly) 2018-12-06 18:46:09 +01:00
__init__.pxd
__init__.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
__main__.py 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py Fix build error from bad import 2018-12-06 15:12:39 +01:00
about.py Set version back to 2.1.0a4 2018-12-03 02:03:26 +01:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
errors.py replace user-facing references to "sbd" with "sentencizer" (#2985) 2018-11-30 21:22:40 +01:00
glossary.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
gold.pxd
gold.pyx Fix JSON segmentation bug that affected French 2018-12-08 10:41:24 +01:00
language.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
lemmatizer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lexeme.pxd
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx Remove cytoolz usage from spaCy 2018-12-03 02:19:12 +01:00
scorer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
structs.pxd
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Remove cytoolz usage from spaCy 2018-12-03 02:19:12 +01:00
vectors.pyx 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx Fix dill usage in vocab 2018-12-06 18:53:16 +01:00