spaCy/extra/example_data/ner_example_data/ner-token-per-line-with-pos...

67 lines
746 B
Plaintext
Raw Normal View History

Updates/bugfixes for NER/IOB converters (#4186) * Updates/bugfixes for NER/IOB converters * Converter formats `ner` and `iob` use autodetect to choose a converter if possible * `iob2json` is reverted to handle sentence-per-line data like `word1|pos1|ent1 word2|pos2|ent2` * Fix bug in `merge_sentences()` so the second sentence in each batch isn't skipped * `conll_ner2json` is made more general so it can handle more formats with whitespace-separated columns * Supports all formats where the first column is the token and the final column is the IOB tag; if present, the second column is the POS tag * As in CoNLL 2003 NER, blank lines separate sentences, `-DOCSTART- -X- O O` separates documents * Add option for segmenting sentences (new flag `-s`) * Parser-based sentence segmentation with a provided model, otherwise with sentencizer (new option `-b` to specify model) * Can group sentences into documents with `n_sents` as long as sentence segmentation is available * Only applies automatic segmentation when there are no existing delimiters in the data * Provide info about settings applied during conversion with warnings and suggestions if settings conflict or might not be not optimal. * Add tests for common formats * Add '(default)' back to docs for -c auto * Add document count back to output * Revert changes to converter output message * Use explicit tabs in convert CLI test data * Adjust/add messages for n_sents=1 default * Add sample NER data to training examples * Update README * Add links in docs to example NER data * Define msg within converters
2019-08-29 10:04:01 +00:00
When WRB O
Sebastian NNP B-PERSON
Thrun NNP I-PERSON
started VBD O
working VBG O
on IN O
self NN O
- HYPH O
driving VBG O
cars NNS O
at IN O
Google NNP B-ORG
in IN O
2007 CD B-DATE
, , O
few JJ O
people NNS O
outside RB O
of IN O
the DT O
company NN O
took VBD O
him PRP O
seriously RB O
. . O
“ '' O
I PRP O
can MD O
tell VB O
you PRP O
very RB O
senior JJ O
CEOs NNS O
of IN O
major JJ O
American JJ B-NORP
car NN O
companies NNS O
would MD O
shake VB O
my PRP$ O
hand NN O
and CC O
turn VB O
away RB O
because IN O
I PRP O
was VBD O
nt RB O
worth JJ O
talking VBG O
to IN O
, , O
” '' O
said VBD O
Thrun NNP B-PERSON
, , O
in IN O
an DT O
interview NN O
with IN O
Recode NNP B-ORG
earlier RBR B-DATE
this DT I-DATE
week NN I-DATE
. . O