spaCy/website/api/_annotation/_biluo.jade

//- 💫 DOCS > API > ANNOTATION > BILUO

+table([ "Tag", "Description" ])
    +row
        +cell #[code #[span.u-color-theme B] EGIN]
        +cell The first token of a multi-token entity.

    +row
        +cell #[code #[span.u-color-theme I] N]
        +cell An inner token of a multi-token entity.

    +row
        +cell #[code #[span.u-color-theme L] AST]
        +cell The final token of a multi-token entity.

    +row
        +cell #[code #[span.u-color-theme U] NIT]
        +cell A single-token entity.

    +row
        +cell #[code #[span.u-color-theme O] UT]
        +cell A non-entity token.

+aside("Why BILUO, not IOB?")
    |  There are several coding schemes for encoding entity annotations as
    |  token tags.  These coding schemes are equally expressive, but not
    |  necessarily equally learnable.
    |  #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]
    |  showed that the minimal #[strong Begin], #[strong In], #[strong Out]
    |  scheme was more difficult to learn than the #[strong BILUO] scheme that
    |  we use, which explicitly marks boundary tokens.

p
    |  spaCy translates the character offsets into this scheme, in order to
    |  decide the cost of each action given the current state of the entity
    |  recogniser. The costs are then used to calculate the gradient of the
    |  loss, to train the model. The exact algorithm is a pastiche of
    |  well-known methods, and is not currently described in any single
    |  publication. The model is a greedy transition-based parser guided by a
    |  linear model whose weights are learned using the averaged perceptron
    |  loss, via the #[+a("http://www.aclweb.org/anthology/C12-1059") dynamic oracle]
    |  imitation learning strategy. The transition system is equivalent to the
    |  BILOU tagging scheme.
Update API documentation 2017-10-03 12:27:22 +00:00			`//- 💫 DOCS > API > ANNOTATION > BILUO`

			`+table([ "Tag", "Description" ])`
			`+row`
			`+cell #[code #[span.u-color-theme B] EGIN]`
			`+cell The first token of a multi-token entity.`

			`+row`
			`+cell #[code #[span.u-color-theme I] N]`
			`+cell An inner token of a multi-token entity.`

			`+row`
			`+cell #[code #[span.u-color-theme L] AST]`
			`+cell The final token of a multi-token entity.`

			`+row`
			`+cell #[code #[span.u-color-theme U] NIT]`
			`+cell A single-token entity.`

			`+row`
			`+cell #[code #[span.u-color-theme O] UT]`
			`+cell A non-entity token.`

			`+aside("Why BILUO, not IOB?")`
			`\| There are several coding schemes for encoding entity annotations as`
			`\| token tags. These coding schemes are equally expressive, but not`
			`\| necessarily equally learnable.`
			`\| #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]`
			`\| showed that the minimal #[strong Begin], #[strong In], #[strong Out]`
			`\| scheme was more difficult to learn than the #[strong BILUO] scheme that`
			`\| we use, which explicitly marks boundary tokens.`

			`p`
			`\| spaCy translates the character offsets into this scheme, in order to`
			`\| decide the cost of each action given the current state of the entity`
			`\| recogniser. The costs are then used to calculate the gradient of the`
			`\| loss, to train the model. The exact algorithm is a pastiche of`
			`\| well-known methods, and is not currently described in any single`
			`\| publication. The model is a greedy transition-based parser guided by a`
			`\| linear model whose weights are learned using the averaged perceptron`
			`\| loss, via the #[+a("http://www.aclweb.org/anthology/C12-1059") dynamic oracle]`
			`\| imitation learning strategy. The transition system is equivalent to the`
			`\| BILOU tagging scheme.`