spaCy/website/usage/_processing-pipelines/_custom-components.jade

//- 💫 DOCS > USAGE > PROCESSING PIPELINES > CUSTOM COMPONENTS

p
    |  A component receives a #[code Doc] object and can modify it – for example,
    |  by using the current weights to make a prediction and set some annotation
    |  on the document. By adding a component to the pipeline, you'll get access
    |  to the #[code Doc] at any point #[strong during processing] – instead of
    |  only being able to modify it afterwards.

+aside-code("Example").
    def my_component(doc):
        # do something to the doc here
        return doc

+table(["Argument", "Type", "Description"])
    +row
        +cell #[code doc]
        +cell #[code Doc]
        +cell The #[code Doc] object processed by the previous component.

    +row("foot")
        +cell returns
        +cell #[code Doc]
        +cell The #[code Doc] object processed by this pipeline component.

p
    |  Custom components can be added to the pipeline using the
    |  #[+api("language#add_pipe") #[code add_pipe]] method. Optionally, you
    |  can either specify a component to add it #[strong before or after], tell
    |  spaCy to add it #[strong first or last] in the pipeline, or define a
    |  #[strong custom name]. If no name is set and no #[code name] attribute
    |  is present on your component, the function name is used.

+code-exec.
    import spacy

    def my_component(doc):
        print("After tokenization, this doc has %s tokens." % len(doc))
        if len(doc) &lt; 10:
            print("This is a pretty short document.")
        return doc

    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe(my_component, name='print_info', first=True)
    print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
    doc = nlp(u"This is a sentence.")

p
    |  Of course, you can also wrap your component as a class to allow
    |  initialising it with custom settings and hold state within the component.
    |  This is useful for #[strong stateful components], especially ones which
    |  #[strong depend on shared data]. In the following example, the custom
    |  component #[code EntityMatcher] can be initialised with #[code nlp] object,
    |  a terminology list and an entity label. Using the
    |  #[+api("phrasematcher") #[code PhraseMatcher]], it then matches the terms
    |  in the #[code Doc] and adds them to the existing entities.

+aside("Rule-based entities vs. model", "💡")
    |  For complex tasks, it's usually better to train a statistical entity
    |  recognition model. However, statistical models require training data, so
    |  for many situations, rule-based approaches are more practical. This is
    |  especially true at the start of a project: you can use a rule-based
    |  approach as part of a data collection process, to help you "bootstrap" a
    |  statistical model.

+code-exec.
    import spacy
    from spacy.matcher import PhraseMatcher
    from spacy.tokens import Span

    class EntityMatcher(object):
        name = 'entity_matcher'

        def __init__(self, nlp, terms, label):
            patterns = [nlp(text) for text in terms]
            self.matcher = PhraseMatcher(nlp.vocab)
            self.matcher.add(label, None, *patterns)

        def __call__(self, doc):
            matches = self.matcher(doc)
            for match_id, start, end in matches:
                span = Span(doc, start, end, label=match_id)
                doc.ents = list(doc.ents) + [span]
            return doc

    nlp = spacy.load('en_core_web_sm')
    terms = (u'cat', u'dog', u'tree kangaroo', u'giant sea spider')
    entity_matcher = EntityMatcher(nlp, terms, 'ANIMAL')

    nlp.add_pipe(entity_matcher, after='ner')
    print(nlp.pipe_names)  # the components in the pipeline

    doc = nlp(u"This is a text about Barack Obama and a tree kangaroo")
    print([(ent.text, ent.label_) for ent in doc.ents])

+h(3, "custom-components-factories") Adding factories

p
    |  When spaCy loads a model via its #[code meta.json], it will iterate over
    |  the #[code "pipeline"] setting, look up every component name in the
    |  internal factories and call
    |  #[+api("language#create_pipe") #[code nlp.create_pipe]] to initialise the
    |  individual components, like the tagger, parser or entity recogniser. If
    |  your model uses custom components, this won't work – so you'll have to
    |  tell spaCy #[strong where to find your component]. You can do this by
    |  writing to the #[code Language.factories]:

+code.
    from spacy.language import Language
    Language.factories['entity_matcher'] = lambda nlp, **cfg: EntityMatcher(nlp, **cfg)

p
    |  You can also ship the above code and your custom component in your
    |  packaged model's #[code __init__.py], so it's executed when you load your
    |  model. The #[code **cfg] config parameters are passed all the way down
    |  from #[+api("spacy#load") #[code spacy.load]], so you can load the model
    |  and its components with custom settings:

+code.
    nlp = spacy.load('your_custom_model', terms=(u'tree kangaroo'), label='ANIMAL')

+infobox("Important note", "⚠️")
    |  When you load a model via its shortcut or package name, like
    |  #[code en_core_web_sm], spaCy will import the package and then call its
    |  #[code load()] method. This means that custom code in the model's
    |  #[code __init__.py] will be executed, too. This is #[strong not the case]
    |  if you're loading a model from a path containing the model data. Here,
    |  spaCy will only read in the #[code meta.json]. If you want to use custom
    |  factories with a model loaded from a path, you need to add them to
    |  #[code Language.factories] #[em before] you load the model.
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
+								//- 💫 DOCS > USAGE > PROCESSING PIPELINES > CUSTOM COMPONENTS
 								p
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
+								    |  A component receives a #[code Doc] object and can modify it – for example,
 								    |  by using the current weights to make a prediction and set some annotation
 								    |  on the document. By adding a component to the pipeline, you'll get access
 								    |  to the #[code Doc] at any point #[strong during processing] – instead of
 								    |  only being able to modify it afterwards.
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
 								+aside-code("Example").
 								    def my_component(doc):
 								        # do something to the doc here
 								        return doc
 								+table(["Argument", "Type", "Description"])
 								    +row
 								        +cell #[code doc]
 								        +cell #[code Doc]
 								        +cell The #[code Doc] object processed by the previous component.
 								    +row("foot")
 								        +cell returns
 								        +cell #[code Doc]
 								        +cell The #[code Doc] object processed by this pipeline component.
 								p
 								    |  Custom components can be added to the pipeline using the
 								    |  #[+api("language#add_pipe") #[code add_pipe]] method. Optionally, you
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
+								    |  can either specify a component to add it #[strong before or after], tell
 								    |  spaCy to add it #[strong first or last] in the pipeline, or define a
 								    |  #[strong custom name]. If no name is set and no #[code name] attribute
 								    |  is present on your component, the function name is used.
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								+code-exec.
 								    import spacy
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
+								    def my_component(doc):
 								        print("After tokenization, this doc has %s tokens." % len(doc))
 								        if len(doc) &lt; 10:
 								            print("This is a pretty short document.")
 								        return doc
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    nlp = spacy.load('en_core_web_sm')
-												Fix typo in code example (resolves #1556)

											
										
										
											2017-11-13 07:29:16 +00:00
+								    nlp.add_pipe(my_component, name='print_info', first=True)
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
+								    print(nlp.pipe_names) # ['print_info', 'tagger', 'parser', 'ner']
 								    doc = nlp(u"This is a sentence.")
 								p
 								    |  Of course, you can also wrap your component as a class to allow
 								    |  initialising it with custom settings and hold state within the component.
 								    |  This is useful for #[strong stateful components], especially ones which
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    |  #[strong depend on shared data]. In the following example, the custom
 								    |  component #[code EntityMatcher] can be initialised with #[code nlp] object,
 								    |  a terminology list and an entity label. Using the
 								    |  #[+api("phrasematcher") #[code PhraseMatcher]], it then matches the terms
 								    |  in the #[code Doc] and adds them to the existing entities.
 								+aside("Rule-based entities vs. model", "💡")
 								    |  For complex tasks, it's usually better to train a statistical entity
 								    |  recognition model. However, statistical models require training data, so
 								    |  for many situations, rule-based approaches are more practical. This is
 								    |  especially true at the start of a project: you can use a rule-based
 								    |  approach as part of a data collection process, to help you "bootstrap" a
 								    |  statistical model.
 								+code-exec.
 								    import spacy
 								    from spacy.matcher import PhraseMatcher
 								    from spacy.tokens import Span
 								    class EntityMatcher(object):
 								        name = 'entity_matcher'
 								        def __init__(self, nlp, terms, label):
 								            patterns = [nlp(text) for text in terms]
 								            self.matcher = PhraseMatcher(nlp.vocab)
 								            self.matcher.add(label, None, *patterns)
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
-												Another small fix to component docs
											
										
										
											2017-11-23 10:47:20 +00:00
+								        def __call__(self, doc):
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								            matches = self.matcher(doc)
 								            for match_id, start, end in matches:
 								                span = Span(doc, start, end, label=match_id)
 								                doc.ents = list(doc.ents) + [span]
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
+								            return doc
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    nlp = spacy.load('en_core_web_sm')
 								    terms = (u'cat', u'dog', u'tree kangaroo', u'giant sea spider')
 								    entity_matcher = EntityMatcher(nlp, terms, 'ANIMAL')
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    nlp.add_pipe(entity_matcher, after='ner')
 								    print(nlp.pipe_names)  # the components in the pipeline
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    doc = nlp(u"This is a text about Barack Obama and a tree kangaroo")
 								    print([(ent.text, ent.label_) for ent in doc.ents])
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								+h(3, "custom-components-factories") Adding factories
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
 								p
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    |  When spaCy loads a model via its #[code meta.json], it will iterate over
 								    |  the #[code "pipeline"] setting, look up every component name in the
 								    |  internal factories and call
 								    |  #[+api("language#create_pipe") #[code nlp.create_pipe]] to initialise the
 								    |  individual components, like the tagger, parser or entity recogniser. If
 								    |  your model uses custom components, this won't work – so you'll have to
 								    |  tell spaCy #[strong where to find your component]. You can do this by
 								    |  writing to the #[code Language.factories]:
-												Update pipeline component usage docs

											
										
										
											2017-10-10 02:24:39 +00:00
 								+code.
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    from spacy.language import Language
 								    Language.factories['entity_matcher'] = lambda nlp, **cfg: EntityMatcher(nlp, **cfg)
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
 								p
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								    |  You can also ship the above code and your custom component in your
 								    |  packaged model's #[code __init__.py], so it's executed when you load your
 								    |  model. The #[code **cfg] config parameters are passed all the way down
 								    |  from #[+api("spacy#load") #[code spacy.load]], so you can load the model
 								    |  and its components with custom settings:
-												Update pipelines docs and add user hooks to custom components

											
										
										
											2017-10-07 13:27:28 +00:00
-												💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)

* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label

											
										
										
											2018-04-29 00:06:46 +00:00
+								+code.
 								    nlp = spacy.load('your_custom_model', terms=(u'tree kangaroo'), label='ANIMAL')
 								+infobox("Important note", "⚠️")
 								    |  When you load a model via its shortcut or package name, like
 								    |  #[code en_core_web_sm], spaCy will import the package and then call its
 								    |  #[code load()] method. This means that custom code in the model's
 								    |  #[code __init__.py] will be executed, too. This is #[strong not the case]
 								    |  if you're loading a model from a path containing the model data. Here,
 								    |  spaCy will only read in the #[code meta.json]. If you want to use custom
 								    |  factories with a model loaded from a path, you need to add them to
 								    |  #[code Language.factories] #[em before] you load the model.