Commit Graph

8657 Commits

Author SHA1 Message Date
Cory Hurst 446f5ec41b Silent keyword in info function in init (#2459)
* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* Pass through "silent" kwarg to the wrapper in the spacy module init.
reference issue  #2196

* contributor agreement
2018-06-18 12:24:21 +02:00
Nipun Sadvilkar 741ba80bd5 Train model command n_iteration 20 -> 30 (#2454)
In source code `train.py` default Number of iterations  is 30
2018-06-18 11:57:08 +02:00
ines 53a2bc8c8d Only scroll sidebar item into view if needed [ci skip] 2018-06-12 10:58:50 +02:00
ines 65713a6593 Increment versions [ci skip] 2018-06-12 10:49:50 +02:00
Ines Montani 968f6f0bda
💫 Document Cython API (#2433)
## Description

This PR adds the most relevant documentation of spaCy's Cython API.

(Todo for when we publish this: rewrite `/api/#section-cython` and `/api/#cython` to `/api/cython#conventions`.)

### Types of change
docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-06-11 17:47:46 +02:00
GolanLevy 72d7e80f94 adding a missing apostrophe (#2436) 2018-06-11 17:47:24 +02:00
ines effb55d591 Adjust formatting [ci skip] 2018-06-11 00:29:13 +02:00
Nathan Breit ba6d2cf393 Add EpiTator to Universe (#2429) 2018-06-11 00:24:13 +02:00
Daniel Ruf d6d688914f chore: cache dependencies (#2418)
* chore: cache dependencies

* chore: add CLA
2018-06-11 00:22:41 +02:00
himkt 1a568f2e08 fix wrong documentations (#2423) 2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi d66292f767 fix UD data file extensions (#2425)
* fix UD data files extension

* add contributor agreement for msklvsk
2018-06-08 14:26:11 +02:00
Nour Shalabi a169b79092 Additions to Arabic stop words. (#2422)
* Additions to Arabic stop words.

* Create nourshalabi.md
2018-06-08 02:33:23 +02:00
Ines Montani 3f2e3cbd27
Add links to Reddit data (see #2401) 2018-05-31 16:22:43 +02:00
ines b8ef9c1000 Fix model names in conftest (see #2379) 2018-05-30 14:10:20 +02:00
ines 0baaf836cf Update formatting [ci skip] 2018-05-30 13:32:49 +02:00
ines 3913e18201 Add self-attentive-parser to universe (see #59) 2018-05-30 13:31:28 +02:00
Maciej c7d53348d7 Fix bug in CLI iob and ner converter (#2392) (fixes #2385)
* issue_2385 add tests for iob_to_biluo converter function

* issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter

* issue_2385 add test to fix b char bug

* add contributor agreement

* fill contributor agreement
2018-05-30 12:28:44 +02:00
ines 605c663a4c Fix HTML merger examples (see #2390) 2018-05-30 12:22:32 +02:00
ansgar-t 9732988951 escape html in displacy.render (#2378) (closes #2361)
## Description
Fix for issue #2361 :
replace &, <, >, " with &amp;amp; , &amp;lt; , &amp;gt; , &amp;quot; in before rendering svg

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [ ] I ran the tests, and all new and existing tests passed.
(As discussed in the comments to #2361)
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-05-28 18:36:41 +02:00
Samuel Pouyt d85494bfae Added agrement (#2374) 2018-05-26 18:19:08 +02:00
Samuel Pouyt 5f988b8e9c Update _custom.jade (#2372)
It seems based on the doc and trying out that the `en` or `[lang]` is missing from the `spacy model-init`
2018-05-26 18:17:12 +02:00
ines d84a830d79 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-26 17:57:05 +02:00
ines fb923b31ea Fix bad HTML example (see #2376) and turn it into section on matcher + components
Avoid problems caused by merging while matching (e.g. index errors). Creating a Matcher component also better reflects the recommended best practices.
2018-05-26 17:57:02 +02:00
James Messinger 4515e96e90 Better formatting for `spacy train` CLI (#2357)
* Better formatting for `spacy train` CLI

Changed to use fixed-spaces rather than tabs to align table headers and data.

### Before:
```
Itn.    P.Loss  N.Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %
0       4618.857        2910.004        76.172  79.645  67.987  88.732  88.261  100.000 4436.9  6376.4
1       4671.972        3764.812        74.481  78.046  62.374  82.680  88.377  100.000 4672.2  6227.1
2       4742.756        3673.473        71.994  77.380  63.966  84.494  90.620  100.000 4298.0  5983.9
```

### After:
```
Itn.  Dep Loss  NER Loss  UAS     NER P.  NER R.  NER F.  Tag %   Token %  CPU WPS  GPU WPS
0     4618.857  2910.004  76.172  79.645  67.987  88.732  88.261  100.000  4436.9   6376.4
1     4671.972  3764.812  74.481  78.046  62.374  82.680  88.377  100.000  4672.2   6227.1
2     4742.756  3673.473  71.994  77.380  63.966  84.494  90.620  100.000  4298.0   5983.9
```

* Added contributor file
2018-05-25 13:08:45 +02:00
Shantam Raj 592834183a corrected spelling (#2359)
changed **interpretted** to **interpreted**
2018-05-24 13:29:52 +02:00
ines 8adb967e0c Fix from source quickstart instructions for Windows
See: https://stackoverflow.com/a/50478036/6400719
2018-05-24 12:42:16 +02:00
Aristo Rinjuang 432ede04af adding more words and rephrasing (#2351)
* adding more words and rephrasing

* adding a contributor

* tokenizer bugs solved
2018-05-24 11:40:57 +02:00
Jani Monoses ec62cadf4c Updates to Romanian support (#2354)
* Add back Romanian in conftest

* Romanian lex_attr

* More tokenizer exceptions for Romanian

* Add tests for some Romanian tokenizer exceptions
2018-05-24 11:40:00 +02:00
Shantam Raj 1a4682dd0b Update _training.jade (#2340)
* Update _training.jade

Correcting grammar. Replacing "The" with "To".

* Create armsp.md

* Update armsp.md
2018-05-21 11:09:33 +02:00
cclauss f7dcaa1f6b Simplify is_config() and normalize_string_keys() (#2305)
* Simplify is_config() and normalize_string_keys()

* Use __in__ to avoid the nested _ands_ and _ors_.
* Dict comprehension directly tracks with the doc string

* Keep more basic loop in normalize_string_keys

* Whitespace
2018-05-21 01:54:35 +02:00
ines ff1082d8e4 Add version tag in CLI docs [ci skip] 2018-05-21 01:17:49 +02:00
Ines Montani d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
ines b59e3b157f Don't require attrs argument in Doc.retokenize and allow both ints and unicode (resolves #2304) 2018-05-20 15:15:37 +02:00
ines 5768df4f09 Add SimpleFrozenDict util to use as default function argument 2018-05-20 15:13:37 +02:00
Matthew Honnibal 581d318971 Fix conftest 2018-05-15 00:54:45 +02:00
Tahar Zanouda 00417794d3 Add Arabic language (#2314)
* added support for Arabic lang

* added Arabic language support

* updated conftest
2018-05-15 00:27:19 +02:00
Jani Monoses 0e08e49e87 Lemmatizer ro (#2319)
* Add Romanian lemmatizer lookup table.

Adapted from http://www.lexiconista.com/datasets/lemmatization/
by replacing cedillas with commas (ș and ț).

The original dataset is licensed under the Open Database License.

* Fix one blatant issue in the Romanian lemmatizer

* Romanian examples file

* Add ro_tokenizer in conftest

* Add Romanian lemmatizer test
2018-05-12 15:20:04 +02:00
vishnumenon ae3719ece5 Fix the code for FACILITIY entities (#2324)
* Fix the code for FACILITIY entities

As far as I can tell, the default models all use "FAC" rather than "FACILITY"

* Added my Contributor Agreement

* Rename vishnumenon to vishnumenon.md
2018-05-12 15:19:17 +02:00
Jani Monoses 42b34832e4 Update Romanian stopword list (#2316)
* Contributor agreement for janimo

* Update Romanian stopword list

Include the correct spellings of all the words already in the repo
that are using cedillas (ş and ţ) instead of commas (ș and ț).

Add another unrelated spelling fix.

See https://github.com/stopwords-iso/stopwords-ro/pull/1 and
https://github.com/stopwords-iso/stopwords-ro/pull/2
2018-05-10 12:16:56 +02:00
Lucas Abbade 18af53014f Adding my contributor agreement (#2315)
* Create LRAbbade.md

* Update LRAbbade.md
2018-05-09 21:25:05 +02:00
Lucas Abbade be7fdc59d1 Update lex_attrs.py (#2307)
* Update lex_attrs.py

Fixed spelling mistakes of some numbers (according to Brazilian Portuguese).

* Update lex_attrs.py

As requested, I've included the correct spelling for both Brazilian Portuguese and Portuguese Portuguese.

I will advise however, that the two are separated in the future. Brazilian Portuguese is a very different language from the original one, although most of the writing is unified, the way people talk in both countries is radically different. Keeping both languages as one may lead to bigger issues in the future, especially when it comes to spell checking.
2018-05-09 20:49:31 +02:00
mauryaland 5368ba028a Update stop_words.py for French language (#2310)
* Add contraction forms of some common stopwords

All the stopwords added contain the apostrophe" ' "or " ’ ".

* Adds contributor agreement mauryaland

* Update mauryaland.md
2018-05-09 12:04:38 +02:00
ines 7a3599c21a Fix formatting and consistency 2018-05-07 23:02:11 +02:00
ines 37facf9b4d Add config for no-response [ci skip] 2018-05-07 22:04:54 +02:00
ines ac25bc4016 Add docs section on sentence segmentation [ci skip] 2018-05-07 21:25:20 +02:00
ines 14148cd147 Fix formatting and wording 2018-05-07 21:24:35 +02:00
ines f803da609f Add scattertext [ci skip] 2018-05-07 19:10:23 +02:00
ines a685fff875 Merge branch 'master' of https://github.com/explosion/spaCy 2018-05-07 18:58:57 +02:00
ines e2241c797c Add lock-threads configuration [ci skip] 2018-05-07 18:54:22 +02:00
B! 414f5270b3 B Cavello's signed Contributor Agreement v2 (#2302)
This time hopefully created in the right spot. (Sorry about that!)
2018-05-07 17:48:54 +02:00