Commit Graph

388 Commits

Author SHA1 Message Date
Paul O'Leary McCann 7dd21b66d5 Extras require mecab (#3024)
* Add note that Unidic is required for Japanese

This addresses #3001. -POLM

* Add extras_require for mecab with old version

Related to issue #3018.

* mecab → ja

Co-Authored-By: polm <polm@dampfkraft.com>
2018-12-08 06:34:49 +01:00
Justin DuJardin 33fca8672f fix issue compiling the latest spacy on MacOS 10.3.6 (#2998) 2018-12-02 05:51:11 +01:00
Matthew Honnibal 05b2336ffa Try again to fix OSX build 2018-12-01 03:12:21 +01:00
Matthew Honnibal 4895b2e830 Merge branch 'master' of https://github.com/explosion/spaCy 2018-12-01 02:37:21 +01:00
Matthew Honnibal 3f16af123e Try to fix OSX build error 2018-12-01 02:36:56 +01:00
Matthew Honnibal 61abb1ef70 Remove msgpack dependency, to try to fix #2995 2018-12-01 02:36:41 +01:00
Matthew Honnibal 9e2ff2f583
Fix regex pin to harmonize with conda (#2964) 2018-11-26 19:28:54 +01:00
Matthew Honnibal e2ae25d6f5 Try setting older regex version, to align with conda 2018-10-29 13:39:00 +01:00
Matthew Honnibal a2745d310e Revert "Update regex version"
This reverts commit 62358dd867.
2018-10-28 16:38:56 +01:00
Matthew Honnibal 62358dd867 Update regex version 2018-10-28 16:27:50 +01:00
Ines Montani fd750ec3bf Fix msgpack-numpy version pin 2018-10-15 14:18:38 +02:00
Ines Montani 051a6b73eb Update Thinc version pin 2018-10-15 01:40:28 +02:00
Matthew Honnibal 7202abdfa9 Fix specifiers for GPU 2018-10-15 00:08:44 +02:00
Matthew Honnibal b305b24c24 Require thinc 6.10.6 2018-10-14 23:28:41 +02:00
Matthew Honnibal 6e6f6be3f5 Update requirements and setup.py 2018-10-14 23:06:46 +02:00
Ines Montani 9ebe607f82 Add wheel to setup_requires 2018-10-14 16:38:48 +02:00
Ines Montani 2e675d9523 Update murmurhash pin 2018-10-14 16:37:38 +02:00
Matthew Honnibal f784e42ffe Try older version of regex 2018-10-03 00:23:40 +02:00
Matthew Honnibal e4fd2ccd07 Try previous version of regex 2018-10-02 23:37:17 +02:00
Matthew Honnibal 9937ff93e5 Update regex version dependency 2018-10-02 19:43:59 +02:00
Matthew Honnibal 05b6103a0c Try to fix version pin for msgpack-numpy 2018-09-28 14:07:00 +02:00
Matthew Honnibal 276aa83d1a Require older msgpack-numpy 2018-09-27 15:34:24 +02:00
Matthew Honnibal 7be9118be3 Require numpy>=1.15.0 to avoid the RuntimeWarning 2018-08-10 00:14:13 +02:00
Matthew Honnibal cabce07ba6 Fix thinc version requirement 2018-07-21 15:56:33 +02:00
Matthew Honnibal a723fafea3 Require thinc 6.10.3.dev1 2018-07-21 12:49:09 +02:00
ines 95641f4026 Only install pathlib backport on Python < 3.4 2018-07-20 21:08:29 +02:00
Matthew Honnibal adde3826e2 Build against thinc 6.10.3.dev0 2018-07-20 13:34:54 +02:00
Ines Montani d4cc736b7c 💫 Improve model downloads: check for existing install, customise pip and use requests library again (#2346)
* Go back to using requests instead of urllib (closes #2320)

Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.

* Only download model if not installed (see #1456)

Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.

* Pass additional options to pip when installing model (resolves #1456)

Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:

python -m spacy download en --user

* Add CLI option to enable installing model package dependencies

* Revert "Add CLI option to enable installing model package dependencies"

This reverts commit 9336ffe695.

* Update documentation
2018-05-20 20:26:56 +02:00
Matthew Honnibal abf8b16d71
Add doc.retokenize() context manager (#2172)
This patch takes a step towards #1487 by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.

The idea is to do merging and splitting like this:

with doc.retokenize() as retokenizer:
    for start, end, label in matches:
        retokenizer.merge(doc[start : end], attrs={'ent_type': label})

The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.

A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.

The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.

We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager.
2018-04-03 14:10:35 +02:00
Matthew Honnibal 8308bbc617 Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts 2018-03-29 00:14:55 +02:00
ines 366c98a94b Remove requests dependency 2018-03-28 12:46:18 +02:00
ines ce6071ca89 Remove ftfy dependency and update docs 2018-03-28 12:09:42 +02:00
ines 6d2c85f428 Drop six and related hacks as a dependency 2018-03-28 10:45:25 +02:00
ines f5f4de98d1 Version-lock msgpack-python (see #2015) 2018-02-22 16:02:32 +01:00
ines 002ee80ddf Add html5lib to setup.py to fix six error (see #1924) 2018-02-02 20:32:08 +01:00
Matthew Honnibal 2e449c1fbf Fix compiler flags, addressing #1591 2018-01-14 14:34:36 +01:00
Matthew Honnibal 04a92bd75e Pin msgpack-numpy requirement 2017-12-06 03:24:24 +01:00
Hugo aa898ab4e4 Drop support for EOL Python 2.6 and 3.3 2017-11-26 19:46:24 +02:00
Matthew Honnibal 716ccbb71e Require thinc 6.10.1 2017-11-15 14:59:34 +01:00
Matthew Honnibal 314f5b9cdb Require thinc 6.10.0 2017-10-28 18:20:10 +00:00
Matthew Honnibal 64e4ff7c4b Merge 'tidy-up' changes into branch. Resolve conflicts 2017-10-28 13:16:06 +02:00
ines 7946464742 Remove spacy.tagger (now in pipeline) 2017-10-27 19:45:04 +02:00
Matthew Honnibal 531142a933 Merge remote-tracking branch 'origin/develop' into feature/better-parser 2017-10-27 12:34:48 +00:00
Matthew Honnibal 642eb28c16 Don't compile with OpenMP by default 2017-10-27 10:16:58 +00:00
Matthew Honnibal 90d1d9b230 Remove obsolete parser code 2017-10-26 13:22:45 +02:00
Matthew Honnibal 79fcf8576a Compile with march=native 2017-10-18 21:46:34 +02:00
Matthew Honnibal 2eb0fe4957 Fix setup.py 2017-10-03 21:40:04 +02:00
Matthew Honnibal b49cc8153a Require correct thinc 2017-09-26 10:00:18 -05:00
ines 68f66aebf8 Use pkg_resources instead of pip for is_package (resolves #1293) 2017-09-16 20:27:59 +02:00
Matthew Honnibal 07cdbd1219 Require thinc 6.8.1, for Windows 2017-09-15 22:47:53 +02:00