Matthew Honnibal
abf8b16d71
Add doc.retokenize() context manager ( #2172 )
...
This patch takes a step towards #1487 by introducing the
doc.retokenize() context manager, to handle merging spans, and soon
splitting tokens.
The idea is to do merging and splitting like this:
with doc.retokenize() as retokenizer:
for start, end, label in matches:
retokenizer.merge(doc[start : end], attrs={'ent_type': label})
The retokenizer accumulates the merge requests, and applies them
together at the end of the block. This will allow retokenization to be
more efficient, and much less error prone.
A retokenizer.split() function will then be added, to handle splitting a
single token into multiple tokens. These methods take `Span` and `Token`
objects; if the user wants to go directly from offsets, they can append
to the .merges and .splits lists on the retokenizer.
The doc.merge() method's behaviour remains unchanged, so this patch
should be 100% backwards incompatible (modulo bugs). Internally,
doc.merge() fixes up the arguments (to handle the various deprecated styles),
opens the retokenizer, and makes the single merge.
We can later start making deprecation warnings on direct calls to doc.merge(),
to migrate people to use of the retokenize context manager.
2018-04-03 14:10:35 +02:00
Matthew Honnibal
8308bbc617
Get msgpack and msgpack_numpy via Thinc, to avoid potential version conflicts
2018-03-29 00:14:55 +02:00
ines
366c98a94b
Remove requests dependency
2018-03-28 12:46:18 +02:00
ines
ce6071ca89
Remove ftfy dependency and update docs
2018-03-28 12:09:42 +02:00
ines
6d2c85f428
Drop six and related hacks as a dependency
2018-03-28 10:45:25 +02:00
ines
f5f4de98d1
Version-lock msgpack-python (see #2015 )
2018-02-22 16:02:32 +01:00
ines
002ee80ddf
Add html5lib to setup.py to fix six error (see #1924 )
2018-02-02 20:32:08 +01:00
Matthew Honnibal
2e449c1fbf
Fix compiler flags, addressing #1591
2018-01-14 14:34:36 +01:00
Matthew Honnibal
04a92bd75e
Pin msgpack-numpy requirement
2017-12-06 03:24:24 +01:00
Hugo
aa898ab4e4
Drop support for EOL Python 2.6 and 3.3
2017-11-26 19:46:24 +02:00
Matthew Honnibal
716ccbb71e
Require thinc 6.10.1
2017-11-15 14:59:34 +01:00
Matthew Honnibal
314f5b9cdb
Require thinc 6.10.0
2017-10-28 18:20:10 +00:00
Matthew Honnibal
64e4ff7c4b
Merge 'tidy-up' changes into branch. Resolve conflicts
2017-10-28 13:16:06 +02:00
ines
7946464742
Remove spacy.tagger (now in pipeline)
2017-10-27 19:45:04 +02:00
Matthew Honnibal
531142a933
Merge remote-tracking branch 'origin/develop' into feature/better-parser
2017-10-27 12:34:48 +00:00
Matthew Honnibal
642eb28c16
Don't compile with OpenMP by default
2017-10-27 10:16:58 +00:00
Matthew Honnibal
90d1d9b230
Remove obsolete parser code
2017-10-26 13:22:45 +02:00
Matthew Honnibal
79fcf8576a
Compile with march=native
2017-10-18 21:46:34 +02:00
Matthew Honnibal
2eb0fe4957
Fix setup.py
2017-10-03 21:40:04 +02:00
Matthew Honnibal
b49cc8153a
Require correct thinc
2017-09-26 10:00:18 -05:00
ines
68f66aebf8
Use pkg_resources instead of pip for is_package ( resolves #1293 )
2017-09-16 20:27:59 +02:00
Matthew Honnibal
07cdbd1219
Require thinc 6.8.1, for Windows
2017-09-15 22:47:53 +02:00
Matthew Honnibal
96a4a9070b
Compile _beam_utils
2017-08-18 21:56:19 +02:00
Matthew Honnibal
f9ae86b01c
Fix requirement
2017-08-18 20:56:53 +02:00
Matthew Honnibal
69bcacdc09
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 20:47:13 +02:00
Matthew Honnibal
de7f3509d2
Compile CFile, for vector loading
2017-08-18 20:46:41 +02:00
Matthew Honnibal
426f84937f
Resolve conflicts when merging new beam parsing stuff
2017-08-18 13:38:32 -05:00
Matthew Honnibal
60d8111245
Require thinc 6.8.1
2017-08-15 03:12:26 -05:00
Matthew Honnibal
52c180ecf5
Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
...
This reverts commit ea8de11ad5
, reversing
changes made to 08e443e083
.
2017-08-14 13:00:23 +02:00
Matthew Honnibal
b353e4d843
Work on parser beam training
2017-08-12 14:47:45 -05:00
ines
495e042429
Add entry point-style auto alias for "spacy"
...
Simplest way to run commands as spacy xxx instead of python -m spacy
xxx, while avoiding environment conflicts
2017-08-09 12:17:30 +02:00
Matthew Honnibal
ff7418b0d9
Update requirements
2017-07-25 18:58:15 +02:00
Matthew Honnibal
b4cdd05466
Add vectors.pyx in setup
2017-06-05 12:45:29 +02:00
Matthew Honnibal
c811790095
Register vectors.pyx in setup
2017-06-05 12:32:22 +02:00
ines
152dc018a6
Remove syntax iterators from setup.py
2017-06-05 12:30:22 +02:00
Matthew Honnibal
a4dcc96c54
Require thinc bugfix
2017-06-05 04:02:52 -05:00
ines
71954d5fe7
Update Thinc version
2017-06-03 10:32:53 +02:00
ines
f45cd174bf
Update Thinc version
2017-06-02 18:48:16 +02:00
Matthew Honnibal
ae8010b526
Move weight serialization to Thinc
2017-06-01 02:56:12 -05:00
Matthew Honnibal
2e364f7ecd
Require msgpack
2017-05-29 13:47:29 +02:00
ines
3cc6fe1484
Add pip to requirements.txt and setup.py
2017-05-17 12:04:03 +02:00
Matthew Honnibal
48de4ed49f
Require thinc 6.6, and compile the nn_parser module
2017-05-14 01:20:28 +02:00
Matthew Honnibal
825c6403d8
Remove serializer
2017-05-09 17:28:30 +02:00
ines
564939391a
Remove spacy.orth
2017-05-09 01:21:47 +02:00
ines
229b8c3974
Tidy up
2017-05-07 18:36:35 +02:00
ines
a793174ae9
Use setuptools.find_packages()
2017-05-03 20:11:02 +02:00
Yasuaki Uechi
c8f83aeb87
Add basic japanese support
2017-05-03 13:56:21 +09:00
Ines Montani
7da9cefd25
Merge pull request #1022 from luvogels/master
...
Initial support for Norwegian Bokmål
2017-04-27 11:16:06 +02:00
Ines Montani
417f430d23
Relax version contstraint
2017-04-20 15:39:24 +02:00
Gyorgy Orosz
4a06a2572c
Using ftfy for handling broken encoded strings.
2017-04-20 13:34:51 +02:00