Commit Graph

9630 Commits

Author SHA1 Message Date
Matthew Honnibal c5f947f194 Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00
Matthew Honnibal 7f02464494 Set version to v2.1.0a8.dev0 2019-02-21 11:42:23 +01:00
Matthew Honnibal f31dbec528 More fixes for #3112 2019-02-21 11:10:10 +01:00
Matthew Honnibal e485241003 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-21 10:33:35 +01:00
Matthew Honnibal 582be8746c Update multi_processing example 2019-02-21 10:33:16 +01:00
Matthew Honnibal 80195bc2d1
Fix issue #3288 (#3308) 2019-02-21 09:48:53 +01:00
Matthew Honnibal a137e8b418 Fix Pipe.to_bytes() when model uninitialized
Closes #3289
2019-02-21 09:42:02 +01:00
Matthew Honnibal 6574e4f2d3 Fix issue #3112 part 1 2019-02-21 09:27:38 +01:00
Matthew Honnibal b21481eeca Load token_match regex with .match, not .search 2019-02-21 09:09:03 +01:00
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani 9696cf16c1 Merge branch 'master' into develop 2019-02-20 21:31:27 +01:00
Matthew Honnibal 0d1ca15b13 💫 Fix bugs in matcher extensions. Closes #1971 (#3301)
* Fix matching on extension attrs and predicates

* Fix detection of match_id when using extension attributes. The match
ID is stored as the last entry in the pattern. We were checking for this
with nr_attr == 0, which didn't account for extension attributes.

* Fix handling of predicates. The wrong count was being passed through,
so even patterns that didn't have a predicate were being checked.

* Fix regex pattern

* Fix matcher set value test
2019-02-20 21:30:39 +01:00
Ines Montani f73d01aa32 Update netlify.toml [ci skip] 2019-02-20 14:33:32 +01:00
Ines Montani da5edbe434 Tidy up 2019-02-20 14:33:23 +01:00
Michael Liberman 386cec1979 - Json fix in comment (#3294) 2019-02-19 18:01:35 +01:00
Ines Montani 3b667787a9 Add xfailing test for #3289 2019-02-18 16:45:04 +01:00
Ines Montani 57ae71ea95 Add docs on serializing the pipeline (see #3289) [ci skip] 2019-02-18 14:13:29 +01:00
Ines Montani 91f260f2c4 Add another test for #1971 2019-02-18 13:36:20 +01:00
Ines Montani f30aac324c Update test_issue1971.py 2019-02-18 13:36:15 +01:00
Ines Montani 38e4422c0d Improve matcher example (resolves #3287) 2019-02-18 13:26:37 +01:00
Ines Montani 660cfe44c5 Fix formatting 2019-02-18 13:26:22 +01:00
Ines Montani 8fa26ca97e Fix tensor shape in test for #3288 2019-02-18 11:01:54 +01:00
Ines Montani c32290557f Add xfailing test for #3288 2019-02-18 10:59:31 +01:00
Ines Montani c5476bd75b Update languages.json 2019-02-18 10:03:35 +01:00
Ines Montani 3fdcdec6a0 Merge branch 'master' into develop 2019-02-18 10:03:32 +01:00
Roshni Biswas e09f1347fa updates for Bengali language (#3286)
* Update morph_rules.py

* contributor agreement for roshni-b

* created example sentences
2019-02-18 10:02:28 +01:00
Ines Montani 212ff359ef Fix links [ci skip] 2019-02-17 22:25:50 +01:00
Ines Montani 04b4df0ec9 Remove n_threads 2019-02-17 22:25:42 +01:00
Ines Montani 4c7ab7620a Update README.md 2019-02-17 22:16:17 +01:00
Ines Montani 8a8523d8c1 Update README.md 2019-02-17 21:59:52 +01:00
Ines Montani e597110d31
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 19:31:19 +01:00
Ines Montani 043e8186f3 Merge branch 'master' into develop 2019-02-17 17:51:17 +01:00
Marc Puig 51268e9f21 Typo error fixed (#3284) 2019-02-17 17:51:02 +01:00
Ines Montani 3af0b2dd1c Add xfailing test for #1971 [ci skip] 2019-02-17 13:04:47 +01:00
Ines Montani 19a002bfd3 Merge branch 'master' into develop 2019-02-17 12:22:54 +01:00
Ines Montani 1e252b129c Auto-format 2019-02-17 12:22:07 +01:00
Roshni Biswas e26d923726 Update morph_rules.py (#3283) 2019-02-17 12:21:47 +01:00
Matthew Honnibal 7d4a52a4d0 Set version to v2.1.0a7 2019-02-16 17:48:34 +01:00
Matthew Honnibal 07617b6b7f Set version to v2.1.0a7.dev12 2019-02-16 17:30:29 +01:00
Matthew Honnibal 808ae7521b Require thinc 7.0.1 2019-02-16 17:29:57 +01:00
Matthew Honnibal 1dc314bada Set version to v2.1.0a7.dev11 2019-02-16 17:02:49 +01:00
Matthew Honnibal eea3001b98 Depend on thinc 7.0.1.dev2 2019-02-16 17:02:30 +01:00
Matthew Honnibal 2ef227c313 Set version to v2.1.0a7.dev1 2019-02-16 16:22:46 +01:00
Matthew Honnibal f456b673d4 Require thinc 7.0.1.dev1 2019-02-16 16:22:26 +01:00
Matthew Honnibal 22923b9cb1 Set version to v2.1.0a7.dev9 2019-02-16 15:47:19 +01:00
Matthew Honnibal 11e826ac3b Require thinc v7.0.1.dev0 2019-02-16 15:47:02 +01:00
Matthew Honnibal e0c91a4c8d Set version to 2.1.0a7 2019-02-16 14:43:38 +01:00
Matthew Honnibal 92b6bd2977
Refinements to retokenize.split() function (#3282)
* Change retokenize.split() API for heads

* Pass lists as values for attrs in split

* Fix test_doc_split filename

* Add error for mismatched tokens after split

* Raise error if new tokens don't match text

* Fix doc test

* Fix error

* Move deps under attrs

* Fix split tests

* Fix retokenize.split
2019-02-15 17:32:31 +01:00
Matthew Honnibal 2dbc61bc26 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-15 14:03:54 +01:00
Ines Montani 1aa57690dc Add xfailing test for orth mismatch in retokenizer.split 2019-02-15 13:55:04 +01:00