Commit Graph

3058 Commits

Author SHA1 Message Date
Henning Peters b8f63071eb add lang registration facility 2016-03-25 18:54:45 +01:00
Matthew Honnibal 9cd21ad5b5 Merge pull request #284 from olegzd/olegzd/example/inventoryCount
Added reloadable English() example for inventory counting
2016-03-25 09:48:47 +11:00
Matthew Honnibal 4a37fdcee1 Merge pull request #287 from wbwseeker/deproj_sentbnd_bug
add function to Token for setting head and dep (and dep_)
2016-03-25 09:47:45 +11:00
Stefan Behnel f18805ee1c make StringStore.__contains__() return True for the empty string (which is also contained in iteration) 2016-03-24 15:42:12 +01:00
Stefan Behnel f2cfbfc412 remove internal redundancy and overhead from StringStore 2016-03-24 15:25:27 +01:00
Wolfgang Seeker d65ef41d08 make error messages language independent 2016-03-24 11:47:09 +01:00
Henning Peters 963570aa49 Merge branch 'master' of github.com:spacy-io/spaCy 2016-03-24 11:19:47 +01:00
Henning Peters a7d7ea3afa first idea for supporting multiple langs in download script 2016-03-24 11:19:43 +01:00
Wolfgang Seeker 5080077097 revert init_model.py back to pre-german state (because it makes more sense)
simplify token.n_rights and token.n_lefts
2016-03-21 16:10:25 +01:00
Matthew Honnibal a862edc0e6 Merge pull request #296 from elyase/patch-2
make use of log_smooth_count
2016-03-19 06:50:30 +11:00
Yaser Martinez Palenzuela 3c210f45fa make use of log_smooth_count 2016-03-17 12:19:52 +01:00
Wolfgang Seeker 5e2e8e951a add baseclass DocIterator for iterators over documents
add classes for English and German noun chunks

the respective iterators are set for the document when created by the parser
as they depend on the annotation scheme of the parsing model
2016-03-16 15:53:35 +01:00
Matthew Honnibal 80134eb12d Merge branch 'master' of https://github.com/spacy-io/spaCy 2016-03-15 19:14:50 +00:00
Matthew Honnibal eaccbcda0f Fix bug in pos_tag.py script 2016-03-16 06:04:14 +11:00
Wolfgang Seeker 2ae253ef5b changed head.__set__ to make it simpler 2016-03-14 13:43:48 +01:00
Henning Peters 8f870854c4 move bootstrap script to gist 2016-03-14 11:32:20 +01:00
Henning Peters c12d3dd200 add __init__.py to empty package dirs 2016-03-14 11:28:03 +01:00
Henning Peters 54f3447b5f cleanup 2016-03-14 01:46:33 +01:00
Henning Peters 8ef5b6e126 cleanup 2016-03-13 19:52:13 +01:00
Henning Peters 1fe29c6919 cleanup 2016-03-13 18:12:32 +01:00
Henning Peters 9f628688ce cleanup 2016-03-12 14:31:39 +01:00
Henning Peters 49f499ca1c cleanup 2016-03-12 14:30:24 +01:00
Henning Peters 5701686272 cleanup 2016-03-12 13:47:10 +01:00
Wolfgang Seeker 46e3f979f1 add function for setting head and label to token
change PseudoProjectivity.deprojectivize to use these functions
2016-03-11 17:31:06 +01:00
Matthew Honnibal b37571063a Merge pull request #286 from gushecht/patch-2
added batch_size as keyword argument
2016-03-11 09:46:36 +11:00
Gus Hecht feefe64ab2 added batch_size as keyword argument
There's probably a better default value....
2016-03-10 14:16:34 -08:00
Wolfgang Seeker 03fb498dbe introduce lang field for LexemeC to hold language id
put noun_chunk logic into iterators.py for each language separately
2016-03-10 13:01:34 +01:00
Oleg Zdornyy a774131671 Added reloadable English() example for inv. count 2016-03-09 19:35:55 -08:00
Wolfgang Seeker bc9c62e279 replace Language functions with corresponding orth functions
implement punctuation functions in orth
2016-03-09 18:07:37 +01:00
Wolfgang Seeker d9312bc9ea add new files npchunks.{pyx,pxd} to hold noun phrase chunk generators 2016-03-09 16:18:48 +01:00
Matthew Honnibal 1508528c8c * Increment version 2016-03-08 15:58:45 +00:00
Matthew Honnibal 963fe5258e * Add missing __contains__ method to vocab 2016-03-08 15:49:10 +00:00
Matthew Honnibal 478aa21cb0 * Remove broken __reduce__ method on vocab 2016-03-08 15:48:21 +00:00
Matthew Honnibal 20235bde00 Merge pull request #282 from henningpeters/switch_vectors
initial proposal for ability to switch vectors
2016-03-09 01:39:41 +11:00
Henning Peters 5b3b3ebc8e upgrade to latest sputnik 2016-03-08 15:30:17 +01:00
Henning Peters eb7ae61b1c cleanup api 2016-03-08 12:59:18 +01:00
Henning Peters b740f20191 hash_string() should not depend on python's internal unicode representation, also fixes https://github.com/spacy-io/sense2vec/issues/5 for py2 2016-03-06 09:19:27 +01:00
Henning Peters aa4d964c14 cleanup api 2016-03-05 17:51:32 +01:00
Henning Peters 931c07a609 initial proposal for separate vector package 2016-03-04 11:09:06 +01:00
Wolfgang Seeker 7adbd7a785 replace Counter with normal dict 2016-03-03 21:36:27 +01:00
Wolfgang Seeker 1ae487a4f6 add backwards compatibility with python 2.6 2016-03-03 21:18:12 +01:00
Wolfgang Seeker 9d1e6de4a0 make a proper list from zip iterator 2016-03-03 19:51:01 +01:00
Wolfgang Seeker 49f9d1c085 change test_nonproj.py to not use zip inside numpy.asarray 2016-03-03 19:42:09 +01:00
Wolfgang Seeker 72b8df0684 turned PseudoProjectivity into a normal python class 2016-03-03 19:05:08 +01:00
Matthew Honnibal fcaa0ad7ce Merge pull request #280 from wbwseeker/german_parser
German parser
2016-03-04 03:27:42 +11:00
Wolfgang Seeker 690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Matthew Honnibal 9d51e4d13c Delete gather_freqs.py
This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`
2016-03-02 00:42:55 +11:00
Matthew Honnibal ae2b479312 Merge pull request #278 from elyase/patch-1
replace codecs.open with io.open
2016-03-02 00:41:23 +11:00
Yaser Martinez Palenzuela 1a93d7f725 replace codecs.open with io.open 2016-03-01 14:10:11 +01:00
Wolfgang Seeker 3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00