spaCy

Commit Graph

Author	SHA1	Message	Date
Matthew Honnibal	4ff92184f1	Improve train_ud script	2017-01-09 09:53:46 -06:00
Matthew Honnibal	c1ef07788c	Update train_ud.py Create deps folder if it doesn't exist.	2017-01-09 10:55:44 +11:00
Matthew Honnibal	46e98ec029	Move init_model.py script from repo. These meta-tools should live elsewhere	2016-12-18 14:03:40 +01:00
dafnevk	cdf5dcc40a	fixed bug in init_model so that it runs for dutch	2016-12-13 14:33:44 +01:00
Matthew Honnibal	c7889492f9	Fix model saving error for Python 3	2016-11-25 18:04:30 -06:00
Matthew Honnibal	22189e60db	Use unicode literals in train_ud	2016-11-25 17:45:45 -06:00
Matthew Honnibal	da5f0cce36	Fix train_ud script, which trains models from the Universal Dependencies format.	2016-11-25 11:19:33 -06:00
Matthew Honnibal	314bc8d34f	Fix train script for 1.0	2016-11-25 08:57:37 -06:00
Matthew Honnibal	bd1bfcca61	Update train.py	2016-10-13 03:23:48 +02:00
Matthew Honnibal	ea23b64cc8	Refactor training, with new spacy.train module. Defaults still a little awkward.	2016-10-09 12:24:24 +02:00
Matthew Honnibal	53fbd3dd1c	Fix train.py for v1.0.0-rc1	2016-10-05 01:11:46 +02:00
Matthew Honnibal	ae202e7a60	Fix init_model.py	2016-09-25 15:58:51 +02:00
Matthew Honnibal	af847e07fc	Fix usage of pathlib for Python3 -- turning paths to strings.	2016-09-24 21:05:27 +02:00
Matthew Honnibal	d310dc73ef	Fix bin/init_model.py after refactoring	2016-09-24 20:38:18 +02:00
Matthew Honnibal	8036368d96	* Fix model saving	2016-05-23 12:01:46 +00:00
Matthew Honnibal	35214053fd	* Work around get_lex_attr bug introduced during German parsing	2016-05-23 10:53:00 +00:00
Wolfgang Seeker	dae6bc05eb	define German dummy lemmatizer until morphology is done	2016-05-02 16:04:53 +02:00
Matthew Honnibal	8569dbc2d0	* Add initial stuff for Chinese parsing	2016-04-24 18:44:24 +02:00
Wolfgang Seeker	f9150ccf2a	rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip	2016-04-08 13:38:07 +02:00
Wolfgang Seeker	a8f4e49900	update init_model.py to previous (better) state	2016-03-29 16:12:13 +02:00
Matthew Honnibal	d249e2f7f3	* Improve error message in bin/parser/train.py	2016-03-29 13:04:33 +11:00
Yaser Martinez Palenzuela	3c210f45fa	make use of log_smooth_count	2016-03-17 12:19:52 +01:00
Matthew Honnibal	fcaa0ad7ce	Merge pull request #280 from wbwseeker/german_parser German parser	2016-03-04 03:27:42 +11:00
Wolfgang Seeker	690c5acabf	adjust train.py to train both english and german models	2016-03-03 15:21:00 +01:00
Matthew Honnibal	9d51e4d13c	Delete gather_freqs.py This script was in a broken state, and should be unnecessary. The functionality is subsumed by `get_freqs.py`	2016-03-02 00:42:55 +11:00
Yaser Martinez Palenzuela	1a93d7f725	replace codecs.open with io.open	2016-03-01 14:10:11 +01:00
Wolfgang Seeker	eae35e9b27	add tokenizer files for German, add/change code to train German pos tagger - add files to specify rules for German tokenization - change generate_specials.py to generate from an external file (abbrev.de.tab) - copy gazetteer.json from lang_data/en/ - init_model.py - change doc freq threshold to 0 - add train_german_tagger.py - expects conll09-formatted input	2016-02-18 13:24:20 +01:00
Henning Peters	a89ca6537b	fix cythonize	2016-02-05 16:17:23 +01:00
Henning Peters	3a50448bf3	py3 compatibility	2016-02-05 15:43:50 +01:00
Henning Peters	7627969aba	refactor, listen on setup.py, *.pxd	2016-02-05 15:37:00 +01:00
Matthew Honnibal	5dc6cffc67	* Fix gather_freqs.py	2016-02-04 20:21:58 +01:00
Matthew Honnibal	e2ed6251d7	* Fancy up the CLI for the conll train script	2016-02-02 22:58:06 +01:00
Matthew Honnibal	a676d66807	* Update the CoNLL train script, to get working on other languages	2016-02-02 22:29:34 +01:00
Henning Peters	73674a4afb	try using system-wide headers	2015-12-13 12:51:23 +01:00
Henning Peters	92fabd0114	wrap virtualenv around cythonize	2015-12-13 12:32:22 +01:00
Henning Peters	9662cf04c9	new approach to dependency headers	2015-12-13 11:53:02 +01:00
Matthew Honnibal	6e68b344c1	* Train after parsing, not before.	2015-11-12 04:43:52 +11:00
Matthew Honnibal	4fb038a9eb	* Update conll_train.py script for spaCy v0.97	2015-10-31 00:53:51 +11:00
Matthew Honnibal	cfaa4bde5d	* Add train and parse scripts that use CoNLL formatted data	2015-10-30 12:54:49 +11:00
Matthew Honnibal	2348a08481	* Load/dump strings with a json file, instead of the hacky strings file we were using.	2015-10-22 21:13:03 +11:00
Matthew Honnibal	0ce12e4548	* Import io in get_freqs	2015-10-19 12:56:18 +11:00
Matthew Honnibal	17fffb4c57	* Update get_freqs.py script	2015-10-16 04:33:49 +11:00
Matthew Honnibal	5ff4454177	* Update get_freqs.py script	2015-10-16 04:31:15 +11:00
Matthew Honnibal	a748146dd3	* Update get_freqs.py script	2015-10-16 04:24:50 +11:00
Matthew Honnibal	a29fd79fbc	* Update get_freqs.py script	2015-10-16 04:24:08 +11:00
Matthew Honnibal	e08a4b46a2	* Update get_freqs.py script	2015-10-16 04:20:35 +11:00
Matthew Honnibal	92f750cf8b	* Use a gzipped frequencies file in init_model	2015-10-11 06:59:44 +02:00
Matthew Honnibal	064bd69ad0	* Refactor symbols, so that frequency rank can be derived from the orth id of a word.	2015-10-10 16:03:48 +11:00
Matthew Honnibal	83dccf0fd7	* Use io module insteads of deprecated codecs module	2015-10-10 14:13:01 +11:00
Matthew Honnibal	f35632e2e5	* Remove SBD print statement in train, after SBD evaluation was removed from Scorer	2015-10-09 11:08:58 +02:00

1 2 3 4 5

207 Commits