Commit Graph

49 Commits

Author SHA1 Message Date
Matthew Honnibal fc8d3a112c Add util.env_opt support: Can set hyper params through environment variables. 2017-05-18 04:36:53 -05:00
Matthew Honnibal 793430aa7a Get spaCy train command working with neural network
* Integrate models into pipeline
* Add basic serialization (maybe incorrect)
* Fix pickle on vocab
2017-05-17 12:04:50 +02:00
Matthew Honnibal 89a4f262fc Fix training methods 2017-04-16 13:00:37 -05:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines 958b12dec8 Use pathlib instead of os.path 2017-04-15 12:13:00 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
Matthew Honnibal 2611ac2a89 Fix scorer bug for NER, related to ambiguity between missing annotations and misaligned tokens 2017-03-16 09:38:28 -05:00
Matthew Honnibal 3d4e389d23 Whitespace 2017-03-15 09:29:42 -05:00
Matthew Honnibal 159e8c46e1 Merge old training fixes with newer state 2016-11-25 09:16:36 -06:00
Matthew Honnibal cc7e607a8a Fix gold.pyx for 1.0 2016-11-25 08:57:59 -06:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal f5fe4f595b Fix json loading, for Python 3. 2016-10-20 21:23:26 +02:00
Matthew Honnibal 52b48b415e Fix GoldParse class 2016-10-16 11:41:36 +02:00
Matthew Honnibal 0317cea0ad Fix GoldParse 2016-10-15 23:55:07 +02:00
Matthew Honnibal a48aa15384 Improve the API for the GoldParse class. 2016-10-15 23:53:29 +02:00
Matthew Honnibal e07fe92b27 Draft a refactored init for the GoldParse class 2016-10-15 22:09:52 +02:00
Matthew Honnibal 86ae665c78 Add function for entity->biluo transformation 2016-10-15 21:51:04 +02:00
Matthew Honnibal 645d99523a Move merge_sents method into spacy.gold 2016-10-13 03:24:29 +02:00
Matthew Honnibal ea23b64cc8 Refactor training, with new spacy.train module. Defaults still a little awkward. 2016-10-09 12:24:24 +02:00
Wolfgang Seeker b6b96b233c don't require read_json_file to expect particular annotations 2016-05-02 15:29:30 +02:00
Wolfgang Seeker 4d7f393fae don't require json-files to have syntactic annotation 2016-04-22 16:32:27 +02:00
Henning Peters 6215272786 remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels 2016-04-12 11:28:07 +02:00
Wolfgang Seeker 690c5acabf adjust train.py to train both english and german models 2016-03-03 15:21:00 +01:00
Wolfgang Seeker 3448cb40a4 integrated pseudo-projective parsing into parser
- nonproj.pyx holds a class PseudoProjectivity which currently holds
  all functionality to implement Nivre & Nilsson 2005's pseudo-projective
  parsing using the HEAD decoration scheme
- changed lefts/rights in Token to account for possible non-projective
  structures
2016-03-01 10:09:08 +01:00
Wolfgang Seeker 4b2297d5d4 add class PseudoProjective for pseudo-projective parsing
PseudoProjective() implements the algorithm from Nivre & Nilsson 2005
using their HEAD decoration scheme.
2016-02-24 11:26:25 +01:00
Wolfgang Seeker 8d531c958b replace tests for non-projectivity
- add functions to find non-projective edges
- add test file for non-projectivity functions
2016-02-22 14:40:40 +01:00
Matthew Honnibal 83dccf0fd7 * Use io module insteads of deprecated codecs module 2015-10-10 14:13:01 +11:00
alvations 8caedba42a caught more codecs.open -> io.open 2015-09-30 20:20:09 +02:00
Matthew Honnibal 7606d9936f * Python3 correction for GoldParse 2015-07-28 14:44:53 +02:00
Matthew Honnibal f4809e562f * Allow json to be used as a fallback if ujson is not available 2015-07-25 18:11:36 +02:00
Matthew Honnibal 2ae0b439b2 * Fix space check in gold.pyx 2015-07-14 00:10:27 +02:00
Matthew Honnibal 89a91ad726 * Add SPACE part-of-speech tag, and train tagger to assign it. Also train tagger not to make whitespace an entity 2015-07-09 13:30:41 +02:00
Matthew Honnibal 43ef5ddea5 * Ensure root albel is spelled ROOT, for backwards compatibility 2015-06-23 04:14:03 +02:00
Matthew Honnibal 46fb24e9fd * Add cycle-checking code in gold.pyx 2015-06-23 00:02:22 +02:00
Matthew Honnibal b643cb3d5c * Allow training documents to be filtered in gold.pyx 2015-06-12 02:42:08 +02:00
Matthew Honnibal 00a0dfcb59 * Avoid shipping the spacy.munge package 2015-06-08 00:54:13 +02:00
Matthew Honnibal 89b8775887 * Fix output from _min_edit_path when inputs match. 2015-06-06 05:58:53 +02:00
Matthew Honnibal ae653b850a * Remove unused import from gold.pyx 2015-06-03 06:07:15 +02:00
Matthew Honnibal a513ec500f * Have oracle functions take a struct instead of a Python object 2015-06-02 20:01:06 +02:00
Matthew Honnibal 87d6551d19 * Allow gold parse to cut non-projective arcs 2015-05-31 01:11:56 +02:00
Matthew Honnibal 9e39a206da * Fix efficiency of JSON reading, by using ujson instead of stream 2015-05-30 17:54:52 +02:00
Matthew Honnibal 76300bbb1b * Use updated JSON format, with sentences below paragraphs. Allows use of gold preprocessing flag. 2015-05-30 01:25:46 +02:00
Matthew Honnibal b76bbbd12c * Read json files recursively from a directory, instead of requiring a single .json file 2015-05-29 03:52:55 +02:00
Matthew Honnibal 7a2725bca4 * Read input json in a streaming way 2015-05-27 19:13:11 +02:00
Matthew Honnibal 6016ee83a6 * Fix reading of NER in gold.pyx 2015-05-27 03:17:50 +02:00
Matthew Honnibal 3593babd35 * Add functions for Levenshtein distance alignment 2015-05-24 21:50:48 +02:00
Matthew Honnibal fc75210941 * Move spacy.syntax.conll to spacy.gold 2015-05-24 21:35:02 +02:00