Matthew Honnibal
|
1a7a1c2771
|
* Fix Issue #16: tokens recurse when printing
|
2015-01-30 19:47:50 +11:00 |
Matthew Honnibal
|
cb95ef6934
|
* Fix download script
|
2015-01-30 19:28:43 +11:00 |
Matthew Honnibal
|
e578bd37bd
|
* Fix download script
|
2015-01-30 18:59:31 +11:00 |
Matthew Honnibal
|
df52014d12
|
* Fix download script
|
2015-01-30 18:36:24 +11:00 |
Matthew Honnibal
|
0f95712189
|
* Improve accuracy reporting during training
|
2015-01-30 18:05:06 +11:00 |
Matthew Honnibal
|
b68f563c2f
|
* Fix Issue #14: Improve parsing API
|
2015-01-30 18:04:41 +11:00 |
Matthew Honnibal
|
998b607f65
|
* Upd download script, having it download all data if there's no data/ directory, allowing easier compilation from source
|
2015-01-30 18:04:01 +11:00 |
Matthew Honnibal
|
67d6e53a69
|
* Ensure parser and tagger function correctly when training from missing values, indicated by -1
|
2015-01-30 14:08:56 +11:00 |
Matthew Honnibal
|
4ff180db74
|
* Fix off-by-one error in commit 0a7fceb
|
2015-01-30 12:49:33 +11:00 |
Matthew Honnibal
|
0a7fcebdf7
|
* Fix Issue #12: Incorrect token.idx calculations for some punctuation, in the presence of token cache
|
2015-01-30 12:33:38 +11:00 |
Matthew Honnibal
|
ebf7d2fab1
|
* Use non-joint sbd, for more simplicity and fewer classes
|
2015-01-29 06:22:03 +11:00 |
Matthew Honnibal
|
d05c5bf141
|
* Remove comment
|
2015-01-29 05:19:27 +11:00 |
Matthew Honnibal
|
320b045daa
|
* Oracle now consistent over gold standard derivation
|
2015-01-29 03:41:58 +11:00 |
Matthew Honnibal
|
f590382134
|
* Work on sbd
|
2015-01-29 03:18:29 +11:00 |
Matthew Honnibal
|
1884a7a0be
|
* Attach comment with paper
|
2015-01-28 03:18:43 +11:00 |
Matthew Honnibal
|
a2d6b195db
|
* Add messy Break transitions, carefully following the scheme of Dd Zhang et al (2013)
|
2015-01-28 03:09:45 +11:00 |
Matthew Honnibal
|
f9ee5d9934
|
* Build a python list of word strings, for debugging
|
2015-01-28 01:06:13 +11:00 |
Matthew Honnibal
|
d819101571
|
* Improve error message on oracle failure
|
2015-01-28 00:58:03 +11:00 |
Matthew Honnibal
|
e6c3d3471f
|
* Tweak documentation for Tokens, and hide constructor as __cinit__
|
2015-01-27 18:57:52 +11:00 |
Matthew Honnibal
|
c38c62d4a3
|
* Add docstring to English class
|
2015-01-27 02:45:21 +11:00 |
Matthew Honnibal
|
d4c99f7dec
|
* Add attrs.pxd
|
2015-01-26 22:22:09 +11:00 |
Matthew Honnibal
|
d4a493855e
|
* Fix error msg
|
2015-01-25 23:01:30 +11:00 |
Matthew Honnibal
|
7f87716cf7
|
* Fix download script
|
2015-01-25 23:01:10 +11:00 |
Matthew Honnibal
|
92fb9257dd
|
* Add parts-of-speech file
|
2015-01-25 22:00:39 +11:00 |
Matthew Honnibal
|
c1c3dba4cb
|
* Check whether vector files are present before trying to load them.
|
2015-01-25 18:16:48 +11:00 |
Matthew Honnibal
|
5049d4c2e6
|
* Add parts_of_speech.pyx
|
2015-01-25 16:32:26 +11:00 |
Matthew Honnibal
|
12b034e3ef
|
* Move POS tag definitions to parts_of_speech.pxd
|
2015-01-25 16:31:07 +11:00 |
Matthew Honnibal
|
7431c133d8
|
* Add error if try to access head and not is_parsed
|
2015-01-25 15:33:54 +11:00 |
Matthew Honnibal
|
951d06c824
|
* Silently don't parse if data is not present
|
2015-01-25 14:47:38 +11:00 |
Matthew Honnibal
|
4e857ab7a6
|
* Fix bug in POS tagger feature
|
2015-01-25 02:20:15 +11:00 |
Matthew Honnibal
|
dd56e298e2
|
* Ensure tagging is applied if parse=True
|
2015-01-25 02:19:44 +11:00 |
Matthew Honnibal
|
94750819cd
|
* Set parse=True by default --- i.e. parse unless told not to.
|
2015-01-25 01:28:28 +11:00 |
Matthew Honnibal
|
71b95202eb
|
* Add docstring to StringStore
|
2015-01-24 20:49:15 +11:00 |
Matthew Honnibal
|
6d1c08dafd
|
* Add docstring to Lexeme
|
2015-01-24 20:48:34 +11:00 |
Matthew Honnibal
|
a97bed9359
|
* Fix POS and dependency label tag names. Add parse and string navigation functions.
|
2015-01-24 17:29:04 +11:00 |
Matthew Honnibal
|
76cd024095
|
* Add whitespace property to Token
|
2015-01-24 07:41:21 +11:00 |
Matthew Honnibal
|
5fd72bc220
|
* Have 'string' refer to the whitespace-padded string
|
2015-01-24 07:32:38 +11:00 |
Matthew Honnibal
|
fda94271af
|
* Rename NORM1 and NORM2 attrs to lower and norm
|
2015-01-24 06:17:03 +11:00 |
Matthew Honnibal
|
5ed8b2b98f
|
* Rename sic to orth
|
2015-01-23 02:08:25 +11:00 |
Matthew Honnibal
|
a27b23cc8f
|
* Have SBD return start/end indices
|
2015-01-22 22:24:44 +11:00 |
Matthew Honnibal
|
d460c28838
|
* Rename vec to repvec
|
2015-01-22 02:06:22 +11:00 |
Matthew Honnibal
|
8b9d913d97
|
* Rename vec to repvec
|
2015-01-22 02:05:58 +11:00 |
Matthew Honnibal
|
9cd0b6b3e9
|
* Various tweaks to Tokens class
|
2015-01-22 02:05:37 +11:00 |
Matthew Honnibal
|
5928d158ce
|
* Pass the string to Tokens
|
2015-01-22 02:04:58 +11:00 |
Matthew Honnibal
|
45264e356b
|
* Rename vec to repvec
|
2015-01-22 02:04:24 +11:00 |
Matthew Honnibal
|
5e63c606ad
|
* Rename vec to repvec
|
2015-01-22 02:03:54 +11:00 |
Matthew Honnibal
|
56e6cf0672
|
* Add _string attr to Tokens object
|
2015-01-21 18:57:09 +11:00 |
Matthew Honnibal
|
d6ac60e91c
|
* Bug fixes to sentences method, and improved vector transport for tokens
|
2015-01-21 18:56:32 +11:00 |
Matthew Honnibal
|
f2a229136c
|
* Fix data_dir=None argument to English class
|
2015-01-21 18:27:31 +11:00 |
Matthew Honnibal
|
ef49b8c179
|
* Add stop-word flag
|
2015-01-21 18:22:31 +11:00 |