Matthew Honnibal
|
85ce36ab11
|
* Refactor symbols, so that frequency rank can be derived from the orth id of a word.
|
2015-10-13 13:44:39 +11:00 |
Matthew Honnibal
|
83dccf0fd7
|
* Use io module insteads of deprecated codecs module
|
2015-10-10 14:13:01 +11:00 |
alvations
|
8caedba42a
|
caught more codecs.open -> io.open
|
2015-09-30 20:20:09 +02:00 |
Matthew Honnibal
|
1ae55cb63a
|
* Copy tag_map.json in init_model
|
2015-09-12 05:54:02 +02:00 |
Matthew Honnibal
|
5ad4527c42
|
* Rename Deutsch to German
|
2015-09-06 20:18:58 +02:00 |
Matthew Honnibal
|
950ce36660
|
* Update init model
|
2015-09-06 17:51:30 +02:00 |
Matthew Honnibal
|
b6b1e1aa12
|
* Add link for Finnish model
|
2015-08-27 10:26:02 +02:00 |
Matthew Honnibal
|
dc13edd7cb
|
* Refactor init_model to accomodate other languages
|
2015-08-26 19:14:05 +02:00 |
Matthew Honnibal
|
bbf07ac253
|
* Cut down init_model to work on more languages
|
2015-08-24 01:05:20 +02:00 |
Matthew Honnibal
|
3ecacb9635
|
* Copy gazetteer file in init_model
|
2015-08-06 16:07:23 +02:00 |
Matthew Honnibal
|
174ed1ad20
|
* Tighten the frequency filter in init_model
|
2015-07-27 21:44:51 +02:00 |
Matthew Honnibal
|
6047f2aa35
|
* Fix path to freqs.txt
|
2015-07-27 02:22:35 +02:00 |
Matthew Honnibal
|
0368889d6c
|
* Support gzipped frequencies in init_model
|
2015-07-26 22:39:22 +02:00 |
Matthew Honnibal
|
c4f20847da
|
* Fix init_model for travis tests
|
2015-07-26 14:03:30 +02:00 |
Matthew Honnibal
|
09312b9353
|
* Fix init_model for travis tests
|
2015-07-26 13:55:47 +02:00 |
Matthew Honnibal
|
90ad717dc4
|
* Update default freq thresholds in init_model
|
2015-07-26 01:41:17 +02:00 |
Matthew Honnibal
|
6a5e035a48
|
* Ensure data files are copied for tokenizer in init_model
|
2015-07-26 01:36:19 +02:00 |
Matthew Honnibal
|
ab93898ac6
|
* Make heuristics more explicit in init_model
|
2015-07-26 00:22:19 +02:00 |
Matthew Honnibal
|
5c04dcd7c1
|
* Fix init_model
|
2015-07-25 23:33:02 +02:00 |
Matthew Honnibal
|
fd525f0675
|
* Pass OOV probability around
|
2015-07-25 23:29:51 +02:00 |
Matthew Honnibal
|
5b6bf4d4a6
|
* Remove probability cap on lexicon
|
2015-07-25 23:05:51 +02:00 |
Matthew Honnibal
|
c62eb110c0
|
* Fix merge conflict in init_model
|
2015-07-25 23:04:30 +02:00 |
Matthew Honnibal
|
0301472d15
|
* Fix init_model
|
2015-07-25 22:56:35 +02:00 |
Matthew Honnibal
|
8e800adfbc
|
* Fix init_model
|
2015-07-25 22:54:08 +02:00 |
Matthew Honnibal
|
6076213c16
|
* Fix init_model script
|
2015-07-25 22:35:52 +02:00 |
Matthew Honnibal
|
ef448649b3
|
* Add read_freqs function in init_model
|
2015-07-25 22:16:36 +02:00 |
Matthew Honnibal
|
6be3ee311c
|
Py3 compatibility tweak
|
2015-07-23 13:13:15 +02:00 |
Matthew Honnibal
|
d4407d8e2f
|
Py3 compatibility tweak
|
2015-07-23 09:45:15 +02:00 |
Matthew Honnibal
|
da4821fc14
|
* Add cluster words to probs in init_model
|
2015-07-23 09:27:07 +02:00 |
Matthew Honnibal
|
4af2595d99
|
* Fix structure of wordnet directory for init_model
|
2015-07-23 06:35:38 +02:00 |
Matthew Honnibal
|
83c0f0da22
|
* Remove lemmatizer from init_model
|
2015-07-23 02:32:34 +02:00 |
Matthew Honnibal
|
386246db5b
|
* Update init_model, making language resources optional
|
2015-07-22 00:25:14 +02:00 |
Matthew Honnibal
|
af54d05d60
|
* Remove sense stuff from init_model
|
2015-07-14 10:56:17 +02:00 |
Matthew Honnibal
|
62cfcd76fe
|
* Add supersense sets to lexemes, from WordNet. Look-up via lemmatization.
|
2015-07-01 18:48:59 +02:00 |
Matthew Honnibal
|
c8a553fe91
|
* Fix cluster initialization
|
2015-05-31 15:21:28 +02:00 |
Matthew Honnibal
|
c037f80638
|
* Add case expansion to Brown clusters
|
2015-05-31 05:50:50 +02:00 |
Matthew Honnibal
|
5ab0f233a1
|
* Ensure words in Brown clusters make it into the vocab, even if they're not in our probs list
|
2015-05-31 05:46:16 +02:00 |
Matthew Honnibal
|
4489d87550
|
* Add cluster=0 by default in init_model
|
2015-04-29 14:23:13 +02:00 |
Matthew Honnibal
|
693c5a1558
|
* Exclude clusterings for words only seen 1 or 2 times, as their clusters are unreliable
|
2015-04-17 04:44:52 +02:00 |
Matthew Honnibal
|
1629b33082
|
* Fix copying of tokenizer data in init_model
|
2015-04-12 04:45:31 +02:00 |
Matthew Honnibal
|
baff0f8ad8
|
* Add docstring explaining script a bit, and add handling of word vectors
|
2015-04-08 08:20:15 +02:00 |
Matthew Honnibal
|
156b70ed82
|
* Add new script to replace make_lexicon, that does full setup of data
|
2015-04-08 07:46:53 +02:00 |