Matthew Honnibal
|
2805068ca8
|
* Have tokens track tuples that record the start offset and pos tag as well as a lexeme pointer
|
2014-10-14 15:21:03 +11:00 |
Matthew Honnibal
|
71ee921055
|
* Slight cleaning of tokenizer code
|
2014-10-10 19:17:22 +11:00 |
Matthew Honnibal
|
59b41a9fd3
|
* Switch to new data model, tests passing
|
2014-10-10 08:11:31 +11:00 |
Matthew Honnibal
|
08cef75ffd
|
* Switch to using a heap-allocated vector in tokens
|
2014-09-15 03:46:14 +02:00 |
Matthew Honnibal
|
f77b7098c0
|
* Upd Tokens to use vector, with bounds checking.
|
2014-09-15 03:22:40 +02:00 |
Matthew Honnibal
|
df24e3708c
|
* Move EnglishTokens stuff to Tokens
|
2014-09-15 01:31:44 +02:00 |
Matthew Honnibal
|
5aa591106b
|
* Fiddle with token features
|
2014-09-12 15:49:36 +02:00 |
Matthew Honnibal
|
073ee0de63
|
* Restore dense_hash_map for cache dictionary. Seems to double efficiency
|
2014-09-12 02:23:51 +02:00 |
Matthew Honnibal
|
1a3222af4b
|
* Moving tokens to use an array internally, instead of a list of Lexeme objects.
|
2014-09-11 16:57:08 +02:00 |
Matthew Honnibal
|
cf412adba8
|
* Refactoring to use Tokens object
|
2014-09-10 18:11:13 +02:00 |
Matthew Honnibal
|
68bae2fec6
|
* More refactoring
|
2014-08-25 16:42:22 +02:00 |
Matthew Honnibal
|
07ecf5d2f4
|
* Fixed group_by, removed idea of general attr_of function.
|
2014-08-22 00:02:37 +02:00 |
Matthew Honnibal
|
a78ad4152d
|
* Broken version being refactored for docs
|
2014-08-20 13:39:39 +02:00 |
Matthew Honnibal
|
01469b0888
|
* Refactor spacy so that chunks return arrays of lexemes, so that there is properly one lexeme per word.
|
2014-08-18 19:14:00 +02:00 |
Matthew Honnibal
|
a895fe5ddb
|
* Upd from spacy
|
2014-07-23 17:35:18 +01:00 |
Matthew Honnibal
|
571808a274
|
Group-by seems to be working
|
2014-07-07 20:27:02 +02:00 |
Matthew Honnibal
|
057c21969b
|
* Refactor for string view features. Working on setting up flags and enums.
|
2014-07-07 16:58:48 +02:00 |
Matthew Honnibal
|
f1bcbd4c4e
|
* Reorganized code to accomodate Tokens class. Need string views before group_by and count_by can be done well.
|
2014-07-07 12:47:21 +02:00 |