Commit Graph

64 Commits

Author SHA1 Message Date
Matthew Honnibal bae59bf92f Remove BiLSTM import 2017-08-18 22:46:59 +02:00
Matthew Honnibal fe90dfc390 Restore changes from nn-beam-parser to spacy/_ml 2017-08-18 22:38:28 +02:00
Matthew Honnibal ce321b0322 Restore changes from nn-beam-parser to spacy/_ml 2017-08-18 22:24:46 +02:00
Matthew Honnibal 931509d96a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-18 21:57:15 +02:00
Matthew Honnibal 263366729e Don't import BiLSTM 2017-08-18 21:56:31 +02:00
Matthew Honnibal 85794c1167 Restore state of _ml.py 2017-08-18 14:55:23 -05:00
Matthew Honnibal 426f84937f Resolve conflicts when merging new beam parsing stuff 2017-08-18 13:38:32 -05:00
Matthew Honnibal 5181e8bedb Fix merge conflict in _ml 2017-08-18 13:35:51 -05:00
Matthew Honnibal 4b1e7bd6d8 Improve tensorizer model 2017-08-16 18:25:20 -05:00
Matthew Honnibal 6259490347 Fix mixture weights in fine_tune 2017-08-14 17:55:18 -05:00
Matthew Honnibal 335fa8b05c Fix gradient in fine_tune 2017-08-14 14:55:47 -05:00
Matthew Honnibal 52c180ecf5 Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit ea8de11ad5, reversing
changes made to 08e443e083.
2017-08-14 13:00:23 +02:00
Matthew Honnibal ac6c25f762 Check SGD is not None in update 2017-08-14 12:09:18 +02:00
Matthew Honnibal 4ab0c8c8e9 Try different drop_layer structure in Tok2Vec 2017-08-12 08:56:57 -05:00
Matthew Honnibal ebe0f7f641 Pass embed size correctly in tagger, and cache embeddings for efficiency 2017-08-12 05:45:20 -05:00
Matthew Honnibal f93f2bed58 Revert use of layer normalization in Tok2Vec 2017-08-09 17:47:03 -05:00
Matthew Honnibal ac2de6dced Switch to ReLu layers in Tok2Vec 2017-08-09 16:41:25 -05:00
Matthew Honnibal 88bf1cf87c Update parser for fine tuning 2017-08-08 15:34:17 -05:00
Matthew Honnibal 5d837c3776 Add mix weights on fine_tune 2017-08-07 06:32:59 -05:00
Matthew Honnibal 3ed203de25 Use LayerNorm and SELU in Tok2Vec 2017-08-06 18:33:18 +02:00
Matthew Honnibal 4a5cc89138 Fix tagger 'fine_tune', to keep private CNN weights 2017-08-06 14:15:48 +02:00
Matthew Honnibal 4cfb7a54e7 Fix tagger 2017-08-06 01:53:31 +02:00
Matthew Honnibal e9ab800e15 Fix tagging model 2017-08-06 01:50:08 +02:00
Matthew Honnibal 468c138ab3 WIP: Add fine-tuning logic to tagger model, re #1182 2017-08-06 01:13:23 +02:00
Matthew Honnibal 523b0df2c9 Update text classification model 2017-07-25 18:57:59 +02:00
Matthew Honnibal 2df563ad24 Remove optimization for textcat that caused loading problem 2017-07-23 14:10:51 +02:00
Matthew Honnibal ded0df5e2f Expose hyper-param as keyword arg 2017-07-22 20:14:37 +02:00
Matthew Honnibal 6ffec9dfea Update _ml, for textcat model 2017-07-22 20:03:40 +02:00
Matthew Honnibal 727481377e Add text-classifer thinc models 2017-07-20 00:17:17 +02:00
Matthew Honnibal 8a17b99b1c Use NORM attribute, not LOWER 2017-06-03 15:30:16 -05:00
Matthew Honnibal b92a89f87b Make it easier to reference embedding tables 2017-05-29 17:53:29 -05:00
Matthew Honnibal c91b121aeb Move serialization functions to util 2017-05-29 10:13:42 +02:00
Matthew Honnibal 1fa2bfb600 Add model_to_bytes and model_from_bytes helpers. Probably belong in thinc. 2017-05-29 09:27:04 +02:00
Matthew Honnibal 6dad4117ad Work on serialization for models 2017-05-29 01:37:57 +02:00
Matthew Honnibal 8de9829f09 Don't overwrite model in initialization, when loading 2017-05-27 15:50:40 -05:00
Matthew Honnibal b27c587800 Fix pieces argument to PrecomputedMaxout 2017-05-25 06:46:59 -05:00
Matthew Honnibal c998776c25 Make single array for features, to reduce GPU copies 2017-05-22 04:51:08 -05:00
Matthew Honnibal 8904814c0e Add missing import 2017-05-21 09:07:56 -05:00
Matthew Honnibal 3b7c108246 Pass tokvecs through as a list, instead of concatenated. Also fix padding 2017-05-20 13:23:32 -05:00
Matthew Honnibal b272890a8c Try to move parser to simpler PrecomputedAffine class. Currently broken -- maybe the previous change 2017-05-20 06:40:10 -05:00
Matthew Honnibal a438cef8c5 Fix significant bug in feature calculation -- off by 1 2017-05-18 06:21:32 -05:00
Matthew Honnibal 711ad5edc4 Cache features in doc2feats 2017-05-18 04:22:20 -05:00
Matthew Honnibal 5211645af3 Get data flowing through pipeline. Needs redesign 2017-05-16 11:21:59 +02:00
Matthew Honnibal a9edb3aa1d Improve integration of NN parser, to support unified training API 2017-05-15 21:53:27 +02:00
Matthew Honnibal 827b5af697 Update draft of parser neural network model
Model is good, but code is messy. Currently requires Chainer, which may cause the build to fail on machines without a GPU.

Outline of the model:

We first predict context-sensitive vectors for each word in the input:

(embed_lower | embed_prefix | embed_suffix | embed_shape)
>> Maxout(token_width)
>> convolution ** 4

This convolutional layer is shared between the tagger and the parser. This prevents the parser from needing tag features.
To boost the representation, we make a "super tag" with POS, morphology and dependency label. The tagger predicts this
by adding a softmax layer onto the convolutional layer --- so, we're teaching the convolutional layer to give us a
representation that's one affine transform from this informative lexical information. This is obviously good for the
parser (which backprops to the convolutions too).

The parser model makes a state vector by concatenating the vector representations for its context tokens. Current
results suggest few context tokens works well. Maybe this is a bug.

The current context tokens:

* S0, S1, S2: Top three words on the stack
* B0, B1: First two words of the buffer
* S0L1, S0L2: Leftmost and second leftmost children of S0
* S0R1, S0R2: Rightmost and second rightmost children of S0
* S1L1, S1L2, S1R2, S1R, B0L1, B0L2: Likewise for S1 and B0

This makes the state vector quite long: 13*T, where T is the token vector width (128 is working well). Fortunately,
there's a way to structure the computation to save some expense (and make it more GPU friendly).

The parser typically visits 2*N states for a sentence of length N (although it may visit more, if it back-tracks
with a non-monotonic transition). A naive implementation would require 2*N (B, 13*T) @ (13*T, H) matrix multiplications
for a batch of size B. We can instead perform one (B*N, T) @ (T, 13*H) multiplication, to pre-compute the hidden
weights for each positional feature wrt the words in the batch. (Note that our token vectors come from the CNN
-- so we can't play this trick over the vocabulary. That's how Stanford's NN parser works --- and why its model
is so big.)

This pre-computation strategy allows a nice compromise between GPU-friendliness and implementation simplicity.
The CNN and the wide lower layer are computed on the GPU, and then the precomputed hidden weights are moved
to the CPU, before we start the transition-based parsing process. This makes a lot of things much easier.
We don't have to worry about variable-length batch sizes, and we don't have to implement the dynamic oracle
in CUDA to train.

Currently the parser's loss function is multilabel log loss, as the dynamic oracle allows multiple states to
be 0 cost. This is defined as:

(exp(score) / Z) - (exp(score) / gZ)

Where gZ is the sum of the scores assigned to gold classes. I'm very interested in regressing on the cost directly,
but so far this isn't working well.

Machinery is in place for beam-search, which has been working well for the linear model. Beam search should benefit
greatly from the pre-computation trick.
2017-05-12 16:09:15 -05:00
Matthew Honnibal bef89ef23d Mergery 2017-05-08 08:29:36 -05:00
Matthew Honnibal 56073a11ef Don't use tags when calculating token vectors 2017-05-08 07:52:24 -05:00
Matthew Honnibal a66a4a4d0f Replace einsums 2017-05-08 14:46:50 +02:00
Matthew Honnibal 807cb2e370 Add PretrainableMaxouts 2017-05-08 14:24:43 +02:00
Matthew Honnibal 2e2268a442 Precomputable hidden now working 2017-05-08 11:36:37 +02:00