precomputable_biaffine: avoid concatenation (#10911)

The `forward` of `precomputable_biaffine` performs matrix multiplication and then `vstack`s the result with padding. This creates a temporary array used for the output of matrix concatenation. This change avoids the temporary by pre-allocating an array that is large enough for the output of matrix multiplication plus padding and fills the array in-place. This gave me a small speedup (a bit over 100 WPS) on de_core_news_lg on M1 Max (after changing thinc-apple-ops to support in-place gemm as BLIS does).
2022-06-10 18:12:28 +02:00 · 2022-06-10 18:12:28 +02:00 · a83a501195
parent 97e8a5041b
commit a83a501195
1 changed files with 4 additions and 2 deletions
--- a/spacy/ml/_precomputable_affine.py
+++ b/spacy/ml/_precomputable_affine.py
@ -22,9 +22,11 @@ def forward(model, X, is_train):
    nP = model.get_dim("nP")
    nI = model.get_dim("nI")
    W = model.get_param("W")
-    Yf = model.ops.gemm(X, W.reshape((nF * nO * nP, nI)), trans2=True)
+    # Preallocate array for layer output, including padding.
+    Yf = model.ops.alloc2f(X.shape[0]  + 1, nF * nO * nP, zeros=False)
+    model.ops.gemm(X, W.reshape((nF * nO * nP, nI)), trans2=True, out=Yf[1:])
    Yf = Yf.reshape((Yf.shape[0], nF, nO, nP))
-    Yf = model.ops.xp.vstack((model.get_param("pad"), Yf))
+    Yf[0] = model.get_param("pad")

    def backward(dY_ids):
        # This backprop is particularly tricky, because we get back a different