Rework python2js_buffer (#1376)

This commit is contained in:
Hood Chatham 2021-04-09 00:51:20 -04:00 committed by GitHub
parent 05a84ba3e9
commit 03b1928311
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
17 changed files with 488 additions and 870 deletions

View File

@ -56,6 +56,12 @@ substitutions:
- {{ API }} Added `PyProxy.getBuffer` API to allow direct access to Python
buffers as Javascript TypedArrays.
[1215](https://github.com/pyodide/pyodide/pull/1215)
- {{ API }} The innermost level of a buffer converted to Javascript used to be a
TypedArray if the buffer was contiguous and otherwise an Array. Now the
innermost level will be a TypedArray unless the buffer format code is a '?' in
which case it will be an Array of booleans, or if the format code is a "s" in
which case the innermost level will be converted to a string.
[1376](https://github.com/pyodide/pyodide/pull/1376)
- {{ Enhancement }} Javascript `BigInt`s are converted into Python `int` and
Python `int`s larger than 2^53 are converted into `BigInt`.
[1407](https://github.com/pyodide/pyodide/pull/1407)

View File

@ -211,7 +211,7 @@ def get_jsdoc_summary_directive(app):
The output is designed to be input to format_table. The link name
needs to be set up so that :any:`link_name` makes a link to the
actual api docs for this object.
actual API docs for this object.
"""
sig = self.get_sig(obj)
display_name = obj.name

View File

@ -276,11 +276,14 @@ the {any}`PyProxy.toJs` method. By default, the `toJs` method does a recursive "
conversion, to do a shallow conversion use `proxy.toJs(1)`. The `toJs` method
performs the following explicit conversions:
| Python | Javascript |
|------------------|---------------------|
| `list`, `tuple` | `Array` |
| `dict` | `Map` |
| `set` | `Set` |
| Python | Javascript |
|-------------------|---------------------|
| `list`, `tuple` | `Array` |
| `dict` | `Map` |
| `set` | `Set` |
| a buffer* | `TypedArray` |
* Examples of buffers include bytes objects and numpy arrays.
In Javascript, `Map` and `Set` keys are compared using object identity unless
the key is an immutable type (meaning a string, a number, a bigint, a boolean,
@ -289,6 +292,8 @@ compared using deep equality. If a key is encountered in a `dict` or `set` that
would have different semantics in Javascript than in Python, then a
`ConversionError` will be thrown.
See {ref}`buffer_tojs` for the behavior of `toJs` on buffers.
`````{admonition} Memory Leaks and toJs
:class: warning
@ -376,16 +381,37 @@ import numpy as np
numpy_array = np.asarray(array)
```
(buffer_tojs)=
### Converting Python Buffer objects to Javascript
Python objects supporting the [Python Buffer
protocol](https://docs.python.org/3/c-api/buffer.html) are proxied into
Javascript. The data inside the buffer can be accessed via the {any}`PyProxy.toJs` method or
the {any}`PyProxy.getBuffer` method. The `toJs` API copies the buffer into Javascript,
whereas the `getBuffer` method allows low level access to the WASM memory
backing the buffer. The `getBuffer` API is more powerful but requires care to
use correctly. For simple use cases the `toJs` API should be prefered.
A PyProxy of any Python object supporting the
[Python Buffer protocol](https://docs.python.org/3/c-api/buffer.html) will have
a method called {any}`getBuffer <PyProxy.getBuffer>`. This can be used to retrieve a reference to a
If the buffer is zero or one-dimensional, then `toJs` will in most cases convert
it to a single `TypedArray`. However, in the case that the format of the buffer
is `'s'`, we will convert the buffer to a string and if the format is `'?'` we will
convert it to an Array of booleans.
If the dimension is greater than one, we will convert it to a nested Javascript
array, with the innermost dimension handled in the same way we would handle a 1d array.
An example of a case where you would not want to use the `toJs` method is when
the buffer is bitmapped image data. If for instance you have a 3d buffer shaped
1920 x 1080 x 4, then `toJs` will be extremely slow. In this case you could use
{any}`PyProxy.getBuffer`. On the other hand, if you have a 3d buffer shaped 1920
x 4 x 1080, the performance of `toJs` will most likely be satisfactory.
Typically the innermost dimension won't matter for performance.
The {any}`PyProxy.getBuffer` method can be used to retrieve a reference to a
Javascript typed array that points to the data backing the Python object,
combined with other metadata about the buffer format. The metadata is suitable
for use with a Javascript ndarray library if one is present. For instance, if
you load the Javascript [ndarray](https://github.com/scijs/ndarray)
package, you can do:
you load the Javascript [ndarray](https://github.com/scijs/ndarray) package, you
can do:
```js
let proxy = pyodide.globals.get("some_numpy_ndarray");
let buffer = proxy.getBuffer();

View File

@ -40,8 +40,23 @@ def test_typed_arrays(selenium):
)
def test_python2js_numpy_dtype(selenium_standalone):
selenium = selenium_standalone
@pytest.mark.parametrize("order", ("C", "F"))
@pytest.mark.parametrize(
"dtype",
(
"int8",
"uint8",
"int16",
"uint16",
"int32",
"uint32",
"int64",
"uint64",
"float32",
"float64",
),
)
def test_python2js_numpy_dtype(selenium, order, dtype):
selenium.load_package("numpy")
selenium.run("import numpy as np")
@ -56,59 +71,37 @@ def test_python2js_numpy_dtype(selenium_standalone):
for k in range(2):
assert (
selenium.run_js(
f"return pyodide.globals.get('x').toJs()[{i}][{j}][{k}]"
f"return Number(pyodide.globals.get('x').toJs()[{i}][{j}][{k}])"
)
== expected_result[i][j][k]
)
for order in ("C", "F"):
for dtype in (
"int8",
"uint8",
"int16",
"uint16",
"int32",
"uint32",
"int64",
"uint64",
"float32",
"float64",
):
selenium.run(
f"""
x = np.arange(8, dtype=np.{dtype})
x = x.reshape((2, 2, 2))
x = x.copy({order!r})
"""
)
assert_equal()
classname = selenium.run_js(
"return pyodide.globals.get('x').toJs()[0][0].constructor.name"
)
if order == "C" and dtype not in ("uint64", "int64"):
# Here we expect a TypedArray subclass, such as Uint8Array, but
# not a plain-old Array
assert classname.endswith("Array")
assert classname != "Array"
else:
assert classname == "Array"
selenium.run(
"""
x = x.byteswap().newbyteorder()
"""
)
assert_equal()
classname = selenium.run_js(
"return pyodide.globals.get('x').toJs()[0][0].constructor.name"
)
if order == "C" and dtype in ("int8", "uint8"):
# Here we expect a TypedArray subclass, such as Uint8Array, but
# not a plain-old Array -- but only for single byte types where
# endianness doesn't matter
assert classname.endswith("Array")
assert classname != "Array"
else:
assert classname == "Array"
selenium.run(
f"""
x = np.arange(8, dtype=np.{dtype})
x = x.reshape((2, 2, 2))
x = x.copy({order!r})
"""
)
assert_equal()
classname = selenium.run_js(
"return pyodide.globals.get('x').toJs()[0][0].constructor.name"
)
# We expect a TypedArray subclass, such as Uint8Array, but not a plain-old
# Array
assert classname.endswith("Array")
assert classname != "Array"
selenium.run(
"""
x = x.byteswap().newbyteorder()
"""
)
assert_equal()
classname = selenium.run_js(
"return pyodide.globals.get('x').toJs()[0][0].constructor.name"
)
assert classname.endswith("Array")
assert classname != "Array"
assert selenium.run("np.array([True, False])") == [True, False]
@ -126,13 +119,9 @@ def test_py2js_buffer_clear_error_flag(selenium):
)
def test_python2js_numpy_scalar(selenium_standalone):
selenium = selenium_standalone
selenium.load_package("numpy")
selenium.run("import numpy as np")
for dtype in (
@pytest.mark.parametrize(
"dtype",
(
"int8",
"uint8",
"int16",
@ -143,33 +132,38 @@ def test_python2js_numpy_scalar(selenium_standalone):
"uint64",
"float32",
"float64",
):
selenium.run(
f"""
x = np.{dtype}(1)
),
)
def test_python2js_numpy_scalar(selenium, dtype):
selenium.load_package("numpy")
selenium.run("import numpy as np")
selenium.run(
f"""
x = np.{dtype}(1)
"""
)
assert (
selenium.run_js(
"""
return pyodide.globals.get('x') == 1
"""
)
assert (
selenium.run_js(
"""
return pyodide.globals.get('x') == 1
is True
)
selenium.run(
"""
x = x.byteswap().newbyteorder()
"""
)
assert (
selenium.run_js(
"""
)
is True
)
selenium.run(
"""
x = x.byteswap().newbyteorder()
"""
)
assert (
selenium.run_js(
"""
return pyodide.globals.get('x') == 1
"""
)
is True
return pyodide.globals.get('x') == 1
"""
)
is True
)
def test_runpythonasync_numpy(selenium_standalone):

View File

@ -149,115 +149,6 @@ EM_JS_NUM(int, hiwire_init, (), {
} else {
Module.BigInt = Number;
}
/**
* Determine type and endianness of data from format. This is a helper
* function for converting buffers from Python to Javascript, used in
* PyProxyBufferMethods and in `toJs` on a buffer.
*
* To understand this function it will be helpful to look at the tables here:
* https://docs.python.org/3/library/struct.html#format-strings
*
* @arg format {String} A Python format string (caller must convert it to a
* Javascript string).
* @arg errorMessage {String} Extra stuff to append to an error message if
* thrown. Should be a complete sentence.
* @returns A pair, an appropriate TypedArray constructor and a boolean which
* is true if the format suggests a big endian array.
* @private
*/
Module.processBufferFormatString = function(formatStr, errorMessage = "")
{
if (formatStr.length > 2) {
throw new Error(
"Expected format string to have length <= 2, " +
`got '${formatStr}'.` + errorMessage);
}
let formatChar = formatStr.slice(-1);
let alignChar = formatStr.slice(0, -1);
let bigEndian;
switch (alignChar) {
case "!":
case ">":
bigEndian = true;
break;
case "<":
case "@":
case "=":
case "":
bigEndian = false;
break;
default:
throw new Error(`Unrecognized alignment character ${ alignChar }.` +
errorMessage);
}
let arrayType;
switch (formatChar) {
case 'b':
arrayType = Int8Array;
break;
case 's':
case 'p':
case 'c':
case 'B':
case '?':
arrayType = Uint8Array;
break;
case 'h':
arrayType = Int16Array;
break;
case 'H':
arrayType = Uint16Array;
break;
case 'i':
case 'l':
case 'n':
arrayType = Int32Array;
break;
case 'I':
case 'L':
case 'N':
case 'P':
arrayType = Uint32Array;
break;
case 'q':
// clang-format off
if (globalThis.BigInt64Array === undefined) {
// clang-format on
throw new Error("BigInt64Array is not supported on this browser." +
errorMessage);
}
arrayType = BigInt64Array;
break;
case 'Q':
// clang-format off
if (globalThis.BigUint64Array === undefined) {
// clang-format on
throw new Error("BigUint64Array is not supported on this browser." +
errorMessage);
}
arrayType = BigUint64Array;
break;
case 'f':
arrayType = Float32Array;
break;
case 'd':
arrayType = Float64Array;
break;
case "e":
// clang-format off
throw new Error(
"Javascript has no Float16 support. Consider converting the data to " +
"float32 before using it from JavaScript. If you are using a webgl " +
"float16 texture then just use `getBuffer('u8')`.");
// clang-format on
default:
throw new Error(`Unrecognized format character '${formatChar}'.` +
errorMessage);
}
return [ arrayType, bigEndian ];
};
return 0;
});
@ -335,51 +226,6 @@ EM_JS_REF(JsRef, hiwire_string_ascii, (const char* ptr), {
return Module.hiwire.new_value(AsciiToString(ptr));
});
EM_JS_REF(JsRef, hiwire_bytes, (char* ptr, int len), {
let bytes = new Uint8ClampedArray(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(bytes);
});
EM_JS_REF(JsRef, hiwire_int8array, (i8 * ptr, int len), {
let array = new Int8Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_uint8array, (u8 * ptr, int len), {
let array = new Uint8Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_int16array, (i16 * ptr, int len), {
let array = new Int16Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_uint16array, (u16 * ptr, int len), {
let array = new Uint16Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_int32array, (i32 * ptr, int len), {
let array = new Int32Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_uint32array, (u32 * ptr, int len), {
let array = new Uint32Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_float32array, (f32 * ptr, int len), {
let array = new Float32Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS_REF(JsRef, hiwire_float64array, (f64 * ptr, int len), {
let array = new Float64Array(Module.HEAPU8.buffer, ptr, len);
return Module.hiwire.new_value(array);
})
EM_JS(void _Py_NO_RETURN, hiwire_throw_error, (JsRef iderr), {
throw Module.hiwire.pop_value(iderr);
});

View File

@ -143,105 +143,6 @@ hiwire_string_utf8(const char* ptr);
JsRef
hiwire_string_ascii(const char* ptr);
/**
* Create a new Javascript Uint8ClampedArray, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_bytes(char* ptr, int len);
/**
* Create a new Javascript Int8Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_int8array(i8* ptr, int len);
/**
* Create a new Javascript Uint8Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_uint8array(u8* ptr, int len);
/**
* Create a new Javascript Int16Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_int16array(i16* ptr, int len);
/**
* Create a new Javascript Uint16Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_uint16array(u16* ptr, int len);
/**
* Create a new Javascript Int32Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_int32array(i32* ptr, int len);
/**
* Create a new Javascript Uint32Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_uint32array(u32* ptr, int len);
/**
* Create a new Javascript Float32Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_float32array(f32* ptr, int len);
/**
* Create a new Javascript Float64Array, given a pointer to a buffer and a
* length, in bytes.
*
* The array's data is not copied.
*
* Returns: New reference
*/
JsRef
hiwire_float64array(f64* ptr, int len);
/**
* Create a new Javascript boolean value.
* Return value is true if boolean != 0, false if boolean == 0.

View File

@ -0,0 +1,19 @@
// The point is to make a file that works with Javascript analysis tools like
// JsDoc and LGTM. They want to parse the file as Javascript. Thus, it's key
// that included js files should parse as valid Javascript. `JS_FILE` is a
// specially designed macro to allow us to do this. We need to look like a
// function call to Javascript parsers. The easiest way to get it to parse is to
// make the macro argument look like a Javascript anonymous function, which we
// do with `()=>{`. However, `()=>{` is an invalid C string so the macro needs
// to remove it. We put `()=>{0,0;`, JS_FILE removes everything up to
// the comma and replace it with a single open brace.
//
#define UNPAIRED_OPEN_BRACE {
#define UNPAIRED_CLOSE_BRACE } // Just here to help text editors pair braces up
#define JS_FILE(func_name, a, args...) \
EM_JS_NUM(int, func_name, (), UNPAIRED_OPEN_BRACE { args return 0; })
// A macro to allow us to add code that is only intended to influence JsDoc
// output, but shouldn't end up in generated code.
#define FOR_JSDOC_ONLY(x)

View File

@ -145,7 +145,7 @@ JsProxy_GetAttr(PyObject* self, PyObject* attr)
if (strcmp(key, "keys") == 0 && hiwire_is_array(JsProxy_REF(self))) {
// Sometimes Python APIs test for the existence of a "keys" function
// to decide whether something should be treated like a dict.
// This mixes badly with the javascript Array.keys api, so pretend that it
// This mixes badly with the javascript Array.keys API, so pretend that it
// doesn't exist. (Array.keys isn't very useful anyways so hopefully this
// won't confuse too many people...)
PyErr_SetString(PyExc_AttributeError, key);

View File

@ -12,6 +12,7 @@
#include "keyboard_interrupt.h"
#include "pyproxy.h"
#include "python2js.h"
#include "python2js_buffer.h"
#define FATAL_ERROR(args...) \
do { \
@ -124,6 +125,7 @@ main(int argc, char** argv)
TRY_INIT(hiwire);
TRY_INIT(docstring);
TRY_INIT(js2python);
TRY_INIT(python2js_buffer);
TRY_INIT_WITH_CORE_MODULE(JsProxy);
TRY_INIT_WITH_CORE_MODULE(pyproxy);
TRY_INIT(keyboard_interrupt);

View File

@ -949,23 +949,9 @@ static PyMethodDef pyproxy_methods[] = {
{ NULL } /* Sentinel */
};
// Some special helper macros to hack it so that "pyproxy.js" parses as a
// javascript file for JsDoc. See comment with explanation there.
#define UNPAIRED_OPEN_BRACE {
#define UNPAIRED_CLOSE_BRACE } // Just here to help text editors pair braces up
#define TEMP_EMJS_HELPER(a, args...) \
EM_JS_NUM(int, pyproxy_init_js, (), UNPAIRED_OPEN_BRACE { args return 0; })
// A macro to allow us to add code that is only intended to influence JsDoc
// output, but shouldn't end up in generated code.
#define FOR_JSDOC_ONLY(x)
#include "include_js_file.h"
#include "pyproxy.js"
#undef TEMP_EMJS_HELPER
#undef UNPAIRED_OPEN_BRACE
#undef UNPAIRED_CLOSE_BRACE
int
pyproxy_init(PyObject* core)
{

View File

@ -1,21 +1,7 @@
// This file to be included from pyproxy.c
//
// The point is to make a file that works with JsDoc. JsDoc will give up if it
// fails to parse the file as javascript. Thus, it's key that this file should
// parse as valid javascript. `TEMP_EMJS_HELPER` is a specially designed macro
// to allow us to do this. We need TEMP_EMJS_HELPER to parse like a javascript
// function call. The easiest way to get it to parse is to make the "argument"
// look like a function call, which we do with `()=>{`. However, `()=>{` is an
// invalid C string so the macro needs to remove it. We put `()=>{0,`,
// TEMP_EMJS_HELPER removes everything up to the comma and replace it with a
// single open brace.
//
// See definition of TEMP_EMJS_HELPER:
// #define TEMP_EMJS_HELPER(a, args...) \
// EM_JS(int, pyproxy_init, (), UNPAIRED_OPEN_BRACE { args return 0; })
// This uses the JS_FILE macro defined in include_js_file.h
// clang-format off
TEMP_EMJS_HELPER(() => {0, /* Magic, see comment */
JS_FILE(pyproxy_init_js, () => {0,0; /* Magic, see include_js_file.h */
Module.PyProxies = {};
// clang-format on

View File

@ -82,19 +82,6 @@ _python2js_unicode(PyObject* x)
}
}
// TODO: Should we use this in explicit conversions?
static JsRef
_python2js_bytes(PyObject* x)
{
char* x_buff;
Py_ssize_t length;
if (PyBytes_AsStringAndSize(x, &x_buff, &length)) {
return NULL;
}
return (JsRef)EM_ASM_INT(
{ return Module.hiwire.new_value(HEAP8.slice($0, $0 + $1)) }, x, length);
}
///////////////////////////////////////////////////////////////////////////////
//
// Container Converters

View File

@ -1,6 +1,7 @@
#define PY_SSIZE_T_CLEAN
#include "Python.h"
#include "error_handling.h"
#include "python2js_buffer.h"
#include "types.h"
@ -11,487 +12,67 @@
// This file handles the conversion of Python buffer objects (which loosely
// represent Numpy arrays) to Javascript.
// Converts everything to nested Javascript arrays, where the scalars are
// standard Javascript numbers (python2js_buffer_recursive)
// There are two methods here:
// 1. Converts everything to nested Javascript arrays, where the scalars are
// standard Javascript numbers (python2js_buffer_recursive)
// 2. Converts everything to nested arrays, where the last contiguous
// dimension is a subarray of a TypedArray that points to the original bytes
// on the WebAssembly (Python) side. This is much faster since it doesn't
// require copying the data, and the data is shared. In the case of a
// one-dimensional array, the result is simply a TypedArray. Unfortunately,
// this requires that the source array is C-contiguous and in native (little)
// endian order. (python2js_shareable_buffer_recursive)
// Unfortunately, this also means that there are different semantics: sometimes
// the array is a copy, and other times it is a shared reference. One should
// write code that doesn't rely on either behavior, but treats this simply as
// the performance optimization that it is.
typedef JsRef(scalar_converter)(char*);
static JsRef
_convert_bool(char* data)
{
char v = *((char*)data);
return hiwire_bool((int)v);
}
static JsRef
_convert_int8(char* data)
{
i8 v = *((i8*)data);
return hiwire_int(v);
}
static JsRef
_convert_uint8(char* data)
{
u8 v = *((u8*)data);
return hiwire_int(v);
}
static JsRef
_convert_int16(char* data)
{
i16 v = *((i16*)data);
return hiwire_int(v);
}
static JsRef
_convert_int16_swap(char* data)
{
i16 v = *((i16*)data);
return hiwire_int(be16toh(v));
}
static JsRef
_convert_uint16(char* data)
{
u16 v = *((u16*)data);
return hiwire_int(v);
}
static JsRef
_convert_uint16_swap(char* data)
{
u16 v = *((u16*)data);
return hiwire_int(be16toh(v));
}
static JsRef
_convert_int32(char* data)
{
i32 v = *((i32*)data);
return hiwire_int(v);
}
static JsRef
_convert_int32_swap(char* data)
{
i32 v = *((i32*)data);
return hiwire_int(be32toh(v));
}
static JsRef
_convert_uint32(char* data)
{
u32 v = *((u32*)data);
return hiwire_int(v);
}
static JsRef
_convert_uint32_swap(char* data)
{
u32 v = *((u32*)data);
return hiwire_int(be32toh(v));
}
static JsRef
_convert_int64(char* data)
{
i64 v = *((i64*)data);
return hiwire_int(v);
}
static JsRef
_convert_int64_swap(char* data)
{
i64 v = *((i64*)data);
return hiwire_int(be64toh(v));
}
static JsRef
_convert_uint64(char* data)
{
u64 v = *((u64*)data);
return hiwire_int(v);
}
static JsRef
_convert_uint64_swap(char* data)
{
u64 v = *((u64*)data);
return hiwire_int(be64toh(v));
}
static JsRef
_convert_float32(char* data)
{
float v = *((float*)data);
return hiwire_double(v);
}
static JsRef
_convert_float32_swap(char* data)
{
union float32_t
{
u32 i;
float f;
} v;
v.f = *((float*)data);
v.i = be32toh(v.i);
return hiwire_double(v.f);
}
static JsRef
_convert_float64(char* data)
{
double v = *((double*)data);
return hiwire_double(v);
}
static JsRef
_convert_float64_swap(char* data)
{
union float64_t
{
u64 i;
double f;
} v;
v.f = *((double*)data);
v.i = be64toh(v.i);
return hiwire_double(v.f);
}
static scalar_converter*
_python2js_buffer_get_converter(Py_buffer* buff)
{
// Uses Python's struct typecodes as defined here:
// https://docs.python.org/3.8/library/array.html
char format;
char swap;
if (buff->format == NULL) {
swap = 0;
format = 'B';
} else {
switch (buff->format[0]) {
case '>':
case '!':
swap = 1;
format = buff->format[1];
break;
case '=':
case '<':
case '@':
swap = 0;
format = buff->format[1];
break;
default:
swap = 0;
format = buff->format[0];
}
}
switch (format) {
case 'c':
case 'b':
return _convert_int8;
case 'B':
return _convert_uint8;
case '?':
return _convert_bool;
case 'h':
if (swap) {
return _convert_int16_swap;
} else {
return _convert_int16;
}
case 'H':
if (swap) {
return _convert_uint16_swap;
} else {
return _convert_uint16;
}
case 'i':
case 'l':
case 'n':
if (swap) {
return _convert_int32_swap;
} else {
return _convert_int32;
}
case 'I':
case 'L':
case 'N':
if (swap) {
return _convert_uint32_swap;
} else {
return _convert_uint32;
}
case 'q':
if (swap) {
return _convert_int64_swap;
} else {
return _convert_int64;
}
case 'Q':
if (swap) {
return _convert_uint64_swap;
} else {
return _convert_uint64;
}
case 'f':
if (swap) {
return _convert_float32_swap;
} else {
return _convert_float32;
}
case 'd':
if (swap) {
return _convert_float64_swap;
} else {
return _convert_float64;
}
default:
return NULL;
}
}
static JsRef
_python2js_buffer_recursive(Py_buffer* buff,
char* ptr,
int dim,
scalar_converter* convert)
{
// This function is basically a manual conversion of `recursive_tolist` in
// Numpy to use the Python buffer interface and output Javascript.
Py_ssize_t i, n, stride;
JsRef jsarray, jsitem;
if (dim >= buff->ndim) {
return convert(ptr);
}
n = buff->shape[dim];
stride = buff->strides[dim];
jsarray = hiwire_array();
for (i = 0; i < n; ++i) {
jsitem = _python2js_buffer_recursive(buff, ptr, dim + 1, convert);
if (jsitem == NULL) {
hiwire_decref(jsarray);
return NULL;
}
hiwire_push_array(jsarray, jsitem);
hiwire_decref(jsitem);
ptr += stride;
}
return jsarray;
}
static JsRef
_python2js_buffer_to_typed_array(Py_buffer* buff)
{
// Uses Python's struct typecodes as defined here:
// https://docs.python.org/3.8/library/array.html
char format;
if (buff->format == NULL) {
format = 'B';
} else {
switch (buff->format[0]) {
case '>':
case '!':
// This path can't handle byte-swapping
return NULL;
case '=':
case '<':
case '@':
format = buff->format[1];
break;
default:
format = buff->format[0];
}
}
Py_ssize_t len = buff->len / buff->itemsize;
switch (format) {
case 'c':
case 'b':
return hiwire_int8array((i8*)buff->buf, len);
case 'B':
return hiwire_uint8array((u8*)buff->buf, len);
case '?':
return NULL;
case 'h':
return hiwire_int16array((i16*)buff->buf, len);
case 'H':
return hiwire_uint16array((u16*)buff->buf, len);
case 'i':
case 'l':
case 'n':
return hiwire_int32array((i32*)buff->buf, len);
case 'I':
case 'L':
case 'N':
return hiwire_uint32array((u32*)buff->buf, len);
case 'q':
case 'Q':
return NULL;
case 'f':
return hiwire_float32array((f32*)buff->buf, len);
case 'd':
return hiwire_float64array((f64*)buff->buf, len);
default:
return NULL;
}
}
enum shareable_enum
{
NOT_SHAREABLE,
CONTIGUOUS,
NOT_CONTIGUOUS
};
static JsRef
_python2js_shareable_buffer_recursive(Py_buffer* buff,
enum shareable_enum shareable,
JsRef idarr,
int ptr,
int dim)
{
Py_ssize_t i, n, stride;
JsRef jsarray, jsitem;
switch (shareable) {
case NOT_CONTIGUOUS:
if (dim >= buff->ndim) {
// The last dimension isn't contiguous, so we need to output one-by-one
return hiwire_get_member_int(idarr, ptr / buff->itemsize);
}
break;
case CONTIGUOUS:
if (dim == buff->ndim - 1) {
// The last dimension is contiguous, so we can output a whole row at a
// time
return hiwire_subarray(
idarr, ptr / buff->itemsize, ptr / buff->itemsize + buff->shape[dim]);
}
break;
default:
break;
}
n = buff->shape[dim];
stride = buff->strides[dim];
jsarray = hiwire_array();
for (i = 0; i < n; ++i) {
jsitem = _python2js_shareable_buffer_recursive(
buff, shareable, idarr, ptr, dim + 1);
if (jsitem == NULL) {
hiwire_decref(jsarray);
return NULL;
}
hiwire_push_array(jsarray, jsitem);
hiwire_decref(jsitem);
ptr += stride;
}
return jsarray;
}
static enum shareable_enum
_python2js_buffer_is_shareable(Py_buffer* buff)
{
if (buff->ndim == 0) {
return NOT_SHAREABLE;
}
char* invalid_codes = ">!qQ?";
for (char* i = buff->format; *i != 0; ++i) {
for (char* j = invalid_codes; *j != 0; ++j) {
if (*i == *j) {
return NOT_SHAREABLE;
}
}
}
for (int i = 0; i < buff->ndim; ++i) {
if (buff->strides[i] <= 0) {
return NOT_SHAREABLE;
}
}
if (buff->itemsize != buff->strides[buff->ndim - 1]) {
return NOT_CONTIGUOUS;
}
// We can use the most efficient method
return CONTIGUOUS;
}
// clang-format off
/**
* A simple helper function that puts the arguments into a Javascript object
* (for readability) and looks up the conversion function, then calls into
* python2js_buffer_recursive.
*/
EM_JS_REF(JsRef, _python2js_buffer_inner, (
void* buf,
Py_ssize_t itemsize,
int ndim,
char* format,
Py_ssize_t* shape,
Py_ssize_t* strides,
Py_ssize_t* suboffsets
), {
// get_converter and _python2js_buffer_recursive defined in python2js_buffer.js
let converter = Module.get_converter(format, itemsize);
let result = Module._python2js_buffer_recursive(buf, 0, {
ndim,
format,
itemsize,
shape,
strides,
suboffsets,
converter,
});
return Module.hiwire.new_value(result);
});
// clang-format on
/**
* Convert a buffer. To get the data out of the Py_buffer without relying on the
* exact memory layout of Py_buffer, we need to do this in C. After pulling the
* data out we call into the EM_JS helper _python2js_buffer_inner, which sets up
* the base case for the recursion and then calls the main js function
* _python2js_buffer_recursive (defined in python2js_buffer.js).
*/
JsRef
_python2js_buffer(PyObject* x)
{
PyObject* memoryview = PyMemoryView_FromObject(x);
if (memoryview == NULL) {
Py_buffer view;
if (PyObject_GetBuffer(x, &view, PyBUF_FULL_RO) == -1) {
return NULL;
}
Py_buffer* buff;
buff = PyMemoryView_GET_BUFFER(memoryview);
enum shareable_enum shareable = _python2js_buffer_is_shareable(buff);
JsRef result;
if (shareable != NOT_SHAREABLE) {
JsRef idarr = _python2js_buffer_to_typed_array(buff);
if (idarr == NULL) {
PyErr_SetString(
PyExc_TypeError,
"Internal error: Invalid type to convert to array buffer.");
return NULL;
}
result =
_python2js_shareable_buffer_recursive(buff, shareable, idarr, 0, 0);
hiwire_decref(idarr);
} else {
scalar_converter* convert = _python2js_buffer_get_converter(buff);
if (convert == NULL) {
Py_DECREF(memoryview);
return NULL;
}
result = _python2js_buffer_recursive(buff, buff->buf, 0, convert);
}
Py_DECREF(memoryview);
// clang-format off
JsRef result = _python2js_buffer_inner(
view.buf,
view.itemsize,
view.ndim,
view.format,
view.shape,
view.strides,
view.suboffsets
);
// clang-format on
PyBuffer_Release(&view);
return result;
}
#include "include_js_file.h"
#include "python2js_buffer.js"

View File

@ -18,4 +18,7 @@
JsRef
_python2js_buffer(PyObject* x);
errcode
python2js_buffer_init();
#endif /* PYTHON2JS_BUFFER_H */

View File

@ -0,0 +1,281 @@
JS_FILE(python2js_buffer_init, () => {
0, 0; /* Magic, see include_js_file.h */
/**
* Determine type and endianness of data from format. This is a helper
* function for converting buffers from Python to Javascript, used in
* PyProxyBufferMethods and in `toJs` on a buffer.
*
* To understand this function it will be helpful to look at the tables here:
* https://docs.python.org/3/library/struct.html#format-strings
*
* @arg format {String} A Python format string (caller must convert it to a
* Javascript string).
* @arg errorMessage {String} Extra stuff to append to an error message if
* thrown. Should be a complete sentence.
* @returns A pair, an appropriate TypedArray constructor and a boolean which
* is true if the format suggests a big endian array.
* @private
*/
Module.processBufferFormatString = function(formatStr, errorMessage = "") {
if (formatStr.length > 2) {
throw new Error("Expected format string to have length <= 2, " +
`got '${formatStr}'.` + errorMessage);
}
let formatChar = formatStr.slice(-1);
let alignChar = formatStr.slice(0, -1);
let bigEndian;
switch (alignChar) {
case "!":
case ">":
bigEndian = true;
break;
case "<":
case "@":
case "=":
case "":
bigEndian = false;
break;
default:
throw new Error(`Unrecognized alignment character ${alignChar}.` +
errorMessage);
}
let arrayType;
switch (formatChar) {
case 'b':
arrayType = Int8Array;
break;
case 's':
case 'p':
case 'c':
case 'B':
case '?':
arrayType = Uint8Array;
break;
case 'h':
arrayType = Int16Array;
break;
case 'H':
arrayType = Uint16Array;
break;
case 'i':
case 'l':
case 'n':
arrayType = Int32Array;
break;
case 'I':
case 'L':
case 'N':
case 'P':
arrayType = Uint32Array;
break;
case 'q':
// clang-format off
if (globalThis.BigInt64Array === undefined) {
// clang-format on
throw new Error("BigInt64Array is not supported on this browser." +
errorMessage);
}
arrayType = BigInt64Array;
break;
case 'Q':
// clang-format off
if (globalThis.BigUint64Array === undefined) {
// clang-format on
throw new Error("BigUint64Array is not supported on this browser." +
errorMessage);
}
arrayType = BigUint64Array;
break;
case 'f':
arrayType = Float32Array;
break;
case 'd':
arrayType = Float64Array;
break;
case "e":
throw new Error("Javascript has no Float16 support.");
default:
throw new Error(`Unrecognized format character '${formatChar}'.` +
errorMessage);
}
return [ arrayType, bigEndian ];
};
/**
* Convert a 1-dimensional contiguous buffer to Javascript.
*
* In this case we can just slice the memory out of the wasm HEAP.
* @param {number} ptr A pointer to the start of the buffer in wasm memory
* @param {number} stride The size of the entries in bytes
* @param {number} n The number of entries
* @returns A new ArrayBuffer with the appropriate data in it (not a view of
* the WASM heap)
* @private
*/
Module.python2js_buffer_1d_contiguous = function(ptr, stride, n) {
"use strict";
let byteLength = stride * n;
// Note: slice here is a copy (as opposed to subarray which is not)
return HEAP8.slice(ptr, ptr + byteLength).buffer;
};
/**
* Convert a 1d noncontiguous buffer to Javascript.
*
* Since the buffer is not contiguous we have to copy it in chunks.
* @param {number} ptr The WAM memory pointer to the start of the buffer.
* @param {number} stride The stride in bytes between each entry.
* @param {number} suboffset The suboffset from the Python Buffer protocol.
* Negative if no suboffsets. (see
* https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.suboffsets)
* @param {number} n The number of entries.
* @param {number} itemsize The size in bytes of each entry.
* @returns A new ArrayBuffer with the appropriate data in it (not a view of
* the WASM heap)
* @private
*/
Module.python2js_buffer_1d_noncontiguous = function(ptr, stride, suboffset, n,
itemsize) {
"use strict";
let byteLength = itemsize * n;
// Make new memory of the appropriate size
let buffer = new Uint8Array(byteLength);
for (i = 0; i < n; ++i) {
let curptr = ptr + i * stride;
if (suboffset >= 0) {
curptr = HEAP32[curptr / 4] + suboffset;
}
buffer.set(HEAP8.subarray(curptr, curptr + itemsize), i * itemsize);
}
return buffer.buffer;
};
/**
* Convert an ndarray to a nested Javascript array, the main function.
*
* This is called by _python2js_buffer_inner (defined in python2js_buffer.c).
* There are two layers of setup that need to be done to get the base case of
* the recursion right.
*
* The last dimension of the array is handled by the appropriate 1d array
* converter: python2js_buffer_1d_contiguous or
* python2js_buffer_1d_noncontiguous.
*
* @param {number} ptr The pointer into the buffer
* @param {number} curdim What dimension are we currently working on? 0 <=
* curdim < ndim.
* @param {number} bufferData All of the data out of the Py_buffer, plus the
* converter function: ndim, format, itemsize, shape (a ptr), strides (a ptr),
* suboffsets (a ptr), converter,
* @returns A nested Javascript array, the result of the conversion.
* @private
*/
Module._python2js_buffer_recursive = function(ptr, curdim, bufferData) {
"use strict";
// When indexing HEAP32 we need to divide the pointer by 4
let n = HEAP32[bufferData.shape / 4 + curdim];
let stride = HEAP32[bufferData.strides / 4 + curdim];
let suboffset = -1;
// clang-format off
if (bufferData.suboffsets !== 0) {
suboffset = HEAP32[bufferData.suboffsets / 4 + curdim];
}
if (curdim === bufferData.ndim - 1) {
// Last dimension, use appropriate 1d converter
let arraybuffer;
if (stride === bufferData.itemsize && suboffset < 0) {
arraybuffer = Module.python2js_buffer_1d_contiguous(ptr, stride, n);
} else {
arraybuffer = Module.python2js_buffer_1d_noncontiguous(
ptr, stride, suboffset, n, bufferData.itemsize);
}
return bufferData.converter(arraybuffer);
}
// clang-format on
let result = [];
for (let i = 0; i < n; ++i) {
// See:
// https://docs.python.org/3/c-api/buffer.html#pil-style-shape-strides-and-suboffsets
let curPtr = ptr + i * stride;
if (suboffset >= 0) {
curptr = HEAP32[curptr / 4] + suboffset;
}
result.push(
Module._python2js_buffer_recursive(curPtr, curdim + 1, bufferData));
}
return result;
};
/**
* Get the appropriate converter function.
*
* The converter function takes an ArrayBuffer and returns an appropriate
* TypedArray. If the buffer is big endian, the converter will convert the
* data to little endian.
*
* The converter function does something special if the format character is
* "?" or "s". If it's "?" we return an array of booleans, if it's "s" we
* return a string.
*
* @param {string} format The format character of the buffer.
* @param {number} itemsize Should be one of 1, 2, 4, 8. Used for big endian
* conversion.
* @returns A converter function ArrayBuffer => TypedArray
*/
Module.get_converter = function(format, itemsize) {
"use strict";
let formatStr = UTF8ToString(format);
let [ArrayType, bigEndian] = Module.processBufferFormatString(formatStr);
let formatChar = formatStr.slice(-1);
// clang-format off
switch (formatChar) {
case "s":
let decoder = new TextDecoder("utf8");
return (buff) => decoder.decode(buff);
case "?":
return (buff) => Array.from(new Uint8Array(buff), x => !!x);
}
// clang-format on
if (!bigEndian) {
// clang-format off
return buff => new ArrayType(buff);
// clang-format on
}
let getFuncName;
let setFuncName;
switch (itemsize) {
case 2:
getFuncName = "getUint16";
setFuncName = "setUint16";
break;
case 4:
getFuncName = "getUint32";
setFuncName = "setUint32";
break;
case 8:
getFuncName = "getFloat64";
setFuncName = "setFloat64";
break;
default:
// clang-format off
throw new Error(`Unexpected size ${ itemsize }`);
// clang-format on
}
function swapFunc(buff) {
let dataview = new DataView(buff);
let getFunc = dataview[getFuncName].bind(dataview);
let setFunc = dataview[setFuncName].bind(dataview);
for (let byte = 0; byte < dataview.byteLength; byte += itemsize) {
// Get value as little endian, set back as big endian.
setFunc(byte, getFunc(byte, true), false);
}
return buff;
}
// clang-format off
return buff => new ArrayType(swapFunc(buff));
// clang-format on
};
});

View File

@ -48,7 +48,7 @@ try:
pass
def then(self, onfulfilled: Callable, onrejected: Callable) -> "Promise":
"""The ``Promise.then`` api, wrapped to manage the lifetimes of the
"""The ``Promise.then`` API, wrapped to manage the lifetimes of the
handlers.
Only available if the wrapped Javascript object has a "then" method.
@ -57,7 +57,7 @@ try:
"""
def catch(self, onrejected: Callable) -> "Promise":
"""The ``Promise.catch`` api, wrapped to manage the lifetimes of the
"""The ``Promise.catch`` API, wrapped to manage the lifetimes of the
handler.
Only available if the wrapped Javascript object has a "then" method.
@ -66,7 +66,7 @@ try:
"""
def finally_(self, onfinally: Callable) -> "Promise":
"""The ``Promise.finally`` api, wrapped to manage the lifetimes of
"""The ``Promise.finally`` API, wrapped to manage the lifetimes of
the handler.
Only available if the wrapped Javascript object has a "then" method.

View File

@ -667,7 +667,7 @@ globalThis.loadPyodide = async function(config = {}) {
* ``name``. This module can then be imported from Python using the standard
* Python import system. If another module by the same name has already been
* imported, this won't have much effect unless you also delete the imported
* module from ``sys.modules``. This calls the ``pyodide_py`` api
* module from ``sys.modules``. This calls the ``pyodide_py`` API
* :func:`pyodide.register_js_module`.
*
* @param {string} name Name of the Javascript module to add
@ -683,7 +683,7 @@ globalThis.loadPyodide = async function(config = {}) {
* :func:`pyodide.register_js_module`. If a Javascript module with that name
* does not already exist, will throw an error. Note that if the module has
* already been imported, this won't have much effect unless you also delete
* the imported module from ``sys.modules``. This calls the ``pyodide_py`` api
* the imported module from ``sys.modules``. This calls the ``pyodide_py`` API
* :func:`pyodide.unregister_js_module`.
*
* @param {string} name Name of the Javascript module to remove