diff --git a/docs/project/changelog.md b/docs/project/changelog.md index 181c84b70..6e0a9f81d 100644 --- a/docs/project/changelog.md +++ b/docs/project/changelog.md @@ -56,6 +56,12 @@ substitutions: - {{ API }} Added `PyProxy.getBuffer` API to allow direct access to Python buffers as Javascript TypedArrays. [1215](https://github.com/pyodide/pyodide/pull/1215) +- {{ API }} The innermost level of a buffer converted to Javascript used to be a + TypedArray if the buffer was contiguous and otherwise an Array. Now the + innermost level will be a TypedArray unless the buffer format code is a '?' in + which case it will be an Array of booleans, or if the format code is a "s" in + which case the innermost level will be converted to a string. + [1376](https://github.com/pyodide/pyodide/pull/1376) - {{ Enhancement }} Javascript `BigInt`s are converted into Python `int` and Python `int`s larger than 2^53 are converted into `BigInt`. [1407](https://github.com/pyodide/pyodide/pull/1407) diff --git a/docs/sphinx_pyodide/sphinx_pyodide/jsdoc.py b/docs/sphinx_pyodide/sphinx_pyodide/jsdoc.py index 5129a46c7..d8b8501cc 100644 --- a/docs/sphinx_pyodide/sphinx_pyodide/jsdoc.py +++ b/docs/sphinx_pyodide/sphinx_pyodide/jsdoc.py @@ -211,7 +211,7 @@ def get_jsdoc_summary_directive(app): The output is designed to be input to format_table. The link name needs to be set up so that :any:`link_name` makes a link to the - actual api docs for this object. + actual API docs for this object. """ sig = self.get_sig(obj) display_name = obj.name diff --git a/docs/usage/type-conversions.md b/docs/usage/type-conversions.md index f28a8512b..9a5466442 100644 --- a/docs/usage/type-conversions.md +++ b/docs/usage/type-conversions.md @@ -276,11 +276,14 @@ the {any}`PyProxy.toJs` method. By default, the `toJs` method does a recursive " conversion, to do a shallow conversion use `proxy.toJs(1)`. The `toJs` method performs the following explicit conversions: -| Python | Javascript | -|------------------|---------------------| -| `list`, `tuple` | `Array` | -| `dict` | `Map` | -| `set` | `Set` | +| Python | Javascript | +|-------------------|---------------------| +| `list`, `tuple` | `Array` | +| `dict` | `Map` | +| `set` | `Set` | +| a buffer* | `TypedArray` | + +* Examples of buffers include bytes objects and numpy arrays. In Javascript, `Map` and `Set` keys are compared using object identity unless the key is an immutable type (meaning a string, a number, a bigint, a boolean, @@ -289,6 +292,8 @@ compared using deep equality. If a key is encountered in a `dict` or `set` that would have different semantics in Javascript than in Python, then a `ConversionError` will be thrown. +See {ref}`buffer_tojs` for the behavior of `toJs` on buffers. + `````{admonition} Memory Leaks and toJs :class: warning @@ -376,16 +381,37 @@ import numpy as np numpy_array = np.asarray(array) ``` +(buffer_tojs)= ### Converting Python Buffer objects to Javascript +Python objects supporting the [Python Buffer +protocol](https://docs.python.org/3/c-api/buffer.html) are proxied into +Javascript. The data inside the buffer can be accessed via the {any}`PyProxy.toJs` method or +the {any}`PyProxy.getBuffer` method. The `toJs` API copies the buffer into Javascript, +whereas the `getBuffer` method allows low level access to the WASM memory +backing the buffer. The `getBuffer` API is more powerful but requires care to +use correctly. For simple use cases the `toJs` API should be prefered. -A PyProxy of any Python object supporting the -[Python Buffer protocol](https://docs.python.org/3/c-api/buffer.html) will have -a method called {any}`getBuffer `. This can be used to retrieve a reference to a +If the buffer is zero or one-dimensional, then `toJs` will in most cases convert +it to a single `TypedArray`. However, in the case that the format of the buffer +is `'s'`, we will convert the buffer to a string and if the format is `'?'` we will +convert it to an Array of booleans. + +If the dimension is greater than one, we will convert it to a nested Javascript +array, with the innermost dimension handled in the same way we would handle a 1d array. + +An example of a case where you would not want to use the `toJs` method is when +the buffer is bitmapped image data. If for instance you have a 3d buffer shaped +1920 x 1080 x 4, then `toJs` will be extremely slow. In this case you could use +{any}`PyProxy.getBuffer`. On the other hand, if you have a 3d buffer shaped 1920 +x 4 x 1080, the performance of `toJs` will most likely be satisfactory. +Typically the innermost dimension won't matter for performance. + +The {any}`PyProxy.getBuffer` method can be used to retrieve a reference to a Javascript typed array that points to the data backing the Python object, combined with other metadata about the buffer format. The metadata is suitable for use with a Javascript ndarray library if one is present. For instance, if -you load the Javascript [ndarray](https://github.com/scijs/ndarray) -package, you can do: +you load the Javascript [ndarray](https://github.com/scijs/ndarray) package, you +can do: ```js let proxy = pyodide.globals.get("some_numpy_ndarray"); let buffer = proxy.getBuffer(); diff --git a/packages/numpy/test_numpy.py b/packages/numpy/test_numpy.py index 7a2f913f4..7ac568f1a 100644 --- a/packages/numpy/test_numpy.py +++ b/packages/numpy/test_numpy.py @@ -40,8 +40,23 @@ def test_typed_arrays(selenium): ) -def test_python2js_numpy_dtype(selenium_standalone): - selenium = selenium_standalone +@pytest.mark.parametrize("order", ("C", "F")) +@pytest.mark.parametrize( + "dtype", + ( + "int8", + "uint8", + "int16", + "uint16", + "int32", + "uint32", + "int64", + "uint64", + "float32", + "float64", + ), +) +def test_python2js_numpy_dtype(selenium, order, dtype): selenium.load_package("numpy") selenium.run("import numpy as np") @@ -56,59 +71,37 @@ def test_python2js_numpy_dtype(selenium_standalone): for k in range(2): assert ( selenium.run_js( - f"return pyodide.globals.get('x').toJs()[{i}][{j}][{k}]" + f"return Number(pyodide.globals.get('x').toJs()[{i}][{j}][{k}])" ) == expected_result[i][j][k] ) - for order in ("C", "F"): - for dtype in ( - "int8", - "uint8", - "int16", - "uint16", - "int32", - "uint32", - "int64", - "uint64", - "float32", - "float64", - ): - selenium.run( - f""" - x = np.arange(8, dtype=np.{dtype}) - x = x.reshape((2, 2, 2)) - x = x.copy({order!r}) - """ - ) - assert_equal() - classname = selenium.run_js( - "return pyodide.globals.get('x').toJs()[0][0].constructor.name" - ) - if order == "C" and dtype not in ("uint64", "int64"): - # Here we expect a TypedArray subclass, such as Uint8Array, but - # not a plain-old Array - assert classname.endswith("Array") - assert classname != "Array" - else: - assert classname == "Array" - selenium.run( - """ - x = x.byteswap().newbyteorder() - """ - ) - assert_equal() - classname = selenium.run_js( - "return pyodide.globals.get('x').toJs()[0][0].constructor.name" - ) - if order == "C" and dtype in ("int8", "uint8"): - # Here we expect a TypedArray subclass, such as Uint8Array, but - # not a plain-old Array -- but only for single byte types where - # endianness doesn't matter - assert classname.endswith("Array") - assert classname != "Array" - else: - assert classname == "Array" + selenium.run( + f""" + x = np.arange(8, dtype=np.{dtype}) + x = x.reshape((2, 2, 2)) + x = x.copy({order!r}) + """ + ) + assert_equal() + classname = selenium.run_js( + "return pyodide.globals.get('x').toJs()[0][0].constructor.name" + ) + # We expect a TypedArray subclass, such as Uint8Array, but not a plain-old + # Array + assert classname.endswith("Array") + assert classname != "Array" + selenium.run( + """ + x = x.byteswap().newbyteorder() + """ + ) + assert_equal() + classname = selenium.run_js( + "return pyodide.globals.get('x').toJs()[0][0].constructor.name" + ) + assert classname.endswith("Array") + assert classname != "Array" assert selenium.run("np.array([True, False])") == [True, False] @@ -126,13 +119,9 @@ def test_py2js_buffer_clear_error_flag(selenium): ) -def test_python2js_numpy_scalar(selenium_standalone): - selenium = selenium_standalone - - selenium.load_package("numpy") - selenium.run("import numpy as np") - - for dtype in ( +@pytest.mark.parametrize( + "dtype", + ( "int8", "uint8", "int16", @@ -143,33 +132,38 @@ def test_python2js_numpy_scalar(selenium_standalone): "uint64", "float32", "float64", - ): - selenium.run( - f""" - x = np.{dtype}(1) + ), +) +def test_python2js_numpy_scalar(selenium, dtype): + + selenium.load_package("numpy") + selenium.run("import numpy as np") + selenium.run( + f""" + x = np.{dtype}(1) + """ + ) + assert ( + selenium.run_js( """ + return pyodide.globals.get('x') == 1 + """ ) - assert ( - selenium.run_js( - """ - return pyodide.globals.get('x') == 1 + is True + ) + selenium.run( + """ + x = x.byteswap().newbyteorder() + """ + ) + assert ( + selenium.run_js( """ - ) - is True - ) - selenium.run( - """ - x = x.byteswap().newbyteorder() - """ - ) - assert ( - selenium.run_js( - """ - return pyodide.globals.get('x') == 1 - """ - ) - is True + return pyodide.globals.get('x') == 1 + """ ) + is True + ) def test_runpythonasync_numpy(selenium_standalone): diff --git a/src/core/hiwire.c b/src/core/hiwire.c index 09714b4ee..26ce47563 100644 --- a/src/core/hiwire.c +++ b/src/core/hiwire.c @@ -149,115 +149,6 @@ EM_JS_NUM(int, hiwire_init, (), { } else { Module.BigInt = Number; } - - /** - * Determine type and endianness of data from format. This is a helper - * function for converting buffers from Python to Javascript, used in - * PyProxyBufferMethods and in `toJs` on a buffer. - * - * To understand this function it will be helpful to look at the tables here: - * https://docs.python.org/3/library/struct.html#format-strings - * - * @arg format {String} A Python format string (caller must convert it to a - * Javascript string). - * @arg errorMessage {String} Extra stuff to append to an error message if - * thrown. Should be a complete sentence. - * @returns A pair, an appropriate TypedArray constructor and a boolean which - * is true if the format suggests a big endian array. - * @private - */ - Module.processBufferFormatString = function(formatStr, errorMessage = "") - { - if (formatStr.length > 2) { - throw new Error( - "Expected format string to have length <= 2, " + - `got '${formatStr}'.` + errorMessage); - } - let formatChar = formatStr.slice(-1); - let alignChar = formatStr.slice(0, -1); - let bigEndian; - switch (alignChar) { - case "!": - case ">": - bigEndian = true; - break; - case "<": - case "@": - case "=": - case "": - bigEndian = false; - break; - default: - throw new Error(`Unrecognized alignment character ${ alignChar }.` + - errorMessage); - } - let arrayType; - switch (formatChar) { - case 'b': - arrayType = Int8Array; - break; - case 's': - case 'p': - case 'c': - case 'B': - case '?': - arrayType = Uint8Array; - break; - case 'h': - arrayType = Int16Array; - break; - case 'H': - arrayType = Uint16Array; - break; - case 'i': - case 'l': - case 'n': - arrayType = Int32Array; - break; - case 'I': - case 'L': - case 'N': - case 'P': - arrayType = Uint32Array; - break; - case 'q': - // clang-format off - if (globalThis.BigInt64Array === undefined) { - // clang-format on - throw new Error("BigInt64Array is not supported on this browser." + - errorMessage); - } - arrayType = BigInt64Array; - break; - case 'Q': - // clang-format off - if (globalThis.BigUint64Array === undefined) { - // clang-format on - throw new Error("BigUint64Array is not supported on this browser." + - errorMessage); - } - arrayType = BigUint64Array; - break; - case 'f': - arrayType = Float32Array; - break; - case 'd': - arrayType = Float64Array; - break; - case "e": - // clang-format off - throw new Error( - "Javascript has no Float16 support. Consider converting the data to " + - "float32 before using it from JavaScript. If you are using a webgl " + - "float16 texture then just use `getBuffer('u8')`."); - // clang-format on - default: - throw new Error(`Unrecognized format character '${formatChar}'.` + - errorMessage); - } - return [ arrayType, bigEndian ]; - }; - return 0; }); @@ -335,51 +226,6 @@ EM_JS_REF(JsRef, hiwire_string_ascii, (const char* ptr), { return Module.hiwire.new_value(AsciiToString(ptr)); }); -EM_JS_REF(JsRef, hiwire_bytes, (char* ptr, int len), { - let bytes = new Uint8ClampedArray(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(bytes); -}); - -EM_JS_REF(JsRef, hiwire_int8array, (i8 * ptr, int len), { - let array = new Int8Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_uint8array, (u8 * ptr, int len), { - let array = new Uint8Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_int16array, (i16 * ptr, int len), { - let array = new Int16Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_uint16array, (u16 * ptr, int len), { - let array = new Uint16Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_int32array, (i32 * ptr, int len), { - let array = new Int32Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_uint32array, (u32 * ptr, int len), { - let array = new Uint32Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_float32array, (f32 * ptr, int len), { - let array = new Float32Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - -EM_JS_REF(JsRef, hiwire_float64array, (f64 * ptr, int len), { - let array = new Float64Array(Module.HEAPU8.buffer, ptr, len); - return Module.hiwire.new_value(array); -}) - EM_JS(void _Py_NO_RETURN, hiwire_throw_error, (JsRef iderr), { throw Module.hiwire.pop_value(iderr); }); diff --git a/src/core/hiwire.h b/src/core/hiwire.h index 0b03c1984..49a8730f7 100644 --- a/src/core/hiwire.h +++ b/src/core/hiwire.h @@ -143,105 +143,6 @@ hiwire_string_utf8(const char* ptr); JsRef hiwire_string_ascii(const char* ptr); -/** - * Create a new Javascript Uint8ClampedArray, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_bytes(char* ptr, int len); - -/** - * Create a new Javascript Int8Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_int8array(i8* ptr, int len); - -/** - * Create a new Javascript Uint8Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_uint8array(u8* ptr, int len); - -/** - * Create a new Javascript Int16Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_int16array(i16* ptr, int len); - -/** - * Create a new Javascript Uint16Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_uint16array(u16* ptr, int len); - -/** - * Create a new Javascript Int32Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_int32array(i32* ptr, int len); - -/** - * Create a new Javascript Uint32Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_uint32array(u32* ptr, int len); - -/** - * Create a new Javascript Float32Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_float32array(f32* ptr, int len); - -/** - * Create a new Javascript Float64Array, given a pointer to a buffer and a - * length, in bytes. - * - * The array's data is not copied. - * - * Returns: New reference - */ -JsRef -hiwire_float64array(f64* ptr, int len); - /** * Create a new Javascript boolean value. * Return value is true if boolean != 0, false if boolean == 0. diff --git a/src/core/include_js_file.h b/src/core/include_js_file.h new file mode 100644 index 000000000..2e6317c2c --- /dev/null +++ b/src/core/include_js_file.h @@ -0,0 +1,19 @@ +// The point is to make a file that works with Javascript analysis tools like +// JsDoc and LGTM. They want to parse the file as Javascript. Thus, it's key +// that included js files should parse as valid Javascript. `JS_FILE` is a +// specially designed macro to allow us to do this. We need to look like a +// function call to Javascript parsers. The easiest way to get it to parse is to +// make the macro argument look like a Javascript anonymous function, which we +// do with `()=>{`. However, `()=>{` is an invalid C string so the macro needs +// to remove it. We put `()=>{0,0;`, JS_FILE removes everything up to +// the comma and replace it with a single open brace. +// + +#define UNPAIRED_OPEN_BRACE { +#define UNPAIRED_CLOSE_BRACE } // Just here to help text editors pair braces up +#define JS_FILE(func_name, a, args...) \ + EM_JS_NUM(int, func_name, (), UNPAIRED_OPEN_BRACE { args return 0; }) + +// A macro to allow us to add code that is only intended to influence JsDoc +// output, but shouldn't end up in generated code. +#define FOR_JSDOC_ONLY(x) diff --git a/src/core/jsproxy.c b/src/core/jsproxy.c index be5dba855..93c1fb584 100644 --- a/src/core/jsproxy.c +++ b/src/core/jsproxy.c @@ -145,7 +145,7 @@ JsProxy_GetAttr(PyObject* self, PyObject* attr) if (strcmp(key, "keys") == 0 && hiwire_is_array(JsProxy_REF(self))) { // Sometimes Python APIs test for the existence of a "keys" function // to decide whether something should be treated like a dict. - // This mixes badly with the javascript Array.keys api, so pretend that it + // This mixes badly with the javascript Array.keys API, so pretend that it // doesn't exist. (Array.keys isn't very useful anyways so hopefully this // won't confuse too many people...) PyErr_SetString(PyExc_AttributeError, key); diff --git a/src/core/main.c b/src/core/main.c index d3e235bbc..32c3e848a 100644 --- a/src/core/main.c +++ b/src/core/main.c @@ -12,6 +12,7 @@ #include "keyboard_interrupt.h" #include "pyproxy.h" #include "python2js.h" +#include "python2js_buffer.h" #define FATAL_ERROR(args...) \ do { \ @@ -124,6 +125,7 @@ main(int argc, char** argv) TRY_INIT(hiwire); TRY_INIT(docstring); TRY_INIT(js2python); + TRY_INIT(python2js_buffer); TRY_INIT_WITH_CORE_MODULE(JsProxy); TRY_INIT_WITH_CORE_MODULE(pyproxy); TRY_INIT(keyboard_interrupt); diff --git a/src/core/pyproxy.c b/src/core/pyproxy.c index af86d869f..3404fc792 100644 --- a/src/core/pyproxy.c +++ b/src/core/pyproxy.c @@ -949,23 +949,9 @@ static PyMethodDef pyproxy_methods[] = { { NULL } /* Sentinel */ }; -// Some special helper macros to hack it so that "pyproxy.js" parses as a -// javascript file for JsDoc. See comment with explanation there. -#define UNPAIRED_OPEN_BRACE { -#define UNPAIRED_CLOSE_BRACE } // Just here to help text editors pair braces up -#define TEMP_EMJS_HELPER(a, args...) \ - EM_JS_NUM(int, pyproxy_init_js, (), UNPAIRED_OPEN_BRACE { args return 0; }) - -// A macro to allow us to add code that is only intended to influence JsDoc -// output, but shouldn't end up in generated code. -#define FOR_JSDOC_ONLY(x) - +#include "include_js_file.h" #include "pyproxy.js" -#undef TEMP_EMJS_HELPER -#undef UNPAIRED_OPEN_BRACE -#undef UNPAIRED_CLOSE_BRACE - int pyproxy_init(PyObject* core) { diff --git a/src/core/pyproxy.js b/src/core/pyproxy.js index a8555e387..1070d2c24 100644 --- a/src/core/pyproxy.js +++ b/src/core/pyproxy.js @@ -1,21 +1,7 @@ // This file to be included from pyproxy.c -// -// The point is to make a file that works with JsDoc. JsDoc will give up if it -// fails to parse the file as javascript. Thus, it's key that this file should -// parse as valid javascript. `TEMP_EMJS_HELPER` is a specially designed macro -// to allow us to do this. We need TEMP_EMJS_HELPER to parse like a javascript -// function call. The easiest way to get it to parse is to make the "argument" -// look like a function call, which we do with `()=>{`. However, `()=>{` is an -// invalid C string so the macro needs to remove it. We put `()=>{0,`, -// TEMP_EMJS_HELPER removes everything up to the comma and replace it with a -// single open brace. -// -// See definition of TEMP_EMJS_HELPER: -// #define TEMP_EMJS_HELPER(a, args...) \ -// EM_JS(int, pyproxy_init, (), UNPAIRED_OPEN_BRACE { args return 0; }) - +// This uses the JS_FILE macro defined in include_js_file.h // clang-format off -TEMP_EMJS_HELPER(() => {0, /* Magic, see comment */ +JS_FILE(pyproxy_init_js, () => {0,0; /* Magic, see include_js_file.h */ Module.PyProxies = {}; // clang-format on diff --git a/src/core/python2js.c b/src/core/python2js.c index c9136ebc8..d1a71fd17 100644 --- a/src/core/python2js.c +++ b/src/core/python2js.c @@ -82,19 +82,6 @@ _python2js_unicode(PyObject* x) } } -// TODO: Should we use this in explicit conversions? -static JsRef -_python2js_bytes(PyObject* x) -{ - char* x_buff; - Py_ssize_t length; - if (PyBytes_AsStringAndSize(x, &x_buff, &length)) { - return NULL; - } - return (JsRef)EM_ASM_INT( - { return Module.hiwire.new_value(HEAP8.slice($0, $0 + $1)) }, x, length); -} - /////////////////////////////////////////////////////////////////////////////// // // Container Converters diff --git a/src/core/python2js_buffer.c b/src/core/python2js_buffer.c index 5ef41bbac..4941e36e6 100644 --- a/src/core/python2js_buffer.c +++ b/src/core/python2js_buffer.c @@ -1,6 +1,7 @@ #define PY_SSIZE_T_CLEAN #include "Python.h" +#include "error_handling.h" #include "python2js_buffer.h" #include "types.h" @@ -11,487 +12,67 @@ // This file handles the conversion of Python buffer objects (which loosely // represent Numpy arrays) to Javascript. +// Converts everything to nested Javascript arrays, where the scalars are +// standard Javascript numbers (python2js_buffer_recursive) -// There are two methods here: - -// 1. Converts everything to nested Javascript arrays, where the scalars are -// standard Javascript numbers (python2js_buffer_recursive) - -// 2. Converts everything to nested arrays, where the last contiguous -// dimension is a subarray of a TypedArray that points to the original bytes -// on the WebAssembly (Python) side. This is much faster since it doesn't -// require copying the data, and the data is shared. In the case of a -// one-dimensional array, the result is simply a TypedArray. Unfortunately, -// this requires that the source array is C-contiguous and in native (little) -// endian order. (python2js_shareable_buffer_recursive) - -// Unfortunately, this also means that there are different semantics: sometimes -// the array is a copy, and other times it is a shared reference. One should -// write code that doesn't rely on either behavior, but treats this simply as -// the performance optimization that it is. - -typedef JsRef(scalar_converter)(char*); - -static JsRef -_convert_bool(char* data) -{ - char v = *((char*)data); - return hiwire_bool((int)v); -} - -static JsRef -_convert_int8(char* data) -{ - i8 v = *((i8*)data); - return hiwire_int(v); -} - -static JsRef -_convert_uint8(char* data) -{ - u8 v = *((u8*)data); - return hiwire_int(v); -} - -static JsRef -_convert_int16(char* data) -{ - i16 v = *((i16*)data); - return hiwire_int(v); -} - -static JsRef -_convert_int16_swap(char* data) -{ - i16 v = *((i16*)data); - return hiwire_int(be16toh(v)); -} - -static JsRef -_convert_uint16(char* data) -{ - u16 v = *((u16*)data); - return hiwire_int(v); -} - -static JsRef -_convert_uint16_swap(char* data) -{ - u16 v = *((u16*)data); - return hiwire_int(be16toh(v)); -} - -static JsRef -_convert_int32(char* data) -{ - i32 v = *((i32*)data); - return hiwire_int(v); -} - -static JsRef -_convert_int32_swap(char* data) -{ - i32 v = *((i32*)data); - return hiwire_int(be32toh(v)); -} - -static JsRef -_convert_uint32(char* data) -{ - u32 v = *((u32*)data); - return hiwire_int(v); -} - -static JsRef -_convert_uint32_swap(char* data) -{ - u32 v = *((u32*)data); - return hiwire_int(be32toh(v)); -} - -static JsRef -_convert_int64(char* data) -{ - i64 v = *((i64*)data); - return hiwire_int(v); -} - -static JsRef -_convert_int64_swap(char* data) -{ - i64 v = *((i64*)data); - return hiwire_int(be64toh(v)); -} - -static JsRef -_convert_uint64(char* data) -{ - u64 v = *((u64*)data); - return hiwire_int(v); -} - -static JsRef -_convert_uint64_swap(char* data) -{ - u64 v = *((u64*)data); - return hiwire_int(be64toh(v)); -} - -static JsRef -_convert_float32(char* data) -{ - float v = *((float*)data); - return hiwire_double(v); -} - -static JsRef -_convert_float32_swap(char* data) -{ - union float32_t - { - u32 i; - float f; - } v; - - v.f = *((float*)data); - v.i = be32toh(v.i); - return hiwire_double(v.f); -} - -static JsRef -_convert_float64(char* data) -{ - double v = *((double*)data); - return hiwire_double(v); -} - -static JsRef -_convert_float64_swap(char* data) -{ - union float64_t - { - u64 i; - double f; - } v; - - v.f = *((double*)data); - v.i = be64toh(v.i); - return hiwire_double(v.f); -} - -static scalar_converter* -_python2js_buffer_get_converter(Py_buffer* buff) -{ - // Uses Python's struct typecodes as defined here: - // https://docs.python.org/3.8/library/array.html - - char format; - char swap; - if (buff->format == NULL) { - swap = 0; - format = 'B'; - } else { - switch (buff->format[0]) { - case '>': - case '!': - swap = 1; - format = buff->format[1]; - break; - case '=': - case '<': - case '@': - swap = 0; - format = buff->format[1]; - break; - default: - swap = 0; - format = buff->format[0]; - } - } - - switch (format) { - case 'c': - case 'b': - return _convert_int8; - case 'B': - return _convert_uint8; - case '?': - return _convert_bool; - case 'h': - if (swap) { - return _convert_int16_swap; - } else { - return _convert_int16; - } - case 'H': - if (swap) { - return _convert_uint16_swap; - } else { - return _convert_uint16; - } - case 'i': - case 'l': - case 'n': - if (swap) { - return _convert_int32_swap; - } else { - return _convert_int32; - } - case 'I': - case 'L': - case 'N': - if (swap) { - return _convert_uint32_swap; - } else { - return _convert_uint32; - } - case 'q': - if (swap) { - return _convert_int64_swap; - } else { - return _convert_int64; - } - case 'Q': - if (swap) { - return _convert_uint64_swap; - } else { - return _convert_uint64; - } - case 'f': - if (swap) { - return _convert_float32_swap; - } else { - return _convert_float32; - } - case 'd': - if (swap) { - return _convert_float64_swap; - } else { - return _convert_float64; - } - default: - return NULL; - } -} - -static JsRef -_python2js_buffer_recursive(Py_buffer* buff, - char* ptr, - int dim, - scalar_converter* convert) -{ - // This function is basically a manual conversion of `recursive_tolist` in - // Numpy to use the Python buffer interface and output Javascript. - - Py_ssize_t i, n, stride; - JsRef jsarray, jsitem; - - if (dim >= buff->ndim) { - return convert(ptr); - } - - n = buff->shape[dim]; - stride = buff->strides[dim]; - - jsarray = hiwire_array(); - - for (i = 0; i < n; ++i) { - jsitem = _python2js_buffer_recursive(buff, ptr, dim + 1, convert); - if (jsitem == NULL) { - hiwire_decref(jsarray); - return NULL; - } - hiwire_push_array(jsarray, jsitem); - hiwire_decref(jsitem); - - ptr += stride; - } - - return jsarray; -} - -static JsRef -_python2js_buffer_to_typed_array(Py_buffer* buff) -{ - // Uses Python's struct typecodes as defined here: - // https://docs.python.org/3.8/library/array.html - - char format; - if (buff->format == NULL) { - format = 'B'; - } else { - switch (buff->format[0]) { - case '>': - case '!': - // This path can't handle byte-swapping - return NULL; - case '=': - case '<': - case '@': - format = buff->format[1]; - break; - default: - format = buff->format[0]; - } - } - - Py_ssize_t len = buff->len / buff->itemsize; - switch (format) { - case 'c': - case 'b': - return hiwire_int8array((i8*)buff->buf, len); - case 'B': - return hiwire_uint8array((u8*)buff->buf, len); - case '?': - return NULL; - case 'h': - return hiwire_int16array((i16*)buff->buf, len); - case 'H': - return hiwire_uint16array((u16*)buff->buf, len); - case 'i': - case 'l': - case 'n': - return hiwire_int32array((i32*)buff->buf, len); - case 'I': - case 'L': - case 'N': - return hiwire_uint32array((u32*)buff->buf, len); - case 'q': - case 'Q': - return NULL; - case 'f': - return hiwire_float32array((f32*)buff->buf, len); - case 'd': - return hiwire_float64array((f64*)buff->buf, len); - default: - return NULL; - } -} - -enum shareable_enum -{ - NOT_SHAREABLE, - CONTIGUOUS, - NOT_CONTIGUOUS -}; - -static JsRef -_python2js_shareable_buffer_recursive(Py_buffer* buff, - enum shareable_enum shareable, - JsRef idarr, - int ptr, - int dim) -{ - Py_ssize_t i, n, stride; - JsRef jsarray, jsitem; - - switch (shareable) { - case NOT_CONTIGUOUS: - if (dim >= buff->ndim) { - // The last dimension isn't contiguous, so we need to output one-by-one - return hiwire_get_member_int(idarr, ptr / buff->itemsize); - } - break; - case CONTIGUOUS: - if (dim == buff->ndim - 1) { - // The last dimension is contiguous, so we can output a whole row at a - // time - return hiwire_subarray( - idarr, ptr / buff->itemsize, ptr / buff->itemsize + buff->shape[dim]); - } - break; - default: - break; - } - - n = buff->shape[dim]; - stride = buff->strides[dim]; - - jsarray = hiwire_array(); - - for (i = 0; i < n; ++i) { - jsitem = _python2js_shareable_buffer_recursive( - buff, shareable, idarr, ptr, dim + 1); - if (jsitem == NULL) { - hiwire_decref(jsarray); - return NULL; - } - hiwire_push_array(jsarray, jsitem); - hiwire_decref(jsitem); - - ptr += stride; - } - - return jsarray; -} - -static enum shareable_enum -_python2js_buffer_is_shareable(Py_buffer* buff) -{ - if (buff->ndim == 0) { - return NOT_SHAREABLE; - } - - char* invalid_codes = ">!qQ?"; - for (char* i = buff->format; *i != 0; ++i) { - for (char* j = invalid_codes; *j != 0; ++j) { - if (*i == *j) { - return NOT_SHAREABLE; - } - } - } - - for (int i = 0; i < buff->ndim; ++i) { - if (buff->strides[i] <= 0) { - return NOT_SHAREABLE; - } - } - - if (buff->itemsize != buff->strides[buff->ndim - 1]) { - return NOT_CONTIGUOUS; - } - - // We can use the most efficient method - return CONTIGUOUS; -} +// clang-format off +/** + * A simple helper function that puts the arguments into a Javascript object + * (for readability) and looks up the conversion function, then calls into + * python2js_buffer_recursive. + */ +EM_JS_REF(JsRef, _python2js_buffer_inner, ( + void* buf, + Py_ssize_t itemsize, + int ndim, + char* format, + Py_ssize_t* shape, + Py_ssize_t* strides, + Py_ssize_t* suboffsets +), { + // get_converter and _python2js_buffer_recursive defined in python2js_buffer.js + let converter = Module.get_converter(format, itemsize); + let result = Module._python2js_buffer_recursive(buf, 0, { + ndim, + format, + itemsize, + shape, + strides, + suboffsets, + converter, + }); + return Module.hiwire.new_value(result); +}); +// clang-format on +/** + * Convert a buffer. To get the data out of the Py_buffer without relying on the + * exact memory layout of Py_buffer, we need to do this in C. After pulling the + * data out we call into the EM_JS helper _python2js_buffer_inner, which sets up + * the base case for the recursion and then calls the main js function + * _python2js_buffer_recursive (defined in python2js_buffer.js). + */ JsRef _python2js_buffer(PyObject* x) { - PyObject* memoryview = PyMemoryView_FromObject(x); - if (memoryview == NULL) { + Py_buffer view; + if (PyObject_GetBuffer(x, &view, PyBUF_FULL_RO) == -1) { return NULL; } - - Py_buffer* buff; - buff = PyMemoryView_GET_BUFFER(memoryview); - - enum shareable_enum shareable = _python2js_buffer_is_shareable(buff); - JsRef result; - - if (shareable != NOT_SHAREABLE) { - JsRef idarr = _python2js_buffer_to_typed_array(buff); - if (idarr == NULL) { - PyErr_SetString( - PyExc_TypeError, - "Internal error: Invalid type to convert to array buffer."); - return NULL; - } - - result = - _python2js_shareable_buffer_recursive(buff, shareable, idarr, 0, 0); - hiwire_decref(idarr); - } else { - scalar_converter* convert = _python2js_buffer_get_converter(buff); - if (convert == NULL) { - Py_DECREF(memoryview); - return NULL; - } - - result = _python2js_buffer_recursive(buff, buff->buf, 0, convert); - } - - Py_DECREF(memoryview); - + // clang-format off + JsRef result = _python2js_buffer_inner( + view.buf, + view.itemsize, + view.ndim, + view.format, + view.shape, + view.strides, + view.suboffsets + ); + // clang-format on + PyBuffer_Release(&view); return result; } + +#include "include_js_file.h" +#include "python2js_buffer.js" diff --git a/src/core/python2js_buffer.h b/src/core/python2js_buffer.h index cd1fc1386..2c51cb9f5 100644 --- a/src/core/python2js_buffer.h +++ b/src/core/python2js_buffer.h @@ -18,4 +18,7 @@ JsRef _python2js_buffer(PyObject* x); +errcode +python2js_buffer_init(); + #endif /* PYTHON2JS_BUFFER_H */ diff --git a/src/core/python2js_buffer.js b/src/core/python2js_buffer.js new file mode 100644 index 000000000..38d1ae541 --- /dev/null +++ b/src/core/python2js_buffer.js @@ -0,0 +1,281 @@ +JS_FILE(python2js_buffer_init, () => { + 0, 0; /* Magic, see include_js_file.h */ + + /** + * Determine type and endianness of data from format. This is a helper + * function for converting buffers from Python to Javascript, used in + * PyProxyBufferMethods and in `toJs` on a buffer. + * + * To understand this function it will be helpful to look at the tables here: + * https://docs.python.org/3/library/struct.html#format-strings + * + * @arg format {String} A Python format string (caller must convert it to a + * Javascript string). + * @arg errorMessage {String} Extra stuff to append to an error message if + * thrown. Should be a complete sentence. + * @returns A pair, an appropriate TypedArray constructor and a boolean which + * is true if the format suggests a big endian array. + * @private + */ + Module.processBufferFormatString = function(formatStr, errorMessage = "") { + if (formatStr.length > 2) { + throw new Error("Expected format string to have length <= 2, " + + `got '${formatStr}'.` + errorMessage); + } + let formatChar = formatStr.slice(-1); + let alignChar = formatStr.slice(0, -1); + let bigEndian; + switch (alignChar) { + case "!": + case ">": + bigEndian = true; + break; + case "<": + case "@": + case "=": + case "": + bigEndian = false; + break; + default: + throw new Error(`Unrecognized alignment character ${alignChar}.` + + errorMessage); + } + let arrayType; + switch (formatChar) { + case 'b': + arrayType = Int8Array; + break; + case 's': + case 'p': + case 'c': + case 'B': + case '?': + arrayType = Uint8Array; + break; + case 'h': + arrayType = Int16Array; + break; + case 'H': + arrayType = Uint16Array; + break; + case 'i': + case 'l': + case 'n': + arrayType = Int32Array; + break; + case 'I': + case 'L': + case 'N': + case 'P': + arrayType = Uint32Array; + break; + case 'q': + // clang-format off + if (globalThis.BigInt64Array === undefined) { + // clang-format on + throw new Error("BigInt64Array is not supported on this browser." + + errorMessage); + } + arrayType = BigInt64Array; + break; + case 'Q': + // clang-format off + if (globalThis.BigUint64Array === undefined) { + // clang-format on + throw new Error("BigUint64Array is not supported on this browser." + + errorMessage); + } + arrayType = BigUint64Array; + break; + case 'f': + arrayType = Float32Array; + break; + case 'd': + arrayType = Float64Array; + break; + case "e": + throw new Error("Javascript has no Float16 support."); + default: + throw new Error(`Unrecognized format character '${formatChar}'.` + + errorMessage); + } + return [ arrayType, bigEndian ]; + }; + + /** + * Convert a 1-dimensional contiguous buffer to Javascript. + * + * In this case we can just slice the memory out of the wasm HEAP. + * @param {number} ptr A pointer to the start of the buffer in wasm memory + * @param {number} stride The size of the entries in bytes + * @param {number} n The number of entries + * @returns A new ArrayBuffer with the appropriate data in it (not a view of + * the WASM heap) + * @private + */ + Module.python2js_buffer_1d_contiguous = function(ptr, stride, n) { + "use strict"; + let byteLength = stride * n; + // Note: slice here is a copy (as opposed to subarray which is not) + return HEAP8.slice(ptr, ptr + byteLength).buffer; + }; + + /** + * Convert a 1d noncontiguous buffer to Javascript. + * + * Since the buffer is not contiguous we have to copy it in chunks. + * @param {number} ptr The WAM memory pointer to the start of the buffer. + * @param {number} stride The stride in bytes between each entry. + * @param {number} suboffset The suboffset from the Python Buffer protocol. + * Negative if no suboffsets. (see + * https://docs.python.org/3/c-api/buffer.html#c.Py_buffer.suboffsets) + * @param {number} n The number of entries. + * @param {number} itemsize The size in bytes of each entry. + * @returns A new ArrayBuffer with the appropriate data in it (not a view of + * the WASM heap) + * @private + */ + Module.python2js_buffer_1d_noncontiguous = function(ptr, stride, suboffset, n, + itemsize) { + "use strict"; + let byteLength = itemsize * n; + // Make new memory of the appropriate size + let buffer = new Uint8Array(byteLength); + for (i = 0; i < n; ++i) { + let curptr = ptr + i * stride; + if (suboffset >= 0) { + curptr = HEAP32[curptr / 4] + suboffset; + } + buffer.set(HEAP8.subarray(curptr, curptr + itemsize), i * itemsize); + } + return buffer.buffer; + }; + + /** + * Convert an ndarray to a nested Javascript array, the main function. + * + * This is called by _python2js_buffer_inner (defined in python2js_buffer.c). + * There are two layers of setup that need to be done to get the base case of + * the recursion right. + * + * The last dimension of the array is handled by the appropriate 1d array + * converter: python2js_buffer_1d_contiguous or + * python2js_buffer_1d_noncontiguous. + * + * @param {number} ptr The pointer into the buffer + * @param {number} curdim What dimension are we currently working on? 0 <= + * curdim < ndim. + * @param {number} bufferData All of the data out of the Py_buffer, plus the + * converter function: ndim, format, itemsize, shape (a ptr), strides (a ptr), + * suboffsets (a ptr), converter, + * @returns A nested Javascript array, the result of the conversion. + * @private + */ + Module._python2js_buffer_recursive = function(ptr, curdim, bufferData) { + "use strict"; + // When indexing HEAP32 we need to divide the pointer by 4 + let n = HEAP32[bufferData.shape / 4 + curdim]; + let stride = HEAP32[bufferData.strides / 4 + curdim]; + let suboffset = -1; + // clang-format off + if (bufferData.suboffsets !== 0) { + suboffset = HEAP32[bufferData.suboffsets / 4 + curdim]; + } + if (curdim === bufferData.ndim - 1) { + // Last dimension, use appropriate 1d converter + let arraybuffer; + if (stride === bufferData.itemsize && suboffset < 0) { + arraybuffer = Module.python2js_buffer_1d_contiguous(ptr, stride, n); + } else { + arraybuffer = Module.python2js_buffer_1d_noncontiguous( + ptr, stride, suboffset, n, bufferData.itemsize); + } + return bufferData.converter(arraybuffer); + } + // clang-format on + + let result = []; + for (let i = 0; i < n; ++i) { + // See: + // https://docs.python.org/3/c-api/buffer.html#pil-style-shape-strides-and-suboffsets + let curPtr = ptr + i * stride; + if (suboffset >= 0) { + curptr = HEAP32[curptr / 4] + suboffset; + } + result.push( + Module._python2js_buffer_recursive(curPtr, curdim + 1, bufferData)); + } + return result; + }; + + /** + * Get the appropriate converter function. + * + * The converter function takes an ArrayBuffer and returns an appropriate + * TypedArray. If the buffer is big endian, the converter will convert the + * data to little endian. + * + * The converter function does something special if the format character is + * "?" or "s". If it's "?" we return an array of booleans, if it's "s" we + * return a string. + * + * @param {string} format The format character of the buffer. + * @param {number} itemsize Should be one of 1, 2, 4, 8. Used for big endian + * conversion. + * @returns A converter function ArrayBuffer => TypedArray + */ + Module.get_converter = function(format, itemsize) { + "use strict"; + let formatStr = UTF8ToString(format); + let [ArrayType, bigEndian] = Module.processBufferFormatString(formatStr); + let formatChar = formatStr.slice(-1); + // clang-format off + switch (formatChar) { + case "s": + let decoder = new TextDecoder("utf8"); + return (buff) => decoder.decode(buff); + case "?": + return (buff) => Array.from(new Uint8Array(buff), x => !!x); + } + // clang-format on + + if (!bigEndian) { + // clang-format off + return buff => new ArrayType(buff); + // clang-format on + } + let getFuncName; + let setFuncName; + switch (itemsize) { + case 2: + getFuncName = "getUint16"; + setFuncName = "setUint16"; + break; + case 4: + getFuncName = "getUint32"; + setFuncName = "setUint32"; + break; + case 8: + getFuncName = "getFloat64"; + setFuncName = "setFloat64"; + break; + default: + // clang-format off + throw new Error(`Unexpected size ${ itemsize }`); + // clang-format on + } + function swapFunc(buff) { + let dataview = new DataView(buff); + let getFunc = dataview[getFuncName].bind(dataview); + let setFunc = dataview[setFuncName].bind(dataview); + for (let byte = 0; byte < dataview.byteLength; byte += itemsize) { + // Get value as little endian, set back as big endian. + setFunc(byte, getFunc(byte, true), false); + } + return buff; + } + // clang-format off + return buff => new ArrayType(swapFunc(buff)); + // clang-format on + }; +}); diff --git a/src/pyodide-py/_pyodide/_core.py b/src/pyodide-py/_pyodide/_core.py index 32d5a25b4..60562ebed 100644 --- a/src/pyodide-py/_pyodide/_core.py +++ b/src/pyodide-py/_pyodide/_core.py @@ -48,7 +48,7 @@ try: pass def then(self, onfulfilled: Callable, onrejected: Callable) -> "Promise": - """The ``Promise.then`` api, wrapped to manage the lifetimes of the + """The ``Promise.then`` API, wrapped to manage the lifetimes of the handlers. Only available if the wrapped Javascript object has a "then" method. @@ -57,7 +57,7 @@ try: """ def catch(self, onrejected: Callable) -> "Promise": - """The ``Promise.catch`` api, wrapped to manage the lifetimes of the + """The ``Promise.catch`` API, wrapped to manage the lifetimes of the handler. Only available if the wrapped Javascript object has a "then" method. @@ -66,7 +66,7 @@ try: """ def finally_(self, onfinally: Callable) -> "Promise": - """The ``Promise.finally`` api, wrapped to manage the lifetimes of + """The ``Promise.finally`` API, wrapped to manage the lifetimes of the handler. Only available if the wrapped Javascript object has a "then" method. diff --git a/src/pyodide.js b/src/pyodide.js index c43284d15..3d6e14013 100644 --- a/src/pyodide.js +++ b/src/pyodide.js @@ -667,7 +667,7 @@ globalThis.loadPyodide = async function(config = {}) { * ``name``. This module can then be imported from Python using the standard * Python import system. If another module by the same name has already been * imported, this won't have much effect unless you also delete the imported - * module from ``sys.modules``. This calls the ``pyodide_py`` api + * module from ``sys.modules``. This calls the ``pyodide_py`` API * :func:`pyodide.register_js_module`. * * @param {string} name Name of the Javascript module to add @@ -683,7 +683,7 @@ globalThis.loadPyodide = async function(config = {}) { * :func:`pyodide.register_js_module`. If a Javascript module with that name * does not already exist, will throw an error. Note that if the module has * already been imported, this won't have much effect unless you also delete - * the imported module from ``sys.modules``. This calls the ``pyodide_py`` api + * the imported module from ``sys.modules``. This calls the ``pyodide_py`` API * :func:`pyodide.unregister_js_module`. * * @param {string} name Name of the Javascript module to remove