Maybe it was just me but this felt a bit overly pedantic. Understanding the internals of wasm bindgen is important for understanding how rust handles strings in WASM, but I was expecting a higher level discussion of how strings are passed to WASM.
Due to the the lack of native strings in WebAssembly different Wasm compilers have different memory layouts and string encodings. For example assemblyscript uses ucs2 for the sake fo compatibility with JavaScript. This obliges to carefully work with memory bounds, string length estimation due to difference in host native and guest string encodings.
AS as well as JS interpret code strings as UTF16-LE during follow methods: String.p.codePointAt, String.p.toUpperCase/toLowerCase, String.p.localeCompare, String.p.normalize, String.fromCodePoint, Array.from(str). In rest cases strings interprets as UCS-2.
The do in all the observable JS APIs, but behind the scenes there are a number of optimizations in each JS engine to deal with the fact that most JS and JSON source comes off the wire in UTF-8 or ASCII.
> Actually, humans generally think in terms of graphemes, which may or may not be composed of multiple Unicode code points (irrespective of the normalization form being used).
Is it unusual for a VM not to have a string (or at least a bytes) type? I have little experience in the space, but it seems clunky. Curious why WASM went this direction.
WASM just reached MVP. The design goal of the MVP seems to be a small spec that includes the minimum amount of requirements.
I guess that is the reason why all four major browsers cloud adopt it almost in the same time, without having many proposal ping-pongs between Google vs Mozilla vs Apple vs Microsoft.
This article, and proposals for reference types, though, make it sound like working around the lack of strings/bytes/chars might have been more work in the end. Sort of like a unicycle isn't really a bicycle MVP :)
Right, but it is better than having to wait more years until browsers reach consensus. I'm not sure but specifying a WASM-native string representation might be more favorable to a specific JS engine than the others, depending on its existing JS string implementation. This might cause disagreements.
The problem is that WebAssembly exists for other languages to be compiled into.
And those other languages don't agree what a string looks like.
So do you have a Rust string? Or a C string? Or a C# string? Or one of the other representations?
By just giving you a bunch of bytes to play with, and letting different languages use them differently, WebAssembly stays out of the way and lets the compiler decide how it wants to make strings.
Can you elaborate? I'm not well-versed in WASM internals and I might be missing something.
I mean, any integer with 8+ bits can be treated like an octet, but memory layout is very different (e.g. an array of i8 and an array of i32 look very different even if they represent the same values).
https://hacks.mozilla.org/2019/08/webassembly-interface-type...