javascript binary data
overview
JavaScript has a convoluted mishmosh of classes used to store and manipulate binary (byte based) data. Additionally, Node.JS added a nice extension that was built on top of the traditional binary data classes for use in Node.JS apps, but it's not included in standard JavaScript for browsers. This text aims to sort out the different types of binary based data structures available in modern javascript in 2023, and proposes some best practices.
only static length blocks of binary data, not dynamic length
All binary data manipulation discussed here is done by using classes that manipulate static length blocks of memory. That is, you either allocate memory having a fixed length in bytes, or are given the memory block by some third party code. You can not increase or decrease the size of that memory. If you really needed to change the size, you would have to allocate a new block of memory, and then manually copy the old block to the new block, then destroy the old block. Note: there is a new resize()
function in the ArrayBuffer
spec that hasn't been implemented everywhere yet. resize()
technically adds a dynamic size change ability to the memory but it doesn't feel dynamic the way a string or an array does. See the short discussion at the end of this text on dynamic storage of binary data.
the ArrayBuffer
and its friends
All static binary data in JavaScript is stored in an ArrayBuffer
which is then read from and written to via the DataView, Uint8Array and TextEncoder/Decoder
classes. This diagram shows how JavaScript's built-in binary data classes fit together. Notice that there are multiple ways an ArrayBuffer
can be read from and written to, and it will be the developer's job to determine the right one to use for each case.
classDiagram
ArrayBuffer <-- TypedArray
ArrayBuffer <-- DataView
class ArrayBuffer {
byteLength: number
}
TypedArray <|-- Uint8Array
TypedArray <|-- Uint32Array
class TypedArray {
buffer: ArrayBuffer
byteLength: number
}
class DataView {
buffer: ArrayBuffer
byteLength: number
getUint8(byteOffset)
setUint8(byteOffset, value)
getInt32(byteOffset, isLittleEndian=false)
setInt32(byteOffset, value, isLittleEndian=false)
}
class Uint8Array {
index[]: number
}
class Uint32Array {
index[]: number
}
classDiagram
class TextEncoder {
encoding:string = 'utf-8'
encode(s:string):Uint8Array
encodeInto(source:string, dest:Uint8Array)
}
class TextDecoder {
encoding:string = 'utf-8'
decode(buffer:Uint8Array|DataView|ArrayBuffer):string
}
class Blob {
arrayBuffer()
stream()
}
class | explanation |
---|---|
ArrayBuffer |
this the core JS memory class. This is a simple raw byte buffer where all JavaScript binary data gets stored. However, the data being held inside this class can't be accessed on its own, instead, it must first be encapsulated by one of the TypedArray descendants like UintArray, DataView, or TextEncoder/Decoder which then allows reading and writing within the buffer. Once created, the buffer can not be shrunk or expanded. This may seem needlessly complicated, but its purpose is to separate the possibly large block of data from code that processes it, and allows that one block of data to be shared, viewed, and updated in multiple ways throughout your application. |
DataView |
allows access to the underlying buffer via numerous getXxxNN(byteOffset,isLittleEndian=false) and setXxxNN(byteOffset,value,isLittleEndian=false) methods. Note that in order to access the underlying buffer as a string, you'll need to use TextDecoder and TextEncoder . |
TypedArray |
abstract base class that encapsulates an ArrayBuffer (do not use by itself) |
⮤Uint8Array |
an array-like class that allows reading and writing to the encapsulated ArrayBuffer one byte at a time using most of the functionality of JavaScript arrays. When instantiating one of these, if you don't point it to an existing ArrayBuffer a new ArrayBuffer will be created for you and be assigned to the buffer member (which it inherits from TypedArray ). |
⮤Uint32Array |
Uint32Array and other XxxxNNArray classes are of limited use. They are arrays that look at their buffer member as a sequences of integers or floats of various sizes and signed vs unsigned using the platform's default endianness. This is of little use because of the lack of endian control. (Always use DataView if reading or writing 16-bit, 32-bit, 64-bit, integers and floats which lets you control the endianness.) |
TextEncoder |
allows writing strings to Uint8Array using a specified encoding, with utf-8 being the default |
TextDecoder |
allows reading encoded strings from Uint8Array, DataView and even from an ArrayBuffer directly using a specified encoding, with utf-8 being the default |
Blob |
this is an additional class that encapsulates a static length block of binary data similar to the ArrayBuffer , but can deliver its internal data either as an ArrayBuffer (which can then make use of DataView and the TypedArray descendants) or as a Stream which has other benefits. This is mainly used to deliver binary data to the browser for things like dynamic images and videos. Blob isn't the focus of this document, so I'll not say anything more about it. |
node.js also has its own way (the Buffer
class)
The classes mentioned above would be all you would need to manipulate static length binary data in Node.JS, however, node includes an additional method for handling binary data in the form of a Buffer
class. Buffer
inherits from Uint8Array
and therefore combines all the functionality of Uint8Array
, plus it adds the functionality of DataView
and TextEncoder/Decoder
into one convenient class. While node can still use the TypedArray, DataView & TextEncoder
classes, the Buffer
class is very convenient to use.
classDiagram
ArrayBuffer <-- TypedArray
ArrayBuffer <-- DataView
class ArrayBuffer {
byteLength: number
}
TypedArray <|-- Uint8Array
TypedArray <|-- Uint32Array
class TypedArray {
buffer: ArrayBuffer
byteLength: number
}
class DataView {
buffer: ArrayBuffer
byteLength: number
getUint8(byteOffset)
setUint8(byteOffset, value)
getInt32(byteOffset, isLittleEndian=false)
setInt32(byteOffset, value, isLittleEndian=false)
}
class Uint8Array {
index[]: number
}
class Uint32Array {
index[]: number
}
Uint8Array <|-- Buffer
class Buffer {
readInt8(byteOffset)
writeInt8(byteOffset, value)
readUInt32BE(byteOffset)
writeUInt32BE(byteOffset, value)
toString(encoding='utf8', start=0, end=lastByte)
write(string, offset=0, length=all, encoding='utf8')
}
%%% class Buffer {class:'green'}
classDiagram
class TextEncoder {
encoding:string = 'utf-8'
encode(s:string):Uint8Array
encodeInto(source:string, dest:Uint8Array)
}
class TextDecoder {
encoding:string = 'utf-8'
decode(buffer:Uint8Array|DataView|ArrayBuffer):string
}
class Blob {
arrayBuffer()
stream()
}
class | explanation |
---|---|
Buffer | this is the main class used in Node.JS for manipulating binary data. (Buffers are used and returned by all fs library functions) This class includes (via inheritance) the functionality offered by the standard Uint8Array , and duplicates the features offered by DataView and the TextEncoder/Decoder classes. You may consider avoiding using Buffer given that it is not available in browsers. But, if you can't live without the convenience that Buffer provides (and I wouldn't blame you, it's easier to use than the combination of Uint8Array, DataView & TextEncoder/Decoder ) you may be interested in using Node's Buffer for node applications, and then using a third party implementation of Buffer for browsers like this one: Buffer for Browsers.…And yes, it is confusing that the ArrayBuffer encapsulated by the Buffer class is also called buffer .…And yes, it is confusing that Buffer spells it's functions as UIntXYZ and DataView , and the TypedArrays spell their functions and classes as UintXYZ (lowercase 'i').…And yes, it is confusing that TextEncoder/Decoder spells it's encodings with dashes like utf-8 , but Buffer spells them without dashes, like utf8 .I wouldn't have named things in these inconsistent ways, but it is what it is. |
dynamic length binary data
binary data in strings?
Technically, JavaScript can store binary data in strings dynamically. Unlike some other languages where strings are terminated with null characters, JavaScript simply treats character 0x00 as any other. It's therefore quite tempting to add binary data into a string because then you'll have an easy-to-use dynamic data structure for binary data! However, I do not recommend doing this. While it may seem to work and be ever so convenient, there are numerous tricky caveats involved: JavaScript will automatically interpret data in strings as being "string data" and will gladly transform and adjust the bytes as it sees fit to handle proper Unicode character encoding. Additionally, JavaScript actually stores strings as 2 bytes per character in memory. So, each byte of binary data you put into a string will be stored as 2 bytes, which will double your memory usage---but the main reason for avoiding this is to stay away from the strange character encoding behavior that your binary data will be inadvertently subjected to. An excellent example of this is clearly described here: Binary data in the browser: Untangling an encoding mess with JavaScript Typed Arrays
array of number: a dynamic length binary storage option
The closest thing JavaScript has to dynamic length storage of binary data would be arrays of numbers, where the numbers are limited to values from 0 to 255. ex: let byteArray = [0,10,255]
The static length binary storage classes can also convert their contents to and from a number array like so:
examples of converting from a number array containing bytes to Uint8Array
, and Buffer
// an array of bytes ...
let byteArray = [0,10,255];
byteArray.push(99); // <-- adding a byte dynamically
// ... to Uint8Array
let uInt8Array = Uint8Array.from(byteArray);
console.log(uInt8Array); // Uint8Array(4) [0, 10, 255, 99]
// ... to Buffer (in Node.JS)
let buffer = Buffer.from(byteArray);
console.log(buffer); // <Buffer 00 0a ff 63> aka [0, 10, 255, 99]
examples of converting from the above uInt8Array
, and buffer
back to number arrays containing bytes.
// ... from uInt8Array
let byteArray2 = [...uInt8Array.values()];
console.log(byteArray2); // [0, 10, 255, 99]
// ... from buffer (in Node.JS)
let byteArray2 = [...buffer.values()];
console.log(buffer); // [0, 10, 255, 99]
conclusion
If you are writing a library to be used on both the browser and in Node, the best way to deal with binary data is to have all of your functions take and return ArrayBuffer
, that way, the developer using your functions can access the data however they see fit. (via the Uint8Array
, DataView
, Blob
, and Buffer
in Node.) After that, it's up to you how you want to access the data inside your functions, keeping in mind that the Buffer
object is only built into Node, but not the browser. Lastly, if you want to store binary data in a dynamic length data structure, arrays of numbers are the only viable option, as storing binary data in strings is problematic.