I am using native javascript websocket in browser and we have an application hosted on AWS where every request goes through API gateway.
In some cases, request data is going upto 60kb, and then my websocket connection is closing automatically. In AWS documentation, I found out below explanation of this issue
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-known-issues.html
API Gateway supports message payloads up to 128 KB with a maximum frame size of 32 KB. If a message exceeds 32 KB, you must split it into multiple frames, each 32 KB or smaller. If a larger message is received, the connection is closed with code 1009.
I tried to find how I can split a message in multiple frames using native javascript websocket but could not find any config related to frames in documentation or anywhere else
Although I find something related to message fragmentation but it seems like a custom solution that I need to implement at both frontend and backend
https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers#message_fragmentation
As far as I know, you cannot do this using the JS AWS SDK "postToConnection" API. Best you can do is write your own poor's man fragmentation and send the chunks as independent messages.
const splitInChunks =
(sizeInBytes: number) =>
(buffer: Buffer): Buffer[] => {
const size = Buffer.byteLength(buffer);
let start = 0;
let end = sizeInBytes;
const chunks: Buffer[] = [];
do {
chunks.push(buffer.subarray(start, end));
start += sizeInBytes;
end += sizeInBytes;
} while (start < size);
return chunks;
};
Where sizeInBytes must be smaller than 32KB. Then you iterate over the chunks:
await Promise.all(chunks.map(c => apiGatewayClient.postToConnection({ data: JSON.stringify(c), connectionId: myConnectionId })
Which may run into rate limits depending on the number of chunks, so consider sending the requests serially and not in parallel
Final remark: Buffer.prototype.subarray is very efficient because it does not reallocate memory: the new chunks point at the same memory space of the original buffer. Think pointer arithmetic in C.
Related
I am using Webpack, compiling a bundled JS file.
The problem
I have a Worker that I am offloading the hashing work to. I pass a file and filesize to it. I previously did not use a Worker. However, when Chrome reacted badly to hashing a large file, I thought that the main thread was being blocked by the hashing mechanism. This could be a false assumption.
The code works correctly for small files. However, for large files, once reaching the part where the final hash is generated, Chrome shows this error:
Firefox is a bit more helpful and shows this message:
Error: Uncaught, unspecified "error" event. (out of memory)
However, the piping of data should alleviate this issue. fileReaderStream reads data in chunks of 1 MB.
The code
import Crypto from 'crypto'
import fileReaderStream from 'filereader-stream'
import concat from 'concat-stream'
var progress = require('progress-stream');
self.onmessage = (event) => {
switch (event.data.topic) {
case 'hash': {
var file = event.data.file;
var filesize = event.data.filesize;
let p1 = progress({
length: filesize,
time: 100 /* ms */
});
let p2 = progress({
length: filesize,
time: 100 /* ms */
});
p1.on('progress', function(progress) {
console.log('p1', progress);
});
p2.on('progress', function(progress) {
console.log('p2', progress);
});
let md5 = Crypto.createHash('md5');
console.log("START HASH");
var reader = fileReaderStream(file);
reader.pipe(p1).pipe(md5).pipe(p2).pipe(concat((data) => {
console.log("DONE HASH");
console.log(data);
}));
break;
}
}
}
Small file example (5,248 KB)
Large file example (643 MB)
Additional Information
Screenshot of memory usage. It takes up 3 GB in a few seconds.
So it could be worth using a different library if this one is poorly implemented with regards to memory management.
This javascript library is implemented by stanford - https://bitwiseshiftleft.github.io/sjcl/
You may also want to consider using a more secure hashing algorithm than md5 due to it's vulnerability to collisions via the birthday attack.
I use forge as encryption library for both the gulp script (that performs encryption) and the front-end one (where in-browser decryption happens).
The computer is a i5-6200U w/ 16GB ram and takes about 10 seconds for either symmetric enc. or decryption of a 15MB json file.
My real issue is the decryption time being too long for users (multiple files to load and decrypt take 30s+ on this system).
I'm certainly missing some key element (buffer or... whatever my lack of experience in the domain might miss). Is there something obviously wrong in the following code? Thanks for your attention.
Obtaining the data
function logic(url){
return new Promise( (resolve, reject) => {
var xhr = new XMLHttpRequest();
xhr.onload = function (event) {
resolve(xhr.response);
};
xhr.onreject = function (err) {
reject(err);
}
xhr.open('GET', url);
xhr.send();
});
}
Decrypting the data
load('data/dicom.json').then( bytes => {
const tIn = new Date().getTime();
const forge = getForge();
const pwd = "aStringPassword";
const iv = getInitVector();
const salt = getSalt();
const key = forge.pkcs5.pbkdf2(pwd, salt, 100, 16);
var decipher = forge.cipher.createDecipher('AES-CBC', key);
decipher.start({iv: iv});
decipher.update(forge.util.createBuffer(bytez));
decipher.finish();
const clear = decipher.output.getBytes();
const tOut = new Date().getTime();
console.log(`decrypted in ${(tOut - tIn) / 1000 }s`); // 10s for 15MB json file
return clear ;
});
Forge up to at least 0.7.1 uses strings for its internal buffer implementation. (This code pre-dates modern buffer APIs and future Forge versions will use newer APIs.) This has some consequences when processing large inputs. As output string buffers get larger during processing the internal JavaScript VM can slow down just doing string processing. One way to avoid this is to use streaming capabilities of the Forge API such that string buffer manipulations use larger data chunks. The input can be processed in chunks with update() and the output manually built during this process. Getting the output chunks with getBytes() will clear the output buffer and allow the Forge internals to operate more efficiently. Building your own output with those chunks does not have the same performance impact.
A test was written to check decrypting large buffers with a single update() call, many update() calls, and with native node APIs. As the input size increases from 1M to 20M the slowdown of a single update() call vs native node APIs goes from ~8x to well over 50x! But if you use streaming processing the slowdown can be only ~4.6x and not noticeably dependent on input size! For your 15M input size this equates to ~0.75s vs ~10.31s. For comparison node is ~0.15s and the WebCrypto API is likely similar. (Timing from a i7-4790K)
A test was also written to see how the chunk size effected the results. When processing large inputs it seems ~64k is about optimal using node.js. This could be different depending on the JavaScript VM and other factors. The key takeaway is that using streaming with any chunk size (even 1M!) offers improvements to avoid linear buffer slowdowns as input size increases.
An example with improved and more constant performance:
const decipher = forge.cipher.createDecipher('AES-CBC', key);
decipher.start({iv: iv});
const length = bytes.length;
const chunkSize = 1024 * 64;
let index = 0;
let clear = '';
do {
clear += decipher.output.getBytes();
const buf = forge.util.createBuffer(bytes.substr(index, chunkSize));
decipher.update(buf);
index += chunkSize;
} while(index < length);
const result = decipher.finish();
assert(result);
clear += decipher.output.getBytes();
A secondary issue with the code is that you want to avoid doing CPU intensive code on the main JS thread. The streaming API will allow you to run each update() call via setImmediate() (if available) or setTimeout(). This will allow the user to interact with the browser while processing is going on. If you can also stream the input data fetch then you could start processing while data is coming over the network. Updating the original code to do this is left as an exercise for the reader. A smaller chunk size might help UI interactivity in this situation.
Lastly it should be noted that native APIs are likely to always be higher performance than Forge. The current WebCrypto API does not offer a streaming API but its performance may be high enough that it may not be an issue in this use case. It's worth trying and seeing what works best.
Also note you should check the decipher.finish() return value.
Encryption has the same buffer issues with large inputs and can use the same pattern as the code above.
For those reading this in the future: newer web APIs and Forge improvements may have greatly changed performance results.
No clue so far, but the code run on the benchmark page, as shown below, runs way slower:
(source available here)
/*** encrypt */
var input = forge.util.createBuffer("plaintext");
var cipher = forge.aes.startEncrypting(key, iv);
cipher.update(input);
var status = cipher.finish();
var ciphertext = cipher.output.data;
Test run via web page: 35ms, same data in my gulp script: 165ms. No clue why, so far.
We are trying to build an app to broadcast live audio to multiple subscribers. The server(written in go) accepts pcm data through chunks and a client using pyaudio is able to tap into the microphone and send this data using the below code. We have tested this and it works. The audio plays from any browser with the subscriber URL.
import pyaudio
import requests
import time
p = pyaudio.PyAudio()
# frames per buffer ?
CHUNK = 1024
# 16 bits per sample ?
FORMAT = pyaudio.paInt16
# 44.1k sampling rate ?
RATE = 44100
# number of channels
CHANNELS = 1
STREAM = p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK
)
print "initialized stream"
def get_chunks(stream):
while True:
try:
chunk = stream.read(CHUNK,exception_on_overflow=False)
yield chunk
except IOError as ioe:
print "error %s" % ioe
url = "https://<server-host>/stream/publish/<uuid>/"
s = requests.session()
s.headers.update({'Content-Type': "audio/x-wav;codec=pcm"})
resp = s.post(url, data=get_chunks(STREAM))
But we need a browser, iOS and Android client to do the same thing as the above client does. We are able to fetch the audio from the mic using the getUserMedia API on the browser but are unable to send this audio to the server like the python code above does. Can someone throw some light in the right direction?
This is about a year old now so I am sure you've moved on but I think that the approach to use from the browser is to stream the data over a WebSocket rather than over HTTP.
I'm using PeerJS, but thought that this problem can be about WebRTC in general, hope You can help me out:
I'm trying to write a simple peer-to-peer file sharing. I'm using serialisation: "none" for PeerJS connection DataChannel, as I'm sending just pure ArrayBuffers.
Everything is good with files around 10mb but I have problems sending bigger file (30+ mb), for example after sending aroung 10-20 first chunks of 900mb zip file connection between peers start throwing Connection is not open. You should listen for the "open" event before sending messages. (on the Sender side)
My setup:
File dragged to drag&drop, Sender uses FileReader to read it as ArrayBuffer in chunks of 64x1024 bytes (no difference with 16x1024) and as soon as each chunk is read - it's sent via peer.send(ChunkArrayBuffer).
Reciever creates blob from each recieved chunk, after transmission finished creates a complete blob out of those and gives a link to user.
My peer connection settings:
var con = peer.connect(peerid, {
label: "file",
reliable: true,
serialization: "none"
})
My sending function:
function sliceandsend(file, sendfunction) {
var fileSize = file.size;
var name = file.name;
var mime = file.type;
var chunkSize = 64 * 1024; // bytes
var offset = 0;
function readchunk() {
var r = new FileReader();
var blob = file.slice(offset, chunkSize + offset);
r.onload = function(evt) {
if (!evt.target.error) {
offset += chunkSize;
console.log("sending: " + (offset / fileSize) * 100 + "%");
if (offset >= fileSize) {
con.send(evt.target.result); ///final chunk
console.log("Done reading file " + name + " " + mime);
return;
}
else {
con.send(evt.target.result);
}
} else {
console.log("Read error: " + evt.target.error);
return;
}
readchunk();
};
r.readAsArrayBuffer(blob);
}
readchunk();
}
Any ideas what can cause this?
Update: Setting 50ms Timeout between chunk transmittions helped a bit, 900mb file loading reached 6% (instead of 1 - 2% previously) before started throwing errors. Maybe it's some kind of limit of simultaneous operations through datachannel or overflowing some kind of datachannel buffer?
Update1: Here's my PeerJS connection object with DataChannel object inside it:
Good News everyone!
It was a buffer overflow of DataChannel problem, thx to this article http://viblast.com/blog/2015/2/25/webrtc-bufferedamount/
bufferedAmount is a property of DataChannel(DC) object which in the latest Chrome version displays amount of data in bytes being currently in buffer, when it exceedes 16MB - DC is silently closed.
Therefore anyone who will encounter this problem need to implement buffering mechanism on application level, which will watch for this property and hold back messages if needed. Also, be aware that in versions of Chrome prior to 37
the same property displays quantity(not size) of messages, and more of that it's broken under windows and displays 0, but with v<37 on overflow DC is not closed - only exception thrown, which can also be caught to indicate buffer overflow.
I made an edit in peer.js unminified code for myself, here you can see both methods in one function (for more of the source code you can look at https://github.com/peers/peerjs/blob/master/dist/peer.js#L217)
DataConnection.prototype._trySend = function(msg) {
var self = this;
function buffering() {
self._buffering = true;
setTimeout(function() {
// Try again.
self._buffering = false;
self._tryBuffer();
}, 100);
return false;
}
if (self._dc.bufferedAmount > 15728640) {
return buffering(); ///custom buffering if > 15MB is buffered in DC
} else {
try {
this._dc.send(msg);
} catch (e) {
return buffering(); ///custom buffering if DC exception caught
}
return true;
}
}
Also opened an issue on PeerJS GitHub: https://github.com/peers/peerjs/issues/291
Have a look at Transfer a file
This page shows how to transfer a file via WebRTC datachannels.
To accomplish this in an interoperable way, the file is split into chunks which are then transferred via the datachannel. The datachannel is reliable and ordered by default which is well-suited to filetransfers.
Although it doesn't use peerjs it can be adapted (to use peerjs) and the code is easy to follow and works without any issues.
I'm writing a web browser app (client-side) that downloads a huge amount of chunks from many locations and joins them to build a blob. Then that blob is saved to local filesystem as a common file. The way I'm doing this is by mean of ArrayBuffer objects and a blob.
var blob = new Blob([ArrayBuffer1, ArrayBuffer2, ArrayBuffer3, ...], {type: mimetype})
This works ok for small and medium-sized files (until 700 MB aprox), but browser crashes with larger files. I understand that RAM memory has its limits. The case is that I need to build the blob in order to generate a file, but I wanna allow users to download files much larger than that size (imagine, for instance, files about 8GB).
¿How can I build the blob avoiding size limits? LocalStorage is more limited than RAM, so I do not know what to use or how to do it.
It looks like you are just concatenating arrays of data together? Why not go about appending the array-buffers together in a giant blob. You'd have to iterate and append each arrayBuffer one at a time. You would seek to the end of the filewriter to append arrays. And for reading only portions of your giant blob back you get a slice of the blob to avoid the browser crashing.
Appending Function
function appendToFile(fPath,data,callback){
fs.root.getFile(fPath, {
create: false
}, function(fileEntry) {
fileEntry.createWriter(function(writer) {
writer.onwriteend = function(e) {
callback();
};
writer.seek(writer.length);
var blob = new Blob([data]);
writer.write(blob);
}, errorHandler);
}, errorHandler);
}
Again to avoid reading the entire blob back, only read portions/chunks of your giant blob when generating the file you mention.
Partial Read Function
function getPartialBlobFromFile(fPath,start,stop,callback){
fs.root.getFile(fPath, {
creation:false
}, function(fileEntry){
fileEntry.file(function(file){
var reader = new FileReader();
reader.onloadend = function(evt){
if(evt.target.readyState == FileReader.DONE){
callback(evt.target.result);
}
};
stop = Math.min(stop,file.size);
reader.readAsArrayBuffer(file.slice(start,stop));
}, errorHandler)
}, errorHandler);
}
You may have to keep indexes, perhaps in a header section of your giant BLOB - I would need to know more before I could give more precise feedback.
Update - avoiding quota limits, Temporary vs Persistent
in response to your comments below
It appears that you are running into issues with storage quota because you are using temporary storage. The following is a snippet borrowed from google found here
Temporary storage is shared among all web apps running in the browser. The shared pool can be up to half of the of available disk space. Storage already used by apps is included in the calculation of the shared pool; that is to say, the calculation is based on (available storage space + storage being used by apps) * .5 .
Each app can have up to 20% of the shared pool. As an example, if the total available disk space is 50 GB, the shared pool is 25 GB, and the app can have up to 5 GB. This is calculated from 20% (up to 5 GB) of half (up to 25 GB) of the available disk space (50 GB).
To avoid this limit you'll have to switch to persistent, it will allow you to quota up to the available free space on the disk. To do this use the following to initialize the File-system instead of the temporary storage request.
navigator.webkitPersistentStorage.requestQuota(1024*1024*5,
function(gB){
window.requestFileSystem(PERSISTENT, gB, onInitFs, errorHandler);
}, function(e){
console.log('Error', e);
})