Extracting gzip data in Javascript with Pako - encoding issues - javascript

I am trying to run what I expect is a very common use case:
I need to download a gzip file (of complex JSON datasets) from Amazon S3, and decompress(gunzip) it in Javascript. I have everything working correctly except the final 'inflate' step.
I am using Amazon Gateway, and have confirmed that the Gateway is properly transferring the compressed file (used Curl and 7-zip to verify the resulting data is coming out of the API). Unfortunately, when I try to inflate the data in Javascript with Pako, I am getting errors.
Here is my code (note: response.data is the binary data transferred from AWS):
apigClient.dataGet(params, {}, {})
.then( (response) => {
console.log(response); //shows response including header and data
const result = pako.inflate(new Uint8Array(response.data), { to: 'string' });
// ERROR HERE: 'buffer error'
}).catch ( (itemGetError) => {
console.log(itemGetError);
});
Also tried a version to do it splitting the binary data input into an array by adding the following before the inflate:
const charData = response.data.split('').map(function(x){return x.charCodeAt(0); });
const binData = new Uint8Array(charData);
const result = pako.inflate(binData, { to: 'string' });
//ERROR: incorrect header check
I suspect I have some sort of issue with the encoding of the data and I am not getting it into the proper format for Uint8Array to be meaningful.
Can anyone point me in the right direction to get this working?
For clarity:
As the code above is listed, I get a buffer error. If I drop the Uint8Array, and just try to process 'result.data' I get the error: 'incorrect header check', which is what makes me suspect that it is the encoding/format of my data which is the issue.
The original file was compressed in Java using GZIPOutputStream with
UTF-8 and then stored as a static file (i.e. randomname.gz).
The file is transferred through the AWS Gateway as binary, so it is
exactly the same coming out as the original file, so 'curl --output
filename.gz {URLtoS3Gateway}' === downloaded file from S3.
I had the same basic issue when I used the gateway to encode the binary data as 'base64', but did not try a whole lot around that effort, as it seems easier to work with the "real" binary data than to add the base64 encode/decode in the middle. If that is a needed step, I can add it back in.
I have also tried some of the example processing found halfway through this issue: https://github.com/nodeca/pako/issues/15, but that didn't help (I might be misunderstanding the binary format v. array v base64).

I was able to figure out my own problem. It was related to the format of the data being read in by Javascript (either Javascript itself or the Angular HttpClient implementation). I was reading in a "binary" format, but it was not the same as that recognized/used by pako. When I read the data in as base64, and then converted to binary with 'atob', I was able to get it working. Here is what I actually have implemented (starting at fetching from the S3 file storage).
1) Build AWS API Gateway that will read a previously stored *.gz file from S3.
Create a standard "get" API request to S3 that supports binary.
(http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings-configure-with-console.html)
Make sure the Gateway will recognize the input type by setting 'Binary types' (application/gzip worked for me, but others like application/binary-octet and image/png should work for other types of files besides *.gz). NOTE: that setting is under the main API selections list on the left of the API config screen.
Set the 'Content Handling' to "Convert to text(if needed)" by selecting the API Method/{GET} -> Integration Request Box and updating the 'Content Handling' item. (NOTE: the example in the link above recommends "passthrough". DON'T use that as it will pass the unreadable binary format.) This is the step that actually converts from binary to base64.
At this point you should be able to download a base64 verion of your binary file via the URL (test in browser or with Curl).
2) I then had the API Gateway generate the SDK and used the respective apiGClient.{get} call.
3) Within the call, translate the base64->binary->Uint8 and then decompress/inflate it. My code for that:
apigClient.myDataGet(params, {}, {})
.then( (response) => {
// HttpClient result is in response.data
// convert the incoming base64 -> binary
const strData = atob(response.data);
// split it into an array rather than a "string"
const charData = strData.split('').map(function(x){return x.charCodeAt(0); });
// convert to binary
const binData = new Uint8Array(charData);
// inflate
const result = pako.inflate(binData, { to: 'string' });
console.log(result);
}).catch ( (itemGetError) => {
console.log(itemGetError);
});
}

Related

Swift 5 - How to convert LONGBLOB/Buffer into Data

I am currently working on a project for school.
I have written an API using Express connected to a mysql database. And now I am writing the iOS app.
My problem is that I need to save profile pictures. So I saved the png data of the picture into a **LONGBLOB** into db and I want to recreate the image into a **UIImage**.
To do that I am trying to convert the buffer into ```Data```
So, the API is returning a buffer created that way:
let buffer = Buffer.from(ppData.data, 'binary').toString('base64');
And on the iOS side I tried:
guard let data = dict["data"] as? Data else {return nil}
Where dict["data"] is the buffer returned by the API.
But it always enter into the "else" part.
What am i doing wrong
Edit:
As what it was said in comments, I decoded the Base64 encoded string. Now the data are decoded but creating a UIImage from it, fails, without any details. What I tried is:
let image = UIImage(from: base64DecodedData)
For example:
guard let strData = dict["data"] as? String else {
return nil
}
guard let data = Data(base64Encoded: strData, options: .ignoreUnknownCharacters) else {
return nil
}
guard let picture = UIImage(data: data) else {
return nil
}
Thanks.
The mistake was not in the swift code part but in my API and database structure. After reading some MySQL and Node.js documentaion, I switched from LONGBLOB (which is totally oversized) to MEDIUMTEXT.
Also, in the API I was trying to create a buffer from binary data but not from a base64 string encoded data, so I removed this line:
let buffer = Buffer.from(ppData.data, 'binary').toString('base64');

How to deserialize dumped BSON with arbitrarily many documents in JavaScript?

I have a BSON file that comes from a mongoexport of a database. Let's assume the database is todo and the collection is items. Now I want to load the data offline into my RN app. Since the collection may contain arbitrarily many documents (lets say 2 currently), I want to use a method to parse the file however many documents it contains.
I have tried the following methods:
Use external bsondump executable.
We can convert the file to JSON using a external command
bsondump --outFile items.json items.bson
But I am developing a mobile app, so invoking a third-party executable in shell command is not ideal. Plus, the output contains several lines of one-line JSON objects, so the output is technically not a correct JSON file. So parsing afterwards is not graceful.
Use deserialize in js-bson library
According to the js-bson documentation, we can do
const bson = require('bson')
const fs = require('fs')
bson.deserialize(fs.readFileSync(PATH_HERE))
But this raises an error
Error: buffer length 173 must === bson size 94
and by adding this option,
bson.deserialize(fs.readFileSync(PATH_HERE), {
allowObjectSmallerThanBufferSize: true
})
the error is resolved but only returns the first document. Because the documentation doesn't mention that this function can only parse 1-document collection, I wonder if there is some option that enables multiple document reading.
Use deserializeStream in js-bson
let docs = []
bson.deserializeStream(fs.readFileSync(PATH_HERE), 0, 2, docs, 0)
But this methods requires a parameter of the document count (2 here).
Use bson-stream library
I am actually using react-native-fetch-blob instead of fs, and according to their documentation, the stream object does not have a pipe method, which is the one-and-only method demonstrated in bson-stream doc. So although this method does not require the number of documents, I am confused how to use it.
// fs
const BSONStream = require('bson-stream');
fs.createReadStream(PATH_HERE).pipe(new BSONStream()).on('data', callback);
// RNFetchBlob
const RNFetchBlob = require('react-native-fetch-blob');
RNFetchBlob.fs.readStream(PATH_HERE, ENCODING)
.then(stream => {
stream.open();
stream.can_we_pipe_here(new BSONStream())
stream.onData(callback)
});
Also I'm not sure about the above ENCODING above.
I have read the source code of js-bson and has figured out a way to solve the problem. I think it's better to keep a detailed record here:
Approach 1
Split documents by ourselves, and feed the documents to parser one-by-one.
BSON internal format
Let's say the .json dump of our todo/items.bson is
{_id: "someid#1", content: "Launch a manned rocket to the sun"}
{_id: "someid#2", content: "Wash my underwear"}
Which clearly violates the JSON syntax because there isn't an outer object wrapping things together.
The internal BSON is of similar shape, but it seems BSON allows this kind of multi-object stuffing in one file.
Then for each document, the four leading bytes indicates the length of this document, including this prefix itself and the suffix. The suffix is simply a 0 byte.
The final BSON file resembles
LLLLDDDDDDD0LLLLDDD0LLLLDDDDDDDDDDDDDDDDDDDDDD0...
where L is length, D is binary data, 0 is literally 0.
The algorithm
Therefore, we can develop a simple algorithm to get the document length, do the bson.deserialize with allowObjectSmallerThanBufferSize which will get a first document from buffer start, then slice off this document and repeat.
About encoding
One extra thing I mentioned is encoding in the React Native context. The libraries dealing with React Native persistent seems to all lack the support of reading the raw buffer from a file. The closest choice we have is base64, which is a string representation of any binary file. Then we use Buffer to convert base64 strings to buffers and feed into the algorithm above.
The code
deserialize.js
const BSON = require('bson');
function _getNextObjectSize(buffer) {
// this is how BSON
return buffer[0] | (buffer[1] << 8) | (buffer[2] << 16) | (buffer[3] << 24);
}
function deserialize(buffer, options) {
let _buffer = buffer;
let _result = [];
while (_buffer.length > 0) {
let nextSize = _getNextObjectSize(_buffer);
if (_buffer.length < nextSize) {
throw new Error("Corrupted BSON file: the last object is incomplete.");
}
else if (_buffer[nextSize - 1] !== 0) {
throw new Error(`Corrupted BSON file: the ${_result.length + 1}-th object does not end with 0.`);
}
let obj = BSON.deserialize(_buffer, {
...options,
allowObjectSmallerThanBufferSize: true,
promoteBuffers: true // Since BSON support raw buffer as data type, this config allows
// these buffers as is, which is valid in JS object but not in JSON
});
_result.push(obj);
_buffer = _buffer.slice(nextSize);
}
return _result;
}
module.exports = deserialize;
App.js
import RNFetchBlob from `rn-fetch-blob`;
const deserialize = require('./deserialize.js');
const Buffer = require('buffer/').Buffer;
RNFetchBlob.fs.readFile('...', 'base64')
.then(b64Data => Buffer.from(b64Data, 'base64'))
.then(bufferData => deserialize(bufferData))
.then(jsData => {/* Do anything here */})
Approach 2
The above method reads the files as a whole. Sometimes when we have a very large .bson file, the app may crash. Of course one can change the readFile to readStream above and add various checks to determine if the current chunk contains an ending of a document. This can be troublesome, and we are actually re-writing the bson-stream library!
So instead, we can create a RNFetchBlob file stream, and another bson-stream parsing stream. This brings us back to the attempt #4 in the question.
After reading the source code, the BSON parsing stream is inherited form a node.js Transform string. Instead of piping, we can manually forward chunks and events from onData and onEnd to on('data') and on('end').
Since bson-stream does not support passing options to underlying bson library calls, one may want to tweak the library source code a little in their own projects.

NodeJS base64 image encoding not quite working

I am using API to get user's profile photo from O365 cloud. Based on the doc it says response contains
*The binary data of the requested photo. *
I would like to use this image to be displayed by Data URI format. Ex:-
"
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg=="
where everything after data:image/png;base64,.... is image data in Base64.
I unable to get Base64 encoding working for the image data I am getting from API. Not sure if I am consuming raw binary img data correctly and converting it correctly to Base64.
To verify:
I upload my URI data here to online editor & it never parses my data/image. Instead if I upload an image here & to get Base64 data then upload URI in 1st editor it does display correctly. So I am guessing by Base64 conversation isn't correct.
CODE in nodejs:-
let base64ImgTry1 = Buffer.from('binary-data-from-api').toString('base64')
//OR
var base64ImgTry2 = new Buffer('binary-data-from-api','binary').toString('base64');
let imgURI_1 = 'data:image/png;base64,' +base64ImgTry1
let imgURI_2 = 'data:image/png;base64,' +base64ImgTry1
none of these imgURI_1 , imgURI_2 works. Not sure if I am going wrong consuming binary-data-from-api.
I also tried this NPM library https://www.npmjs.com/package/image-data-uri
in which i used this method -> encode(data, mediaType) where data was direct response from API.
https://www.site24x7.com/tools/datauri-to-image.html

Can I fetch a Readable Stream then convert to JSON client side?

I'm hoping to use a Google Sheets CSV as a data source. I'd like to fetch the CSV data on the client, then convert into JSON.
I'm able to fetch the CSV, which returns a ReadableStream. However, I'm not sure how to correctly read that response and convert into JSON.
I've found an npm package which should help convert the CSV data to JSON, but having a little bit of a time working with the stream.
Example: https://jsfiddle.net/21jeq1h5/3/
Can anyone point me in the right direction to use the ReadableStream?
Since CSV is simply text, the solution is the use the response.text() method of the fetch() API.
https://developer.mozilla.org/en-US/docs/Web/API/Body/text
Once the text is onboard, it is as simple as parsing the CSV out of the file. If you want objects as an output it is imperative the headers are included in the CSV (which yours are).
I've included the code snippet below. It won't run on SO because SO sets the origin to null on AJAX requests. So I've also included a link to a working codepen solution.
fetch('https://docs.google.com/spreadsheets/d/e/KEY&single=true&output=csv')
.then(response => response.text())
.then(transform);
function transform(str) {
let data = str.split('\n').map(i=>i.split(','));
let headers = data.shift();
let output = data.map(d=>{obj = {};headers.map((h,i)=>obj[headers[i]] = d[i]);return obj;});
console.log(output);
}
Pen
https://codepen.io/randycasburn/pen/xjzzvW?editors=0012
Edit
I should add that if you truly want this in a JSON string (per your question), you can run
json = JSON.stringify(output);

AWS S3 browser upload using HTTP POST gives invalid signature

I'm working on a website where the users should be able to upload video files to AWS. In order to avoid unnecessary traffic I would like the user to upload directly to AWS (and not through the API server). In order to not expose my secret key in the JavaScript I'm trying to generate a signature in the API. It does, however, tell me when I try to upload, that the signature does not match.
For signature generation I have been using http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-UsingHTTPPOST.html
On the backend I'm running C#.
I generate the signature using
string policy = $#"{{""expiration"":""{expiration}"",""conditions"":[{{""bucket"":""dennisjakobsentestbucket""}},[""starts-with"",""$key"",""""],{{""acl"":""private""}},[""starts-with"",""$Content-Type"",""""],{{""x-amz-algorithm"":""AWS4-HMAC-SHA256""}}]}}";
which generates the following
{"expiration":"2016-11-27T13:59:32Z","conditions":[{"bucket":"dennisjakobsentestbucket"},["starts-with","$key",""],{"acl":"private"},["starts-with","$Content-Type",""],{"x-amz-algorithm":"AWS4-HMAC-SHA256"}]}
based on http://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-HTTPPOSTConstructPolicy.html (I base64 encode the policy). I have tried to keep it very simple, just as a starting point.
For generating the signature, I use code found on the AWS site.
static byte[] HmacSHA256(String data, byte[] key)
{
String algorithm = "HmacSHA256";
KeyedHashAlgorithm kha = KeyedHashAlgorithm.Create(algorithm);
kha.Key = key;
return kha.ComputeHash(Encoding.UTF8.GetBytes(data));
}
static byte[] GetSignatureKey(String key, String dateStamp, String regionName, String serviceName)
{
byte[] kSecret = Encoding.UTF8.GetBytes(("AWS4" + key).ToCharArray());
byte[] kDate = HmacSHA256(dateStamp, kSecret);
byte[] kRegion = HmacSHA256(regionName, kDate);
byte[] kService = HmacSHA256(serviceName, kRegion);
byte[] kSigning = HmacSHA256("aws4_request", kService);
return kSigning;
}
Which I use like this:
byte[] signingKey = GetSignatureKey(appSettings["aws:SecretKey"], dateString, appSettings["aws:Region"], "s3");
byte[] signature = HmacSHA256(encodedPolicy, signingKey);
where dateString is on the format yyyymmdd
I POST information from JavaScript using
let xmlHttpRequest = new XMLHttpRequest();
let formData = new FormData();
formData.append("key", "<path-to-upload-location>");
formData.append("acl", signature.acl); // private
formData.append("Content-Type", "$Content-Type");
formData.append("AWSAccessKeyId", signature.accessKey);
formData.append("policy", signature.policy); //base64 of policy
formData.append("x-amz-credential", signature.credentials); // <accesskey>/20161126/eu-west-1/s3/aws4_request
formData.append("x-amz-date", signature.date);
formData.append("x-amz-algorithm", "AWS4-HMAC-SHA256");
formData.append("Signature", signature.signature);
formData.append("file", file);
xmlHttpRequest.open("post", "http://<bucketname>.s3-eu-west-1.amazonaws.com/");
xmlHttpRequest.send(formData);
I have been using UTF8 everywhere as prescribed by AWS. In their examples the signature is on a hex format, which I have tried as well.
No matter what I try I get an error 403
The request signature we calculated does not match the signature you provided. Check your key and signing method.
My policy on AWS has "s3:Get*", "s3:Put*"
Am I missing something or does it just work completely different than what I expect?
Edit: The answer below is one of the steps. The other is that AWS distinguish between upper and lowercase hex strings. 0xFF != 0xff in the eyes of AWS. They want the signature in all lowercase.
You are generating the signature using Signature Version 4, but you are constructing the form as though you were using Signature Version 2... well, sort of.
formData.append("AWSAccessKeyId", signature.accessKey);
That's V2. It shouldn't be here at all.
formData.append("x-amz-credential", signature.credentials); // <accesskey>/20161126/eu-west-1/s3/aws4_request
This is V4. Note the redundant submission of the AWS Access Key ID here and above. This one is probably correct, although the examples have capitalization like X-Amz-Credential.
formData.append("x-amz-algorithm", "AWS4-HMAC-SHA256");
That is also correct, except it may need to be X-Amz-Algorithm. (The example seems to imply that capitalization is ignored).
formData.append("Signature", signature.signature);
This one is incorrect. This should be X-Amz-Signature. V4 signatures are hex, so that is what you should have here. V2 signatures are base64.
There's a full V4 example here, which even provides you with an example aws key and secret, date, region, bucket name, etc., that you can use with your code to verify that you indeed get the same response. The form won't actually work but the important question is whether your code can generate the same form, policy, and signature.
For any given request, there is only ever exactly one correct signature; however, for any given policy, there may be more than one valid JSON encoding (due to JSON's flexibility with whitespace) -- but for any given JSON encoding there is only one possible valid base64-encoding of the policy. This means that your code, using the example data, is certified as working correctly if it generates exactly the same form and signature as shown in the example -- and it means that your code is proven invalid if it generates the same form and policy with a different signature -- but there is a third possibility: the test actually proves nothing conclusive about your code if your code generates a different base64 encoding of the policy, because that will necessarily change the signature to not match, yet might still be a valid policy.
Note that Signature V2 is only suported on older S3 regions, while Signature V4 is supported by all S3 regions, so, even though you could alternately fix this by making your entire signing process use V2, that wouldn't be recommended.
Note also that The request signature we calculated does not match the signature you provided. Check your key and signing method does not tell you anything about whether the bucket policy or any users policies allow or deny the request. This error is not a permissions error. It will be thrown prior to the permissions checks, based solely on the validity of the signature, not whether the AWS Access Key id is authorized to perform the requested operation, which is something that is only tested after the signature is validated.
I suggest you to create a pair auth token with permission to POST only, and send an http request like this:
require 'rest-client'
class S3Uploader
def initialize
#options = {
aws_access_key_id: "ACCESS_KEY",
aws_secret_access_key: "ACCESS_SECRET",
bucket: "BUCKET",
acl: "private",
expiration: 3.hours.from_now.utc,
max_file_size: 524288000
}
end
def fields
{
:key => key,
:acl => #options[:acl],
:policy => policy,
:signature => signature,
"AWSAccessKeyId" => #options[:aws_access_key_id],
:success_action_status => "201"
}
end
def key
#key ||= "temp/${filename}"
end
def url
"http://#{#options[:bucket]}.s3.amazonaws.com/"
end
def policy
Base64.encode64(policy_data.to_json).delete("\n")
end
def policy_data
{
expiration: #options[:expiration],
conditions: [
["starts-with", "$key", ""],
["content-length-range", 0, #options[:max_file_size]],
{ bucket: #options[:bucket] },
{ acl: #options[:acl] },
{ success_action_status: "201" }
]
}
end
def signature
Base64.encode64(
OpenSSL::HMAC.digest(
OpenSSL::Digest.new("sha1"),
#options[:aws_secret_access_key], policy
)
).delete("\n")
end
end
uploader = S3Uploader.new
puts uploader.fields
puts uploader.url
begin
RestClient.post(uploader.url, uploader.fields.merge(file: File.new('51bb26652134e98eae931fbaa10dc3a1.jpeg'), :multipart => true))
rescue RestClient::ExceptionWithResponse => e
puts e.response
end

Categories

Resources