I'm looking for a way to check if two files/documents (PDF, JPG, PNG) are the same.
If a user selects one or more files, I convert the File Object to a javascript object. I'm keeping the size, type, filename and I create a blob so I can store the object in my redux store.
When a user selects another file I want to compare this file with the files that already has been added (so I can set the same blobURL).
I can check if two files has the same name, type and size but there is a change that all these properties match and the files aren't the same so I would like to check the file path. Unfortunately, that property isn't provided in the File Object. Is there a way to get this or another solution to make sure both files are (not) the same?
No there is no way to get the real path, but that doesn't matter.
All you have access to is a FakePath, in the form C:\fakepath\yourfilename.ext (from input.value), and sometimes a bit more if you gained access to a directory.
But anyway you probably don't want to check that two files came from the same place on the hard-disk, this has no importance whatsoever, since they could very well have been modified since first access.
What you can and probably want to do however, is to check if their content
are the same.
For this, you can compare their byte content:
inp1.onchange = inp2.onchange = e => {
const file1 = inp1.files[0];
const file2 = inp2.files[0];
if(!file1 || !file2) return;
compare(file1, file2)
.then(res => console.log('are same ? ', res));
};
function compare(file1, file2) {
// they don't have the same size, they are different
if(file1.size !== file2.size)
return Promise.resolve(false);
// load both as ArrayBuffers
return Promise.all([
readAsArrayBuffer(file1),
readAsArrayBuffer(file2)
]).then(([buf1, buf2]) => {
// create views over our ArrayBuffers
const arr1 = new Uint8Array(buf1);
const arr2 = new Uint8Array(buf2);
return !arr1.some((val, i) =>
arr2[i] !== val // search for diffs
);
});
}
function readAsArrayBuffer(file) {
// we could also have used a FileReader,
// but Response is conveniently already Promise based
return new Response(file).arrayBuffer();
}
<input type="file" id="inp1">
<input type="file" id="inp2">
Now, you say that you don't have access to the original Files anymore, and that you can only store serializable data. In this case, one less performant solution is to generate a hash from your Files.
This can be done on front-end, thanks to the SubtleCrypto API,
but this operation being quite slow for big files, you may want to do it systematically from server instead of doing it on front, and to only do it on front when the sizes are the same:
// a fake storage object like OP has
const store = [
{ /* an utf-8 text file whose content is `hello world`*/
name: "helloworld.txt",
size: 11,
hash: "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9" // generated from server
}
];
// the smae file as the one we fakely stored
const sameFile = new File(['hello world'], 'same-file.txt');
// a file the same size as the one we stored (needs deep check)
const sameSizeButDifferentContent = new File(['random text'], 'differentcontent.txt');
inp.onchange = e => tryToStore(inp.files[0]);
tryToStore(sameFile); // false
tryToStore(sameSizeButDifferentContent);
// hash: "a4e082f56a58e0855a6abbf2f4ebd08895ff85ea80e634e02b210def84b557dd"
function tryToStore(file) {
checkShouldStore(file)
.then(result => {
console.log('should store', file.name, result)
if(result)
store.push(result);
// this is just for demo, in your case you would do it on the server
if(!result.hash)
generateHash(file).then(h => result.hash = h);
});
}
async function checkShouldStore(file) {
const {name, size} = file;
const toStore = {name, size, file}; // create a wrapper object
// first check against the sizes (fast checking)
const sameSizes = store.filter(obj => obj.size === file.size);
// only if some files have the same size
if(sameSizes.length) {
// then we generate a hash directly
const hash = await generateHash(file);
if(sameSizes.some(obj => obj.hash === hash)) {
return false; // is already in our store
}
toStore.hash = hash; // save the hash so we don't have to generate it on server
}
return toStore;
}
async function generateHash(file) {
// read as ArrayBuffer
const buf = await new Response(file).arrayBuffer();
// generate SHA-256 hash using crypto API
const hash_buf = await crypto.subtle.digest("SHA-256", buf);
// convert to Hex
const hash_arr = [...new Uint8Array(hash_buf)]
.map(v => v.toString(16).padStart(2, "0"));
return hash_arr.join('');
}
<input type="file" id="inp">
Related
Right now, I have to precompute the floyd warshall cost and path matrix every time my server loads.
This is for a map which is N by N. We only have a couple of maps so I think I should precompute into variables before the server even starts up.
I have 4 variables.
Cost -> Matrix of values.
Path -> Matrix of tuples
TupleVal -> Tuple as a key mapped to a number (Map() object in JS)
IndexVal -> Number as a key mapped to a Tuple (Map() object in JS)'
How can I compute these 4 variables ONCE, and store it somewhere such that it is relatively
easy to retrieve? Should this be done through JSON? If so how can I write to a JSON file and read from a JSON file these specific datastructures?
//This is the map I use. A Tuple is converted to a string which maps to number
class ArrayKeyedMap extends Map {
get(array) {
return super.get(this.toKey(array));
}
set(array, value) {
return super.set(this.toKey(array), value);
}
has(array) {
return super.has(this.toKey(array));
}
delete(array) {
return super.delete(this.toKey(array));
}
toKey(array) {
return JSON.stringify(array);
}
}
.
.
.
// This is what I return ForbiddenVals and tupleVal are arraykeyedmap object
// index is a map object, mapping a number to a tuple. (x,y)
// path and cost are a 2 dimensional array, which contains numbers.
return [path, cost, tupleVal, index, ForbiddenVals]
Above, is basically the data structures I use. What is the easiest way to compute these values ONCE, so I never have to ever again unless I make a change to the maps?
Thank you
If you're using Express you can save the JSON to a file with fs
const fs = require('fs');
const path = require('path');
//resolve a relative path to an absolute one
const cacheDir = path.resolve('./json');
//the name of the json file, can be anything in any directory
const jsonFile = `${cacheDir}/json/n_x_n.map.json`;
let data;
//create the cache directories if they don't exist
if(!fs.existsSync(`${cacheDir}/json`)) {
fs.mkdirSync(`${cacheDir}/json`, {recursive: true});
}
//if the JSON file does not exist, generate the json and save it to the disk
if(!fs.existsSync(jsonFile)) {
data = genData(); //this is where you generate hte values once
fs.writeFile(jsonFile, JSON.stringify(data), (err) => {
if(err) {
console.error('Couldn\'t save JSON', err);
} else {
console.log('Saved JSON');
}
}
} else {
//otherwise load the JSON from the file
data = JSON.parse(fs.readFileSync(jsonFile));
}
//do whatever with the data
If it is just a website you could use localstorage
//load the data from localStorage
let data = localStorage.getItem('json');
//if there is no data in the localStorage, generate the data and save it
if(!data) {
data = genData();
localStorage.setItem('json', data);
}
//do whatever with the data
Goal: Objects will be pushed to a readable stream and then saved in a separate .csv depending on what channel (Email, Push, In-App) they come from.
Problem: I am unable to separate out the streams in to different .pipe() "lines" so all .csv logs receive only their channel specific event objects. But in the current iteration all of the .csv files created by the Writestream are receiving the event objects from all channels.
Questions:
Can I dynamically create the multiple channel "pipe() lines" in the setup() function programmatically or is the current way I am approaching this correct?
Is this manual creation of the "pipe() lines" the reason all of the .csv's are being populated with events? Can this be solved with one "pipe() line" and dynamic routing?
A brief explanation of the code below:
setup() calls makeStreams() - creates an object with a Readable and a Writable (rotating file system Writable stream) (setup() is an unnecessary function right now but will hold more setup tasks later.)
pushStream() is called when an inbound event occurs and pushes an object like: {Email: {queryParam:1, queryParam:2, etc.}} The event is sorted by the highest level obj (in this case "Email") and then is pushed to the correct writable stream which in theory should be ported to the correct writable stream.
Unfortunately this isn't the case, it's sending the event object to all of the writable streams. How can I send it to only the correct stream?
CODE:
const Readable = require('stream').Readable
const Json2csvTransform = require('json2csv').Transform;
var rfs = require("rotating-file-stream");
const channelTypes = ['Push Notification', 'Email', 'In-app Message']
var streamArr = setup(channelTypes);
const opts = {};
const transformOpts = {
objectMode: true
};
const json2csv = new Json2csvTransform(opts, transformOpts);
function setup(list) {
console.log("Setting up streams...")
streamArr = makeStreams(list) //makes streams out of each endpoint
return streamArr
}
//Stream Builder for Logging Based Upon Channel Name
function makeStreams(listArray) {
listArray = ['Push Notification', 'Email', 'In-app Message']
var length = listArray.length
var streamObjs = {}
for (var name = 0; name < length; name++) {
var fileName = listArray[name] + '.csv'
const readStream = new Readable({
objectMode: true,
read() {}
})
const writeStream = rfs(fileName, {
size: "50M", // rotate every 50 MegaBytes written
interval: "1d" // rotate daily
//compress: "gzip" // compress rotated files
});
var objName = listArray[name]
var obj = {
instream: readStream,
outstream: writeStream
}
streamObjs[objName] = obj
}
return streamObjs
}
function pushStream(obj) {
var keys = Object.keys(obj)
if (streamArr[keys]) {
streamArr[keys].instream.push(obj[keys])
} else {
console.log("event without a matching channel error")
}
}
//Had to make each pipe line here manually. Can this be improved? Is it the reason all of the files are receiving all events?
streamArr['Email'].instream.pipe(json2csv).pipe(streamArr['Email'].outstream)
streamArr['In-app Message'].instream.pipe(json2csv).pipe(streamArr['In-app Message'].outstream)
streamArr['Push Notification'].instream.pipe(json2csv).pipe(streamArr['Push Notification'].outstream)
module.exports = {
makeStreams,
pushStream,
setup
}
This question already has answers here:
Get Download URL from file uploaded with Cloud Functions for Firebase
(25 answers)
Closed 4 years ago.
I have a cloud function that generates a set of resized images for every image uploaded. This is triggered with the onFinalize() hook.
Cloud Function to resize an uploaded image:
export const onImageUpload = functions
.runWith({
timeoutSeconds: 120,
memory: '1GB'
})
.storage
.object()
.onFinalize(async object => {
const bucket = admin.storage().bucket(object.bucket)
const filePath = object.name
const fileName = filePath.split('/').pop()
const bucketDir = dirname(filePath)
const workingDir = join(tmpdir(), 'resizes')
const tmpFilePath = join(workingDir, fileName)
if (fileName.includes('resize#') || !object.contentType.includes('image')) {
return false
}
await fs.ensureDir(workingDir)
await bucket.file(filePath).download({
destination: tmpFilePath
})
const sizes = [
500,
1000
]
const uploadPromises = sizes.map(async size => {
const resizeName = `resize#${size}_${fileName}`
const resizePath = join(workingDir, resizeName)
await sharp(tmpFilePath)
.resize(size, null)
.toFile(resizePath)
return bucket.upload(resizePath, {
destination: join(bucketDir, resizeName)
})
})
// I need to now update my Firestore database with the public URL.
// ...but how do I get that here?
await Promise.all(uploadPromises)
return fs.remove(workingDir)
})
That's all well and good and it works, but I also need to somehow retrieve the public URL for each of these images, in order to write the values into my Firestore.
I can do this on the frontend using getDownloadURL(), but I'm not sure how to do it from within a Cloud Function from the newly generated images.
As I see it, this needs to happen on the backend anyway, as my frontend has no way of knowing when the images have been processed.
Only works on the client:
const storageRef = firebase.storage().ref()
const url = await storageRef.child(`images/${image.name}`).getDownloadURL()
Any ideas?
Answer (with caveats):
This question was technically answered correctly by #sergio below, but I just wanted to point out some additional things that need doing before it can work.
It appears that the 'expires' parameter of getSignedUrl() has to be a number according to TypeScript. So, to make it work I had to pass a future date represented as an epoch (milliseconds) like 3589660800000.
I needed to pass credentials to admin.initializeApp() in order to use this method. You need to generate a service account key in your Firebase admin. See here: https://firebase.google.com/docs/admin/setup?authuser=1
Hope this helps someone else out too.
I believe the promises returned from bucket upload contain a reference to the File, which then you can use to obtain a signed URL.
Something like (not tested):
const data = await bucket.upload(resizePath, { destination: join(bucketDir, resizeName) });
const file = data[0];
const signedUrlData = await file.getSignedUrl({ action: 'read', expires: '03-17-2025'});
const url = signedUrlData[0];
I'm building a tool that will clean up a JSON file containing localization strings if they are no longer in use in the source code.
First, I parse the localization file into an array with all the id's that are (or no longer are) used in the source code to get the string value in the right language.
so I have an array looking something like this:
const ids = ['home.title', 'home.description', 'menu.contact', 'menu.social'];
etc. you get the point.
I'm using node.js fs promisified readFile and glob to search .js source code files like this:
const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
const results = jsFiles.map(async file => {
const filePath = path.join(directory, file);
return readFile(filePath, 'utf8').then((data) => {
// handle match here
}).catch(console.log);
});
I also have Ramda available for fancy list/collection functions, but no other libraries.
So, I will be able to loop through the ids array and for each item scan the entire source code for a match with the function from above. But that seems a bit overkill to scan the entire source code times ids.length. The ids array is on around 400 ids' and the source code is hundreds of large files.
To avoid O(M*N), is there a way to match the entire array with the entire source code, and discard the not matched array items? Or what would be the best practice here?
current solution:
const cleanLocal = async () => {
const localIdList = Object.keys(await getLocalMap());
const matches = [];
localIdList.map(async id => {
const directory = path.join(__dirname, '..');
const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'});
jsFiles.map(async file => {
const filePath = path.join(directory, file);
return readFile(filePath, 'utf8').then((data) => {
if (data.indexOf(id) >= 0) {
console.log(id);
matches.push(id);
}
}).catch(console.log);
});
});
};
You can't avoid the O(M*N) complexity in this case.
However, to improve performance you can switch the order of your operations: first loop over the files and then loop over the array. This is because looping over the files is a costly IO operation, while looping over the array is a fast memory operation.
In your code, you have M memory operations and M*N IO (filesystem) operations.
If you first loop over the files, you would have N IO operations and M*N memory operations.
As it is not possible to avoid O(M*N) in this case I have only been able to optimize this search function by looping through the source files once and then over the ids' for each file as proposed by #mihai as an optimization opportunity.
The end result looks like this:
const cleanLocal = async () => {
const localIdList = Object.keys(await getLocalMap()); // ids' array
const matches = [];
const directory = path.join(__dirname, '..');
const jsFiles = await globbing('./**/*.js', {cwd: directory, ignore: './**/*test.js'}); // list of files to scan
const results = jsFiles.map(async file => {
const filePath = path.join(directory, file);
return readFile(filePath, 'utf8').then((data) => {
localIdList.map(id => {
if (R.contains(id, data)) { // R = ramda.js
matches.push(id);
}
});
}).catch(console.log);
});
await Promise.all(results);
console.log('matches: ' + R.uniq(matches).length);
console.log('in local.json: ' + localIdList.length);
};
Please let me know if there are any other way to optimize this.
I have a function that writes a file to a directory:
response.pipe(fs.createWriteStream(fullPath))
But before that I want to check if the path already exist, and if so, add a suffix, e.g. file_1.txt (and if that exist, create file_2.txt instead...etc):
// Check if the path already exist
let fullPath = "C:/test/file.txt"
let dir = "C:/test/"
let fileName = "file"
let fileExt = ".txt"
if (fs.existsSync(fullPath)) {
// I tried using a while loop but I end up making it too complicated
...
}
// Write file to disk
response.pipe(fs.createWriteStream(destinationPath))
Question
How do I properly / efficiently do that?
The while loop is the correct way.
// Check if the path already exist
let fullPath = "C:/test/file.txt"
let dir = "C:/test/"
let fileName = "file"
let fileExt = ".txt"
let num = 0;
while (fs.existsSync(fullPath)) {
fullPath = `${dir}${fileName}_${num++}${fileExt}`;
}
After this, fullPath contains the first nonexistent file.
Note that there's a potential race condition. Some other process could create the file after your loop finishes.
If all of your file names are named the same thing + "_#".txt, I think the most (one of the most) efficient ways to check that would be something along the lines of:
Get all files from the directory
var files = [];
fs.readdir(dir, (err, files) => {
files.forEach(file => {
files.push(file);
});
})
You would then sort the array (could be expensive if a lot of files)... then last record would be the highest number which you can easily extract.
Another thing you could do is find the file which has the latest creation date using similar approach using the Stats class from FS.
Get all files from the directory and sort them:
var files = fs.readdirSync(pathName)
.map(function(v) {
return { name:v};
})
.sort(function(a, b) { return a.nam > b.name; })
.map(function(v) { return v.name; });
And by latest creation date.
var files = fs.readdirSync(dir)
.map(function(v) {
return { name:v,
time:fs.statSync(dir + v).mtime.getTime()
};
})
.sort(function(a, b) { return a.time - b.time; })
.map(function(v) { return v.name; });