I am writing a script which at its core parses a .csv file for certain columns storing them in an array and then writes the contents to another .csv file. I am able to parse the file using fast-csv and have confirmed in the terminal that my array is in the correct format. However, when I attempt to write this array using the fast-csv to a .csv file, the contents never appear in the file and no errors are thrown. I have validated that the array is being passed all the way through to the callback. In addition I have gone so far as to replace that variable in the writeToPath function with a simple array and still no luck. Any assistance would be appreciated.
const processFile = (fileName, file, cb) => {
let writeData = []
let tempArray = []
csv.fromPath(basePath + file, {ignoreEmpty: false, headers: false})
.on("data", function(data){
if (data[0] != ''){
[startDate, endDate] = fileName
tempArray[0] = data[0]
tempArray[1] = data[1]
tempArray[2] = data[2]
tempArray[3] = data[3]
tempArray[4] = data[4]
tempArray[5] = data[8]
tempArray[6] = ""
tempArray[7] = ""
tempArray[8] = ""
tempArray[9] = startDate
tempArray[10] = endDate
writeData[i] = tempArray
writeData.shift()
tempArray = []
i++
}
})
.on("end", () => {
console.log('end')
}).on('finish', (() => {
cb(writeData)
}));
}
processFile(fileName, file, (csvData) => {
console.log(csvData)
csv.writeToPath('./working-files/top.csv', {headers: false}, csvData).on("finish", () => {
console.log('done')
})
Unfortunately, without any context to the dataset you are using, there is only so much I can suggest. The variables needed to debug this properly would be: the file, the file names used and whatever 'i' is. If you can update this then I'll be happy to take another look.
I would suggest going back and logging the variables after each step that would modify them, hopefully then you'll get a better picture as what is going wrong.
I understand this isn't a complete answer and it will probably get removed but I don't have the 50 needed reputation to make a comment.
Related
I'm uploading some Excel files to the server to do something with the data. I'm using xlsx to read and Tabulator to present the uploaded data in the frontend.
This is the loop in which i read the data and push it into an array:
for (const k of raw) {
let columns: Column[] = []
//new Array each loop
let tmpData: any[] = []
//read data from Excel files
const wb = read(await k.file.arrayBuffer(), {
sheets: k.type.sheet
})
tmpData = utils.sheet_to_json(wb.Sheets[k.type.sheet], {
range: k.type.row
})
//do something with the data of the files
tmpData.forEach((item, index) => {
if (Object.entries(item).length < 2) {
data.splice(index, 1);
}
});
tmpData.forEach((item) => {
Object.keys(item).forEach((key) => {
if (key.includes('EMPTY')) {
delete item[key]
}
})
})
//seperate the first row to extract columns
Object.keys(tmpData[0]).forEach((key) => {
columns.push(new Column(key, key, 'input', true, 300))
})
//create columns Array containing the headers of all files
columnsObject.push(new ColumnsType(columns, k.type.name))
//push the data into an Array
data.push(new TablesType(tmpData, k.type.name))
//troubleshooting
data.forEach(item => {
console.log(item.vendor)
})
console.log('\n ----- \n')
}
This is an example of the output of the troubleshooting where the problem accurs:
Ciena
-----
Ciena
PaloAlto
-----
PaloAlto
Infinera
-----
PaloAlto
Infinera
Arista
-----
In the third loop the Ciena object is missing. I tried to do it with a traditional for loop already, but the same issue accured with some constellation of uploaded files.
The data is reading just fine for all files, just the array is throwing it away.
//do something with the data of the files
tmpData.forEach((item, index) => {
if (Object.entries(item).length < 2) {
data.splice(index, 1);
}
});
Why even write the comment if it is as meaningless as this? What is this supposed to do? Something?!
This is btw. also the place where your mistake most likely lies. You're splicing your data array. (Meaning deleting stuff from your data array) This would explain how your data array loses some of it's elements. As I don't know what this part of your code is actually supposed to to, I can't tell you how to fix it, just that this is most likely the origin of your troubles.
I'm quite new to this so please bare with me.
I'm currently trying to put together an HTML report building tool.
I have 2 html reports that are being generated by 3rd parties.
I'd like to be able to upload them, parse them, save the specific parse to a variable and update my template which is in a folder on the server.
Currently, I'm using express, and node-html-parser.
I have no issues getting the HTML files uploaded to a directory on the server and parsing those files.
My issue comes in when I try to update the variable I want with the string that I want.
const fs = require('fs');
const htmlparse = require('node-html-parser').parse;
var element1
var element2
function datatoString(){
fs.readFile(__dirname + "/api/upload/" + file1, 'utf8', (err,html)=>{
const root = htmlparse(html);
head = root.querySelector('head');
element1 = head.toString();
console.log("-------------break------------")
console.log(head.toString()); //This works and shows me my parsed info
});
fs.readFile(__dirname + "/api/upload/" + file2, 'utf8', (err,html)=>{
const root = htmlparse(html);
body = root.querySelector('body');
element2 = body.toString();
console.log("-------------break------------")
console.log(body.toString()); //This works and shows me my parsed info
});
};
Now, ideally I'd like to call back this function in a GET request and have it update the variables. From there, I would use those strings to modify a template HTML file that's sitting in a folder on my server. I'd like to be able to replace html elements in the template with those updated variables. Once updated, id push the response to download the file.
Every time I try this with a fs.writeFile , it seems to just say the variables 'element1' or 'element2' are empty.
I'm not even sure if I can write a local HTML file and save it the same way you'd normally do it with the DOM.
I'm lost at this point. I would assume I'd need to read then write the template html file. but how i'd go about editing it, I have no clue. Also, the variables being empty is stumping me. I know it's due to the fact that fs.readFile is asynchronous, but then how would I go about reading and writing files in the manner I am looking for?
any help would be much appreciated!
You have two possibilities: use fs.readFileSync, which is easy to use but since it 's synchronous, it blocks your thread (and makes your server unresponsive while the files are being read). The more elegant solution is to use the Promise version and to await it.
const promises = require('fs').promises;
const htmlparse = require('node-html-parser').parse;
let element1, element2;
async function datatoString() {
let html = await promises.readFile(__dirname + "/api/upload/" + file1, 'utf8');
let root = htmlparse(html);
head = root.querySelector('head');
element1 = head.toString();
console.log("-------------break------------")
console.log(element1);
html = await promises.readFile(__dirname + "/api/upload/" + file2, 'utf8');
root = htmlparse(html);
body = root.querySelector('body');
element2 = body.toString();
console.log("-------------break------------")
console.log(element2);
};
You have two options here. One is to block the thread and wait for each consecutive read to end before ending the function.
function datatoString() {
let element1, element2;
fs.readFileSync(... element1 = 'foo'});
fs.readFileSync(... element2 = 'bar'});
return [element1, element2];
}
app.get('/example', (req, res) => {
...
const [element1, element2] = datatoString();
}
The other would be to use async and read both files at the same time, then return whenever they both finish:
function datatoString() {
return new Promise((resolve, reject) => {
let element1, element2;
fs.readFile(... element1 = 'foo', if (element2) resolve([element1, element2]);});
fs.readFile(... element2 = 'bar', if (element1) resolve([element1, element2]);});
});
}
app.get('/example', async (req, res) => {
...
const [element1, element2] = await datatoString();
}
I need to remove a substring (that appears only in specific known lines of the file) from a file.
there are simple solutions of reading all file data to a string, removing the substring, and then write the fixed data to the file.
here is a code I found in here:
Node js - Remove string from text file
var data = fs.readFileSync('banlist.txt', 'utf-8');
var newValue = data.replace(new RegEx("STRING_TO_REMOVE"), '');
fs.writeFileSync('banlist.txt', newValue, 'utf-8');
My problem is, that the file is huge - up to billion lines of logs, so I can't read all content to the memory.
Why not a simple transform stream and replace()? replace can take a callback as second parameter i.e. .replace(/bad1|bad2|bad3/g, filterWords) in case you need to replace words rather than remove them completely.
const fs = require("fs")
const { pipeline, Transform } = require("stream")
const { join } = require("path")
const readFile = fs.createReadStream("./words.txt")
const writeFile = fs.createWriteStream(
join(__dirname, "words-filtered.txt"),
"utf8"
)
const transformFile = new Transform({
transform(chunk, enc, next) {
let c = chunk.toString().replace(/bad/g, "replaced")
this.push(c)
next()
},
})
pipeline(readFile, transformFile, writeFile, (err) => {
if (err) {
console.log(err.message)
}
})
https://nodejs.org/api/fs.html#fs_fs_read_fd_buffer_offset_length_position_callback
Dont read the whole file at once... read a small buffered piece of it.. and look for your input with that buffered piece.... then increment your buffer starting position and do it again.... would recommend having each buffer start not at the end of the previous buffer... but overlap by at least the expected size of the data being sought so that you dont run into half of your data being at end of one buffer and other half at beginning of the other
You could use a file read stream. However, you would have to find a way to detect if the read data only contains part of the result.
What you probably want to do is use streams so that you are writing after partial reads. this example could probably work for you. you need to copy over the output text file ".tmp" over the original to get the same behavior in your question. It works by reading a chunk and then looking to see if you've come across a new line. then it processes that line, writes it, then removes it from the buffer. This should help with your memory problem.
var fs = require("fs");
var readStream = fs.createReadStream("./BFFile.txt", { encoding: "utf-8" });
var writeStream = fs.createWriteStream("./BFFile.txt.tmp");
const STRING_TO_REMOVE = "badword";
var buffer = ""
readStream.on("data", (chunk) => {
buffer += chunk;
var indexOfNewLine = buffer.search("\n");
while (indexOfNewLine !== -1) {
var line = buffer.substring(0, indexOfNewLine + 1);
buffer = buffer.substring(indexOfNewLine + 1, buffer.length);
line = line.replace(new RegExp(STRING_TO_REMOVE), "");
writeStream.write(line);
indexOfNewLine = buffer.search("\n");
}
})
readStream.on("end", () => {
buffer = buffer.replace(new RegExp(STRING_TO_REMOVE), "");
writeStream.write(buffer);
writeStream.close();
})
There are a few assumptions with this solution such as the data being UTF-8, there only being 1 bad word potentially per line, every line having some text (I didn't test for that), and that every line ends with new line and not some other line ending.
Heres the docs for streams in Node
another thought I had was to use pipe and a transform stream but that seems like over kill.
You can use this code to do it. I'm using fs stream. it's created for read huge files in small memory by chunks. docs
const fs = require('fs');
const readStream = fs.createReadStream('./XXXXX');
const writeStream = fs.createWriteStream('./XXXXXXX');
readStream.on('data', (chunk) => {
const data = chunk.toString().replace('STRING_TO_REMOVE', 'XXXXXX');
writeStream.write(data);
});
readStream.on('end', () => {
writeStream.close();
});
There are two things I want to do:
I want to create a new array of objects from an existing object,
And increment the object so each object can have a count id of 1,2,3 etc
My issue is that when I write to the file it writes only 1 random object to the file and the rest don't show. There are so errors and all the objects have the same increment value. Please explain what I am doing wrong. Thanks.
Code:
data.json:
{
"users":[
{
"name":"mike",
"category":[
{
"title":"cook",
}
],
"store":{
"location":"uptown",
"city":"ulis"
},
"account":{
"type":"regular",
"payment":[
"active":false
]
}
}
]
}
index.js:
const appData = ('./data.json')
const fs = require('fs');
let newObject = {}
appData.forEach(function(items){
let x = items
let numincrement = 1++
newObject.name = x.name
newObject.count = numincrement
newObject.categories = x.categories
newObject.store = x.store
newObject.account = x.account
fs.writeFile('./temp.json', JSON.stringify(newObject, null, 2),'utf8' , function(err, data) {
// console.log(data)
if(err) {
console.log(err)
return
} else{
console.log('created')
}
})
})
There are a whole bunch of problems here:
You're just rewriting the same object over and over to the file. fs.writeFile() rewrites the entire file. It does not append to the file. In addition, you cannot append to the JSON format either. So, this code will only every write one object to the file.
To append new JSON data to what's in the existing file, you would have to read in the existing JSON, parse it to convert it to a Javascript array, then add new items onto the array, then convert back to JSON and write out the file again. For more efficient appending, you would need a different data format (perhaps comma delimited lines).
Your loop has all sorts of problems. You're assigning to the same newObject over and over again.
Your numincrement is inside the loop so it will have the same value on every invocation of the loop. You can also just use the index parameter passed to the forEach() callback instead of using your own variable.
If what you're trying to iterate over is the users array in your data, then you may need to be iterating over appData.users, not just appData.
If you really just want to append data to a text file, the JSON is not the easiest format to use. It might be easier to just use comma delimited lines. Then, you can just append new lines to the file. Can't really do that with JSON.
If you're willing to just overwrite the file with the current data, you can do this:
const appData = ('./data.json').users;
const fs = require('fs');
// create an array of custom objects
let newData = appData.map((item, index) => {
return {
name: item.name,
count: index + 1,
categories: item.categoies,
store: item.store,
account: item.account
};
});
// write out that data to a file as JSON (overwriting existing file)
fs.writeFile('./temp.json', JSON.stringify(newData, null, 2),'utf8' , function(err, data) {
if (err) {
console.log(err);
} else {
console.log("data written");
}
});
I'm looking for a way to check if two files/documents (PDF, JPG, PNG) are the same.
If a user selects one or more files, I convert the File Object to a javascript object. I'm keeping the size, type, filename and I create a blob so I can store the object in my redux store.
When a user selects another file I want to compare this file with the files that already has been added (so I can set the same blobURL).
I can check if two files has the same name, type and size but there is a change that all these properties match and the files aren't the same so I would like to check the file path. Unfortunately, that property isn't provided in the File Object. Is there a way to get this or another solution to make sure both files are (not) the same?
No there is no way to get the real path, but that doesn't matter.
All you have access to is a FakePath, in the form C:\fakepath\yourfilename.ext (from input.value), and sometimes a bit more if you gained access to a directory.
But anyway you probably don't want to check that two files came from the same place on the hard-disk, this has no importance whatsoever, since they could very well have been modified since first access.
What you can and probably want to do however, is to check if their content
are the same.
For this, you can compare their byte content:
inp1.onchange = inp2.onchange = e => {
const file1 = inp1.files[0];
const file2 = inp2.files[0];
if(!file1 || !file2) return;
compare(file1, file2)
.then(res => console.log('are same ? ', res));
};
function compare(file1, file2) {
// they don't have the same size, they are different
if(file1.size !== file2.size)
return Promise.resolve(false);
// load both as ArrayBuffers
return Promise.all([
readAsArrayBuffer(file1),
readAsArrayBuffer(file2)
]).then(([buf1, buf2]) => {
// create views over our ArrayBuffers
const arr1 = new Uint8Array(buf1);
const arr2 = new Uint8Array(buf2);
return !arr1.some((val, i) =>
arr2[i] !== val // search for diffs
);
});
}
function readAsArrayBuffer(file) {
// we could also have used a FileReader,
// but Response is conveniently already Promise based
return new Response(file).arrayBuffer();
}
<input type="file" id="inp1">
<input type="file" id="inp2">
Now, you say that you don't have access to the original Files anymore, and that you can only store serializable data. In this case, one less performant solution is to generate a hash from your Files.
This can be done on front-end, thanks to the SubtleCrypto API,
but this operation being quite slow for big files, you may want to do it systematically from server instead of doing it on front, and to only do it on front when the sizes are the same:
// a fake storage object like OP has
const store = [
{ /* an utf-8 text file whose content is `hello world`*/
name: "helloworld.txt",
size: 11,
hash: "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9" // generated from server
}
];
// the smae file as the one we fakely stored
const sameFile = new File(['hello world'], 'same-file.txt');
// a file the same size as the one we stored (needs deep check)
const sameSizeButDifferentContent = new File(['random text'], 'differentcontent.txt');
inp.onchange = e => tryToStore(inp.files[0]);
tryToStore(sameFile); // false
tryToStore(sameSizeButDifferentContent);
// hash: "a4e082f56a58e0855a6abbf2f4ebd08895ff85ea80e634e02b210def84b557dd"
function tryToStore(file) {
checkShouldStore(file)
.then(result => {
console.log('should store', file.name, result)
if(result)
store.push(result);
// this is just for demo, in your case you would do it on the server
if(!result.hash)
generateHash(file).then(h => result.hash = h);
});
}
async function checkShouldStore(file) {
const {name, size} = file;
const toStore = {name, size, file}; // create a wrapper object
// first check against the sizes (fast checking)
const sameSizes = store.filter(obj => obj.size === file.size);
// only if some files have the same size
if(sameSizes.length) {
// then we generate a hash directly
const hash = await generateHash(file);
if(sameSizes.some(obj => obj.hash === hash)) {
return false; // is already in our store
}
toStore.hash = hash; // save the hash so we don't have to generate it on server
}
return toStore;
}
async function generateHash(file) {
// read as ArrayBuffer
const buf = await new Response(file).arrayBuffer();
// generate SHA-256 hash using crypto API
const hash_buf = await crypto.subtle.digest("SHA-256", buf);
// convert to Hex
const hash_arr = [...new Uint8Array(hash_buf)]
.map(v => v.toString(16).padStart(2, "0"));
return hash_arr.join('');
}
<input type="file" id="inp">