Fast-CSV modify before loading CSV to MySQL

Fast-CSV modify before loading CSV to MySQL - javascript

I am trying to load a CSV file to my MYSQL database, however before I do so I need to modify it slightly. The CSV file is Pipe delimitated (|)
I have a column in the CSV file called Party:Identification. This column has results such as "a:hello, b:hi c:151 ......" This can go on infinitely.
I only need to get the value for c. I have come up with a method that works for this, however I am stuck on how to modify the value before the file is inserted into the database.
I tried replacing all the ":" in the headers with "" and then using .transform to modify the values, however this doesn't appear to change the values in the column, only the header. Code is attached below.
csv.parseFile(req.file.path, {
headers: headers => headers.map(function (header) {
const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
console.log(newHeaders)
return newHeaders
}),
delimiter: '|'
})
.transform(function(data) {
console.log(data)
PartyIdentification: getPartyID(data.partyIdentification)
})
.on("error", (err) => console.error(err))
.on("finish", function () {
query("LOAD DATA LOCAL INFILE '" +
file +
"' INTO TABLE table " +
" FIELDS TERMINATED BY '|'" +
" LINES TERMINATED BY '\n'" +
" IGNORE 1 ROWS;").then(r =>
console.log(file)
)
})
function getPartyID(str) {
if (str === undefined) return ""
const split = str.split(",")
const value = split.find(val => {
return val.includes("c")
})
if(value === undefined) return ""
return (value.split(":")[1].trim())
}

You can use a regex to parse the value of c:123 in a string:
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : null;
}
[
"a:hello, b:hi c:151 d:foo",
"a:hello, b:no_c",
].forEach(str => {
console.log(str, '==>', getPartyID(str));
});
Output:
a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null
Explanation of regex:
\b -- word boundary
c: -- literal text
([^ ]*) -- capture group 1 with value, up to and excluding space
UPDATE 1: Based on additional question on how to insert modified data into MySQL, here is a solution that does not use INFILE, but instead loads the file into memory (here simulated with const input), modifies the data as needed, and constructs a SQL statement that inserts all the data. IMPORTANT: You likely want to add escapes against SQL injections.
const input = `Name|Party:Identification|Date
Foo|a:hello, b:hi c:151 d:foo|2022-01-11
Bar|a:hola, b:hey c:99 d:bar|2022-01-12
Nix|a:nix, b:ni d:nix|2022-01-13`;
const partIdFieldName = 'Party:Identification';
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : 0;
}
let partIdIdx = 0;
let data = input.split(/[\r\n]+/).map((row, idx) => {
let cells = row.split('|');
if(idx === 0) {
partIdIdx = cells.indexOf(partIdFieldName);
} else {
cells[partIdIdx] = getPartyID(cells[partIdIdx]);
}
return cells;
});
//console.log('data', '==>', data);
let sql = 'INSERT INTO tbl_name\n' +
' (' + data[0].map(val => '"' + val + '"').join(',') + ')\n' +
'VALUES\n' +
data.slice(1).map(row => {
return ' (' + row.map(val => /^[\d+\.]+$/.test(val)
? val
: '"' + val + '"'
).join(',') + ')'
}).join('\n') + ';';
console.log(sql);
Output:
INSERT INTO tbl_name
("Name","Party:Identification","Date")
VALUES
("Foo",151,"2022-01-11")
("Bar",99,"2022-01-12")
("Nix",0,"2022-01-13");

Don't bother fixing the csv file before loading, simply toss the unwanted columns as you LOAD it.
This, for example, will load only the 3rd column:
LOAD DATA ...
(#a, #b, c_col, #d, #e, ...)
That is, capture the unwanted columns into #variables that you will then ignore.
If you need to remove the c: before storing into the table, then
LOAD DATA ...
(#a, #b, #c, #d, #e, ...)
SET c_c0l = mid(#c, 3)
(or whatever expression will work. See also SUBSTRING_INDEX in case it would work better.)
LOAD DATA is plenty fast, even in this wasteful mode. And a lot less coding on your part.

Related

How do I parse JSON sprinkled unpredictably into a string?

Suppose that I've got a node.js application that receives input in a weird format: strings with JSON arbitrarily sprinkled into them, like so:
This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text
I have a couple guarantees about this input text:
The bits of literal text in between the JSON objects are always free from curly braces.
The top level JSON objects shoved into the text are always object literals, never arrays.
My goal is to split this into an array, with the literal text left alone and the JSON parsed out, like this:
[
"This is a string ",
{"with":"json","in":"it"},
" followed by more text ",
{"and":{"some":["more","json"]}},
" and more text"
]
So far I've written a naive solution that simply counts curly braces to decide where the JSON starts and stops. But this wouldn't work if the JSON contains strings with curly braces in them {"like":"this one } right here"}. I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?

You can check if JSON.parse throws an error to determine if the chunk is a valid JSON object or not. If it throws an error then the unquoted } are unbalanced:
const tests = [
'{"just":"json }}{}{}{{[]}}}}","x":[1,2,3]}',
'Just a string',
'This string has a tricky case: {"like":"this one } right here"}',
'This string {} has a tiny JSON object in it.',
'.{}.',
'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text',
];
tests.forEach( test => console.log( parse_json_interleaved_string( test ) ) );
function parse_json_interleaved_string ( str ) {
const chunks = [ ];
let last_json_end_index = -1;
let json_index = str.indexOf( '{', last_json_end_index + 1 );
for ( ; json_index !== -1; json_index = str.indexOf( '{', last_json_end_index + 1 ) ) {
// Push the plain string before the JSON
if ( json_index !== last_json_end_index + 1 )
chunks.push( str.substring( last_json_end_index, json_index ) );
let json_end_index = str.indexOf( '}', json_index + 1 );
// Find the end of the JSON
while ( true ) {
try {
JSON.parse( str.substring( json_index, json_end_index + 1 ) );
break;
} catch ( e ) {
json_end_index = str.indexOf( '}', json_end_index + 1 );
if ( json_end_index === -1 )
throw new Error( 'Unterminated JSON object in string' );
}
}
// Push JSON
chunks.push( str.substring( json_index, json_end_index + 1 ) );
last_json_end_index = json_end_index + 1;
}
// Push final plain string if any
if ( last_json_end_index === - 1 )
chunks.push( str );
else if ( str.length !== last_json_end_index )
chunks.push( str.substr( last_json_end_index ) );
return chunks;
}

Here's a comparatively simple brute-force approach: split the whole input string on curly braces, then step through the array in order. Whenever you come across an open brace, find the longest chunk of the array from that starting point that successfully parses as JSON. Rinse and repeat.
This will not work if the input contains invalid JSON and/or unbalanced braces (see the last two test cases below.)
const tryJSON = input => {
try {
return JSON.parse(input);
} catch (e) {
return false;
}
}
const parse = input => {
let output = [];
let chunks = input.split(/([{}])/);
for (let i = 0; i < chunks.length; i++) {
if (chunks[i] === '{') {
// found some possible JSON; start at the last } and backtrack until it works.
for (let j = chunks.lastIndexOf('}'); j > i; j--) {
if (chunks[j] === '}') {
// Does it blend?
let parsed = tryJSON(chunks.slice(i, j + 1).join(""))
if (parsed) {
// it does! Grab the whole thing and skip ahead
output.push(parsed);
i = j;
}
}
}
} else if (chunks[i]) {
// neither JSON nor empty
output.push(chunks[i])
}
}
console.log(output)
return output
}
parse(`{"foo": "bar"}`)
parse(`test{"foo": "b}ar{{[[[{}}}}{}{}}"}`)
parse(`this {"is": "a st}ri{ng"} with {"json": ["in", "i{t"]}`)
parse(`{}`)
parse(`this {"i{s": invalid}`)
parse(`So is {this: "one"}`)

I could try to get around that by doing similar quote counting math, but then I also have to account for escaped quotes. At that point it feels like I'm redoing way too much of JSON.parse's job. Is there a better way to solve this problem?
I don't think so. Your input is pretty far from JSON.
But accounting for all those things isn't that hard.
The following snippet should work:
function construct(str) {
const len = str.length
let lastSavedIndex = -1
let bracketLevel = 0
let inJsonString = false
let lastCharWasEscapeChar = false
let result = []
for(let i = 0; i < len; ++i) {
if(bracketLevel !== 0 && !lastCharWasEscapeChar && str[i] === '"') {
inJsonString = !inJsonString
}
else if (!inJsonString && str[i] === '{') {
if (bracketLevel === 0) {
result.push(str.substring(lastSavedIndex + 1, i))
lastSavedIndex = i - 1
}
++bracketLevel
}
else if (!inJsonString && str[i] === '}') {
--bracketLevel
if (bracketLevel === 0) {
result.push(JSON.parse(str.substring(lastSavedIndex + 1, i + 1)))
lastSavedIndex = i
}
}
else if (inJsonString && str[i] === '\\') {
lastCharWasEscapeChar = !lastCharWasEscapeChar
}
else {
lastCharWasEscapeChar = false
}
}
if(lastSavedIndex !== len -1) {
result.push(str.substring(lastSavedIndex + 1, len))
}
return result
}
const standardText = 'This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text. {"foo": "bar}"}'
const inputTA = document.getElementById('input')
const outputDiv = document.getElementById('output')
function updateOutput() {
outputDiv.innerText =
JSON.stringify(
construct(inputTA.value),
null,
2
)
}
inputTA.oninput = updateOutput
inputTA.value = standardText
updateOutput()
<textarea id="input" rows="5" cols="50"></textarea>
<pre id="output"><pre>

You can use RegExp /(\s(?=[{]))|\s(?=[\w\s]+[{])/ig to .split() space character followed by opening curly brace { or space character followed by one or more word or space characters followed by opening curly brace, .filter() to remove undefined values from resulting array, create a new array, then while the resulting split array has .length get the index where the value contains only space characters, .splice() the beginning of the matched array to the index plus 1, if array .length is 0 .push() empty string '' else space character ' ' with match .join()ed by space character ' ' .replace() last space character and .shift() matched array, which is JSON, then next element of the matched array.
const str = `This is a string {"with":"json","in":"it"} followed by more text {"and":{"some":["more","json"]}} and more text {"like":"this one } right here"}`;
const formatStringContainingJSON = s => {
const r = /(\s(?=[{]))|\s(?=[\w\s]+[{])/ig;
const matches = s.split(r).filter(Boolean);
const res = [];
while (matches.length) {
const index = matches.findIndex(s => /^\s+$/.test(s));
const match = matches.splice(0, index + 1);
res.push(
`${!res.length ? '' : ' '}${match.join(' ').replace(/\s$/, '')}`
, `${matches.shift()}`
);
};
return res;
}
let result = formatStringContainingJSON(str);
console.log(result);

Here you one approach that iterates char by char. First we create an array from the input and then use reduce() on it. When we detect an opening curly bracket { we push the current accumulated chunk on an array of detected results, and then we set a flag on the accumulator object we are using on reduce. While this flag is set to true we will try to parse for a JSON and only when success we put the chunk representing the JSON on the array of detected results and set the flag again to false.
The accumulator of the reduce() method will hold next data:
res: an array with detected results: strings or jsons.
chunk: a string representing the current accumulated chunk of chars.
isJson: a boolean indicating if the current chunk is json or not.
const input = 'This is a string {"with":"json", "in":"it"} followed by more text {"and":{"some":["more","json","data"]}} and more text';
let obj = Array.from(input).reduce(({res, isJson, chunk}, curr) =>
{
if (curr === "{")
{
if (!isJson) res.push(chunk);
chunk = isJson ? chunk + curr : curr;
isJson = true;
}
else if (isJson)
{
try
{
chunk += curr;
JSON.parse(chunk);
// If no error, we found a JSON.
res.push(chunk);
chunk = "";
isJson = false;
}
catch(e) {/* Ignore error */}
}
else
{
chunk += curr;
}
return {res, isJson, chunk};
}, {res:[], isJson:false, chunk:""})
// First stage done, lets debug obtained data.
obj.res.push(obj.chunk);
console.log(obj.res);
// Finally, we map the pieces.
let res = obj.res.map(x => x.match("{") ? JSON.parse(x) : x);
console.log(res);

Obligatory answer: this is an improper format (because of this complication, and the guarantee is a security hole if the parser is improperly designed); it should ideally be redesigned. (Sorry, it had to be said.)
Barring that, you can generate a parser using your favorite parser generator that outputs to javascript as a target language. It might even have a demo grammar for JSON.
However, the glaring security issue is incredibly scary (if any JSON gets past the 'guarantee', suddenly it's a vector). An array interspersed representation seems nicer, with the constraint that assert(text.length == markup.length+1):
'{
"text": ['Hello', 'this is red text!'],
"markup": [{"text":"everyone", "color":"red"}]
}'
or even nicer:
'[
{"type":"text", "text":"Hello"},
{"type":"markup", "text":"everyone", "color":"red"} # or ,"val":{"text":.., "color":..}}
{"type":"text", "text":"this is red text!"},
...
]'
Store compressed ideally. Unserialize without any worries with JSON.parse.

Checking for null only gets "Cannot read property '1' of null" in nodejs

I am trying to test for a null value after a # in a string. I have tried it various ways but I always get a Cannot read property '1' of null when submitting test data. I have ferreted out the errors I can think of but this one I cannot seem to get around. Please keep in mind I am a beginner at this, I haven't programmed since cobol days and the last time i worked on javascript was in the early 2000s.
//start test data, 5 possible strings that may pass through
elt.message = '#1a' //goes through the script good
elt.message = '#12b' // goes through
elt.message = '#123c' //goes through
elt.message = '' //is ignored
elt.message = '# ' //crashes server
//end test data
//First lets test to see if # is in the message. If true then we will parse it and add it to the database.
var str = elt.message;
var substr = '#';
var vtest = str.indexOf(substr) > -1;
if (vtest == 1){
var Vname = elt.author;
console.log('We tested for # and the value is true');
//extracts the number and the letter after the # from incoming chat messages
var test = elt.message; // replace with message text variable.
var pstr = test.match(/#(\d{1,3})([a-zA-Z])/);
if (pstr) {
var numbers = pstr[1];
var character = pstr[2];
var chupp = character.toUpperCase(); //Converts the lowercase to uppercase
}
//Tests to see if neither the question number or the possible answer is left out
//if (pstr[1] !== '' && pstr[2] !== ''){ //doesn't work =(
if (pstr[1] !== null && pstr[2] !== null){ //doesn't work either =(
console.log('we processed the numbers after the #sign and assigned the numbers and letter into variables.')
console.log('The question number is: ' + pstr[1]);
console.log('The letter processed is: ' + pstr[2]);
// Grabs the date and converts it into the YYYYMMDD string.
var dobj = new Date();
var dstr = dobj.toString();
var dsplit = dstr.split(' ');
let currentdate = `${dobj.getMonth() < '9' ? `0${dobj.getMonth() + 1}` :
dobj.getMonth() + 1}`;
currentdate = `${dsplit[3]}${currentdate}${dsplit[2]}`;
console.log(currentdate)//remove when done
//checks to see what the highest question number is in the database
var sel = con.query("SELECT * FROM questions WHERE ClassID = "+ currentdate + " ORDER BY QuesID DESC LIMIT 1", function (err, result){
if (err) throw err;
console.log('Total number of question records: '+result[0].QuesID);
console.log('the script is querying with' + pstr[1]);
console.log('the scripts answer letter is ' + pstr[2]);
if (pstr[2] != '' && pstr[1] <= result[0].QuesID ){
var query = con.query("SELECT * FROM questions WHERE ClassID = " + currentdate + " AND QuesID = " + pstr[1], function (err, result) { // Selects the record based on the Date and the question number variables provided above
if (err) throw err;
console.log('it got past the test')
if (result[0].AnsweredFirst === '' && result[0].AnswerLetter === chupp) { //Test to see if the AnsweredFirst is empty and that the Answer letter matchs with whats on file
console.log('MATCH!');//remove when done
var sql = "UPDATE questions SET AnsweredFirst = '"+ Vname + "' WHERE ClassID = " + currentdate + " AND QuesID = " + pstr[1]; //Updates the record with the first person who answered the question in the AnsweredFirst field
con.query(sql, function (err, result) {
if (err) throw err;
console.log(Vname + " answered question " + pstr[1] + " First!");
});
}
});
}
});
} else {
console.log('Either the question number or the letter was left blank so we are skipping'); //the viewer did not put in a proper number and letter after the # sign
}
} else {
console.log('No signs of # so skipping queries') //if there is no # sign the message is not processed
};
I added the rest of the script to get a better idea. Messages are passed to the server from a chat client.
I'll give it a try moving the block of code into the first if statement. I know its messy but honestly i am surprised I got this far.

var pstr = test.match(/#(\d{1,3})([a-zA-Z])/);
means that if no match is found for your regex, then pstr is null
in that case any index of pstr (like pstr[1], pstr[2]) will throw the error you described:
Cannot read property 'n' of null
Solution:
Before using indexes, check if the variable has a value or not
if(pstr !== null) {
// do something with pstr[1]
}
Edit:
And as nnnnnn rightly pointed out, you cannot explicitly store a null value in a string.

Look. If your test string did not match your regular expression then pstr assigned to null. Besides in next if condition you tried to check first element of pstr without checking it on null value:
if (pstr[1] !== null && pstr[2] !== null){ //doesnt work either =(
So, I think you need either add pstr!==null in second if or move all condition branch from this if inside then part of previous one if statement.

Nodejs encoding issue

I'm trying to get data from a request, but the formatting or encoding isn't what I'm looking for.
I've tried to set the encoding using req.setEncoding('utf8')
The string I should be getting is:
import Graphics.Element exposing (..)
import Graphics.Collage exposing (..)
import Color exposing (..)
main : Element
main = collage 500 500 [filled orange (circle (1 + 49)]
What I am actually getting is: import+Graphics.Element+exposing+%28..%29%0D%0Aimport+Graphics.Collage+exposing+%28..%29%0D%0Aimport+Color+exposing+%28..%29%0D%0Amain+%3A+Element%0D%0Amain+%3D+collage+500+500+%5Bfilled+orange+%28circle+%281+%2B+49%29%5D
This is where I read the data and set the encoding:
function onPost () {
// When there is a POST request
app.post('/elmsim.html',function (req, res) {
console.log('POST request received from elmsim')
req.setEncoding('ascii')
req.on('data', function (data) {
// Create new directory
createDir(data, res)
})
})
}
Any help would be great! Thanks

The string you are getting is an url encoded string.
Have you try to call decodeUriComponent on the string?
decodeURIComponent( string )

Luca's answer is correct, but decodeURIComponent will not work for strings including a plus sign. You must split the string using '%2B' as a splitter (This represents a plus sign) and apply decodeURIComponent to each individual string. The strings can then be concatenated, and the plus signs can be added back.
This is my solution:
function decodeWithPlus(str) {
// Create array seperated by +
var splittedstr = str.split('%2B')
// Decode each array element and add to output string seperated by '+'
var outs = ''
var first = true
splittedstr.forEach(function (element) {
if (first) {
outs += replaceAll('+', ' ', decodeURIComponent(element))
first = false
}
else {
outs += '+' + replaceAll('+', ' ', decodeURIComponent(element))
}
})
return outs
}
function replaceAll(find, replace, str) {
var outs = ''
for (i = 0; i < str.length; i++) {
if (str[i] === find) {
outs += replace
}
else {
outs += str[i]
}
}
return outs
}

Writing onto HTML page

So for a brief context, my program reads in a file and displays it onto the html page. The code below uses a regex expression to read that file and extract the errors. Instead of using console.log each time and debugging, is there any I way I could just write the results onto the HTML page?
When I used:
document.getElementById("").innerHTML
it would just print out the last summary instead of printing out all of the summaries.
I tried using a controller and ng-repeat (AngularJS) to do it, but somehow I did not do it right.
Any ideas -- doesn't have to be in AngularJS???
while ((match = reDiaBtoa.exec(reader.result)) != null) {
if (match.index === reDiaBtoa.lastIndex) {
reDiaBtoa.lastIndex++;
}
// View your result using the match-variable.
// eg match[0] etc.
// extracts the status code: ERROR
if (match[2] === "ERROR" || match[2] === "FATAL" || match[2] === "SEVERE") {
console.log("Time: " + match[1]);
console.log("Thread Name: " + match[3]);
console.log("Source name & line number: " + match[4]);
console.log("Log Message: " + match[5] + '\n');
console.log("-----------------------------------------------------------------");
}
} //end of the while loop ((match = reDiaBtoa.exec.....))

If you use
document.getElementById("someId").innerHTML =
it'll overwrite the existing html.
instead, use
document.getElementById("someId").innerHTML +=

Create a string variable before your while loop and then append to it in your loop:
var outputToDisplay = "";
//While loop
outputToDisplay += "Time: " + match[1];
//etc
//End While
document.getElementById(theId).innherHTML = outputToDisplay;

is there a way to exclude certain chars from encodeURIComponent

i am building a query string for my url and need to exclude certain chars from the encode.
I want to exclude the "&" and the "=" so that I can make a statement as such:
first=blah&second=blah and so on....
I guess the best way to put it is how do I stop them from being encoded?
some code:
else if (array[i].nodeName == "SELECT") {
if (array[i].id == "multiple") {
var selected = $.map($('#multiple option:selected'),
function (e) {
return $(e).val();
});
$.each(selected, function (index, value) {
name = array[i].name;
values += app + "\&" + key + "=";
});
} else {
name = arr[i].name;
values = arr[i].value;
}
}
key = encodeURIComponent(name);
value = encodeURIComponent(values);
queryString += name + "=" + values + "&";

Is there a way to exclude certain chars from encodeURIComponent?
No. It's a builtin function that takes exactly one argument.
You do need to encode & when it appears in the middle of a key or value so the simplest solution is to encode the individual names and values before combining them. Define
function emit(name, value) {
queryString += (queryString.indexOf("?") >= 0 ? "&" : "?")
+ encodeURIComponent(name) + "=" + encodeURIComponent(value);
}
and then call that function for each name/value pair in multiple selects or once for each other checked input.
else if (array[i].nodeName=="SELECT" ){
if(array[i].id == "multiple"){
var selected = $.map( $('#multiple option:selected'),
function(e){return $(e).val();});
$.each(selected, function(index, value){
emit(array[i].name, value);
});
} else {
emit(arr[i].name, arr[i].value);
}
}
Using encodeURI or similar will not properly encode #, = or other necessary code-points.

The name of the function should suggest how it should be used: call it on the pieces of the query string, not the whole query string.
edit — I've tried to create an example based on your code, but I can't figure out what it's trying to do. As it stands it seems to have syntax errors.

Develop Reference

JavaScript is the programming language of the Web.

Fast-CSV modify before loading CSV to MySQL - javascript

Related

How do I parse JSON sprinkled unpredictably into a string?

Checking for null only gets "Cannot read property '1' of null" in nodejs

Nodejs encoding issue

Writing onto HTML page

is there a way to exclude certain chars from encodeURIComponent

Categories

Resources