CSV string to array when there is \n in body [duplicate] - javascript

This question already has answers here:
How to parse CSV data that contains newlines in field using JavaScript
(2 answers)
Closed 10 months ago.
I'm trying to convert a CSV string into an array of array of objects. Although the issue is, there is a bunch of \n in the body from the incoming request, with are causing the request to split and mess up all the code. I'm attempting to fix this even with \n in the body
The string looks like this, all the messages that are strings from the incoming request, starts with a \" and finishes with \".
"id,urn,title,body,risk,s.0.id,s.1.id,s.2.id,a.0.id,a.1.id,a.2.id,a.3.id
302,25,\"Secure Data\",\"Banking can save a lot of time but it’s not without risks. Scammers treat your bank account as a golden target –
it can be a quick and untraceable way to get money from you\n\n**TOP TIPS**\n\n**Always read your banks rules.** These tips don’t replace your banks rules - \
in fact we fully support them. If you don’t follow their rules, you may not get your money back if you are defrauded \n\n**Saving passwords or allowing auto-complete.**
Saving passwords in your browser is great for remembering them but if a hacker is able to access your computer, they will also have access to your passwords.
When on your banking site the password box we recommend you don’t enable the auto-complete function – a hacked device means they are able to gain access using this method \n\n**Use a
PIN number on your device.** It’s really important to lock your device when you’re not using it.\",,2,20,52,1,2,3,4"
I have attempted to make it smaller since there is a bunch of content, but the string that comes is basically the above, The big string with is messing my code up start at Banking can save and finishes at not using it. I have several other datas that have the same type of body, and always comes inside \" body \", I have been attempting to perform a function to separate the content from this CSV string, into an array of array or an array of objects.
This is what I attempted:
function csv_To_Array(str, delimiter = ",") {
const header_cols = str.slice(0, str.indexOf("\n")).split(delimiter);
const row_data = str.slice(str.indexOf("\n") + 1).split("\n");
const arr = row_data.map(function (row) {
const values = row.split(delimiter);
const el = header_cols.reduce(function (object, header, index) {
object[header] = values[index];
return object;
}, {});
return el;
});
// return the array
return arr;
}
I have thought on using regex too, where I would split if it had a comma of a \n, although if there is a /" it will split when it finds the next /":
array.split(/,/\n(?!\d)/))

Try this:
csvData.replace(/(\r\n|\n|\r)/gm, "");
Once you've used that to replace the new lines, or removed them, this code will help you get started with understanding how to build an array from the new CSV string:
const splitTheArrayAndLogIt = () => {
const everySingleCharacter = csvData.split(""); // <-- this is a new array
console.log(everySingleCharacter);
const splitAtCommas = csvData.split(",");
console.log(splitAtCommas);
}

Related

Javascript object with arrays to search param style query string

Looking for clean way to convert a javascript object containing arrays as values to a search param compatible query string. Serializing an element from each array before moving to the next index.
Using libraries such as querystring or qs, converts the object just fine, but handles each array independently. Passing the resulting string to the server (which I cannot change) causes an error in handling of the items as each previous value is overwritten by the next. Using any kind of array notation in the query string is not supported. The only option I have not tried is a custom sort function, but seems like it would be worse than writing a custom function to parse the object. Any revision to the object that would generate the expected result is welcome as well.
var qs = require("qs")
var jsobj = {
origString:['abc','123'],
newString:['abcd','1234'],
action:'compare'
}
qs.stringify(jsobj,{encode:false})
qs.stringify(jsobj,{encode:false,indices:false})
qs.stringify(jsobj,{encode:false,indices:false,arrayFormat:'repeat'})
Result returned is
"origString=abc&origString=123&newString=abcd&newString=1234&action=compare"
Result desired would be
"origString=abc&newString=abcd&origString=123&newString=1234&action=compare"
I tried reorder your json:
> var jsobj = [{origString: 'abc', newString: 'abcd' }, {origString: '123',
newString: '1234' }, {action:'compare'}]
> qs.stringify(jsobj,{encode:false})
'0[origString]=abc&0[newString]=abcd&1[origString]=123&1[newString]=1234&2[action]=compare'
But I don't know if this is a good alternative for your problem.
Chalk this up to misunderstanding of the application. After spending some more time with the API I realized my mistake, and as posted above by others, order does no matter. Not sure why my first several attempts failed but the question is 'answered'

Output capitalized names from randomUser.me api?

Forgive me if this isn't the right platform to ask this question. And let me preface that I'm a designer with very little API and javascript experience.
I'm using the randomUser api to generate a json file or url I can input in Invision's Craft tool for Sketch, so I can input real data in my designs. https://screencast.com/t/jAkwUpUja2. However, it gives the names in lowercase instead of title-case/capitalized.
I'm generating the JSON by typing the endpoints I need in the browser: https://screencast.com/t/E8Cmjk5XSSCk
So, is there a way I can force the api to give me capitalized names? Thanks!
EDIT: here is the JSON url: https://randomuser.me/api/?results=20&nat=us&inc=name,gender,picture&format=pretty
Here is the simplest way to capitalize a string with JS, as far as i know:
// let's assume, that you have stored the lastname as follow:
let last = 'rodney';
To transform the lastname, you apply this pattern:
let capitalizedLast = last[0].toUpperCase() + last.substr(1);
last[0] returns the first letter of the string r.
last.substr(1) gives the rest of the lastname odney
with toUpperCase() you transform the first letter and + will concatenate both to the final result.
You just need to itterate over the results from your API and transform the elements that you needed in that way.
A quick look at the documentation suggests that there might not be a way to get the API to return capitalized names directly. So you're going to have to write some JavaScript to do the job for you.
This code should print out the data to the console with all names capitalized.
It iterates through the items in the result array, goes through all properties of the item's name property and capitalizes them.
The capitalize function gets the first character of the name, converts it to upper case and appends it to the rest of the name.
function capitalize(text) {
return (!text || !text.length) ?
text :
(text[0].toUpperCase() + text.slice(1));
}
$.get("https://randomuser.me/api/?results=20&nat=us&inc=name,gender,picture&format=pretty",
function(data) {
if (!data || !data.results)
return;
data.results.forEach(function(user) {
if (user.name) {
for (var name in user.name) {
user.name[name] = capitalize(user.name[name]);
}
}
});
console.log(data);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Single character signing scheme (minimal security)

Note: I originally posted this to Information Security, but I'm starting to think it might be more relevant here as it's really about determining what I should do with a request rather than securing information.
Situation
System A:
I have a system A that serves requests to users. This server does something, and then redirects the user to system B. During that redirect, server A can give the user a 32-character alphanumeric string of information to pass along to system B. 31 characters of that information are needed, but one can be used as a checksum. This string can more or less be thought of as a request ID.
System B:
When system B receives a request from the user, it can verify that the request (and the ID-like string) are valid by parsing the 31-character string, querying a database, and talking to system A. This system can verify with absolute certainty that the request is valid and has not been tampered with, but it's very computationally expensive.
Attackers:
It is likely that this system will see many attempts to spoof the ID. This is filtered by later checks so I'm not worried about a single character perfectly signing the ID, but I do want to avoid spending any more resources on handling these requests than is needed.
What I Need
I am looking for a checksum/signing scheme that can, with a single character, give me a good idea of whether the request should continue to the verification process or if it should be immediately discarded as invalid. If a message is discarded, I need to be 100% sure that it isn't valid, but it's okay if I keep messages that are invalid. I believe an ideal solution would mean 1/62 invalid requests are kept (attacker has to guess the check character), but as a minimal solution discarding half of all invalid requests would be sufficient.
What I've Tried
I have looked at using the Luhn algorithm (same one that's used for credit cards), but I would like to be able to use a key to generate the character to make it more difficult for an attacker to spoof the checksum.
As a first attempt at creating a signing scheme, I am bitwise xor-ing the 31-byte id with a 31-byte key, summing the resulting bytes, converting to decimal and adding the digits together until it's less than 62, then mapping it to a character in the set [a-bA-Z0-9] (pseudocode below). The problem is that although I'm pretty sure this won't discard any valid requests, I'm not sure how to determine how often this will let through invalid IDs or if the key can be retrieved using the final value.
Set alphabet to (byte[]) "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
Set keystring to "aLaklj14sdLK87kxcvalskj7asfrq01";
Create empty byte[] key;
FOR each letter in keystring
Push (index of letter in alphabet) to key;
Create empty byte[] step1;
FOR each (r, k) in (request, key)
Push r XOR s to step1;
Set step2 to 0;
FOR each b in step1
Add (int) b to step2;
WHILE step2 > 62
Copy step2 to step3;
Set step2 to 0;
Convert step3 to String;
Split step3 between characters;
FOR each digit in step3
Add (int) digit to step2;
END WHILE
RETURN alphabet[step2]
Stated Formally
A deterministic hash function where, given a private key and an input 31 bytes long, yields an output in the set {x | x ∈ ℕ, x < 62}, where guessing the output would be more efficient than calculating the private key. (Bonus points for variable-length input)
This will eventually be implemented in NodeJS/JavaScript but isn't really language-dependent.
Disclaimer: I apologize if this question is too vague and theoretical. Please comment for clarification if it's needed. There are, obviously, ways I could work around this problem, but in this case, I am looking for as direct a solution as possible.
If you want a "deterministic hash function" with a private key, then I believe you can just use sha256 (or any other hash function in your crypto library) with the key appended to the input:
sha256(input+key).toString('hex');
Afterwards, take the last few bits of the hash value, convert it from hex string to integer, divide the integer by 62, get the remainder, and determine the character based on the remainder.
This won't give you perfect 1/62 distribution probability (the hex string should have a uniform distribution for each value but not the remainders after dividing by 62) for each character but should be very close.
One approach would be to create a Blob URL when user visits initial document. The Blob URL should be unique to the document which created the URL. The user can then use the Blob URL as a request identifier to server "B". When user makes request to "B" revoke the Blob URL.
The Blob URL is unique for each call to URL.createObjectURL(), the user creates the unique identifier, where the lifetime of the Blob URL is the lifetime of the document where the Blob URL is created, or the Blob URL is revoked. There is minimal opportunity for the Blob URL to be copied from the visitors' browser by any individual other than the user which created the Blob URL, unless other issues exist at the individuals' computer.
const requestA = async() => {
const blob = new Blob();
const blobURL = URL.createObjectURL(blob);
const A = await fetch("/path/to/server/A", {
method:"POST", body:JSON.stringify({id:blobURL})
});
const responseA = await A.text();
// do stuff with response
return [blobURL, responseA];
}
Server "A" communicates created Blob URL to server "B"
const requestB = async(blobURL) => {
const blob = new Blob();
const blobURL = URL.createObjectURL(blob);
const B = await fetch("/path/to/server/B", {
method:"POST", body:JSON.stringify({id:blobURL})
});
const responseB = await B.text();
return responseB
}
requestA()
.then(([blobURL, responseA] => {
// do stuff with `responseA`
console.log(responseA);
// return `requestB` with `blobURL` as parameter
return requestB(blobURL)
})
.then(responseB => console.log(responseB) // do stuff with `responseB`)
.catch(err => console.error(err));

Firebase OrderByKey with startAt and endAt giving wrong results

I have 3 objects with the keys as it looks like this:
They are in format of YYYYMMDD. I am trying to get data of a month. But I am not getting the desired output.
When I query it like this:
var ref = db.child("-KPXECP6a1pXaM4gEYe0");
ref.orderByKey().startAt("20160901").once("value", function (snapshot) {
console.log("objects: " + snapshot.numChildren());
snapshot.forEach(function(childSnapshot) {
console.log(childSnapshot.key);
});
});
I get the following output:
objects: 3
20160822-KPl446bbdlaiQx6BOPL
20160901-KPl48ID2FuT3tAVf4DW
20160902-KPl4Fr4O28VpsIkB70Z
When I query this along with endAt:
ref.orderByKey().startAt("20160901").endAt("20160932").once("value", function (snapshot) {
console.log("objects: " + snapshot.numChildren());
snapshot.forEach(function(childSnapshot) {
console.log(childSnapshot.key);
});
});
I get this:
objects: 0
If I use ~ sign at the end,
ref.orderByKey().startAt("20160901").endAt("20160932~").once("value", function (snapshot) {
console.log("objects: " + snapshot.numChildren());
snapshot.forEach(function(childSnapshot) {
console.log(childSnapshot.key);
});
});
I get the output:
objects: 3
20160822-KPl446bbdlaiQx6BOPL
20160901-KPl48ID2FuT3tAVf4DW
20160902-KPl4Fr4O28VpsIkB70Z
Is there anything I am missing here?
Wow... this took some time to dig up. Thanks for the jsfiddle, that helped a lot.
TL;DR: ensure that you always have a non-numeric character in your search criteria, e.g. ref.orderByKey().startAt("20160901-").endAt("20160931~").
Longer explanation
In Firebase all keys are stored as strings. But we make it possible for developers to store arrays in the database. In order to allow that we store the array indices as string properties. So ref.set(["First", "Second", "Third"]) is actually stored as:
"0": "First"
"1": "Second"
"2": "Third"
When you get the data back from Firebase, it'll convert this into an array again. But it is important for your current use-case to understand that it is stored as key-value pairs with string keys.
When you execute a query, Firebase tries to detect whether you're querying a numeric range. When it thinks that is your intent, it converts the arguments into numbers and queries against the numeric conversion of the keys on the server.
In your case since you are querying on only a numeric value, it will switch to this numeric query mode. But since your keys are actually all strings, nothing will match.
For this reason I'd recommend that you prefix keys with a constant string. Any valid character will do, I used a - in my tests. This will fool our "is it an array?" check and everything will work the way you want it.
The quicker fix is to ensure that your conditions are non-convertible to a number. In the first snippet I did this by adding a very low range ASCII character to the startAt() and a very high ASCII character to endAt().
Both of these are workarounds for the way Firebase deals with arrays. Unfortunately the API doesn't have a simple way to handle it and requires such a workaround.

optimize search through large js string array?

if I have a large javascript string array that has over 10,000 elements,
how do I quickly search through it?
Right now I have a javascript string array that stores the description of a job,
and I"m allowing the user to dynamic filter the returned list as they type into an input box.
So say I have an string array like so:
var descArr = {"flipping burgers", "pumping gas", "delivering mail"};
and the user wants to search for: "p"
How would I be able to search a string array that has 10000+ descriptions in it quickly?
Obviously I can't sort the description array since they're descriptions, so binary search is out. And since the user can search by "p" or "pi" or any combination of letters, this partial search means that I can't use associative arrays (i.e. searchDescArray["pumping gas"] )
to speed up the search.
Any ideas anyone?
As regular expression engines in actual browsers are going nuts in terms of speed, how about doing it that way? Instead of an array pass a gigantic string and separate the words with an identifer.
Example:
String "flipping burgers""pumping gas""delivering mail"
Regex: "([^"]*ping[^"]*)"
With the switch /g for global you get all the matches. Make sure the user does not search for your string separator.
You can even add an id into the string with something like:
String "11 flipping burgers""12 pumping gas""13 delivering mail"
Regex: "(\d+) ([^"]*ping[^"]*)"
Example: http://jsfiddle.net/RnabN/4/ (30000 strings, limit results to 100)
There's no way to speed up an initial array lookup without making some changes. You can speed up consequtive lookups by caching results and mapping them to patterns dynamically.
1.) Adjust your data format. This makes initial lookups somewhat speedier. Basically, you precache.
var data = {
a : ['Ant farm', 'Ant massage parlor'],
b : ['Bat farm', 'Bat massage parlor']
// etc
}
2.) Setup cache mechanics.
var searchFor = function(str, list, caseSensitive, reduce){
str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
var found = [];
var reg = new RegExp('^\\s?'+str, 'g' + caseSensitive ? '':'i');
var i = list.length;
while(i--){
if(reg.test(list[i])) found.push(list[i]);
reduce && list.splice(i, 1);
}
}
var lookUp = function(str, caseSensitive){
str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
if(data[str]) return cache[str];
var firstChar = caseSensitive ? str[0] : str[0].toLowerCase();
var list = data[firstChar];
if(!list) return (data[str] = []);
// we cache on data since it's already a caching object.
return (data[str] = searchFor(str, list, caseSensitive));
}
3.) Use the following script to create a precache object. I suggest you run this once and use JSON.stringify to create a static cache object. (or do this on the backend)
// we need lookUp function from above, this might take a while
var preCache = function(arr){
var chars = "abcdefghijklmnopqrstuvwxyz".split('');
var cache = {};
var i = chars.length;
while(i--){
// reduce is true, so we're destroying the original list here.
cache[chars[i]] = searchFor(chars[i], arr, false, true);
}
return cache;
}
Probably a bit more code then you expected, but optimalisation and performance doesn't come for free.
This may not be an answer for you, as I'm making some assumptions about your setup, but if you have server side code and a database, you'd be far better off making an AJAX call back to get the cut down list of results, and using a database to do the filtering (as they're very good at this sort of thing).
As well as the database benefit, you'd also benefit from not outputting this much data (10000 variables) to a web based front end - if you only return those you require, then you'll save a fair bit of bandwidth.
I can't reproduce the problem, I created a naive implementation, and most browsers do the search across 10000 15 char strings in a single digit number of milliseconds. I can't test in IE6, but I wouldn't believe it to more than 100 times slower than the fastest browsers, which would still be virtually instant.
Try it yourself: http://ebusiness.hopto.org/test/stacktest8.htm (Note that the creation time is not relevant to the issue, that is just there to get some data to work on.)
One thing you could do wrong is trying to render all results, that would be quite a huge job when the user has only entered a single letter, or a common letter combination.
I suggest trying a ready made JS function, for example the autocomplete from jQuery. It's fast and it has many options to configure.
Check out the jQuery autocomplete demo
Using a Set for large datasets (1M+) is around 3500 times faster than Array .includes()
You must use a Set if you want speed.
I just wrote a node script that needs to look up a string in a 1.3M array.
Using Array's .includes for 10K lookups:
39.27 seconds
Using Set .has for 10K lookups:
0.01084 seconds
Use a Set.

Categories

Resources