Split a CSV with javascript - javascript

Is there a way to split a CSV string with javascript where the separator can also occur as an escaped value. Other regex implementations solve this problem with a lookbehind, but since javascript does not support lookbehind I wonder how I could accomplish this in a neatly fashion using a regex expression.
A csv line might look like this
"This is\, a value",Hello,4,'This is also\, possible',true
This must be split into (strings containing)
[0] => "This is\, a value"
[1] => Hello
[2] => 4
[3] => 'This is also\, possible'
[4] => true

Instead of trying to split you can try a global match for all that is not a , with this pattern:
/"[^"]+"|'[^']+'|[^,]+/g

for example you can use this regex:
(.*?[^\\])(,|$)
regex takes everything .*? until first comma, which does not have \ in front of it, or end of line

Here's some code that changes csv to json (assuming the first row it prop names). You can take the first part (array2d) and do other things with it very easily.
// split rows by \r\n. Not sure if all csv has this, but mine did
const rows = rawCsvFile.split("\r\n");
// find all commas, or chunks of text in quotes. If not in quotes, consider it a split point
const splitPointsRegex = /"(""|[^"])+?"|,/g;
const array2d = rows.map((row) => {
let lastPoint = 0;
const cols: string[] = [];
let match: RegExpExecArray;
while ((match = splitPointsRegex.exec(row)) !== null) {
if (match[0] === ",") {
cols.push(row.substring(lastPoint, match.index));
lastPoint = match.index + 1;
}
}
cols.push(row.slice(lastPoint));
// remove leading commas, wrapping quotes, and unneeded \r
return cols.map((datum) =>
datum.replace(/^,?"?|"$/g, "")
.replace(/""/g, `\"`)
.replace(/\r/g, "")
);
})
// assuming first row it props name, create an array of objects with prop names of the values given
const out = [];
const propsRow = array2d[0];
array2d.forEach((row, i) => {
if (i === 0) { return; }
const addMe: any = {};
row.forEach((datum, j) => {
let parsedData: any;
if (isNaN(Number(datum)) === false) {
parsedData = Number(datum);
} else if (datum === "TRUE") {
parsedData = true;
} else if (datum === "FALSE") {
parsedData = false;
} else {
parsedData = datum;
}
addMe[propsRow[j]] = parsedData;
});
out.push(addMe);
});
console.log(out);

Unfortunately this doesn't work with Firefox, only in Chrome and Edge:
"abc\\,cde,efg".split(/(?<!\\),/) will result in ["abc\,cde", "efg"].
You will need to remove all (unescaped) escapes in a second step.

Related

How to attach an id to all instances of a regex match in Javascript?

I'm implementing a search highlight feature. I've been able to use regex to find all instances of a particular keyword in my table, and highlight it. What I'm trying to do now is be able to jump to the next/prev highlight word using Enter and Shift+Enter resp. Here's what I'm currently doing:
highlightSearchKeyword () {
const reviews = document.querySelectorAll('span[data-v-071a96ec=""]')
const formattedKeyword = this.searchKeyWord.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&')
let count = 0
reviews.forEach(rev => {
const substring = new RegExp(formattedKeyword, 'gi')
let description = rev.innerText
if (formattedKeyword !== '') {
try {
count += description.match(substring || []).length
} catch (error) {
count += 0
}
}
description = description.replace(substring, match => `<span class="highlight" id="search-${some-id}">${match}</span>`)
rev.innerHTML = description
})
this.highlightedWordCount = count
}
As you can see, I'm replacing each match with a string that surrounds the original match word by a <span> tag. I want to give each instance of a match an id, so that I can do something like:
const elem = document.querySelector(`#search-{some-id}`)
elem.scrollIntoView()
If someone could help me out with this that'd be great.
You could use the index that forEach passes you:
highlightSearchKeyword () {
const reviews = document.querySelectorAll('span[data-v-071a96ec=""]')
const formattedKeyword = this.searchKeyWord.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&')
let count = 0
reviews.forEach((rev, index) => { // *** Added `index` parameter
const substring = new RegExp(formattedKeyword, 'gi')
let description = rev.innerText
if (formattedKeyword !== '') {
try {
count += description.match(substring || []).length
} catch (error) {
count += 0
}
}
// *** −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−vvvvv
description = description.replace(substring, match => `<span class="highlight" id="search-${index}">${match}</span>`)
rev.innerHTML = description
})
this.highlightedWordCount = count
}
Or if you need to do this more than once and may end up with the same IDs that way, have a variable you increment as necessary;
// Somewhere in code that `highlightSearchKeyword` closes over:
let nextId = 1;
// Then:
highlightSearchKeyword () {
const reviews = document.querySelectorAll('span[data-v-071a96ec=""]')
const formattedKeyword = this.searchKeyWord.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&')
let count = 0
reviews.forEach(rev => {
const substring = new RegExp(formattedKeyword, 'gi')
let description = rev.innerText
if (formattedKeyword !== '') {
try {
count += description.match(substring || []).length
} catch (error) {
count += 0
}
}
// *** −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−vvvvvvvv
description = description.replace(substring, match => `<span class="highlight" id="search-${nextId++}">${match}</span>`)
rev.innerHTML = description
})
this.highlightedWordCount = count
}
Side note -- Re: new RegExp(formattedKeyword, 'gi'), two things:
If formattedKeyword has any characters in it that have special meaning to regular expressions (like () or $ or ^ or...), the regular expression won't be what you expect it to be. This question's answers have various implementations of an "escape" function for regular expressions you could use to avoid that concern.
Note that it will do a substring match (it'll find nan in banana, for instance). If that's what you want, great, but if not add \b at the beginning and end.

javascript: Replace values in a string based on an array, efficient solution

I have a sample code but I am looking for the most efficient solution. Sure I can loop twice through the array and string but I was wondering if I could just do a prefix search character per character and identify elements to be replaced. My code does not really do any of that since my regex is broken.
const dict = {
'\\iota': 'ι',
'\\nu': 'ν',
'\\omega': 'ω',
'\\\'e': 'é',
'^e': 'ᵉ'
}
const value = 'Ko\\iota\\nu\\omega L\'\\\'ecole'
const replaced = value.replace(/\b\w+\b/g, ($m) => {
console.log($m)
const key = dict[$m]
console.log(key)
return (typeof key !== 'undefined') ? key : $m
})
Your keys are not fully word characters, so \b\w+\b will not match them. Construct the regex from the keys instead:
// https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
const escapeRegExp = string => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
const dict = {
'\\iota': 'ι',
'\\nu': 'ν',
'\\omega': 'ω',
'\\\'e': 'é',
'^e': 'ᵉ'
}
const value = 'Ko\\iota\\nu\\omega L\'\\\'ecole'
const pattern = new RegExp(Object.keys(dict).map(escapeRegExp).join('|'), 'g');
const replaced = value.replace(pattern, match => dict[match]);
console.log(replaced);

How to match two non-consecutive words in a String for React/Javascript search?

I have a search bar which relies on this filter method.
I concatenate all the search strings in a variable concat, and then use either .includes() or .match() as shown below. If searched for multiple words, this only returns a result if the words occur consecutively in concat.
However, I want it to match ANY two words in concat, not just consecutive ones. Is there a way to do this easily?
.filter((frontMatter) => {
var concat =
frontMatter.summary +
frontMatter.title +
frontMatter.abc+
frontMatter.def+
frontMatter.ghi+
frontMatter.jkl;
return concat.toLowerCase().match(searchValue.toLowerCase());
});
Also tried;
.filter((frontMatter) => {
const concat =
frontMatter.summary +
frontMatter.title +
frontMatter.abc+
frontMatter.def+
frontMatter.ghi+
frontMatter.jkl;
return concat.toLowerCase().includes(searchValue.toLowerCase());
});
Thanks!
Everything is explained in the comments of the code.
If you don't care that "deter" matches the word "undetermined"
.filter((frontMatter) => {
// Get the front matter into a string, separated by spaces
const concat = Object.values(frontMatter).join(" ").toLowerCase();
// Look for a string in quotes, if not then just find a word
const regex = /\"([\w\s\\\-]+)\"|([\w\\\-]+)/g;
// Get all the queries
const queries = [...searchValue.toLowerCase().matchAll(regex)].map((arr) => arr[1] || arr[2]);
// Make sure that every query is satisfied
return queries.every((q) => concat.includes(q));
});
If you DO care that "deter" should NOT match the word "undetermined"
.filter((frontMatter) => {
// Get the front matter into a string, separated by spaces
// The prepended and appended spaces are important for the regex later!
const concat = ` ${Object.values(frontMatter).join(" ").toLowerCase()} `;
// Look for a string in quotes, if not then just find a word
const regex = /\"([\w\s\\\-]+)\"|([\w\\\-]+)/g;
// Get all the queries
const queries = [...searchValue.toLowerCase().matchAll(regex)].map((arr) => arr[1] || arr[2]);
// Make sure that every query is satisfied
// [\\s\\.?!_] and [\\s\\.?!_] check for a space or punctuation at the beginning and end of a word
// so that something like "deter" isn't matching inside of "undetermined"
return queries.every((q) => new RegExp(`[\\s\\.?!_]${q}[\\s\\.?!_]`).test(concat));
});
I'd use .reduce to count up the number of matches, and return true if there are at least 2:
const props = ['summary', 'title', 'abc', 'def', 'ghi', 'jkl'];
// ...
.filter((frontMatter) => {
const lowerSearch = searchValue.toLowerCase();
const matchCount = props.reduce(
(a, prop) => a + lowerSearch.includes(frontMatter[prop].toLowerCase()),
0
);
return matchCount >= 2;
})

Typescript parse string into groups of digits as 345-67 and text with both words and digits

My string is in format "[111-11] text here with digits 111, [222-22-22]; 333-33 text here" and I want to parse so that I have the code [111-11], [222-22-22], [333-33] and its corresponding text.
I don't have fixed splitter except for the code xxx-xx or xxx-xx-xx.
I tried in this way but it fails to get digits at desc part. \D will get anything but digits.
let text = "[111-11] text here with digits 111, [222-22-22]; 333-33 text here";
let codes=[];
let result = text.replace(/(\d{3}(-\d{2})+)(\D*)/g,(str, code, c, desc) => {
desc = desc.trim().replace(/[\[\]']+/g,'');
if (code) codes.push({'code':code.trim(),'desc': desc});
return str;
}); //parse and split codes
Finally, I want result in this style:
[{code:'111-11', desc:'text here with digits 111'},
{code:'222-22-22', desc:''},
{code:'333-33', desc:'text here'}]
I really appreciate the help.
You could take a search for the brackets values and the following text in groups and a positive lookahead for the bracket part or end of string. Then destructuring the string and push the wanted object.
const regex = /\[?(\d{3}(-\d\d)+)\]?(.*?)(?=\[?\d{3}(-\d\d)+\]?|$)/gm;
const str = `[111-11] text here with digits 111, [222-22-22]; 333-33 text here`;
var m,
code, desc,
result= [];
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
({ 1: code, 3: desc } = m);
result.push({ code, desc })
}
console.log(result);
Another approach:
const Sep = ','
const PatSep = /[,;]/g
// skippng first spaces, then getting the numbers (and ignoring the brackets
// if presents), then gets rest
const PatPart = /^\s*\[?(\d{3}(-\d{2})+)]?(.*)$/
const src =
"[111-11] text here with digits 111, [222-22-22]; 333-33 text here"
const parse = src => {
// to avoir differents terminations
const normalized = src.replace (PatSep, Sep)
return normalized.split (Sep).reduce((acc, part) => {
// getting code and desc from part
const [_, code, __, desc] = part.match (PatPart)
// accumulating in desired array
return [
...acc,
{code, desc}
]
}, [])
}
console.log(parse (src))
;)

RegEx to extract all matches from string using RegExp.exec

I'm trying to parse the following kind of string:
[key:"val" key2:"val2"]
where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value.
For those curious I'm trying to parse the database format of task warrior.
Here is my test string:
[description:"aoeu" uuid:"123sth"]
which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.
In node, this is my output:
[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]
But description:"aoeu" also matches this pattern. How can I get all matches back?
Continue calling re.exec(s) in a loop to obtain all the matches:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);
Try it with this JSFiddle: https://jsfiddle.net/7yS2V/
str.match(pattern), if pattern has the global flag g, will return all the matches as an array.
For example:
const str = 'All of us except #Emran, #Raju and #Noman were there';
console.log(
str.match(/#\w*/g)
);
// Will log ["#Emran", "#Raju", "#Noman"]
To loop through all matches, you can use the replace function:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
This is a solution
var s = '[description:"aoeu" uuid:"123sth"]';
var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}
This is based on lawnsea's answer, but shorter.
Notice that the `g' flag must be set to move the internal pointer forward across invocations.
str.match(/regex/g)
returns all matches as an array.
If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).
function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}
// Usage
const matches = findMatches(/regex/g, str)
as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.
We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.
The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like
// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);
for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}
arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array
It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:
[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]
For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.
If you have ES9
(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)
Use the new yourString.matchAll( /your-regex/g ).
If you don't have ES9
If you have an older system, here's a function for easy copy and pasting
function findAll(regexPattern, sourceString) {
let output = []
let match
// auto-add global flag while keeping others as-is
let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}
example usage:
console.log( findAll(/blah/g,'blah1 blah2') )
outputs:
[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
Based on Agus's function, but I prefer return just the match values:
var bob = "> bob <";
function matchAll(str, regex) {
var res = [];
var m;
if (regex.global) {
while (m = regex.exec(str)) {
res.push(m[1]);
}
} else {
if (m = regex.exec(str)) {
res.push(m[1]);
}
}
return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch); // yeilds: [>, <]
Iterables are nicer:
const matches = (text, pattern) => ({
[Symbol.iterator]: function * () {
const clone = new RegExp(pattern.source, pattern.flags);
let match = null;
do {
match = clone.exec(text);
if (match) {
yield match;
}
} while (match);
}
});
Usage in a loop:
for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}
Or if you want an array:
[ ...matches('abcdefabcdef', /ab/g) ]
Here is my function to get the matches :
function getAllMatches(regex, text) {
if (regex.constructor !== RegExp) {
throw new Error('not RegExp');
}
var res = [];
var match = null;
if (regex.global) {
while (match = regex.exec(text)) {
res.push(match);
}
}
else {
if (match = regex.exec(text)) {
res.push(match);
}
}
return res;
}
// Example:
var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');
res.forEach(function (item) {
console.log(item[0]);
});
If you're able to use matchAll here's a trick:
Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:
Array.from(str.matchAll(regexp), m => m[0]);
If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:
Array.from(str.matchAll(regexp), m => m.groups.firstName);
Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:
const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
console.log(match);
}
// ["mice", index: 0, input: "mice like to dice rice", groups:
undefined]
// ["dice", index: 13, input: "mice like to dice rice",
groups: undefined]
// ["rice", index: 18, input: "mice like to dice
rice", groups: undefined]
It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.
Use this...
var all_matches = your_string.match(re);
console.log(all_matches)
It will return an array of all matches...That would work just fine....
But remember it won't take groups in account..It will just return the full matches...
I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.
// 1) Define keywords
var keywords = ['apple', 'orange', 'banana'];
// 2) Create regex, pass "i" for case-insensitive and "g" for global search
regex = new RegExp("(" + keywords.join('|') + ")", "ig");
=> /(apple|orange|banana)/gi
// 3) Match it against any string to get all matches
"Test string for ORANGE's or apples were mentioned".match(regex);
=> ["ORANGE", "apple"]
Hope this helps!
This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.
I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
// We only want the group matches in the array
function purify_regex(reResult){
// Removes the Regex specific values and clones the array to prevent mutation
let purifiedArray = [...reResult];
// Removes the full match value at position 0
purifiedArray.shift();
// Returns a pure array without mutating the original regex result
return purifiedArray;
}
// purifiedResult= ["description", "aoeu"]
That looks more verbose than it is because of the comments, this is what it looks like without comments
var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);
function purify_regex(reResult){
let purifiedArray = [...reResult];
purifiedArray.shift();
return purifiedArray;
}
Note that any groups that do not match will be listed in the array as undefined values.
This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.
Here's a one line solution without a while loop.
The order is preserved in the resulting list.
The potential downsides are
It clones the regex for every match.
The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'
(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))
[ [ 'description:"aoeu"',
'description',
'aoeu',
index: 0,
input: 'description:"aoeu"',
groups: undefined ],
[ ' uuid:"123sth"',
'uuid',
'123sth',
index: 0,
input: ' uuid:"123sth"',
groups: undefined ] ]
My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:
^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test
const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
[ description : "aoeu" uuid : "123sth" ]
[ description : "aoeu"uuid : "123sth" ] `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
const re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
const matches = [...re.exec('[description:"aoeu" uuid:"123sth"]').entries()]
console.log(matches)
Basically, this is ES6 way to convert Iterator returned by exec to a regular Array
Here is my answer:
var str = '[me nombre es] : My name is. [Yo puedo] is the right word';
var reg = /\[(.*?)\]/g;
var a = str.match(reg);
a = a.toString().replace(/[\[\]]/g, "").split(','));

Categories

Resources