Javascript split string on space or on quotes to array

Javascript split string on space or on quotes to array - javascript

var str = 'single words "fixed string of words"';
var astr = str.split(" "); // need fix
I would like the array to be like this:
var astr = ["single", "words", "fixed string of words"];

The accepted answer is not entirely correct. It separates on non-space characters like . and - and leaves the quotes in the results. The better way to do this so that it excludes the quotes is with capturing groups, like such:
//The parenthesis in the regex creates a captured group within the quotes
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myString = 'single words "fixed string of words"';
var myArray = [];
do {
//Each call to exec returns the next regex match as an array
var match = myRegexp.exec(myString);
if (match != null)
{
//Index 1 in the array is the captured group if it exists
//Index 0 is the matched text, which we use if no captured group exists
myArray.push(match[1] ? match[1] : match[0]);
}
} while (match != null);
myArray will now contain exactly what the OP asked for:
single,words,fixed string of words

str.match(/\w+|"[^"]+"/g)
//single, words, "fixed string of words"

This uses a mix of split and regex matching.
var str = 'single words "fixed string of words"';
var matches = /".+?"/.exec(str);
str = str.replace(/".+?"/, "").replace(/^\s+|\s+$/g, "");
var astr = str.split(" ");
if (matches) {
for (var i = 0; i < matches.length; i++) {
astr.push(matches[i].replace(/"/g, ""));
}
}
This returns the expected result, although a single regexp should be able to do it all.
// ["single", "words", "fixed string of words"]
Update
And this is the improved version of the the method proposed by S.Mark
var str = 'single words "fixed string of words"';
var aStr = str.match(/\w+|"[^"]+"/g), i = aStr.length;
while(i--){
aStr[i] = aStr[i].replace(/"/g,"");
}
// ["single", "words", "fixed string of words"]

Here might be a complete solution:
https://github.com/elgs/splitargs

ES6 solution supporting:
Split by space except for inside quotes
Removing quotes but not for backslash escaped quotes
Escaped quote become quote
Can put quotes anywhere
Code:
str.match(/\\?.|^$/g).reduce((p, c) => {
if(c === '"'){
p.quote ^= 1;
}else if(!p.quote && c === ' '){
p.a.push('');
}else{
p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
}
return p;
}, {a: ['']}).a
Output:
[ 'single', 'words', 'fixed string of words' ]

This will split it into an array and strip off the surrounding quotes from any remaining string.
const parseWords = (words = '') =>
(words.match(/[^\s"]+|"([^"]*)"/gi) || []).map((word) =>
word.replace(/^"(.+(?="$))"$/, '$1'))

This soulution would work for both double (") and single (') quotes:
Code:
str.match(/[^\s"']+|"([^"]*)"/gmi)
// ["single", "words", "fixed string of words"]
Here it shows how this regular expression would work: https://regex101.com/r/qa3KxQ/2

Until I found #dallin 's answer (this thread: https://stackoverflow.com/a/18647776/1904943) I was having difficulty processing strings with a mix of unquoted and quoted terms / phrases, via JavaScript.
In researching that issue, I ran a number of tests.
As I found it difficult to find this information, I have collated the relevant information (below), which may be useful to others seeking answers on the processing in JavaScript of strings containing quoted words.
let q = 'apple banana "nova scotia" "british columbia"';
Extract [only] quoted words and phrases:
// https://stackoverflow.com/questions/12367126/how-can-i-get-a-substring-located-between-2-quotes
const r = q.match(/"([^']+)"/g);
console.log('r:', r)
// r: Array [ "\"nova scotia\" \"british columbia\"" ]
console.log('r:', r.toString())
// r: "nova scotia" "british columbia"
// ----------------------------------------
// [alternate regex] https://www.regextester.com/97161
const s = q.match(/"(.*?)"/g);
console.log('s:', s)
// s: Array [ "\"nova scotia\"", "\"british columbia\"" ]
console.log('s:', s.toString())
// s: "nova scotia","british columbia"
Extract [all] unquoted, quoted words and phrases:
// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
const t = q.match(/\w+|"[^"]+"/g);
console.log('t:', t)
// t: Array(4) [ "apple", "banana", "\"nova scotia\"", "\"british columbia\"" ]
console.log('t:', t.toString())
// t: apple,banana,"nova scotia","british columbia"
// ----------------------------------------------------------------------------
// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
// [#dallon 's answer (this thread)] https://stackoverflow.com/a/18647776/1904943
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myArray = [];
do {
/* Each call to exec returns the next regex match as an array. */
var match = myRegexp.exec(q); // << "q" = my query (string)
if (match != null)
{
/* Index 1 in the array is the captured group if it exists.
* Index 0 is the matched text, which we use if no captured group exists. */
myArray.push(match[1] ? match[1] : match[0]);
}
} while (match != null);
console.log('myArray:', myArray, '| type:', typeof(myArray))
// myArray: Array(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
console.log(myArray.toString())
// apple,banana,nova scotia,british columbia
Work with a set (rather than an array):
// https://stackoverflow.com/questions/28965112/javascript-array-to-set
var mySet = new Set(myArray);
console.log('mySet:', mySet, '| type:', typeof(mySet))
// mySet: Set(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
Iterating over set elements:
mySet.forEach(x => console.log(x));
/* apple
* banana
* nova scotia
* british columbia
*/
// https://stackoverflow.com/questions/16401216/iterate-over-set-elements
myArrayFromSet = Array.from(mySet);
for (let i=0; i < myArrayFromSet.length; i++) {
console.log(i + ':', myArrayFromSet[i])
}
/*
0: apple
1: banana
2: nova scotia
3: british columbia
*/
Asides
The JavaScript responses above are from the FireFox Developer Tools (F12, from web page). I created a blank HTML file that calls a .js file that I edit with Vim, as my IDE. Simple JavaScript IDE
Based on my tests, the cloned set appears to be a deep copy. Shallow-clone an ES6 Map or Set

I noticed the disappearing characters, too. I think you can include them - for example, to have it include "+" with the word, use something like "[\w\+]" instead of just "\w".

Related

Finding number of overlapping occurrences of a pattern in a string in JavaScript [duplicate]

Let's say I have the string
"12345"
If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.
I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.
Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too.
Here is a demo using matchAll:
var re = /(?=(\d{3}))/g;
console.log( Array.from('12345'.matchAll(re), x => x[1]) );
Here is an ES5 compliant demo:
var re = /(?=(\d{3}))/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
res.push(m[1]);
}
console.log(res);
Here is a regex101.com demo
Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:
var re = /\d{3}/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
res.push(m[0]);
re.lastIndex = m.index + 1; // <- Important
}
console.log(res);

You can't do this with a regex alone, but you can get pretty close:
var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;
while ( (match = pat.exec( '1234567' ) ) != null ) {
results.push( match[1] );
}
console.log(results);
In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.
This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

To answer the "How", you can manually change the index of the last match (requires a loop) :
var input = '12345',
re = /\d{3}/g,
r = [],
m;
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
r; // ["123", "234", "345"]
Here is a function for convenience :
function matchOverlap(input, re) {
var r = [], m;
// prevent infinite loops
if (!re.global) re = new RegExp(
re.source, (re+'').split('/').pop() + 'g'
);
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
return r;
}
Usage examples :
matchOverlap('12345', /\D{3}/) // []
matchOverlap('12345', /\d{3}/) // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g) // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/) // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/) // []
matchOverlap('LOLOL', /lol/i) // ["LOL", "LOL"]

I would consider not using a regex for this. If you want to split into groups of three you can just loop over the string starting at the offset:
let s = "12345"
let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3))
console.log(m)

Use (?=(\w{3}))
(3 being the number of letters in the sequence)

JavaScript Regex expression producing unexpected results when run in console vs in rubular [duplicate]

Let's say I have the string
"12345"
If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.
I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.
Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too.
Here is a demo using matchAll:
var re = /(?=(\d{3}))/g;
console.log( Array.from('12345'.matchAll(re), x => x[1]) );
Here is an ES5 compliant demo:
var re = /(?=(\d{3}))/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
res.push(m[1]);
}
console.log(res);
Here is a regex101.com demo
Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:
var re = /\d{3}/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
res.push(m[0]);
re.lastIndex = m.index + 1; // <- Important
}
console.log(res);

You can't do this with a regex alone, but you can get pretty close:
var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;
while ( (match = pat.exec( '1234567' ) ) != null ) {
results.push( match[1] );
}
console.log(results);
In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.
This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

To answer the "How", you can manually change the index of the last match (requires a loop) :
var input = '12345',
re = /\d{3}/g,
r = [],
m;
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
r; // ["123", "234", "345"]
Here is a function for convenience :
function matchOverlap(input, re) {
var r = [], m;
// prevent infinite loops
if (!re.global) re = new RegExp(
re.source, (re+'').split('/').pop() + 'g'
);
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
return r;
}
Usage examples :
matchOverlap('12345', /\D{3}/) // []
matchOverlap('12345', /\d{3}/) // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g) // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/) // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/) // []
matchOverlap('LOLOL', /lol/i) // ["LOL", "LOL"]

I would consider not using a regex for this. If you want to split into groups of three you can just loop over the string starting at the offset:
let s = "12345"
let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3))
console.log(m)

Use (?=(\w{3}))
(3 being the number of letters in the sequence)

javascript all instances of a regex query, where character matches are repeatable [duplicate]

Let's say I have the string
"12345"
If I .match(/\d{3}/g), I only get one match, "123". Why don't I get [ "123", "234", "345" ]?

The string#match with a global flag regex returns an array of matched substrings. The /\d{3}/g regex matches and consumes (=reads into the buffer and advances its index to the position right after the currently matched character) 3 digit sequence. Thus, after "eating up" 123, the index is located after 3, and the only substring left for parsing is 45 - no match here.
I think the technique used at regex101.com is also worth considering here: use a zero-width assertion (a positive lookahead with a capturing group) to test all positions inside the input string. After each test, the RegExp.lastIndex (it's a read/write integer property of regular expressions that specifies the index at which to start the next match) is advanced "manually" to avoid infinite loop.
Note it is a technique implemented in .NET (Regex.Matches), Python (re.findall), PHP (preg_match_all), Ruby (String#scan) and can be used in Java, too.
Here is a demo using matchAll:
var re = /(?=(\d{3}))/g;
console.log( Array.from('12345'.matchAll(re), x => x[1]) );
Here is an ES5 compliant demo:
var re = /(?=(\d{3}))/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
res.push(m[1]);
}
console.log(res);
Here is a regex101.com demo
Note that the same can be written with a "regular" consuming \d{3} pattern and manually set re.lastIndex to m.index+1 value after each successful match:
var re = /\d{3}/g;
var str = '12345';
var m, res = [];
while (m = re.exec(str)) {
res.push(m[0]);
re.lastIndex = m.index + 1; // <- Important
}
console.log(res);

You can't do this with a regex alone, but you can get pretty close:
var pat = /(?=(\d{3}))\d/g;
var results = [];
var match;
while ( (match = pat.exec( '1234567' ) ) != null ) {
results.push( match[1] );
}
console.log(results);
In other words, you capture all three digits inside the lookahead, then go back and match one character in the normal way just to advance the match position. It doesn't matter how you consume that character; . works just as well \d. And if you're really feeling adventurous, you can use just the lookahead and let JavaScript handle the bump-along.
This code is adapted from this answer. I would have flagged this question as a duplicate of that one, but the OP accepted another, lesser answer.

When an expression matches, it usually consumes the characters it matched. So, after the expression matched 123, only 45 is left, which doesn't match the pattern.

To answer the "How", you can manually change the index of the last match (requires a loop) :
var input = '12345',
re = /\d{3}/g,
r = [],
m;
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
r; // ["123", "234", "345"]
Here is a function for convenience :
function matchOverlap(input, re) {
var r = [], m;
// prevent infinite loops
if (!re.global) re = new RegExp(
re.source, (re+'').split('/').pop() + 'g'
);
while (m = re.exec(input)) {
re.lastIndex -= m[0].length - 1;
r.push(m[0]);
}
return r;
}
Usage examples :
matchOverlap('12345', /\D{3}/) // []
matchOverlap('12345', /\d{3}/) // ["123", "234", "345"]
matchOverlap('12345', /\d{3}/g) // ["123", "234", "345"]
matchOverlap('1234 5678', /\d{3}/) // ["123", "234", "567", "678"]
matchOverlap('LOLOL', /lol/) // []
matchOverlap('LOLOL', /lol/i) // ["LOL", "LOL"]

I would consider not using a regex for this. If you want to split into groups of three you can just loop over the string starting at the offset:
let s = "12345"
let m = Array.from(s.slice(2), (_, i) => s.slice(i, i+3))
console.log(m)

Use (?=(\w{3}))
(3 being the number of letters in the sequence)

javascript regex pattern with an array

I want to create a regex pattern which should be able to search through an array.
Let's say :
var arr = [ "first", "second", "third" ];
var match = text.match(/<arr>/);
which should be able to match only
<first> or <second> or <third> ......
but should ignore
<ffirst> or <dummy>
I need an efficient approach please .
Any help would be great .
Thanks

First you can do array.map to quote all special regex characters.
Then you can do array.join to join the array elements using | and create an instance of RegExp.
Code:
function quoteSpecial(str) { return str.replace(/([\[\]^$|()\\+*?{}=!.])/g, '\\$1'); }
var arr = [ "first", "second", "third", "fo|ur" ];
var re = new RegExp('<(?:' + arr.map(quoteSpecial).join('|') + ')>');
//=> /<(?:first|second|third|fo\|ur)>/
then use this RegExp object:
'<first>'.match(re); // ["<first>"]
'<ffirst>'.match(re); // null
'<dummy>'.match(re); // null
'<second>'.match(re); // ["<second>"]
'<fo|ur>'.match(re); // ["<fo|ur>"]

You should search for a specific word from a list using (a|b|c).
The list is made from the arr by joining the values with | char as glue
var arr = [ "first", "second", "third" ];
var match = text.match(new RegExp("<(?:"+arr.join("|")+")>")); //matches <first> <second> and <third>
Note that if your "source" words might contain regular expression's preserved characters - you might get into trouble - so you might need to escape those characters before joining the array
A good function for doing so can be found here:
function regexpQuote(str, delimiter) {
return String(str)
.replace(new RegExp('[.\\\\+*?\\[\\^\\]$(){}=!<>|:\\' + (delimiter || '') + '-]', 'g'), '\\$&');
}
so in this case you'll have
function escapeArray(arr){
var escaped = [];
for(var i in arr){
escaped.push(regexpQuote(arr[i]));
}
return escaped;
}
var arr = [ "first", "second", "third" ];
var pattern = new RegExp("<(?:"+escapeArray(arr).join("|")+")>");
var match = text.match(pattern); //matches <first> <second> and <third>

How to determine if a string matches a pattern type

Given the following pattern types: where 11 and 22 are variable:
#/projects/11
#/projects/11/tasks/22
With Javascript/jQuery, given var url, how can I determine if var url equals either string 1, 2 or neither?
Thanks

You can do it using a single regular expression:
var reg = /^#\/projects\/(\d+)(?:\/tasks\/(\d+))?$/,
str = "#/projects/11/tasks/22",
match = str.match(reg);
if (match && !match[2])
// Match on string 1
else if (match && match[2])
// Match on string 2
else
// No match
The expression I wrote uses sub-expressions to capture the digits; the result would be an array that looks like this:
"#/projects/11/tasks/22".match(reg);
//-> ["#/projects/11/tasks/22", "11", "22"]
"#/projects/11".match(reg);
//-> ["#/projects/11", "11", undefined]
There are many regular expression tutorials online that will help you understand how to solve problems like this one - I'd recommend searching Google for such a tutorial.

I would look into Regex personally, as it is easy to set up a pattern and test if a string applies to it. Try this: http://www.regular-expressions.info/javascript.html

Here's a more flexible approach you can use for other urls too http://jsfiddle.net/EXRXE/
/*
in: "#/cat/34/item/24"
out: {
cat: "34",
item: "24"
}
*/
function translateUrl(url) {
// strip everytying from the beginning that's not a character
url = url.replace(/^[^a-zA-Z]*/, "");
var parts = url.split("/");
var obj = {};
for(var i=0; i < parts.length; i+=2) {
obj[parts[i]] = parts[i+1]
}
return obj;
}
var url = translateUrl('#/projects/11/tasks/22');
console.log(url);
if (url.projects) {
console.log("Project is " + url.projects);
}
if (url.tasks) {
console.log("Task is " + url.tasks);
}

Develop Reference

JavaScript is the programming language of the Web.

Javascript split string on space or on quotes to array - javascript

var str = 'single words "fixed string of words"'; var astr = str.split(" "); // need fix I would like the array to be like this: var astr = ["single", "words", "fixed string of words"];

str.match(/\w+|"[^"]+"/g) //single, words, "fixed string of words"

Here might be a complete solution: https://github.com/elgs/splitargs

This will split it into an array and strip off the surrounding quotes from any remaining string. const parseWords = (words = '') => (words.match(/[^\s"]+|"([^"]*)"/gi) || []).map((word) => word.replace(/^"(.+(?="$))"$/, '$1'))

This soulution would work for both double (") and single (') quotes: Code: str.match(/[^\s"']+|"([^"]*)"/gmi) // ["single", "words", "fixed string of words"] Here it shows how this regular expression would work: https://regex101.com/r/qa3KxQ/2

I noticed the disappearing characters, too. I think you can include them - for example, to have it include "+" with the word, use something like "[\w\+]" instead of just "\w".

Related

Finding number of overlapping occurrences of a pattern in a string in JavaScript [duplicate]

JavaScript Regex expression producing unexpected results when run in console vs in rubular [duplicate]

javascript all instances of a regex query, where character matches are repeatable [duplicate]

javascript regex pattern with an array

How to determine if a string matches a pattern type

Categories

Resources