Javascript split by spaces but not those in quotes - javascript

The goal is to split a string at the spaces but not split the text data that is in quotes or separate that from the adjacent text.
The input is effectively a string that contains a list of value pairs. If the value value contains a space it is enclosed in quotes. I need a function that returns an array of value-pair elements as per the example below:
Example Input:
'a:0 b:1 moo:"foo bar" c:2'
Expected result:
a:0,b:1,moo:foo bar,c:2 (An array of length 4)
I have checked through a load of other questions but none of them (I found) seem to cope with my issue. Most seem to split at the space within the quotes or they split the 'moo:' and 'foo bar' into separate parts.
Any assistance would be greatly appreciated,
Craig

You can use this regex for split:
var s = 'a:0 b:1 moo:"foo bar" c:2';
var m = s.split(/ +(?=(?:(?:[^"]*"){2})*[^"]*$)/g);
//=> [a:0, b:1, moo:"foo bar", c:2]
RegEx Demo
It splits on spaces only if it is outside quotes by using a positive lookahead that makes sure there are even number of quotes after a space.

You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):
var str = 'a:0 b:1 moo:"foo bar" c:2',
arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */
Demo jsFiddle
What's this Regex doing?
It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:
[\w]+ = match any word character [a-zA-Z0-9_] between one and unlimited times.
\: = match the : character once (backslash escaped).
g = global modifier - don't return on first match.
Demo Regex101 (with explanation)

Any special reason it has to be a regexp?
var str = 'a:0 b:1 moo:"foo bar" c:2';
var parts = [];
var currentPart = "";
var isInQuotes= false;
for (var i = 0; i < str.length, i++) {
var char = str.charAt(i);
if (char === " " && !isInQuotes) {
parts.push(currentPart);
currentPart = "";
} else {
currentPart += char;
}
if (char === '"') {
isInQuotes = !isInQuotes;
}
}
if (currentPart) parts.push(currentPart);

Related

How to slice optional arguments in RegEx?

Actually i have the following RegExp expression:
/^(?:(?:\,([A-Za-z]{5}))?)+$/g
So the accepted input should be something like ,IGORA but even ,IGORA,GIANC,LOLLI is valid and i would be able to slice the string to 3 group in this case, in other the group number should be equals to the user input that pass the RegExp test.
i was trying to do something like this in JavaScript but it return only the last value
var str = ',GIANC,IGORA';
var arr = str.match(/^(?:(?:\,([A-Za-z]{5}))?)+$/).slice(1);
alert(arr);
So the output is 'IGORA' while i would it to be 'GIANC' 'IGORA'
Here is another example
/^([A-Z]{5})(?:(?:\,([A-Za-z]{2}))?)+$/g
test of regexp may have at least 5 chart string but it also can have other 5 chart string separated with a comma so from input
IGORA,CIAOA,POPOP
I would have an array of ["IGORA","CIAOA","POPOP"]
You can capture the words in a capturing surrounded by an optional preceding comma or an optional trailing comma.
You can test the regex here: ,?([A-Za-z]+),?
const pattern = /,?([A-Za-z]+),?/gm;
const str = `,IGORA,GIANC,LOLLI`;
let matches = [];
let match;
// Iterate until no match found
while ((m = pattern.exec(str))) {
// The first captured group is the match
matches.push(m[1]);
}
console.log(matches);
There are other ways to do this, but I found that one of the simple ways is by using the replace method, as it can replace all instances that match that regex.
For example:
var regex = /^(?:(?:\,([A-Za-z]{5}))?)+$/g;
var str = ',GIANC,IGORA';
var arr = [];
str.replace(regex, function(match) {
arr[arr.length] = match;
return match;
});
console.log(arr);
Also, in my code snippet you can see that there is an extra coma in each string, you can solve that by changing line 5 to arr[arr.length] = match.replace(/^,/, '').
Is this what you're looking for?
Explanation:
\b word boundary (starting or ending a word)
\w a word ([A-z])
{5} 5 characters of previous
So it matches all 5-character words but not NANANANA
var str = 'IGORA,CIAOA,POPOP,NANANANA';
var arr = str.match(/\b\w{5}\b/g);
console.log(arr); //['IGORA', 'CIAOA', 'POPOP']
If you only wish to select words separated by commas and nothing else, you can test for them like so:
(?<=,\s*|^) preceded by , with any number of trailing space, OR is the first word in list.
(?=,\s*|$) followed by , and any number of trailing spaces OR is last word in list.
In the following code, POPOP and MOMMA are rejected because they are not separated by a comma, and NANANANA fails because it is not 5 character.
var str = 'IGORA, CIAOA, POPOP MOMMA, NANANANA, MEOWI';
var arr = str.match(/(?<=,\s*|^)\b\w{5}\b(?=,\s*|$)/g);
console.log(arr); //['IGORA', 'CIAOA', 'MEOWI']
If you can't have any trailing spaces after the comma, just leave out the \s* from both (?<=,\s*|^) and (?=,\s*|$).

Finding ++ in Regular Expression

I want to find ++ or -- or // or ** sign in in string can anyone help me?
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = ++,--,//,**;
var result = str.match(patt1);
if (result)
{
alert("you cant do this :l");
document.getElementById('screen').innerHTML='';
}
This finds doubles of the characters by a backreference:
/([+\/*-])\1/g
[from q. comments]: i know this but when i type var patt1 = /[++]/i; code find + and ++
[++] means one arbitrary of the characters. Normally + is the qantifier "1 or more" and needs to be escaped by a leading backslash when it should be a literal, except in brackets where it does not have any special meaning.
Characters that do need to be escaped in character classes are e.g. the escape character itself (backslash), the expression delimimiter (slash), the closing bracket and the range operator (dash/minus), the latter except at the end of the character class as in my code example.
A character class [] matches one character. A quantifier, e.g. [abc]{2} would match "aa", "bb", but "ab" as well.
You can use a backreference to a match in parentheses:
/(abc)\1
Here the \1 refers to the first parentheses (abc). The entire expression would match "abcabc".
To clarify again: We could use a quantifier on the backreference:
/([+\/*-])\1{9}/g
This matches exactly 10 equal characters out of the class, the subpattern itself and 9 backreferences more.
/.../g finds all occurrences due to the modifier global (g).
test-case on regextester.com
Define your pattern like this:
var patt1 = /\+\+|--|\/\/|\*\*/;
Now it should do what you want.
More info about regular expressions: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
You can use:
/\+\+|--|\/\/|\*\*/
as your expression.
Here I have escaped the special characters by using a backslash before each (\).
I've also used .test(str) on the regular expression as all you need is a boolean (true/false) result.
See working example below:
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = /\+\+|--|\/\/|\*\*/;
var result = patt1.test(res);
if (result) {
alert("you cant do this :l");
document.getElementById('screen').innerHTML = '';
}
<div id="screen">
This is some++ text
</div>
Try this:-
As
n+:- Matches any string that contains at least one n
n* Matches any string that contains zero or more occurrences of n
We need to use backslash before this special characters.
var str = document.getElementById('screen').innerHTML;
var res = str.substring(0, str.length);
var patt1 = /\+\+|--|\/\/|\*\*/;
var result = str.match(patt1);
if (result)
{
alert("you cant do this :l");
document.getElementById('screen').innerHTML='';
}
<div id="screen">2121++</div>

Javascript Regex to split line of log with key value pairs

I have a log like
t=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService
and I want to turn it into a javascript object like
{
t: 2016-08-03T18:47:26+0000,
lvl: dbug
msg: "Event Received"
Service: SomeService
}
But I am having trouble coming up with a regex that will detect the string "Event Received" in the log line.
I want to split the log line by space but because of the string it is much more difficult.
I am trying to come up with a regex that will detect the fields and parameters so that I can isolate them and split with the equal sign.
I suggest a regex without any lookahead:
var re = /(\w+)=(?:"([^"]*)"|(\S*))/g;
See the regex demo
The point is that the first group ((\w+)) captures the attribute name and the 2nd and 3rd are placed into a non-capturing "container" as alternative branches. Their values can be checked and then either one will be used to fill out the object.
Pattern details:
(\w+) - Group 1 (attribute name) matching 1+ word chars (from [a-zA-Z0-9_] ranges)
= - an equal sign
(?:"([^"]*)"|(\S*)) - a non-capturing "container" group matching either of the two alternatives:
"([^"]*)" - a quote, then Group 2 capturing 0+ chars other than ", and a quote
| - or
(\S*) - Group 3 capturing 0+ non-whitespace symbols.
var rx = /(\w+)=(?:"([^"]*)"|(\S*))/g;
var s = "t=2016-08-03T18:47:26+0000 lvl=dbug msg=\"Event Received\" Service=SomeService";
var obj = {};
while((m=rx.exec(s))!==null) {
if (m[2]) {
obj[m[1]] = m[2];
} else {
obj[m[1]] = m[3];
}
}
console.log(obj);
You can use this regex to capture various name=value pairs:
/(\w+)=(.*?)(?= \w+=|$)/gm
RegEx Demo
Code:
var re = /(\w+)=(.*?)(?= \w+=|$)/gm;
var str = 't=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService';
var m;
var result = {};
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
result[m[1]] = m[2];
}
console.log(result);
Use this pattern:
/^t=([^ ]+) lvl=([^ ]+) msg=(.*?[a-z]") Service=(.*)$/gm
Online Demo
To achieve expected result, use below
var x = 't=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService';
var y = x.replace(/=/g,':').split(' ');
var z = '{'+ y+'}';
console.log(z);
http://codepen.io/nagasai/pen/oLPRAy

Javascript Regex for capturing variables in curved brackets

I have a string that is being fed into a JavaScript function and I need to pull the variables out of it.
var str = "heres:a:func('var1', 'var2', 'var3', 2)"
I'm getting close, but would like to do it with one regex.
str.match(/\((.*)\)/)[1].split(/\s*,\s*/)
Results should look like this:
['var1', 'var2', 'var3', 2]
Here's one way to do it:
Regex101 Link
This does not include the quotes by the way, you can add those optionally if you want.
var pattern = /(\w+)(?!.*\()(?=.*\))/g;
var str = "heres:a:func('var1', 'var2', 'var3', 2)";
var matches = str.match(pattern);
console.log(matches); //['var1','var2','var3','2']
This basically searches for a word character group, and then does a negative and positive lookahead.
Basically
(?!.*\()
says that I want this to NOT be before any number of characters plus a ( character and
(?=.*\))
says that i WANT this to be before any number of characters and a ) character.
Then the capturing group is at the beginning, so you could replace (\w+) with ([\'\w]+) if you wanted to keep the quotes (which I don't think you would right)
Edit: To include spaces in your strings, you can do something like this:
var pattern = /([\w]+\s[\w]+|\w+)(?!.*\()(?=.*\))/g
But that will not capture trailing white space, just spaces surrounded by 2 word types (a-Z0-1). Also that only will allow 1 space in the word, so if you need multiples, you'd have to check for that as well. You could modify it to check for any number of word characters or spaces between 2 valid word characters.
For Multiple Spaces:
var pattern = /([\w]+[\s\w]*[\w]+|\w+)(?!.*\()(?=.*\))/g
Includes 1 Space: Regex101 Link
Includes Multiple Spaces: Regex101 Link
Edit2:
And just as a final one, if you REALLY want to add a bunch of spaces throughout, you can do this one:
Includes Multiple Spaces, Multiple Times: Regex101 Link
/([\w]+[\s\w]+[\w]+|\w+)(?!.*\()(?=.*\))/g
This should do ('\w+'|\d+) it captures words (\w = alphanumeric and hyphen) between single quote or (|) numeric unquoted values.
See the demo here
For a code exemple:
var str = "heres:a:func('var1', 'var2', 'var3', 2)"
var reg=new RegExp("('\\w+'|\\d+)", "g");
var i= 0;
var arr = [];
str.replace(reg,function(m,group) {arr[i++]=group})
console.log(arr) gives:
["'var1'", "'var2'", "'var3'", "2"]
As i can't add a comment i post a answer.
var str = "heres:a:func('var1', 'var2', 'var3', 2)"
var args = /\(\s*([^)]+?)\s*\)/.exec(str);
if (args[1]) {
args = args[1].split(/\s*,\s*/);
console.log(args);
alert(args);
}
or try it out here:
https://jsfiddle.net/oz3Ljfe1/3/
You can do it with one line, but still 2 regexps: 1) remove all up to the opening ( and the closing ), then split on the commas with optional whitespace. This way we'll get all the vars, regardless of how many there are variables.
var str = "heres:a:func('var1', 'var2', 'var3', 2)";
alert(str.replace(/.*\(|\s*\)\s*$/g, '').split(/\s*,\s*/));
Also, you might try a kind of a fancy regex, but it is not that safe (only works if you have correctly formatted data):
var re = /('[^']*?'|\b\d+\b),?(?=(?:(?:[^']*'){2})*[^']*\)$)/g;
var str = 'heres:a:func23(\'var1\', \'var2\', \'var3\', 2, \'2345\')';
while ((m = re.exec(str)) !== null) {
document.getElementById("res").innerHTML += m[1] + "<br/>";
}
<div id="res"/>

How to Split string with multiple rules in javascript

I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon
Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)
You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.
use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find
I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.
This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/

Categories

Resources