Javascript Regex to split line of log with key value pairs - javascript

I have a log like
t=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService
and I want to turn it into a javascript object like
{
t: 2016-08-03T18:47:26+0000,
lvl: dbug
msg: "Event Received"
Service: SomeService
}
But I am having trouble coming up with a regex that will detect the string "Event Received" in the log line.
I want to split the log line by space but because of the string it is much more difficult.
I am trying to come up with a regex that will detect the fields and parameters so that I can isolate them and split with the equal sign.

I suggest a regex without any lookahead:
var re = /(\w+)=(?:"([^"]*)"|(\S*))/g;
See the regex demo
The point is that the first group ((\w+)) captures the attribute name and the 2nd and 3rd are placed into a non-capturing "container" as alternative branches. Their values can be checked and then either one will be used to fill out the object.
Pattern details:
(\w+) - Group 1 (attribute name) matching 1+ word chars (from [a-zA-Z0-9_] ranges)
= - an equal sign
(?:"([^"]*)"|(\S*)) - a non-capturing "container" group matching either of the two alternatives:
"([^"]*)" - a quote, then Group 2 capturing 0+ chars other than ", and a quote
| - or
(\S*) - Group 3 capturing 0+ non-whitespace symbols.
var rx = /(\w+)=(?:"([^"]*)"|(\S*))/g;
var s = "t=2016-08-03T18:47:26+0000 lvl=dbug msg=\"Event Received\" Service=SomeService";
var obj = {};
while((m=rx.exec(s))!==null) {
if (m[2]) {
obj[m[1]] = m[2];
} else {
obj[m[1]] = m[3];
}
}
console.log(obj);

You can use this regex to capture various name=value pairs:
/(\w+)=(.*?)(?= \w+=|$)/gm
RegEx Demo
Code:
var re = /(\w+)=(.*?)(?= \w+=|$)/gm;
var str = 't=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService';
var m;
var result = {};
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex)
re.lastIndex++;
result[m[1]] = m[2];
}
console.log(result);

Use this pattern:
/^t=([^ ]+) lvl=([^ ]+) msg=(.*?[a-z]") Service=(.*)$/gm
Online Demo

To achieve expected result, use below
var x = 't=2016-08-03T18:47:26+0000 lvl=dbug msg="Event Received" Service=SomeService';
var y = x.replace(/=/g,':').split(' ');
var z = '{'+ y+'}';
console.log(z);
http://codepen.io/nagasai/pen/oLPRAy

Related

Extract hex-code data from custom string

For example we put the following string:
#FF00FFNick#AA00efName
I want to create a pattern to get the following output array
{
[0] = {"#FF00FF", "Nick"},
[1] = {"#AA00ef", "Name"}
}
I write the following code
var reg = /#([a-f\d]{3}){1,2}(.*?)/gi;
alert(str.match(reg));
But the output i get only hex-code substrings. Where is the mistake?
I suggest
/(#(?:[a-f\d]{3}){1,2})([^#]+)/gi
See the regex demo
Details:
(#(?:[a-f\d]{3}){1,2}) - Group 1 capturing
# - a hash symbol
(?:[a-f\d]{3}){1,2} - 1 or 2 sequences of hex chars (case insensitive due to i modifier)
([^#]+) - Group 2 capturing 1+ chars other than #.
Demo:
var s = "#FF00FFNick#AA00efName";
var re = /(#(?:[a-f\d]{3}){1,2})([^#]+)/gi;
var res = [], m;
while ((m=re.exec(s)) !== null) {
res.push([m[1], m[2]]);
}
console.log(res);

Matching whole words with Javascript's Regex with a few restrictions

I am trying to create a regex that can extract all words from a given string that only contain alphanumeric characters.
Yes
yes absolutely
#no
*NotThis
orThis--
Good *Bad*
1ThisIsOkay2 ButNotThis2)
Words that should have been extracted: Yes, yes, absolutely, Good, 1ThisIsOkay2
Here is the work I have done thus far:
/(?:^|\b)[a-zA-Z0-9]+(?=\b|$)/g
I had found this expression that works in Ruby ( with some tweaking ) but I have not been able to convert it to Javascript regex.
Use /(?:^|\s)\w+(?!\S)/g to match 1 or more word chars in between start of string/whitespace and another whitespace or end of string:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)\w+(?!\S)/g;
var res = s.match(re).map(function(m) {
return m.trim();
});
console.log(res);
Or another variation:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)(\w+)(?!\S)/g;
var res = [];
while ((m=re.exec(s)) !== null) {
res.push(m[1]);
}
console.log(res);
Pattern details:
(?:^|\s) - either start of string or whitespace (consumed, that is why trim() is necessary in Snippet 1)
\w+ - 1 or more word chars (in Snippet 2, captured into Group 1 used to populate the resulting array)
(?!\S) - negative lookahead failing the match if the word chars are not followed with non-whitespace.
You can do that (where s is your string) to match all the words:
var m = s.split(/\s+/).filter(function(i) { return !/\W/.test(i); });
If you want to proceed to a replacement, you can do that:
var res = s.split(/(\s+)/).map(function(i) { return i.replace(/^\w+$/, "#");}).join('');

Regex - ignoring text between quotes / HTML(5) attribute filtering

So I have this Regular expression, which basically has to filter the given string to a HTML(5) format list of attributes. It currently isn't doing my fulfilling, but that's about to change! (I hope so)
I'm trying to achieve that whenever an occurrence is found, it selects the text until the next occurrence OR the end of the string, as the second match. So if you'd take a look at the current regular expression:
/([a-zA-Z]+|[a-zA-Z]+-[a-zA-Z0-9]+)=["']/g
A string like this: hey="hey world" hey-heyhhhhh3123="Hello world" data-goed="hey"
Would be filtered / matched out like this:
MATCH 1. [0-3] `hey`
MATCH 2. [16-32] `hey-heyhhhhh3123`
MATCH 3. [47-56] `data-goed`
This has to be seen as the attribute-name(s), and now.. we just have to fetch the attribute's value(s). So the mentioned string has to have an outcome like this:
MATCH 1.
1 [0-3] `hey`
2 [6-14] `hey world`
MATCH 2.
1 [16-32] `hey-heyhhhhh3123`
2 [35-45] `Hello world`
MATCH 3.
1 [47-56] `data-goed`
2 [59-61] `hey`
Could anyone try and help me to get my fulfilling? It would be appericiated a lot!
You can use
/([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g
See regex demo
Pattern details:
([^\s=]+) - Group 1 capturing 1 or more characters other than whitespace and = symbol
= - an equal sign
(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+)) - a non-capturing group of 2 alternatives (one more '([^'\\]*(?:\\.[^'\\]*)*)' alternative can be added to account for single quoted string literals)
"([^"\\]*(?:\\.[^"\\]*)*)" - a double quoted string literal pattern:
" - a double quote
([^"\\]*(?:\\.[^"\\]*)*) - Group 2 capturing 0+ characters other than \ and ", followed with 0+ sequences of any escaped symbol followed with 0+ characters other than \ and "
" - a closing dlouble quote
| - or
(\S+) - Group 3 capturing one or more non-whitespace characters
JS demo (no single quoted support):
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g;
var str = 'hey="hey world" hey-heyhhhhh3123="Hello \\"world\\"" data-goed="hey" more=here';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push([m[1], m[3]]);
} else {
res.push([m[1], m[2]]);
}
}
console.log(res);
JS demo (with single quoted literal support)
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)'|(\S+))/g;
var str = 'pseudoprefix-before=\'hey1"\' data-hey="hey\'hey" more=data and="more \\"here\\""';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[2]) {
res.push([m[1], m[2]])
} else if (m[3]) {
res.push([m[1], m[3]])
} else if (m[4]) {
res.push([m[1], m[4]])
}
}
console.log(res);

regex to match all words but AND, OR and NOT

In my javascript app I have this random string:
büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)
and i would like to match all words special chars and numbers besides the words AND, OR and NOT.
I tried is this
/(?!AND|OR|NOT)\b[\u00C0-\u017F\w\d]+/gi
which results in
["büert", "3454jhadf", "asdfsdf", "technüology", "bar", "bas"]
but this one does not match the ü or any other letter outside the a-z alphabet at the beginning or at the end of a word because of the \b word boundary.
removing the \b oddly ends up matching part or the words i would like to exclude:
/(?!AND|OR|NOT)[\u00C0-\u017F\w\d]+/gi
result is
["büert", "ND", "OT", "3454jhadf", "üasdfsdf", "R", "technüology", "ND", "bar", "R", "bas"]
what is the correct way to match all words no matter what type of characters they contain besides the ones i want exclude?
The issue here has its roots in the fact that \b (and \w, and other shorthand classes) are not Unicode-aware in JavaScript.
Now, there are 2 ways to achieve what you want.
1. SPLIT WITH PATTERN(S) YOU WANT TO DISCARD
var re = /\s*\b(?:AND|OR|NOT)\b\s*|[()]/;
var s = "büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)";
var res = s.split(re).filter(Boolean);
document.body.innerHTML += JSON.stringify(res, 0, 4);
// = > [ "büert", "3454jhadf üasdfsdf", "technüology", "bar", "bas" ]
Note the use of a non-capturing group (?:...) so as not to include the unwanted words into the resulting array. Also, you need to add all punctuation and other unwanted characters to the character class.
2. MATCH USING CUSTOM BOUNDARIES
You can use groupings with anchors/reverse negated character class in a regex like this:
(^|[^\u00C0-\u017F\w])(?!(?:AND|OR|NOT)(?=[^\u00C0-\u017F\w]|$))([\u00C0-\u017F\w]+)(?=[^\u00C0-\u017F\w]|$)
The capure group 2 will hold the values you need.
See regex demo
JS code demo:
var re = /(^|[^\u00C0-\u017F\w])(?!(?:AND|OR|NOT)(?=[^\u00C0-\u017F\w]|$))([\u00C0-\u017F\w]+)(?=[^\u00C0-\u017F\w]|$)/gi;
var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)';
var m;
var arr = [];
while ((m = re.exec(str)) !== null) {
arr.push(m[2]);
}
document.body.innerHTML += JSON.stringify(arr);
or with a block to build the regex dynamically:
var bndry = "[^\\u00C0-\\u017F\\w]";
var re = RegExp("(^|" + bndry + ")" + // starting boundary
"(?!(?:AND|OR|NOT)(?=" + bndry + "|$))" + // restriction
"([\\u00C0-\\u017F\\w]+)" + // match and capture our string
"(?=" + bndry + "|$)" // set trailing boundary
, "g");
var str = 'büert AND NOT 3454jhadf üasdfsdf OR technüology AND (bar OR bas)';
var m, arr = [];
while ((m = re.exec(str)) !== null) {
arr.push(m[2]);
}
document.body.innerHTML += JSON.stringify(arr);
Explanation:
(^|[^\u00C0-\u017F\w]) - our custom boundary (match a string start with ^ or any character outside the [\u00C0-\u017F\w] range)
(?!(?:AND|OR|NOT)(?=[^\u00C0-\u017F\w]|$)) - a restriction on the match: the match is failed if there are AND or OR or NOT followed by string end or characters other than those in the \u00C0-\u017F range or non-word character
([\u00C0-\u017F\w]+) - match word characters ([a-zA-Z0-9_]) or those from the \u00C0-\u017F range
(?=[^\u00C0-\u017F\w]|$) - the trailing boundary, either string end ($) or characters other than those in the \u00C0-\u017F range or non-word character.

Javascript split by spaces but not those in quotes

The goal is to split a string at the spaces but not split the text data that is in quotes or separate that from the adjacent text.
The input is effectively a string that contains a list of value pairs. If the value value contains a space it is enclosed in quotes. I need a function that returns an array of value-pair elements as per the example below:
Example Input:
'a:0 b:1 moo:"foo bar" c:2'
Expected result:
a:0,b:1,moo:foo bar,c:2 (An array of length 4)
I have checked through a load of other questions but none of them (I found) seem to cope with my issue. Most seem to split at the space within the quotes or they split the 'moo:' and 'foo bar' into separate parts.
Any assistance would be greatly appreciated,
Craig
You can use this regex for split:
var s = 'a:0 b:1 moo:"foo bar" c:2';
var m = s.split(/ +(?=(?:(?:[^"]*"){2})*[^"]*$)/g);
//=> [a:0, b:1, moo:"foo bar", c:2]
RegEx Demo
It splits on spaces only if it is outside quotes by using a positive lookahead that makes sure there are even number of quotes after a space.
You could approach it slightly differently and use a Regular Expression to split where spaces are followed by word characters and a colon (rather than a space that's not in a quoted part):
var str = 'a:0 b:1 moo:"foo bar" c:2',
arr = str.split(/ +(?=[\w]+\:)/g);
/* [a:0, b:1, moo:"foo bar", c:2] */
Demo jsFiddle
What's this Regex doing?
It looks for a literal match on the space character, then uses a Positive Lookahead to assert that the next part can be matched:
[\w]+ = match any word character [a-zA-Z0-9_] between one and unlimited times.
\: = match the : character once (backslash escaped).
g = global modifier - don't return on first match.
Demo Regex101 (with explanation)
Any special reason it has to be a regexp?
var str = 'a:0 b:1 moo:"foo bar" c:2';
var parts = [];
var currentPart = "";
var isInQuotes= false;
for (var i = 0; i < str.length, i++) {
var char = str.charAt(i);
if (char === " " && !isInQuotes) {
parts.push(currentPart);
currentPart = "";
} else {
currentPart += char;
}
if (char === '"') {
isInQuotes = !isInQuotes;
}
}
if (currentPart) parts.push(currentPart);

Categories

Resources