JavaScript regex symbol occurence

JavaScript regex symbol occurence - javascript

I have some problem with regex in JS. I wrote my regular expression:
/^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*\])\s+(.+)$/g
But it gives wrong result with one example:
WARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
I want regex to divide the string in a such parts:
WARN
2016-01-19
13:17:32,051
[localhost-startStop-1]
Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
And everything OK, except last 2 parts. There I got:
[localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from]
and [ Starting Day]
Why? I want to divide that part of string by first ] occurrence. Don't know why it takes the second.
PS: Here is the example: https://regex101.com/r/wG5xV6/2
Thanks.

You need to restrict the .* (that matches zero or more characters other than a newline, as many as possible) with a lazy dot matching .*? that matches zero or more characters other than a newline, as few as possible:
^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*?\])\s+(.+)$
^^^
See the regex demo
You can also shorten the pattern a bit by replacing [^\s] with \S:
^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$
Another demo
var re = /^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$/gm;
var str = 'INFO 2016-01-20 08:03:21,113 [C3P0PooledConnectionPoolManager[identityToken->1bqu9pa9eq1cqr515yzwu7|6c240779]-HelperThread-#0] Connection to \'rander\' established. Notifying listeners...\nWARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]';
while ((m = re.exec(str)) !== null) {
document.body.innerHTML += "<pre>"+ JSON.stringify([m[1],m[2],m[3],m[4], m[5]], 0, 4) + "</pre>";
}

You can try this:
^(.*?)(\s*?)(\S*?)(\s*?)(\S*?)(\s*?)(\[.*?\])(\s*)(.*?)$
Also change \S with .
\S means not sapce.
?means get less.
The rules of this sentence can be expressed as follows
begin + word + space + word + space + word + space + word + space + word + end
It must find first ],so we use ? to find it.
if u want to change the format of this sentence,you can replace it use
($1)\r($3)\r($5)\r($7)\r($9) or other.

Related

JavaScript Regex split at first letter?

Since many cases using Regex, differs from case to case, depending on what format your string is in, I'm having a hard time finding a solution to my problem.
I have an array containing strings in the format, as an example:
"XX:XX - XX:XX Algorithm and Data Structures"
Where "XX:XX - XX:XX" is timespan for a lecture, and X being a number.
I'm new to Regex and trying to split the string at the first letter occurring, like so:
let str = "08:15 - 12:50 Algorithm and Data Structures";
let re = //Some regex expression
let result = str.split(re); // Output: ["08:15 - 12:50", "Algorithm and Data Structures"]
I'm thinking it should be something like /[a-Z]/ but I'm not sure at all...
Thanks in advance!

The easiest way is probably to "mark" where you want to split and then split:
const str = '12 34 abcde 45 abcde'.replace(/^([^a-z]+)([a-z])/i, '$1,$2');
// '12 34 ,abcde 45 abcde'
str.split(',')
// [ '12 34 ', 'abcde 45 abcde' ]
This finds the place where the string starts, has a bunch of non a-z characters, then has an a-z characters, and puts a comma right in-between. Then you split by the comma.
You can also split directly with a positive look ahead but it might make the regex a bit less readable.

console.log(
"08:15 - 12:50 Algorithm and Data Structures".split(/ ([A-Za-z].*)/).filter(Boolean)
)
or, if it's really always XX:XX - XX:XX, easier to just do:
const splitTimeAndCourse = (input) => {
return [
input.slice(0, "XX:XX - XX:XX".length),
input.slice("XX:XX - XX:XX".length + 1)
]
}
console.log(splitTimeAndCourse("08:15 - 12:50 Algorithm and Data Structures"))

If you have a fixed length of the string where the time is, you can use this regex for example
(^.{0,13})(.*)
Check this here https://regex101.com/r/ANMHy5/1

I know you asked about regex in particular, but here is a way to this without regex...
Provided your time span is always at the beginning of your string and will always be formatted with white space between the numbers as XX:XX - XX:XX. You could use a function that splits the string at the white space and reconstructs the first three indexed strings into one chunk, the time span, and the last remaining strings into a second chunk, the lecture title. Then return the two chunks as an array.
let str = "08:15 - 12:50 Algorithm and Data Structures";
const splitString = (str) => {
// split the string at the white spaces
const strings = str.split(' ')
// define variables
let lecture = '',
timespan = '';
// loop over the strings
strings.forEach((str, i) => {
// structure the timespan
timespan = `${strings[0]} ${strings[1]} ${strings[2]}`;
// conditional to get the remaining strings and concatenate them into a new string
i > 2 && i < strings.length?lecture += `${str} `: '';
})
// place them into an array and remove white space from end of second string
return [timespan, lecture.trimEnd()]
}
console.log(splitString(str))

For that format, you might also use 2 capture groups instead of using split.
^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)
The pattern matches:
^ Start of string
(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2}) Capture group 1, match a timespan like pattern
\s+ Match 1+ whitspac chars
([A-Za-z].*) Capture group 2, start with a char A-Za-z and match the rest of the line.
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)/;
let [, ...groups] = str.match(regex);
console.log(groups);
Another option using split might be asserting not any chars a-zA-Z to the left from the start of the string using a lookbehind (see this link for the support), match 1+ whitespace chars and asserting a char a-zA-Z to the right.
(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])/;
console.log(str.split(regex))

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.

I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}

You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);

You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

Regex to match all words but the one beginning and ending with special chars

I'm struggling with a regex for Javascript.
Here is a string from which I want to match all words but the one prefixed by \+\s and suffixed by \s\+ :
this-is my + crappy + example
The regex should match :
this-is my + crappy + example
match 1: this-is
match 2: my
match 3: example

You can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.
\+[^+]+\+|([\w-]+)
Example:
var re = /\+[^+]+\+|([\w-]+)/g,
s = "this-is my + crappy + example",
match,
results = [];
while (match = re.exec(s)) {
results.push(match[1]);
}
console.log(results.filter(Boolean)) //=> [ 'this-is', 'my', 'example' ]
Alternatively, you could replace between the + characters and then match your words.
var s = 'this-is my + crappy + example',
r = s.replace(/\+[^+]*\+/g, '').match(/[\w-]+/g)
console.log(r) //=> [ 'this-is', 'my', 'example' ]

As per desired output. Get the matched group from index 1.
([\w-]+)|\+\s\w+\s\+
Live DEMO
MATCH 1 this-is
MATCH 2 my
MATCH 3 example

Adding a condition to a regex

Given the Javascript below how can I add a condition to the clause? I would like to add a "space" character after a separator only if a space does not already exist. The current code will result in double-spaces if a space character already exists in spacedText.
var separators = ['.', ',', '?', '!'];
for (var i = 0; i < separators.length; i++) {
var rg = new RegExp("\\" + separators[i], "g");
spacedText = spacedText.replace(rg, separators[i] + " ");
}

'. , ? ! .,?!foo'.replace(/([.,?!])(?! )/g, '$1 ');
//-> ". , ? ! . , ? ! foo"
Means replace every occurence of one of .,?! that is not followed by a space with itself and a space afterwards.

I would suggest the following regexp to solve your problem:
"Test!Test! Test.Test 1,2,3,4 test".replace(/([!,.?])(?!\s)/g, "$1 ");
// "Test! Test! Test. Test 1, 2, 3, 4 test"
The regexp matches any character in the character class [!,.?] not followed by a space (?!\s). The parenthesis around the character class means that the matched separator will be contained in the first backreference $1, which is used in the replacement string. See this fiddle for working example.

You could do a replace of all above characters including a space. In that way you will capture any punctuation and it's trailing space and replace both by a single space.
"H1a!. A ?. ".replace(/[.,?! ]+/g, " ")
[.,?! ] is a chararcter class. It will match either ., ,, ?, ! or and + makes it match atleast once (but if possible multiple times).

spacedText = spacedText.replace(/([\.,!\?])([^\s])/g,"$1 ")
This means: replace one of these characters ([\.,!\?]) followed by a non-whitespace character ([^\s]) with the match from first group and a space ("$1 ").

Here is a working code :
var nonSpaced= 'Hello World!Which is your favorite number? 10,20,25,30 or other.answer fast.';
var spaced;
var patt = /\b([!\.,\?])+\b/g;
spaced = nonSpaced.replace(patt, '$1 ');
If you console.log the value of spaced, It will be : Hello World! Which is your favorite number? 10, 20, 25, 30 or other. answer fast. Notice the number of space characters after the ? sign , it is only one, and there is not extra space after last full-stop.

RegEx needed to split javascript string on "|" but not "\|"

We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.
ex we would like to see the following string split into the following components
1|2|3\|4|5
1
2
3\|4
5
I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.

Instead of a split, do a global match (the same way a lexical analyzer would):
match anything other than \\ or |
or match any escaped char
Something like this:
var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);
A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.

What you're looking for is a "negative look-behind matching regular expression".
This isn't pretty, but it should split the list for you:
var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});
This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.

A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).
Without using Regex, if I understood what you desire, this should do the job:
function doSplit(input) {
var output = [];
var currPos = 0,
prevPos = -1;
while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
if (input[currPos-1] == "\\") continue;
var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
prevPos = currPos;
output.push(recollect);
}
var recollect = input.substr(prevPos + 1);
output.push(recollect);
return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]

Develop Reference

JavaScript is the programming language of the Web.

JavaScript regex symbol occurence - javascript

Related

JavaScript Regex split at first letter?

regex to extract numbers starting from second symbol

Regex to match all words but the one beginning and ending with special chars

Adding a condition to a regex

RegEx needed to split javascript string on "|" but not "\|"

Categories

Resources