I have some problem with regex in JS. I wrote my regular expression:
/^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*\])\s+(.+)$/g
But it gives wrong result with one example:
WARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
I want regex to divide the string in a such parts:
WARN
2016-01-19
13:17:32,051
[localhost-startStop-1]
Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
And everything OK, except last 2 parts. There I got:
[localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from]
and [ Starting Day]
Why? I want to divide that part of string by first ] occurrence. Don't know why it takes the second.
PS: Here is the example: https://regex101.com/r/wG5xV6/2
Thanks.
You need to restrict the .* (that matches zero or more characters other than a newline, as many as possible) with a lazy dot matching .*? that matches zero or more characters other than a newline, as few as possible:
^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*?\])\s+(.+)$
^^^
See the regex demo
You can also shorten the pattern a bit by replacing [^\s] with \S:
^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$
Another demo
var re = /^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$/gm;
var str = 'INFO 2016-01-20 08:03:21,113 [C3P0PooledConnectionPoolManager[identityToken->1bqu9pa9eq1cqr515yzwu7|6c240779]-HelperThread-#0] Connection to \'rander\' established. Notifying listeners...\nWARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]';
while ((m = re.exec(str)) !== null) {
document.body.innerHTML += "<pre>"+ JSON.stringify([m[1],m[2],m[3],m[4], m[5]], 0, 4) + "</pre>";
}
You can try this:
^(.*?)(\s*?)(\S*?)(\s*?)(\S*?)(\s*?)(\[.*?\])(\s*)(.*?)$
Also change \S with .
\S means not sapce.
?means get less.
The rules of this sentence can be expressed as follows
begin + word + space + word + space + word + space + word + space + word + end
It must find first ],so we use ? to find it.
if u want to change the format of this sentence,you can replace it use
($1)\r($3)\r($5)\r($7)\r($9) or other.
Related
Since many cases using Regex, differs from case to case, depending on what format your string is in, I'm having a hard time finding a solution to my problem.
I have an array containing strings in the format, as an example:
"XX:XX - XX:XX Algorithm and Data Structures"
Where "XX:XX - XX:XX" is timespan for a lecture, and X being a number.
I'm new to Regex and trying to split the string at the first letter occurring, like so:
let str = "08:15 - 12:50 Algorithm and Data Structures";
let re = //Some regex expression
let result = str.split(re); // Output: ["08:15 - 12:50", "Algorithm and Data Structures"]
I'm thinking it should be something like /[a-Z]/ but I'm not sure at all...
Thanks in advance!
The easiest way is probably to "mark" where you want to split and then split:
const str = '12 34 abcde 45 abcde'.replace(/^([^a-z]+)([a-z])/i, '$1,$2');
// '12 34 ,abcde 45 abcde'
str.split(',')
// [ '12 34 ', 'abcde 45 abcde' ]
This finds the place where the string starts, has a bunch of non a-z characters, then has an a-z characters, and puts a comma right in-between. Then you split by the comma.
You can also split directly with a positive look ahead but it might make the regex a bit less readable.
console.log(
"08:15 - 12:50 Algorithm and Data Structures".split(/ ([A-Za-z].*)/).filter(Boolean)
)
or, if it's really always XX:XX - XX:XX, easier to just do:
const splitTimeAndCourse = (input) => {
return [
input.slice(0, "XX:XX - XX:XX".length),
input.slice("XX:XX - XX:XX".length + 1)
]
}
console.log(splitTimeAndCourse("08:15 - 12:50 Algorithm and Data Structures"))
If you have a fixed length of the string where the time is, you can use this regex for example
(^.{0,13})(.*)
Check this here https://regex101.com/r/ANMHy5/1
I know you asked about regex in particular, but here is a way to this without regex...
Provided your time span is always at the beginning of your string and will always be formatted with white space between the numbers as XX:XX - XX:XX. You could use a function that splits the string at the white space and reconstructs the first three indexed strings into one chunk, the time span, and the last remaining strings into a second chunk, the lecture title. Then return the two chunks as an array.
let str = "08:15 - 12:50 Algorithm and Data Structures";
const splitString = (str) => {
// split the string at the white spaces
const strings = str.split(' ')
// define variables
let lecture = '',
timespan = '';
// loop over the strings
strings.forEach((str, i) => {
// structure the timespan
timespan = `${strings[0]} ${strings[1]} ${strings[2]}`;
// conditional to get the remaining strings and concatenate them into a new string
i > 2 && i < strings.length?lecture += `${str} `: '';
})
// place them into an array and remove white space from end of second string
return [timespan, lecture.trimEnd()]
}
console.log(splitString(str))
For that format, you might also use 2 capture groups instead of using split.
^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)
The pattern matches:
^ Start of string
(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2}) Capture group 1, match a timespan like pattern
\s+ Match 1+ whitspac chars
([A-Za-z].*) Capture group 2, start with a char A-Za-z and match the rest of the line.
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)/;
let [, ...groups] = str.match(regex);
console.log(groups);
Another option using split might be asserting not any chars a-zA-Z to the left from the start of the string using a lookbehind (see this link for the support), match 1+ whitespace chars and asserting a char a-zA-Z to the right.
(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])/;
console.log(str.split(regex))
Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.
I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}
You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);
You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);
I'm struggling with a regex for Javascript.
Here is a string from which I want to match all words but the one prefixed by \+\s and suffixed by \s\+ :
this-is my + crappy + example
The regex should match :
this-is my + crappy + example
match 1: this-is
match 2: my
match 3: example
You can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.
\+[^+]+\+|([\w-]+)
Example:
var re = /\+[^+]+\+|([\w-]+)/g,
s = "this-is my + crappy + example",
match,
results = [];
while (match = re.exec(s)) {
results.push(match[1]);
}
console.log(results.filter(Boolean)) //=> [ 'this-is', 'my', 'example' ]
Alternatively, you could replace between the + characters and then match your words.
var s = 'this-is my + crappy + example',
r = s.replace(/\+[^+]*\+/g, '').match(/[\w-]+/g)
console.log(r) //=> [ 'this-is', 'my', 'example' ]
As per desired output. Get the matched group from index 1.
([\w-]+)|\+\s\w+\s\+
Live DEMO
MATCH 1 this-is
MATCH 2 my
MATCH 3 example
Given the Javascript below how can I add a condition to the clause? I would like to add a "space" character after a separator only if a space does not already exist. The current code will result in double-spaces if a space character already exists in spacedText.
var separators = ['.', ',', '?', '!'];
for (var i = 0; i < separators.length; i++) {
var rg = new RegExp("\\" + separators[i], "g");
spacedText = spacedText.replace(rg, separators[i] + " ");
}
'. , ? ! .,?!foo'.replace(/([.,?!])(?! )/g, '$1 ');
//-> ". , ? ! . , ? ! foo"
Means replace every occurence of one of .,?! that is not followed by a space with itself and a space afterwards.
I would suggest the following regexp to solve your problem:
"Test!Test! Test.Test 1,2,3,4 test".replace(/([!,.?])(?!\s)/g, "$1 ");
// "Test! Test! Test. Test 1, 2, 3, 4 test"
The regexp matches any character in the character class [!,.?] not followed by a space (?!\s). The parenthesis around the character class means that the matched separator will be contained in the first backreference $1, which is used in the replacement string. See this fiddle for working example.
You could do a replace of all above characters including a space. In that way you will capture any punctuation and it's trailing space and replace both by a single space.
"H1a!. A ?. ".replace(/[.,?! ]+/g, " ")
[.,?! ] is a chararcter class. It will match either ., ,, ?, ! or and + makes it match atleast once (but if possible multiple times).
spacedText = spacedText.replace(/([\.,!\?])([^\s])/g,"$1 ")
This means: replace one of these characters ([\.,!\?]) followed by a non-whitespace character ([^\s]) with the match from first group and a space ("$1 ").
Here is a working code :
var nonSpaced= 'Hello World!Which is your favorite number? 10,20,25,30 or other.answer fast.';
var spaced;
var patt = /\b([!\.,\?])+\b/g;
spaced = nonSpaced.replace(patt, '$1 ');
If you console.log the value of spaced, It will be : Hello World! Which is your favorite number? 10, 20, 25, 30 or other. answer fast. Notice the number of space characters after the ? sign , it is only one, and there is not extra space after last full-stop.
We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.
ex we would like to see the following string split into the following components
1|2|3\|4|5
1
2
3\|4
5
I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.
Instead of a split, do a global match (the same way a lexical analyzer would):
match anything other than \\ or |
or match any escaped char
Something like this:
var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);
A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.
What you're looking for is a "negative look-behind matching regular expression".
This isn't pretty, but it should split the list for you:
var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});
This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.
A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).
Without using Regex, if I understood what you desire, this should do the job:
function doSplit(input) {
var output = [];
var currPos = 0,
prevPos = -1;
while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
if (input[currPos-1] == "\\") continue;
var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
prevPos = currPos;
output.push(recollect);
}
var recollect = input.substr(prevPos + 1);
output.push(recollect);
return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]