Currently I'm using replace(/\s*([,.!?:;])[,.!?:;]*\s*/g, '$1 ') to add space after punctuation. But it doesn't work if the sentence contains three dots.
Example text: "Hello,today is a beautiful day...But tomorrow is,not."
Expected output: "Hello, today is a beautiful day... But tomorrow is, not."
let text = "Hello,today is a beautiful day...But tomorrow is,not.";
text = text.replace(/\s*([,.!?:;])[,.!?:;]*\s*/g, '$1 ')
Gives:
"Hello, today is a beautiful day. But tomorrow is, not. "
Please tell me what regex I can use so that I can get the expected output.
You should match all consecutive punctuation chars into Group 1, not just the first char. Also, it makes sense to exclude a match of the punctuation at the end of the string.
You can use
text.replace(/\s*([,.!?:;]+)(?!\s*$)\s*/g, '$1 ')
Also, it still might be handy to .trim() the result. See the regex demo.
Details
\s* - 0 or more whitspace chars
([,.!?:;]+) - Group 1 ($1): one or more ,, ., !, ?, : or ;
(?!\s*$) - if not immediately followed with zero or more whitespace chars and then end of string
\s* - 0 or more whitspace chars
See a JavaScript demo:
let text = "Hello,today is a beautiful day...But tomorrow is,not.";
text = text.replace(/\s*([,.!?:;]+)(?!\s*$)\s*/g, '$1 ');
console.log(text);
Thanks #Wiktor Stribiżew for his suggestions and I come up with the final regex that meets my requirements:
let text = 'Test punctuation:c,c2,c3,d1.D2.D3.Q?.Ec!Sc;Test number:1.200 1,200 2.3.Test:money: $15,000.Test double quote and three dots:I said "Today is a...beautiful,sunny day.".But tomorrow will be a long day...';
text = text.replace(/\.{3,}$|\s*(?:\d[,.]\d|([,.!?:;]+))(?!\s*$)(?!")\s*/g, (m0, m1) => { return m1 ? m1 + ' ' : m0; });
console.log(text); // It will print out: "Test punctuation: c, c2, c3, d1. D2. D3. Q?. Ec! Sc; Test number: 1.200 1,200 2.3. Test: money: $15,000. Test double quote and three dots: I said "Today is a... beautiful, sunny day.". But tomorrow will be a long day..."
Bonus! I also converted this into Dart as I'm using this feature in Flutter app as well. So just in case someone needs to use it in Dart:
void main() {
String addSpaceAfterPunctuation(String words) {
var regExp = r'\.{3,}$|\s*(?:\d[,.]\d|([,.!?:;]+))(?!\s*$)(?!")\s*';
return words.replaceAllMapped(
RegExp(regExp),
(Match m) {
return m[1] != null ? "${m[1]} " : "${m[0]}";
},
);
}
var text = 'Test punctuation:c,c2,c3,d1.D2.D3.Q?.Ec!Sc;Test number:1.200 1,200 2.3.Test:money: \$15,000.Test double quote and three dots:I said "Today is a...beautiful,sunny day.".But tomorrow will be a long day...';
text = addSpaceAfterPunctuation(text);
print(text); // Print out: Test punctuation: c, c2, c3, d1. D2. D3. Q?. Ec! Sc; Test number: 1.200 1,200 2.3. Test: money: $15,000. Test double quote and three dots: I said "Today is a... beautiful, sunny day.". But tomorrow will be a long day...
}
Related
So I am very new with Regex and I have managed to create a way to check if a specific word exists inside of a string without just being part of another word.
Example:
I am looking for the word "banana".
banana == true, bananarama == false
This is all fine, however a problem occurs when I am looking for words containing Swedish letters (Å,Ä,Ö) with words containing only two letters.
Example:
I am looking for the word "på" in a string looking like this: "på påsk"
and it comes back as negative.
However if I look for the word "påsk" then it comes back positive.
This is the regex I am using:
const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg på plagg";
console.log(doesWordExist(stringOfWords, "på"))
//Expected result: true
//Actual result: false
However if I were to change the word "på" to a three letter word then it comes back true:
const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s);
stringOfWords = "Färg pås plagg";
console.log(doesWordExist(stringOfWords, "pås"))
//Expected result: true
//Actual result: true
I have been looking around for answers and I have found a few that have similar issues with Swedish letters, none of them really look for only the word in its entirity.
Could anyone explain what I am doing wrong?
The word boundary \b strictly depends on the characters matched by \w, which is a short-hand character class for [A-Za-z0-9_].
For obtaining a similar behaviour you must re-implement its functionality, for example like this:
const swedishCharClass = '[a-zäöå]';
const doesWordExist = (s, word) => new RegExp(
'(?<!' + swedishCharClass + ')' + word + '(?!' + swedishCharClass + ')', 'i'
).test(s);
console.log(doesWordExist("Färg på plagg", "på")); // true
console.log(doesWordExist("Färg pås plagg", "pås")); // true
console.log(doesWordExist("Färg pås plagg", "på")); // false
For more complex alphabets, I'd suggest you to take a look at Concrete Javascript Regex for Accented Characters (Diacritics).
Since many cases using Regex, differs from case to case, depending on what format your string is in, I'm having a hard time finding a solution to my problem.
I have an array containing strings in the format, as an example:
"XX:XX - XX:XX Algorithm and Data Structures"
Where "XX:XX - XX:XX" is timespan for a lecture, and X being a number.
I'm new to Regex and trying to split the string at the first letter occurring, like so:
let str = "08:15 - 12:50 Algorithm and Data Structures";
let re = //Some regex expression
let result = str.split(re); // Output: ["08:15 - 12:50", "Algorithm and Data Structures"]
I'm thinking it should be something like /[a-Z]/ but I'm not sure at all...
Thanks in advance!
The easiest way is probably to "mark" where you want to split and then split:
const str = '12 34 abcde 45 abcde'.replace(/^([^a-z]+)([a-z])/i, '$1,$2');
// '12 34 ,abcde 45 abcde'
str.split(',')
// [ '12 34 ', 'abcde 45 abcde' ]
This finds the place where the string starts, has a bunch of non a-z characters, then has an a-z characters, and puts a comma right in-between. Then you split by the comma.
You can also split directly with a positive look ahead but it might make the regex a bit less readable.
console.log(
"08:15 - 12:50 Algorithm and Data Structures".split(/ ([A-Za-z].*)/).filter(Boolean)
)
or, if it's really always XX:XX - XX:XX, easier to just do:
const splitTimeAndCourse = (input) => {
return [
input.slice(0, "XX:XX - XX:XX".length),
input.slice("XX:XX - XX:XX".length + 1)
]
}
console.log(splitTimeAndCourse("08:15 - 12:50 Algorithm and Data Structures"))
If you have a fixed length of the string where the time is, you can use this regex for example
(^.{0,13})(.*)
Check this here https://regex101.com/r/ANMHy5/1
I know you asked about regex in particular, but here is a way to this without regex...
Provided your time span is always at the beginning of your string and will always be formatted with white space between the numbers as XX:XX - XX:XX. You could use a function that splits the string at the white space and reconstructs the first three indexed strings into one chunk, the time span, and the last remaining strings into a second chunk, the lecture title. Then return the two chunks as an array.
let str = "08:15 - 12:50 Algorithm and Data Structures";
const splitString = (str) => {
// split the string at the white spaces
const strings = str.split(' ')
// define variables
let lecture = '',
timespan = '';
// loop over the strings
strings.forEach((str, i) => {
// structure the timespan
timespan = `${strings[0]} ${strings[1]} ${strings[2]}`;
// conditional to get the remaining strings and concatenate them into a new string
i > 2 && i < strings.length?lecture += `${str} `: '';
})
// place them into an array and remove white space from end of second string
return [timespan, lecture.trimEnd()]
}
console.log(splitString(str))
For that format, you might also use 2 capture groups instead of using split.
^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)
The pattern matches:
^ Start of string
(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2}) Capture group 1, match a timespan like pattern
\s+ Match 1+ whitspac chars
([A-Za-z].*) Capture group 2, start with a char A-Za-z and match the rest of the line.
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)/;
let [, ...groups] = str.match(regex);
console.log(groups);
Another option using split might be asserting not any chars a-zA-Z to the left from the start of the string using a lookbehind (see this link for the support), match 1+ whitespace chars and asserting a char a-zA-Z to the right.
(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])/;
console.log(str.split(regex))
I am using
(/(?<=[.,])(?=[^\s])/mg,' ')
to add spaces after . and , that are not followed by spaces. I want to ignore instances of the word U.S. Could someone help do this?
You can use this regex
\b(U\.S)\b|([,.])(?=\S)
\b(U\.S)\b - Matches U.S. Since nothing is mentioned in question so i am considering word boundaries. (g1)
([.,])(?=\S) - Matches . or , followed by a non space character. (g2)
let str = 'ab.c,de'
let str2 = 'U.S xyzU.S U.S xyz.x'
const replacer = (input)=>{
return input.replace(/\b(U\.S)\b|([,.])(?=\S)/gm, function(match,g1,g2){
return g1 ? g1 : g2+' '
})
}
console.log(replacer(str))
console.log(replacer(str2))
I have some problem with regex in JS. I wrote my regular expression:
/^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*\])\s+(.+)$/g
But it gives wrong result with one example:
WARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
I want regex to divide the string in a such parts:
WARN
2016-01-19
13:17:32,051
[localhost-startStop-1]
Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]
And everything OK, except last 2 parts. There I got:
[localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from]
and [ Starting Day]
Why? I want to divide that part of string by first ] occurrence. Don't know why it takes the second.
PS: Here is the example: https://regex101.com/r/wG5xV6/2
Thanks.
You need to restrict the .* (that matches zero or more characters other than a newline, as many as possible) with a lazy dot matching .*? that matches zero or more characters other than a newline, as few as possible:
^([A-Z]+)\s+([^\s]+)\s+([^\s]+)\s(\[.*?\])\s+(.+)$
^^^
See the regex demo
You can also shorten the pattern a bit by replacing [^\s] with \S:
^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$
Another demo
var re = /^([A-Z]+)\s+(\S+)\s+(\S+)\s(\[.*?\])\s+(.+)$/gm;
var str = 'INFO 2016-01-20 08:03:21,113 [C3P0PooledConnectionPoolManager[identityToken->1bqu9pa9eq1cqr515yzwu7|6c240779]-HelperThread-#0] Connection to \'rander\' established. Notifying listeners...\nWARN 2016-01-19 13:17:32,051 [localhost-startStop-1] Duplicate property values for key Data\ Df : [ Date from] and [ Starting Day]';
while ((m = re.exec(str)) !== null) {
document.body.innerHTML += "<pre>"+ JSON.stringify([m[1],m[2],m[3],m[4], m[5]], 0, 4) + "</pre>";
}
You can try this:
^(.*?)(\s*?)(\S*?)(\s*?)(\S*?)(\s*?)(\[.*?\])(\s*)(.*?)$
Also change \S with .
\S means not sapce.
?means get less.
The rules of this sentence can be expressed as follows
begin + word + space + word + space + word + space + word + space + word + end
It must find first ],so we use ? to find it.
if u want to change the format of this sentence,you can replace it use
($1)\r($3)\r($5)\r($7)\r($9) or other.
Given the Javascript below how can I add a condition to the clause? I would like to add a "space" character after a separator only if a space does not already exist. The current code will result in double-spaces if a space character already exists in spacedText.
var separators = ['.', ',', '?', '!'];
for (var i = 0; i < separators.length; i++) {
var rg = new RegExp("\\" + separators[i], "g");
spacedText = spacedText.replace(rg, separators[i] + " ");
}
'. , ? ! .,?!foo'.replace(/([.,?!])(?! )/g, '$1 ');
//-> ". , ? ! . , ? ! foo"
Means replace every occurence of one of .,?! that is not followed by a space with itself and a space afterwards.
I would suggest the following regexp to solve your problem:
"Test!Test! Test.Test 1,2,3,4 test".replace(/([!,.?])(?!\s)/g, "$1 ");
// "Test! Test! Test. Test 1, 2, 3, 4 test"
The regexp matches any character in the character class [!,.?] not followed by a space (?!\s). The parenthesis around the character class means that the matched separator will be contained in the first backreference $1, which is used in the replacement string. See this fiddle for working example.
You could do a replace of all above characters including a space. In that way you will capture any punctuation and it's trailing space and replace both by a single space.
"H1a!. A ?. ".replace(/[.,?! ]+/g, " ")
[.,?! ] is a chararcter class. It will match either ., ,, ?, ! or and + makes it match atleast once (but if possible multiple times).
spacedText = spacedText.replace(/([\.,!\?])([^\s])/g,"$1 ")
This means: replace one of these characters ([\.,!\?]) followed by a non-whitespace character ([^\s]) with the match from first group and a space ("$1 ").
Here is a working code :
var nonSpaced= 'Hello World!Which is your favorite number? 10,20,25,30 or other.answer fast.';
var spaced;
var patt = /\b([!\.,\?])+\b/g;
spaced = nonSpaced.replace(patt, '$1 ');
If you console.log the value of spaced, It will be : Hello World! Which is your favorite number? 10, 20, 25, 30 or other. answer fast. Notice the number of space characters after the ? sign , it is only one, and there is not extra space after last full-stop.