I have a text that has sentences that may not have space after a dot like:
See also vadding.Constructions on this term abound.
How can I add a space after a dot that is not before the domain name? The text may have URLs like:
See also vadding.Constructions on this term abound. http://example.com/foo/bar
Match and capture an URL and just match all other dots to replace with a dot+space:
var re = /((?:https?|ftps?):\/\/\S+)|\.(?!\s)/g;
var str = 'See also vadding.Constructions on this term abound.\nSee also vadding.Constructions on this term abound. http://example.com/foo/bar';
var result = str.replace(re, function(m, g1) {
return g1 ? g1 : ". ";
});
document.body.innerHTML = "<pre>" + result + "</pre>";
The URL regex - (?:https?|ftps?):\/\/\S+ - matches http or https or ftp, ftps, then :// and 1+ non-whitespaces (\S+). It is one of the basic ones, you can use a more complex one that you can easily find on SO. E.g. see What is a good regular expression to match a URL?.
The approach in more detail:
The ((?:https?|ftps?):\/\/\S+)|\.(?!\s) regex has 2 alternatives: the URL matching part (described above), or (|) the dot matching part (\.(?!\s)).
NOTE that (?!\s) is a negative lookahead that allows matching a dot that is NOT followed with a whitespace.
When we run string.replace() we can specify an anonymous callback function as the second argument and pass the match and group arguments to it. So, here, we have 1 match value (m) and 1 capture group value g1 (the URL). If the URL was matched, g1 is not null. return g1 ? g1 : ". "; means we do not modify the group 1 if it was matched, and if it was not, we matched a standalone dot, thus, we replace with with . .
You can try using RegExp /(\.)(?!=[a-z]{2}\/|[a-z]{3}\/|\s+|$)/g to match . character if not followed by two or three lowercase letters or space character
"See also vadding.Constructions on this term abound. http://example.com/foo/bar"
.replace(/(\.)(?!=[a-z]{2}\/|[a-z]{3}\/|\s+|$)/g, "$1 ")
Using idea from #MarcelKohls
var text = "See also vadding.Constructions on this term abound. http://example.com/foo/bar";
var url_re = /(\bhttps?:\/\/(?:(?:(?!&[^;]+;)|(?=&))[^\s"'<>\]\[)])+\b)/gi;
text = text.split(url_re).map(function(text) {
if (text.match(url_re)) {
return text;
} else {
return text.replace(/\.([^ ])/g, '. $1');
}
}).join('');
document.body.innerHTML = '<pre>' + text + '</pre>';
Use this pattern:
/\.(?! )((?:ftp|http)[^ ]+)?/g
Online Demo
Related
I have this function that finds whole words and should replace them. It identifies spaces but should not replace them, ie, not capture them.
function asd (sentence, word) {
str = sentence.replace(new RegExp('(?:^|\\s)' + word + '(?:$|\\s)'), "*****");
return str;
};
Then I have the following strings:
var sentence = "ich mag Äpfel";
var word = "Äpfel";
The result should be something like:
"ich mag *****"
and NOT:
"ich mag*****"
I'm getting the latter.
How can I make it so that it identifies the space but ignores it when replacing the word?
At first this may seem like a duplicate but I did not find an answer to this question, that's why I'm asking it.
Thank you
You should put back the matched whitespaces by using a capturing group (rather than a non-capturing one) with a replacement backreference in the replacement pattern, and you may also leverage a lookahead for the right whitespace boundary, which is handy in case of consecutive matches:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word + '(?=$|\\s)'), "$1*****");
return str;
};
var sentence = "ich mag Äpfel";
var word = "Äpfel";
console.log(asd(sentence, word));
See the regex demo.
Details
(^|\s) - Group 1 (later referred to with the help of a $1 placeholder in the replacement pattern): a capturing group that matches either start of string or a whitespace
Äpfel - a search word
(?=$|\s) - a positive lookahead that requires the end of string or whitespace immediately to the right of the current location.
NOTE: If the word can contain special regex metacharacters, escape them:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '(?=$|\\s)'), "$1*****");
return str;
};
I have a string like this:
var str = " this is a [link][1]
[1]: http://example.com
and this is a [good website][2] in my opinion
[2]: http://goodwebsite.com
[3]: http://example.com/fsadf.jpg
[![this is a photo][3]][3]
and there is some text hare ..! ";
Now I want this:
var newstr = "this is a [link][1]
and this is a [good website][2] in my opinion
[![this is a photo][3]][3]
and there is some text hare ..!
[1]: http://example.com
[2]: http://goodwebsite.com
[3]: http://example.com/fsadf.jpg"
How can I do that?
In reality, that variable str is the value of a textarea ... and I'm trying to create a markdown editor .. So what I want is exactly the same with what SO's textarea does.
Here is my try:
/^(\[[0-9]*]:.*$)/g to select [any digit]: in the first of line
And I think I should create a group for that using () and then replace it with \n\n $1
try this:
strLinksArray = str.match(/(\[\d+\]\:\s*[^\s\n]+)/g);
strWithoutLinks = str.replace(/(\[\d+\]\:\s*[^\s\n]+)/g, ''); //removed all links
Here you will get links as array and string without links then do whatever changes you want.
You can use
var re = /^(\[[0-9]*]:)\s*(.*)\r?\n?/gm; // Regex declaration
var str = 'this is a [link][1]\n[1]: http://example.com\nand this is a [good website][2] in my opinion\n[2]: http://goodwebsite.com\n[3]: http://example.com/fsadf.jpg\n[![this is a photo][3]][3]\nand there is some text hare ..!';
var links = []; // Array for the links
var result = str.replace(re, function (m, g1, g2) { // Removing the links
links.push(" " + g1 + " " + g2); // and saving inside callback
return ""; // Removal happens here
});
var to_add = links.join("\n"); // Join the links into a string
document.getElementById("tinput").value = result + "\n\n\n" + to_add; // Display
<textarea id="tinput"></textarea>
See regex demo at regex101.com.
Regex explanation:
^ - start of line (due to the /m modifier)
(\[[0-9]*]:) - Group 1 (referred to as g1 in the replace callback) matching...
\[ - opening square bracket
[0-9]* - zero or more digits
] - closing square bracket
: - a colon
\s* - zero or more whitespace
(.*) - Group 2 matching (g2) zero or more characters other than newline
\r?\n? - one or zero \r followed by one or zero \n
/gm - define global search and replace and ^ matches line start instead of string start
I need help with regular expression.
Using javascript I am going through each line of a text file and I want to replace any match of [0-9]{6,9} with a '*', but, I don't want to replace numbers with prefix 100. So, a number like 1110022 should be replaced (matched), but 1004567 should not (no match).
I need a single expression that will do the trick (just the matching part). I can’t use ^ or $ because the number can appear in the middle of the line.
I have tried (?!100)[0-9]{6,9}, but it doesn't work.
More examples:
Don't match: 10012345
Match: 1045677
Don't match:
1004567
Don't match: num="10034567" test
Match just the middle number in the line: num="10048876" 1200476, 1008888
Thanks
You need to use a leading word boundary to check if a number starts with some specific digit sequence:
\b(?!100)\d{6,9}
See the regex demo
Here, the 100 is checked right after a word boundary, not inside a number.
If you need to replace the matches with just a single asterisk, just use the "*" as a replacement string (see snippet right below).
var re = /\b(?!100)\d{6,9}/g;
var str = 'Don\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, '*') + "</pre>";
<div id="r"/>
Or, if you need to replace each digit with *, you need to use a callback function inside a replace:
String.prototype.repeat = function (n, d) {
return --n ? this + (d || '') + this.repeat(n, d) : '' + this
};
var re = /\b(?!100)\d{6,9}/g;
var str = '123456789012 \nDon\'t match: 10012345\n\nMatch: 1045677\n\nDon\'t match:\n\n1004567\n\nDon\'t match: num="10034567" test\n\nMatch just the middle number in the line: num="10048876" 1200476, 1008888';
document.getElementById("r").innerHTML = "<pre>" + str.replace(re, function(m) { return "*".repeat(m.length); }) + "</pre>";
<div id="r"/>
The repeat function is borrowed from BitOfUniverse's answer.
Trying out with a regex for simple problem. My input string is
firstname.ab
And am trying to output it as,
Firstname AB
So the main aim is to capitalize the first letter of the string and replace the dot with space. So chose to write two regex to solve.
First One : To replace dot with space /\./g
Second One : To capitalize the first letter /\b\w/g
And my question is, Can we do both operation with a single regex ?
Thanks in advance !!
You can use a callback function inside the replace:
var str = 'firstname.ab';
var result = str.replace(/^([a-zA-Z])(.*)\.([^.]+)$/, function (match, grp1, grp2, grp3, offset, s) {
return grp1.toUpperCase() + grp2 + " " + grp3.toUpperCase();
});
alert(result);
The grp1, grp2 and grp3 represent the capturing groups in the callback function. grp1 is a leading letter ([a-zA-Z]). Then we capturing any number of character other than newline ((.*) - if you have linebreaks, use [\s\S]*). And then comes the literal dot \. that we do not capture since we want to replace it with a space. And lastly, the ([^.]+$) regex will match and the capture all the remaining substring containing 1 or more characters other then a literal dot till the end.
We can use capturing groups to re-build the input string this way.
var $input = $('#input'),
value = $input.val(),
value = value.split( '.' );
value[0] = value[0].charAt( 0 ).toUpperCase() + value[0].substr(1),
value[1] = value[1].toUpperCase(),
value = value.join( ' ' );
$input.val( value );
It would be much easier if you simply split the value, process the string in the array, and join them back.
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" value="first.ab" id="input">
Given the Javascript below how can I add a condition to the clause? I would like to add a "space" character after a separator only if a space does not already exist. The current code will result in double-spaces if a space character already exists in spacedText.
var separators = ['.', ',', '?', '!'];
for (var i = 0; i < separators.length; i++) {
var rg = new RegExp("\\" + separators[i], "g");
spacedText = spacedText.replace(rg, separators[i] + " ");
}
'. , ? ! .,?!foo'.replace(/([.,?!])(?! )/g, '$1 ');
//-> ". , ? ! . , ? ! foo"
Means replace every occurence of one of .,?! that is not followed by a space with itself and a space afterwards.
I would suggest the following regexp to solve your problem:
"Test!Test! Test.Test 1,2,3,4 test".replace(/([!,.?])(?!\s)/g, "$1 ");
// "Test! Test! Test. Test 1, 2, 3, 4 test"
The regexp matches any character in the character class [!,.?] not followed by a space (?!\s). The parenthesis around the character class means that the matched separator will be contained in the first backreference $1, which is used in the replacement string. See this fiddle for working example.
You could do a replace of all above characters including a space. In that way you will capture any punctuation and it's trailing space and replace both by a single space.
"H1a!. A ?. ".replace(/[.,?! ]+/g, " ")
[.,?! ] is a chararcter class. It will match either ., ,, ?, ! or and + makes it match atleast once (but if possible multiple times).
spacedText = spacedText.replace(/([\.,!\?])([^\s])/g,"$1 ")
This means: replace one of these characters ([\.,!\?]) followed by a non-whitespace character ([^\s]) with the match from first group and a space ("$1 ").
Here is a working code :
var nonSpaced= 'Hello World!Which is your favorite number? 10,20,25,30 or other.answer fast.';
var spaced;
var patt = /\b([!\.,\?])+\b/g;
spaced = nonSpaced.replace(patt, '$1 ');
If you console.log the value of spaced, It will be : Hello World! Which is your favorite number? 10, 20, 25, 30 or other. answer fast. Notice the number of space characters after the ? sign , it is only one, and there is not extra space after last full-stop.