regex replace string with dollar sign, Weird output - javascript

I am using a javascript function as follows to convert an html blockquote to a markdown blockquote:
function convertBlockquote(str) {
var r = str;
var pat = /<blockquote>\n?([\s\S]*?)\n?<\/blockquote>/mi; //[\s\S] = dotall; ? = non-greedy match
for (var mat; (mat = r.match(pat)) !== null; ) {
mat = mat[1]
.replace(/\n/gm, '\n> ')
.replace(/<p>/igm, '\n> ')
.replace(/<\/p>/igm, '\n> \n> ')
.replace(/(\n> ?){3,}/gm, '\n> \n> ');
r = r.replace(pat, '\n>' + mat + '\n');
}
return r;
}
So if I pass in: <blockquote>Price: $1000 plus tax.</blockquote>
I would expect: > Price: $1000 plus tax.
But I get: > Price: Price: $1000 plus tax.000 plus tax.
Notice how it is replacing $1 in the $1000 part of the string with the entire original string?
How can I escape this or update the function to handle this (and other special chars which might cause similar issue)?

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace:
You can specify a function as the second parameter. In this case, the function will be invoked after the match has been performed. The function's result (return value) will be used as the replacement string. (Note: the above-mentioned special replacement patterns do not apply in this case.)
So you can write
r = r.replace(pat, function () { return '\n>' + mat + '\n'; });

Am I missing something obvious or the implementation looks complicated?
That's how I'd do that:
str=str.replace(/<blockquote>(.*?)<\/blockquote>/g,'> $1')
In case you'd like the multiline quotes to be matched:
str=str.replace(/<blockquote>((.|\n)*?)<\/blockquote>/g,'> $1')

Related

Transform string of text using JavaScript

I am working on a code to transform a string of text into a Sentence case which would also retain Acronyms. I did explore similar posts in StackOverflow, however, I couldn't find the one which suits my requirement.
I have already achieved the transformation of Acronyms and the first letter in the sentence. however, I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.
Below is the code I am currently working on, I would need someone to help me Optimize the code and to fix the issues.
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
// Certain words such as initialisms or acronyms should be left uppercase
uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
for (i = 0, j = uppers.length; i < j; i++)
str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
uppers[i].toUpperCase());
// To remove Special caharacters like ':' and '?'
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
return str;
}
Input: play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa.
Current Output: Play around - This is a String Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the ACRONYMS as it is like NASA.
Expected Output: Play around - this is a string of text, which needs to be converted to sentence case at the same time keeping the ACRONYMS as it is like NASA.
Here's a runnable version of the initial code (I have slightly modified the input string):
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.replace(/(^\w{1}|\.\s*\w{1})/gi, function(txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
// Certain words such as initialisms or acronyms should be left uppercase
uppers = ['Id', 'Tv', 'Nasa', 'Acronyms'];
for (i = 0, j = uppers.length; i < j; i++)
str = str.replace(new RegExp('\\b' + uppers[i] + '\\b', 'g'),
uppers[i].toUpperCase());
// To remove Special caharacters like ':' and '?'
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
return str;
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = input.toSentenceCase()
console.log(result)
I ran into other issues like some letters in the sentence are still in Uppercase, especially texts in and after Double Quotes (" ") and camelcase texts.
Some letters remain uppercased because you are not calling .toLowerCase() anywhere in your code. Expect in the beginning, but that regex is targetingonly the initial letters of sentences, not other letters.
It can be helpful to first lowercase all letters, and then uppercase some letters (acronyms and initial letters of sentences). So, let's call .toLowerCase() in the beginning:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
// ...
return str;
}
Next, let's take a look at this regex:
/(^\w{1}|\.\s*\w{1})/gi
The parentheses are unnecessary, because the capturing group is not used in the replacer function. The {1} quantifiers are also unnecessary, because by default \w matches only one character. So we can simplify the regex like so:
/^\w|\.\s*\w/gi
This regex finds two matches from the input string:
p
. a
Both matches contain only one letter (\w), so in the replacer function, we can safely call txt.toUpperCase() instead of the current, more complex expression (txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()). We can also use an arrow function:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this.toLowerCase();
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
However, the initial letter of the third sentence is not uppercased because the sentence starts with a quote. Because we are anyway going to remove quotes and question marks, let's do it at the beginning.
Let's also simplify and combine the regexes:
// Before
str = str.replace(/[""]/g,'');
str = str.replace(/[?]/g,'');
str = str.replace(/[:]/g,' - ');
// After
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
So:
String.prototype.toSentenceCase = function() {
var i, j, str, lowers, uppers;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
// ...
return str;
}
Now the initial letter of the third sentence is correctly uppercased. That's because when we are uppercasing the initial letters, the third sentence doesn't start with a quote anymore (because we have removed the quote).
What's left is to uppercase acronyms. In your regex, you probably want to use the i flag as well for case-insensitive matches.
Instead of using a for loop, it's possible to use a single regex to look for all matches and uppercase them. This allows us to get rid of most of the variables as well. Like so:
String.prototype.toSentenceCase = function() {
var str;
str = this;
str = str.toLowerCase();
str = str.replace(/["?]/g,'');
str = str.replace(/:/g,' - ');
str = str.replace(/^\w|\.\s*\w/gi, (txt) => txt.toUpperCase());
str = str.replace(/\b(id|tv|nasa|acronyms)\b/gi, (txt) => txt.toUpperCase());
return str;
}
And looks like we are now getting correct results!
Three more things, though:
Instead of creating and mutating the str variable, we can modify this and chain the method calls.
It might make sense to rename the txt variables to match variables, since they are regex matches.
Modifying a built-in object's prototype is a bad idea. Creating a new function is a better idea.
Here's the final code:
function convertToSentenceCase(str) {
return str
.toLowerCase()
.replace(/["?]/g, '')
.replace(/:/g, ' - ')
.replace(/^\w|\.\s*\w/gi, (match) => match.toUpperCase())
.replace(/\b(id|tv|nasa|acronyms)\b/gi, (match) => match.toUpperCase())
}
const input = `play around: This is a "String" Of text, which needs to be cONVERTED to Sentence Case at the same time keeping the Acronyms as it is like Nasa. another sentence. "third" sentence starting with a quote.`
const result = convertToSentenceCase(input)
console.log(result)

String.replaceAll using regular expression in both parameters in JS

I need to replace all instances of a substring with a modified version of the substring, can I do something like:
const regex = /[0-9]{4}[A-Z]{3}/g; // format: 0000ABC
myString = myString.replaceAll(regex, regex + " I'm modified");
Abstract example
If myString is
5000ABC, 250XYZ, GEN3000
and I want to modify certain 4 digit - 3 letter patterns, my expected output is
5000ABC I'm modified, 250XYZ, 1000DEF I'm modified, GEN3000
Not sure I understand the output format you want, but guessing from your example it looks like you want to append some string after each match. Here we go:
const input = 'Some text with 1234ABC and 5678XYZ';
const regex = /([0-9]{4}[A-Z]{3})/g; // format: 0000ABC
var result = input.replace(regex, "$1 I'm modified");
console.log('input: ' + input);
console.log('result: ' + result);
Output:
input: Some text with 1234ABC and 5678XYZ
result: Some text with 1234ABC I'm modified and 5678XYZ I'm modified
Explanation:
capture your pattern with parenthesis for later use
use a string .replace() where you can reference the captured pattern with $1, and prefix/append any text you want
The workaround I have at the moment is
const regex = /[0-9]{4}[A-Z]{3}/g;
var matches = myString.match(regex);
if (matches && matches.length > 0) {
for (var match of matches) {
var modified = match + "I'm modified";
myString = myString.replaceAll(match, modified)
}
}

How to get value in $1 in regex to a variable for further manipulation [duplicate]

You can backreference like this in JavaScript:
var str = "123 $test 123";
str = str.replace(/(\$)([a-z]+)/gi, "$2");
This would (quite silly) replace "$test" with "test". But imagine I'd like to pass the resulting string of $2 into a function, which returns another value. I tried doing this, but instead of getting the string "test", I get "$2". Is there a way to achieve this?
// Instead of getting "$2" passed into somefunc, I want "test"
// (i.e. the result of the regex)
str = str.replace(/(\$)([a-z]+)/gi, somefunc("$2"));
Like this:
str.replace(regex, function(match, $1, $2, offset, original) { return someFunc($2); })
Pass a function as the second argument to replace:
str = str.replace(/(\$)([a-z]+)/gi, myReplace);
function myReplace(str, group1, group2) {
return "+" + group2 + "+";
}
This capability has been around since Javascript 1.3, according to mozilla.org.
Using ESNext, quite a dummy links replacer but just to show-case how it works :
let text = 'Visit http://lovecats.com/new-posts/ and https://lovedogs.com/best-dogs NOW !';
text = text.replace(/(https?:\/\/[^ ]+)/g, (match, link) => {
// remove ending slash if there is one
link = link.replace(/\/?$/, '');
return `${link.substr(link.lastIndexOf('/') +1)}`;
});
document.body.innerHTML = text;
Note: Previous answer was missing some code. It's now fixed + example.
I needed something a bit more flexible for a regex replace to decode the unicode in my incoming JSON data:
var text = "some string with an encoded 's' in it";
text.replace(/&#(\d+);/g, function() {
return String.fromCharCode(arguments[1]);
});
// "some string with an encoded 's' in it"
If you would have a variable amount of backreferences then the argument count (and places) are also variable. The MDN Web Docs describe the follwing syntax for sepcifing a function as replacement argument:
function replacer(match[, p1[, p2[, p...]]], offset, string)
For instance, take these regular expressions:
var searches = [
'test([1-3]){1,3}', // 1 backreference
'([Ss]ome) ([A-z]+) chars', // 2 backreferences
'([Mm][a#]ny) ([Mm][0o]r[3e]) ([Ww][0o]rd[5s])' // 3 backreferences
];
for (var i in searches) {
"Some string chars and many m0re w0rds in this test123".replace(
new RegExp(
searches[i]
function(...args) {
var match = args[0];
var backrefs = args.slice(1, args.length - 2);
// will be: ['Some', 'string'], ['many', 'm0re', 'w0rds'], ['123']
var offset = args[args.length - 2];
var string = args[args.length - 1];
}
)
);
}
You can't use 'arguments' variable here because it's of type Arguments and no of type Array so it doesn't have a slice() method.

Remove all dots except the first one from a string

Given a string
'1.2.3.4.5'
I would like to get this output
'1.2345'
(In case there are no dots in the string, the string should be returned unchanged.)
I wrote this
function process( input ) {
var index = input.indexOf( '.' );
if ( index > -1 ) {
input = input.substr( 0, index + 1 ) +
input.slice( index ).replace( /\./g, '' );
}
return input;
}
Live demo: http://jsfiddle.net/EDTNK/1/
It works but I was hoping for a slightly more elegant solution...
There is a pretty short solution (assuming input is your string):
var output = input.split('.');
output = output.shift() + '.' + output.join('');
If input is "1.2.3.4", then output will be equal to "1.234".
See this jsfiddle for a proof. Of course you can enclose it in a function, if you find it necessary.
EDIT:
Taking into account your additional requirement (to not modify the output if there is no dot found), the solution could look like this:
var output = input.split('.');
output = output.shift() + (output.length ? '.' + output.join('') : '');
which will leave eg. "1234" (no dot found) unchanged. See this jsfiddle for updated code.
It would be a lot easier with reg exp if browsers supported look behinds.
One way with a regular expression:
function process( str ) {
return str.replace( /^([^.]*\.)(.*)$/, function ( a, b, c ) {
return b + c.replace( /\./g, '' );
});
}
You can try something like this:
str = str.replace(/\./,"#").replace(/\./g,"").replace(/#/,".");
But you have to be sure that the character # is not used in the string; or replace it accordingly.
Or this, without the above limitation:
str = str.replace(/^(.*?\.)(.*)$/, function($0, $1, $2) {
return $1 + $2.replace(/\./g,"");
});
You could also do something like this, i also don't know if this is "simpler", but it uses just indexOf, replace and substr.
var str = "7.8.9.2.3";
var strBak = str;
var firstDot = str.indexOf(".");
str = str.replace(/\./g,"");
str = str.substr(0,firstDot)+"."+str.substr(1,str.length-1);
document.write(str);
Shai.
Here is another approach:
function process(input) {
var n = 0;
return input.replace(/\./g, function() { return n++ > 0 ? '' : '.'; });
}
But one could say that this is based on side effects and therefore not really elegant.
This isn't necessarily more elegant, but it's another way to skin the cat:
var process = function (input) {
var output = input;
if (typeof input === 'string' && input !== '') {
input = input.split('.');
if (input.length > 1) {
output = [input.shift(), input.join('')].join('.');
}
}
return output;
};
Not sure what is supposed to happen if "." is the first character, I'd check for -1 in indexOf, also if you use substr once might as well use it twice.
if ( index != -1 ) {
input = input.substr( 0, index + 1 ) + input.substr(index + 1).replace( /\./g, '' );
}
var i = s.indexOf(".");
var result = s.substr(0, i+1) + s.substr(i+1).replace(/\./g, "");
Somewhat tricky. Works using the fact that indexOf returns -1 if the item is not found.
Trying to keep this as short and readable as possible, you can do the following:
JavaScript
var match = string.match(/^[^.]*\.|[^.]+/g);
string = match ? match.join('') : string;
Requires a second line of code, because if match() returns null, we'll get an exception trying to call join() on null. (Improvements welcome.)
Objective-J / Cappuccino (superset of JavaScript)
string = [string.match(/^[^.]*\.|[^.]+/g) componentsJoinedByString:''] || string;
Can do it in a single line, because its selectors (such as componentsJoinedByString:) simply return null when sent to a null value, rather than throwing an exception.
As for the regular expression, I'm matching all substrings consisting of either (a) the start of the string + any potential number of non-dot characters + a dot, or (b) any existing number of non-dot characters. When we join all matches back together, we have essentially removed any dot except the first.
var input = '14.1.2';
reversed = input.split("").reverse().join("");
reversed = reversed.replace(\.(?=.*\.), '' );
input = reversed.split("").reverse().join("");
Based on #Tadek's answer above. This function takes other locales into consideration.
For example, some locales will use a comma for the decimal separator and a period for the thousand separator (e.g. -451.161,432e-12).
First we convert anything other than 1) numbers; 2) negative sign; 3) exponent sign into a period ("-451.161.432e-12").
Next we split by period (["-451", "161", "432e-12"]) and pop out the right-most value ("432e-12"), then join with the rest ("-451161.432e-12")
(Note that I'm tossing out the thousand separators, but those could easily be added in the join step (.join(','))
var ensureDecimalSeparatorIsPeriod = function (value) {
var numericString = value.toString();
var splitByDecimal = numericString.replace(/[^\d.e-]/g, '.').split('.');
if (splitByDecimal.length < 2) {
return numericString;
}
var rightOfDecimalPlace = splitByDecimal.pop();
return splitByDecimal.join('') + '.' + rightOfDecimalPlace;
};
let str = "12.1223....1322311..";
let finStr = str.replace(/(\d*.)(.*)/, '$1') + str.replace(/(\d*.)(.*)/, '$2').replace(/\./g,'');
console.log(finStr)
const [integer, ...decimals] = '233.423.3.32.23.244.14...23'.split('.');
const result = [integer, decimals.join('')].join('.')
Same solution offered but using the spread operator.
It's a matter of opinion but I think it improves readability.

JavaScript - string regex backreferences

You can backreference like this in JavaScript:
var str = "123 $test 123";
str = str.replace(/(\$)([a-z]+)/gi, "$2");
This would (quite silly) replace "$test" with "test". But imagine I'd like to pass the resulting string of $2 into a function, which returns another value. I tried doing this, but instead of getting the string "test", I get "$2". Is there a way to achieve this?
// Instead of getting "$2" passed into somefunc, I want "test"
// (i.e. the result of the regex)
str = str.replace(/(\$)([a-z]+)/gi, somefunc("$2"));
Like this:
str.replace(regex, function(match, $1, $2, offset, original) { return someFunc($2); })
Pass a function as the second argument to replace:
str = str.replace(/(\$)([a-z]+)/gi, myReplace);
function myReplace(str, group1, group2) {
return "+" + group2 + "+";
}
This capability has been around since Javascript 1.3, according to mozilla.org.
Using ESNext, quite a dummy links replacer but just to show-case how it works :
let text = 'Visit http://lovecats.com/new-posts/ and https://lovedogs.com/best-dogs NOW !';
text = text.replace(/(https?:\/\/[^ ]+)/g, (match, link) => {
// remove ending slash if there is one
link = link.replace(/\/?$/, '');
return `${link.substr(link.lastIndexOf('/') +1)}`;
});
document.body.innerHTML = text;
Note: Previous answer was missing some code. It's now fixed + example.
I needed something a bit more flexible for a regex replace to decode the unicode in my incoming JSON data:
var text = "some string with an encoded 's' in it";
text.replace(/&#(\d+);/g, function() {
return String.fromCharCode(arguments[1]);
});
// "some string with an encoded 's' in it"
If you would have a variable amount of backreferences then the argument count (and places) are also variable. The MDN Web Docs describe the follwing syntax for sepcifing a function as replacement argument:
function replacer(match[, p1[, p2[, p...]]], offset, string)
For instance, take these regular expressions:
var searches = [
'test([1-3]){1,3}', // 1 backreference
'([Ss]ome) ([A-z]+) chars', // 2 backreferences
'([Mm][a#]ny) ([Mm][0o]r[3e]) ([Ww][0o]rd[5s])' // 3 backreferences
];
for (var i in searches) {
"Some string chars and many m0re w0rds in this test123".replace(
new RegExp(
searches[i]
function(...args) {
var match = args[0];
var backrefs = args.slice(1, args.length - 2);
// will be: ['Some', 'string'], ['many', 'm0re', 'w0rds'], ['123']
var offset = args[args.length - 2];
var string = args[args.length - 1];
}
)
);
}
You can't use 'arguments' variable here because it's of type Arguments and no of type Array so it doesn't have a slice() method.

Categories

Resources