Javascript: Regex to replace two characters and all their surrounding whitespace - javascript

I have a string and would like to replace all instances of the two characters "<" and ">" together with all its surrounding whitespace (no tabs, no newlines, possibly empty) by " < " and " > ", respectively.
Can I do this with a one-liner replace regex expression?
The slow and hard way would be
while (entry.value.indexOf(" <") > -1) {
entry.value = entry.value.replace(" <","<");
}
while (entry.value.indexOf("< ") > -1) {
entry.value = entry.value.replace("< ","<");
}
while (entry.value.indexOf(" >") > -1) {
entry.value = entry.value.replace(" >",">");
}
while (entry.value.indexOf("> ") > -1) {
entry.value = entry.value.replace("> ",">");
}
entry.value = entry.value.replace("<"," < ").replace(">"," > ");
Shortening the whitespace is explained at Regex to replace multiple spaces with a single space, but I do not assume whitespaces around the two characters.
The use case I have are saving math expressions in a database to be presented on a website using MathJax. Doing so, one runs into exactly this problem, see http://docs.mathjax.org/en/latest/tex.html#tex-and-latex-in-html-documents.
Typical expressions are
"Let $i$ such that $i<j$..."
"Let $<I>$ be an ideal in..."
(the later wouldn't even render here in the preview in normal text mode.)

Copy pasting Wiktor's comment here. \s matches any whitespace character, * indicates to match 0 or more of those whitespace characters, [<>] matches any < or >, the g flag indicates to do a global replace instead of just replacing the first match, and the parentheses are to create a capture group so that we can use $1 to refer to the match as a backreference in the replacement string.
See some example input output below.
'<>' // => ' < > ' (two spaces between the carets)
'<\t\t\n\ \n<' // => ' < < ' (again two spaces)
'>a \t b< ' // => ' > a \t b < '
'a>\n b <c ' // => 'a > b < c '
a = 'fpo< \n>\naf ja\tb<>\t<><>asd\npfi b.<< > >';
b = a.replace(/\s*([<>])\s*/g, ' $1 ');
console.log(b);

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Javascript Regex, Add space before and after math operators if there is none

Im trying to make the perfect math parser for my discord bot.
Currently I have a simple function parser that takes in a string which has a ton of .replace methods to clear up a bunch of junk or formatting things leftover from discord, or just replaces {} with () and such quality of life things...
var parseArgs = args.toLowerCase().replace(/ -o/g, "").replace(/x/g, "*").replace(/[a-z]/g, "")
.replace(/{/g, "(").replace(/}/g, ")").replace(/\[/g, "(").replace(/]/g, ")").replace(/\+=/g, "+")
.replace(/-=/g, "-").replace(/'/g, "").replace(/`/g, "").replace(/"/g, "");
var origArgs = args.toLowerCase().replace(/`/g, "").replace(/ -o/g, "");
const output = parseMath(parseArgs);
This is nice and all, but If you input an equation like this:
!math 1 + 1aaa+aaaa2{55>>2}
The parser will output:
1 + 1+2*(55>>2)
I want it to output:
1 + 1 + 2 * (55 >> 2)
Which easily gets parsed by my function, but the equation is sent into the chat, and its quite ugly.
Im asking if theres a simple regex formula to check if a math operator (+ - / * x ( ) >> ^ += -= == ===) like those is between any numbers
so 1+2/3(4>>2) and 3>>4===3*4 will turn into 1 + 2 / 3 (4 >> 2) and 3 >> 4 === 3 * 4 respectively.
Edit: I see how crappy my replaces are, so I simplified them:
var parseArgs = args.toLowerCase().replace(/x/g, "*").replace(/ -o|[a-z]|"|'|`/g, "")
.replace(/{|\[/g, "(").replace(/}|]/g, ")").replace(/\+=/g, "+").replace(/-=/g, "-");
var origArgs = args.toLowerCase().replace(/ -o|`/g, "");
First remove anything that isn't mathematical (remove anything that isn't a number or a possible operator), then use .replace to match zero or more spaces, followed by any of the operators, then match zero or more spaces again, and replace with the operator with one space on each side:
const parse = (args) => {
const argsWithOnlyMath = args.replace(/[^\d+\-\/*x()>^=]/g, ' ');
const spacedArgs = argsWithOnlyMath
.replace(/\s*(\D+)\s*/g, ' $1 ') // add spaces
.replace(/ +/g, ' ') // ensure no duplicate spaces
.replace(/\( /g, '(') // remove space after (
.replace(/ \)/g, ')'); // remove space before )
console.log(spacedArgs);
};
parse('!math 1 + 1aaa+aaaa2(55>>2)');
parse(' 1+2/3(4>>2) ');
parse('3>>4===3*4');
To also add spaces before ( and after ), just add more .replaces:
const parse = (args) => {
const argsWithOnlyMath = args.replace(/[^\d+\-\/*x()>^=]/g, ' ');
const spacedArgs = argsWithOnlyMath
.replace(/\s*(\D+)\s*/g, ' $1 ') // add spaces
.replace(/\(/g, ' (') // add space before (
.replace(/\)/g, ') ') // add space after )
.replace(/ +/g, ' ') // ensure no duplicate spaces
.replace(/\( /g, '(') // remove space after (
.replace(/ \)/g, ')'); // remove space before )
console.log(spacedArgs);
};
parse('!math 1 + 1aaa+aaaa2(55>>2)');
parse(' 1+2/3(4>>2) *()');
parse('3*()');

Regex remove duplicate adjacent characters in javascript

I've been struggling getting my regex function to work as intended. My goal is to iterate endlessly over a string (until no match is found) and remove all duplicate, adjacent characters. Aside from checking if 2 characters (adjacent of each other) are equal, the regex should only remove the match when one of the pair is uppercase.
e.g. the regex should only remove 'Xx' or 'xX'.
My current regex only removes matches where a lowercase character is followed by any uppercase character.
(.)(([a-z]{0})+[A-Z])
How can I implement looking for the same adjacent character and the pattern of looking for an uppercase character followed by an equal lowercase character?
You'd either have to list out all possible combinations, eg
aA|Aa|bB|Bb...
Or implement it more programatically, without regex:
let str = 'fooaBbAfoo';
outer:
while (true) {
for (let i = 0; i < str.length - 1; i++) {
const thisChar = str[i];
const nextChar = str[i + 1];
if (/[a-z]/i.test(thisChar) && thisChar.toUpperCase() === nextChar.toUpperCase() && thisChar !== nextChar) {
str = str.slice(0, i) + str.slice(i + 2);
continue outer;
}
}
break;
}
console.log(str);
Looking for the same adjacent character: /(.)\1/
Looking for an uppercase character followed by an equal lowercase character isn't possible in JavaScript since it doesn't support inline modifiers. If they were regex should be: /(.)(?!\1)(?i:\1)/, so it matches both 'xX' or 'Xx'

understanding this regular expressions

var keys = {};
source.replace(
/([^=&]+)=([^&]*)/g,
function(full, key, value) {
keys[key] =
(keys[key] ? keys[key] + "," : "") + value;
return "";
}
);
var result = [];
for (var key in keys) {
result.push(key + "=" + keys[key]);
}
return result.join("&");
}
alert(compress("foo=1&foo=2&blah=a&blah=b&foo=3"));
i still confuse with this /([^=&]+)=([^&]*)/g , the + and * use for ?
The ^ means NOT these, the + means one or more characters matching, the () are groups. And the * is any ammount of matches (0+).
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
So by looking at it, I'm guesing its replacing anything thats NOT =&=& or &=& or ==, which is wierd.
+ and * are called quantifiers. They determine how many times can a subset match (the set of characters immediately preceding them usually grouped with [] or () to which the quantifiers apply) repeat.
/ start of regex
( group 1 starts
[^ anything that does not match
=& equals or ampersand
]+ one or more of above
) group 1 ends
= followed by equals sign followed by
( group 2 starts
[^ anything that does not match
=& ampersand
]* zero or more of above
) group 2 ends
/ end of regex

How to replace css('background-image')

I want to replace css('background-image') path.
The problem:
for the same variable oldBgImg = this.element.css('background-image')
FireFox returns -
"url("http://mySite/images/file1.png")"
but Chrome returns it without the quotes:
"url(http://mySite/images/file1.png)"
Here is the solution I use. can you please help me make it simpler?
var oldBgImg = this.element.css('background-image');
// => FF: "url("http://mySite/images/file1.png")"
// Chrome: "url(http://mySite/images/file1.png)"
// According to http://www.w3.org/TR/CSS2/syndata.html#value-def-uri :
// quotes are optional, so Chrome does not use them, but FF does . . .
var n1 = oldBgImg.lastIndexOf("("); n1 += 1; // now points to the char after the "("
var n2 = oldBgImg.lastIndexOf(")"); n2 -= 1; // now points to the char before the ")"
var c1 = oldBgImg.substring(n1, n1 + 1); // test the first Char after the "("
var c2 = oldBgImg.substring(n2, n2 + 1); // test the first Char after the "("
if ( (c1 == "\"") || (c1 == "\'") ) { n1 += 1; }
if ( (c2 == "\"") || (c2 == "\'") ) { n2 -= 1; }
var oldBgImgPath = oldBgImg.substring(n1, n2 + 1); // [ (" ] .. [ ") ]
var n = oldBgImgPath.lastIndexOf("/");
var newBgImgPath = oldBgImgPath.substring(0, n + 1) + "file2.gif";
// if needed, should also add :
// var path = encodeURI(newBgImgPath);
this.element.css('background-image', 'url(' + newBgImgPath + ')');
Notes:
According to http://www.w3.org/TR/CSS2/syndata.html#value-def-uri
one can use single quote or double-quote or no quote sign
I am looking for a general solution, also for relative path (without "http" or with "file") , I just want to replace the fileName within the URL.
Here's an example of how to do it with regular expressions. - live demo
The expression:
("?)(http:.*?)\1\)
The match
url = 'url("http://mySite/images/file1.png")'.match(/("?)(http:.*?)\1\)/)[2];
You can then reconstruct your property.
$(this).css( 'background-image', 'url("' + url + "')" );
This should work on all browsers.
I did it with regular expressions. I use this code:
var re = /url\(['"]?(.+?)[^\/]+['"]?\)/;
var regs = re.exec(oldBgImg);
var newBgImgPath = regs[1] + "file2.png";
JSFiddle
I'll explain the RE.
It starts with a /, this will indicate it's a RE.
Then there's url\(. It matches the text url(. ( is escaped because it is a reserved character.
Then there is ['"]?. ['"] matches ' or " and the ? makes it optional.
A ( starts a RE group, that can be referred to.
In .+? . matches all charaters except a newline. A + tells that there must be at least 1 of them, or more. Finally, a ? makes the + non-greedy, so it matches as little characters as possible but still tries to match the whole RE.
A ) ends the group.
[^\/] matches any non-/ character. Then there's a + again. It has no ? after it, because we want to match as many non-/ characters (the file name) from the end as we can.
Finally, another optional quote, an escaped ) for the closing bracket in url(...) and a / to end the RE.
Now re.exec(oldBgImg) returns an array with the first element being the whole matched string and the next elements being the matched RE groups (created by () brackets). Then I can just take regs[1], which is the first matched group and contains the pathname.
You could replace the quotes in oldBgImg with nothing like this.
oldBgImg = oldBgImg.replace(/\"/g, "");
That way the URL is always the same no matter what browser retrieved it.

Categories

Resources