Finding the xml escaped dash in Javascript - javascript

I want to use regular expression to find dashes in an html in javascript. The dashes in html pages sometimes may be xml escaped with the string value of –. However, using regular expression to find this string is not working for some reason.
var html = document.getElementsByTagName('html').item(0).innerHTML;
var escapedDash = /–/ig;
var foundEscapedDash = html.match(escapedDash);
alert(foundEscapedDash);
The regular experession, /–/ig does not result in any values. Nor does the regular expression /-/i find the escaped dash –
Does anyone know of a regular expression that can find the escaped dash?

When you set innerHTML to a string with an entity, it converts it to the literal character. For example:
var div = document.createElement('div');
div.innerHTML = '–'
alert(div.innerHTML.length); // 1, not 7 as may be expected
So you need to match the actual character &ndash, and to do that, you can use the unicode literal representation. For "–", it's \u2013.
div.innerHTML.match(/\u2013/ig)
By the way, assuming the dash is the first character of the string, you can find the hex number 0x2013 for yourself with div.innerHTML.charCodeAt(0).toString(16).

Try this:
var str = '–hello world –';
var escapedDash = /(–+)/ig;
var foundEscapedDash = str.match(escapedDash);
alert(foundEscapedDash);

Related

Javascipt regex to get string between two characters except escaped without lookbehind

I am looking for a specific javascript regex without the new lookahead/lookbehind features of Javascript 2018 that allows me to select text between two asterisk signs but ignores escaped characters.
In the following example only the text "test" and the included escaped characters are supposed to be selected according the rules above:
\*jdjdjdfdf*test*dfsdf\*adfasdasdasd*test**test\**sd* (Selected: "test", "test", "test\*")
During my research I found this solution Regex, everything between two characters except escaped characters /(?<!\\)(%.*?(?<!\\)%)/ but it uses negative lookbehinds which is supported in javascript 2018 but I need to support IE11 as well, so this solution doesn't work for me.
Then i found another approach which is almost getting there for me here: Javascript: negative lookbehind equivalent?. I altered the answer of Kamil Szot to fit my needs: ((?!([\\])).|^)(\*.*?((?!([\\])).|^)\*) Unfortuantely it doesn't work when two asterisks ** are in a row.
I have already invested a lot of hours and can't seem to get it right, any help is appreciated!
An example with what i have so far is here: https://www.regexpal.com/?fam=117350
I need to use the regexp in a string.replace call (str.replace(regexp|substr, newSubStr|function); so that I can wrap the found strings with a span element of a specific class.
You can use this regular expression:
(?:\\.|[^*])*\*((?:\\.|[^*])*)\*
Your code should then only take the (only) capture group of each match.
Like this:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /(?:\\.|[^*])*\*((?:\\.|[^*])*)\*/g
var match;
while (match = regex.exec(str)) {
console.log(match[1]);
}
If you need to replace the matches, for instance to wrap the matches in a span tag while also dropping the asterisks, then use two capture groups:
var str = "\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*";
var regex = /((?:\\.|[^*])*)\*((?:\\.|[^*])*)\*/g
var result = str.replace(regex, "$1<span>$2</span>");
console.log(result);
One thing to be careful with: when you use string literals in JavaScript tests, escape the backslash (with another backslash). If you don't do that, the string actually will not have a backslash! To really get the backslash in the in-memory string, you need to escape the backslash.
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr.match(/\*(\\.)*t(\\.)*e(\\.)*s(\\.)*t(\\.)*\*/g).map(m => m.substr(1, m.length-2));
console.log(m);
More generic code:
const prepareRegExp = (word, delimiter = '\\*') => {
const escaped = '(\\\\.)*';
return new RegExp([
delimiter,
escaped,
[...word].join(escaped),
escaped,
delimiter
].join``, 'g');
};
const testStr = `\\*jdjdjdfdf*test*dfsdf\\*adfasdasdasd*test**test\\**sd*`;
const m = testStr
.match(prepareRegExp('test'))
.map(m => m.substr(1, m.length-2));
console.log(m);
https://instacode.dev/#Y29uc3QgcHJlcGFyZVJlZ0V4cCA9ICh3b3JkLCBkZWxpbWl0ZXIgPSAnXFwqJykgPT4gewogIGNvbnN0IGVzY2FwZWQgPSAnKFxcXFwuKSonOwogIHJldHVybiBuZXcgUmVnRXhwKFsKICAgIGRlbGltaXRlciwKICAgIGVzY2FwZWQsCiAgICBbLi4ud29yZF0uam9pbihlc2NhcGVkKSwKICAgIGVzY2FwZWQsCiAgICBkZWxpbWl0ZXIKICBdLmpvaW5gYCwgJ2cnKTsKfTsKCmNvbnN0IHRlc3RTdHIgPSBgXFwqamRqZGpkZmRmKnRlc3QqZGZzZGZcXCphZGZhc2Rhc2Rhc2QqdGVzdCoqdGVzdFxcKipzZCpgOwpjb25zdCBtID0gdGVzdFN0cgoJLm1hdGNoKHByZXBhcmVSZWdFeHAoJ3Rlc3QnKSkKCS5tYXAobSA9PiBtLnN1YnN0cigxLCBtLmxlbmd0aC0yKSk7Cgpjb25zb2xlLmxvZyhtKTs=

How to remove a a substring followed by a dynamic string from a string?

I have the following string
search/year/20212022-2/period/value
And I want to get the results as
search/period/value By removing the year/20212022-2/ from the above string. The year is static but 20212022-2 is dynamic.
How could I do this in jQuery?
You can do it easily with a regular expression:
var str = 'search/year/20212022-2/period/value';
var regex = /(year\/[0-9\-]+\/)/g;
console.log(str.replace(regex, ''));
year/20212022-2 can be in any location inside the string. Just want to remove year followed by a / and string and then again a /
In this case use a regex to find /year/ followed by a string of numbers and hyphens to be replaced, like this:
let input = "search/year/20212022-2/period/value";
let output = input.replace(/\/year\/[\d-]+/g, '');
console.log(output);

Using search and replace with regex in javascript

I have a regular expression that I have been using in notepad++ for search&replace to manipulate some text, and I want to incorporate it into my javascript code. This is the regular expression:
Search
(?-s)(.{150,250}\.(\[\d+\])*)\h+ and replace with \1\r\n\x20\x20\x20
In essence creating new paragraphs for every 150-250 words and indenting them.
This is what I have tried in JavaScript. For a text area <textarea name="textarea1" id="textarea1"></textarea>in the HTML. I have the following JavaScript:
function rep1() {
var re1 = new RegExp('(?-s)(.{150,250}\.(\[\d+\])*)\h+');
var re2 = new RegExp('\1\r\n\x20\x20\x20');
var s = document.getElementById("textarea1").value;
s = string.replace(re1, re2);
document.getElementById("textarea1").value = s;
}
I have also tried placing the regular expressions directly as arguments for string.replace() but that doesn't work either. Any ideas what I'm doing wrong?
Several issues:
JavaScript does not support (?-s). You would need to add modifiers separately. However, this is the default setting in JavaScript, so you can just leave it out. If it was your intention to let . also match line breaks, then use [^] instead of . in JavaScript regexes.
JavaScript does not support \h -- the horizontal white space. Instead you could use [^\S\r\n].
When passing a string literal to new RegExp be aware that backslashes are escape characters for the string literal notation, so they will not end up in the regex. So either double them, or else use JavaScript's regex literal notation
In JavaScript replace will only replace the first occurrence unless you provided the g modifier to the regular expression.
The replacement (second argument to replace) should not be a regex. It should be a string, so don't apply new RegExp to it.
The backreferences in the replacement string should be of the $1 format. JavaScript does not support \1 there.
You reference string where you really want to reference s.
This should work:
function rep1() {
var re1 = /(.{150,250}\.(\[\d+\])*)[^\S\r\n]+/g;
var re2 = '$1\r\n\x20\x20\x20';
var s = document.getElementById("textarea1").value;
s = s.replace(re1, re2);
document.getElementById("textarea1").value = s;
}

How to ignore escape characters in javascript?

I have the following string:
var str = '\x27';
I have no control on it, so I cannot write it as '\\x27' for example. Whenever I print it, i get:
'
since 27 is the apostrophe. When I call .length on it, it gives me 1. This is of course correct, but how can I treat it like a not escaped string and have it print literally
\x27
and give me a length of 4?
I'm not sure if you should do what you are trying to do, but this is how it works:
var s = '\x27';
var sEncoded = '\\x' + s.charCodeAt(0).toString(16);
s is a string that contains one character, the apostrophe. The character code as a hexadecimal number is 27.
After the assignment var str = '\x27';, you can't tell where the contents of str came from. There's no way to find out whether a string literal was assigned, or whether the string literal contained an escape sequence. All you have is a string containing a single apostrophe character (Unicode code point U+0027). The original assignment could have been
var str = '\x27'; // or
var str = "'"; // or
var str = String.fromCodePoint(3 * 13);
There's simply no way to tell.
That said, your question looks like an XY problem. Why are you trying to print \x27 in the first place?

Using regex to split double hyphen but not single hyphen

I have an html element id that looks like this:
dp__1-2--1-3
I'm trying to use the JavaScript split() function to lop off and return the final '1-3'
My regex skills are poor but a bit of searching around got me to this point:
var myId = "dp__1-2--1-3";
var myIdPostFix = myId.split(/[\-\-]+/).pop();
Unfortunately that returns me only the '3'.
So my question is how do I split double hyphens but NOT single hyphens?
It's the brackets in the regular expression that keeps it from working. A set will match one of any of the characers in it, so [\-\-] is the same as [\-], i.e. matching a single hyphen.
Just remove the brackets:
var myIdPostFix = myId.split(/--/).pop();
or just use the string '--' instead of a regular expression:
var myIdPostFix = myId.split('--').pop();
split accepts a regular expression or a string as the first argument.
You were very close. You can achieve what you want with:
var myIdPostFix = myId.split("--").pop();

Categories

Resources