Javascript Regex Substitution newline handling in browsers

Javascript Regex Substitution newline handling in browsers - javascript

I created a html textarea with a capability to add "[" and "]" at the beginning and end of whatever text has been entered within that.
My problem is, when I enter some multiline data into the textarea, the regex is handled differently in ff and ie.
Input:
Iam
learning
regex
Expected Output: (I get this in FF )
[Iam]
[learning]
[regex]
Output in IE:
[Iam
][]
[learning
][]
[regex]
The Regex code is here:
(textareaIDelement.value).replace(/(^)(.*)(\n{0,})($)/gm, "[" + "$2" +"]");
I added the (\n{0,}) in the regex to match newlines.. but it doesn't have any effect..
Thanks

In IE, the line separator in a textarea's value property is \r\n. In all other major browsers it's \n. A simple solution would be to normalize line separators into \n first. I've also simplified the regex:
textareaIDelement.value.replace(/\r\n/g, "\n").replace(/^(.*)\n*$/gm, "[$1]");

My guess is that Firefox is using a single 0x0A (\n) as the line separator, whereas IE is using the Windows separator 0x0D 0x0A (\r\n).
Depending on the exact semantics of the regex library, it's probably matching both of the WIndows characters independently as line separators, hence it detects the end of the line followed by a 0-character line.
(This isn't an actual answer per se, as I'm not massively familiar with exactly how JS processes regex metacharacters, but hopefully it will point you in the right direction.)

Related

Is it safe to just put "\n" instead of "[\r\n]" when writing a JavaScript Regular expression?

Update: my original test involving copy/pasting from a text file into the browser was flawed. I created a new test in JavaScript which verified that carriage return \r is in fact being matched.
The following code logs ['\r', '\r', '\r'] to the console, which verifies that the \r is being matched:
<script>
const CarriageReturn = String.fromCharCode(13); // char code for carriage return is 13
const str = CarriageReturn + CarriageReturn + CarriageReturn;
const matches = str.match(/\r/g);
console.log(matches); // this will output ['\r', '\r', '\r']
</script>
Original Question
The common method suggested by numerous StackOverflow answers and articles across the internet to match a line break in regular expressions is to use the ubiquitous token [\r\n]. It is supposedly to ensure compatibility with Windows systems, since Windows uses the carriage return \r and line feed \n together to make a new line, as opposed to just the line feed \n for UNIX based operating system such as Linux or Mac.
I'm beginning to think JavaScript ignores this distinction and just treats every line break as \n.
Today, I did an experiment where I created a text file with 10 carriage returns, opened up the text file, then copy/pasted the carriage returns into the regular expression tester at https://regex101.com.
When I tested all those carriage returns against the simple regular expression \r, nothing matched. However, using the alternative \n matched all 10 carriage returns.
So my question is, based on my experiment, is it safe to just write \n instead of [\r\n] when matching line breaks in JavaScript?

No, do not replace [\r\n] with \n.
Line ends at http://regex101.com are only \n and that is why you had no match with \r.
In real texts, both carriage return and line feed characters might need matching.
Besides, the dot does not match \r in JavaScript regex.

JavaScript treats newlines as \n, that's why it matched all when you tested it. \r\n is windows style of representing new lines while Unix based systems uses \n. If you are not sure, you can use this regex: /\r?\n/

After doing a different test, it appears JavaScript does make a distinction between \r and \n, but not in all cases. Here are the exceptions:
If you generate a carriage return string in JavaScript using String.fromCharCode(13), and try to match it with pattern \r, the pattern will match successfully.
If you type a line break with your keyboard directly into a <textarea> in your browser, it is interpreted by JavaScript as just \n. There will be no matches for \r.
If you copy/paste text containing carriage returns (\r) from a text file into a <textarea> in your browser, your browser will convert all the sequences of \r\n into just \n. So, it will appear as if JavaScript is ignoring the \rs in your text, but it's only because your browser removed them in the process of pasting it into the <textarea>.
I updated my original question with the test I ran to confirm that the \r token is matched when generated with String.fromCharCode(13).

How to replace my current regular expression without using negative lookbehind

I have the following regular expression which matches on all double quotes besides those that are escaped:
i.e:
The regular expression is as follows:
((?<![\\])")
How could I alter this to no longer use the negative lookbehind as it is not supported on some browsers?
Any help is greatly appreciated, thanks!
I wasn't able to get anything currently working

You can match
/\\"|(")/
and keep only captured matches. Being so simple, it should work with most every regex engine.
Demo
This matches what you don't want (\\")--to be discarded--and captures what you do want (")--to be kept.
This technique has been referred to by one regex expert as The Greatest Regex Trick Ever. To get to the punch line at the link search for "(at last!)".

Neither of these may be a completely satisfactory solution.
This regex won't just match unescaped ", there's additional logic required to check if the 1st character of captured groups is " and adjust the match position.:
(?:^|[^\\])(")
This may be a better choice, but it depends on positive lookahead - which may have the same issue as negative lookbehind.
Version 1a (again requires additional logic)
(?:^|\b)(?=[^\\])(")
Version 2a (depends on positive lookahead)
(?:^|\b|\\\\)(?=[^\\])(")
Assuming you need to also handle escaped slashes followed by escaped quotes (not in the question, but ok):
Version 1a (requires the additional logic):
(?:^|[^\\]|\\\\)(")

Building on this answer, I'd like to add that you may also want to ignore escaped backslashes, and match the closing quote in this string:
"ab\\"
In that case, /\\[\\"]|(")/g is what you're after.

Why Javascript string.replace("\n\t","xxx") replaces "\n\t" with "\nxxx"?

I expect to replace "\n\t" with "xxx" in a txt file:
"数字多功能光盘 DVD shùzì"
I do this: str.replace("\n\t","xxx")
method matches needed parts but leaves \n part and only replaces \t for 'xxx'.WHY?
why when use crtl+F in VSCOde and it works like charm but in code it doesn't.

First of all, str.replace("a","b") only replaces the first occurrence in JavaScript. To replace all of them, you need to use a regex with g modifier. So, you could try str.replace(/\n\t/g,"xxx") first.
Next, why does it work in VSCode? In VSCode regex, \n matches any line break sequence that is selected in the bottom right-hand corner of VSCode app. It works as \R in PCRE, Java, Onigmo, etc. in this case.
As there can be many line ending sequences you may consider "converting" VSCode \n to (?:\r\n|[\r\n\x0B\x0C\x85\u2028\u2029]) that matches any single Unicode line break sequence and use
s = s.replace(/(?:\r\n|[\r\n\x0B\x0C\x85\u2028\u2029])\t/g, '')

Write HTML Special Character into a Variable

$("<h2/>", {"class" : "wi wi"+data.today.code}).text(" " + data.city + data.today.temp.now + "F").appendTo(custom_example);
Hi there, I'm trying to alter the code above to add the degrees icon just before the (F)arenheit marker. I've tried entering + html("°") + but it doesn't work. My JS is pretty rough and I was hoping I could get a quick answer here before I spent too long trying and failing. Thanks!
I want the end result to print something like: Encinitas 65°F

Special characters are characters that must be escaped by a backslash\, like:
Single quote \'
Double quote \"
Backslash \\
The degree ° is not a special character, you can just write it, as it is.
Edit: If you want to use the unicode of °F, just write: '\u2109'.

Escape Special Characters JavaScript
JavaScript uses the \ (backslash) as an escape characters for:
\' single quote
\" double quote
\ backslash
\n new line
\r carriage return
\t tab
\b backspace
\f form feed
\v vertical tab (IE < 9 treats '\v' as 'v' instead of a vertical tab
('\x0B').
If cross-browser compatibility is a concern, use \x0B instead of \v.)
\0 null character (U+0000 NULL) (only if the next character is not a
decimal digit; else it’s an octal escape sequence)
Note that the \v and \0 escapes are not allowed in JSON strings.

First of all the degree character needs not to be escaped. So simply entering "°F" should do the job.
However, if you are in doubt with the codepage of your JS code you could use a JavaScript escape sequence. JS escape sequences are quite different from HTML escapes. The do not support decimal values at all. So first of all you have to convert 176 to hex: b0. The correctly escaped equivalent to "°F" is "\xb0F". It will work too and is more robust with respect to codepage issues of you platform's source editor.
If you really want to assign HTML code you need to use the .html() function. But this is mutual exclusive to .text(). So in this case all of your content needs to be HTML rather than plain text. Otherwise an HTML injection vulnerability arises. I.e. you need to properly escape angle brackets and some other symbols in data.city and maybe data.today.temp.now as well.
JS itself has no built-in function to escape HTML. But JQuery provides a trick: $('<div/>').text(data.city).html() will return appropriately escaped HTML. See HTML-encoding lost when attribute read from input field for more details.
I would recommend not to use .html() unless you really need it, e.g. if you want to apply styles or formatting to parts of the text only.

Replace + with space character

The original problem:
I send a JSON string with Unicode strings (many different languages and also md5 hashes) from a Java servlet to web clients. I URLEncoder.encode("my strings", "UTF-8") the strings before creating the JSON array.
(I'm almost sure something is wrong in this approach too, and I am probably doing one encoding too much though)
Anyway:
in javascript I run a unescape() to get back the result, but spaces (encoded as +) are not decoded.
So I use .replace(/\+/g,' ') to replace + with space before calling unescape().
But:
leading and trailing + signs are omitted
and
consecutive + signs are replaced by a single space.
Please lend me a hand (or mind) :)

Use this
var string="+Salvis+Sumeet+Jacob,Srlawrjhkjh+"
var str=string.replace(/[+ ]+/g, " ");
console.log(str)
DEMO HERE

So I guess
leading and trailing + signs are omitted and consecutive + signs are replaced by a single space.
is what you want to achieve, not the outcome you currently get and want to avoid. If that's the case then
.replace(/\++/g,' ').trim()
will replace every one or more + characters with a single space, then remove leading/trailing space.
"++foo+bar++baz+".replace(/\++/g,' ').trim()
// "foo bar baz"
You may need String.prototype.trim polyfill for IE8 and older

The reason that unescape() doesn't change + to spaces is because ... it's not part of its spec.
The ascii-space-character-as-plus-sign encoding is rather non-standard (though widely supported) and dates back to early versions of HTML.
Per the spec for unescape() and escape(), the only things that are changed by unescape() are hexadecimal escape sequences in the form %XX and %uXXXX. escape() replaces unicode characters outside a small subset of unrestricted characters with such hexadecimal escape sequences; unescape(), naturally, just reverses the operation.

Develop Reference

JavaScript is the programming language of the Web.

Javascript Regex Substitution newline handling in browsers - javascript

In IE, the line separator in a textarea's value property is \r\n. In all other major browsers it's \n. A simple solution would be to normalize line separators into \n first. I've also simplified the regex: textareaIDelement.value.replace(/\r\n/g, "\n").replace(/^(.)\n$/gm, "[$1]");

Related

Is it safe to just put "\n" instead of "[\r\n]" when writing a JavaScript Regular expression?

How to replace my current regular expression without using negative lookbehind

Why Javascript string.replace("\n\t","xxx") replaces "\n\t" with "\nxxx"?

Write HTML Special Character into a Variable

Replace + with space character

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

Javascript Regex Substitution newline handling in browsers - javascript

In IE, the line separator in a textarea's value property is \r\n. In all other major browsers it's \n. A simple solution would be to normalize line separators into \n first. I've also simplified the regex: textareaIDelement.value.replace(/\r\n/g, "\n").replace(/^(.*)\n*$/gm, "[$1]");

Related

Is it safe to just put "\n" instead of "[\r\n]" when writing a JavaScript Regular expression?

How to replace my current regular expression without using negative lookbehind

Why Javascript string.replace("\n\t","xxx") replaces "\n\t" with "\nxxx"?

Write HTML Special Character into a Variable

Replace + with space character

Categories

Resources

In IE, the line separator in a textarea's value property is \r\n. In all other major browsers it's \n. A simple solution would be to normalize line separators into \n first. I've also simplified the regex: textareaIDelement.value.replace(/\r\n/g, "\n").replace(/^(.)\n$/gm, "[$1]");