Regex won't find '\u2028' unicode characters

Regex won't find '\u2028' unicode characters - javascript

We're having a lot of trouble tracking down the source of \u2028 (Line Separator) in user submitted data which causes the 'unterminated string literal' error in Firefox.
As a result, we're looking at filtering it out before submitting it to the server (and then the database).
After extensive googling and reading of other people's problems, it's clear I have to filter these characters out before submitting to the database.
Before writing the filter, I attempted to search for the character just to ensure it can find it using:
var index = content.search("/\u2028/");
alert("Index: [" + index + "]");
I get -1 as the result everytime, even when I know the character is in the content variable (I've confirmed via a Java jUnit test on the server side).
Assuming that content.replace() would work the same way as search(), is there something I'm doing wrong or anything I'm missing in order to find and strip these line separators?

Your regex syntax is incorrect. You only use the two forward slashes when using a regex literal. It should be just:
var index = content.search("\u2028");
or:
var index = content.search(/\u2028/); // regex literal
But this should really be done on the server, if anywhere. JavaScript sanitization can be trivially bypassed. It's only useful for user convenience, and I don't think accidentally entering line separator is that common.

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much

This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

Email Validation RegEx username/local name length check not running

I've debugged for a few hours now and have hit a wall - regex has never been my strongsuit. I have been able to alter the following regex to restrict 255 characters for domain fine, however, in trying to restrict the local/username portion of an email address I'm running into issues implementing a 64 character limit. I've gone through regex101 replacing +s and *s and attempting to understand what each pass is doing - however, even when I add a check against all non-whitespace characters with a limit of 64 it seems like the other checks pass and take precedence - although I'm not sure. Below is my regex currently without any of the 64 character checks that I've broken it with:
var emailCheck = new RegExp(/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.{0,1}([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]){1,255}([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.*$/i);
What I have so far can be seen at http://jsfiddle.net/mtqx0tz1/ , I've made other slight alterations (e.g. not allowing consecutive dots) but for the most part this regex comes from another stack post without the character limits.
Lastly, I'm aware this isn't the 'standard' so to speak and emails are checked server-side, however, I would like to be more safe than sorry...as well as work on some of my regex. Sorry if this question isn't worthy of an actual post - I'm just simply not seeing where in my passes {1,64} is failing. At this point I'm thinking about just sub-stringing the portion of the string up to the # sign and checking length that way...but it would be nice to include it in this statement since all the checks are done here to begin with.

I have used this regex validation and it works good.
The e-mail address is in the variable strIn
try
{
return Regex.IsMatch(strIn,
#"^(?("")("".+?(?<!\\)""#)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])#))" +
#"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));
}
catch (RegexMatchTimeoutException)
{
return false;
}

How to filter emojis from string jquery/javascript?

I'm using the following to exclude emojis/emoticons from a string in php. How do I do the same with javascript or jQuery?
preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);
This is what I try to do
$('#edit.popup .btn.save').live('click',function(e) {
var item_id = $(this).attr('id');
var edited_text = $('#edit.popup textarea').val().replace(/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u, '');
$('#grid li.image#' + item_id + ' img').attr('data-text', edited_text);
});
I found this suggestion in another post on Stack Overflow, but it's not working. It's still allowing emojis from ex ios.
.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')
What I try to achieve is to not allow emojis in textfield, and if an emoji is inserted (from ex ios keyboard) it will be replaced by nothing. It works with php. Someone here who can help me out with this?

Based on the answer from mb21, this regex did the job. No loop required!
/[\uD800-\uDBFF][\uDC00-\uDFFF]/g

As pointed out in this answer, JavaScript doesn't support Unicode code points outside the Basic Multilingual Plane (where iOS emojis lie).
I highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). Then you'll understand what was meant with:
So some indirect approach is needed. Cf. to JavaScript strings outside of the BMP.
For example, you could look for code points in the range [\uD800-\uDBFF] (high surrogates), and when you find one, check that the next code point in the string is in the range [\uDC00-\uDFFF] (if not, there is a serious data error), interpret the two as a Unicode character, and replace them by whatever you wish to put there. This looks like a job for a simple loop through the string, rather than a regular expression.

JS/XSS: When assigning user-provided strings to variables; is it enough to replace <,>, and string delimiter?

If a server-side script generates the following output:
<script>
var a = 'text1';
var b = 'text2';
var c = 'text3';
</script>
, and the values (in this example "text1", "text2" and "text3") are user supplied (via HTTP GET/POST), is it enough to remove < and > from the input and to replace
'
with
' + "'" + '
in order to be safe from XSS? (This is my main question)
I'm particularly worried about the backslash not being escaped because an attacker could unescape the trailing '. Could that be a potential problem in this context? If the variable assignments were not separated by line breaks, an attacker could supply the values
text1
text2\
;alert(1);//
and end up with working JS code like
<script>
var a = 'text1'; var b = 'text2\'; var c = ';alert(1);//text3';
</script>
But since there are line breaks that shouldn't be a problem either. Am I missing something else?

It would be more secure to JSON encode your data, instead of rolling your own Javascript encoding function. When dealing with web application security, rolling your own is almost always not the answer. A JSON representation would handle the quotes and backslashes and any other special characters.
Most server side languages have a JSON module. Some also have a function specifically for what you're doing such as HttpUtility.JavaScriptStringEncode for the .NET framework.
If you were to roll your own, then it would be better to replace the characters for example like " to \x22, instead of changing single quotes or removing them. Also consider there is a multitude of creative XSS attacks that you'd need to defend against.
The end result, whatever method you use, is your data should remain intact when presented to the user. For example it's no good having O"Neil if someone's name is O'Neil.

Escaping raw, unescaped strings in bookmarklet

I’m trying to write a search engine bookmarklet (for Chrome), but I’m having trouble escaping the string.
For example if the search engine bookmarklet is the following:
javascript:alert("%s"); //%s is the search engine query, passed literally by chrome.
Then running it on the following string will give incorrect results:
c:\zebra
c:zebra instead of c:\zebra
If the character after the slash happens to be an actual escape character, then the results will vary depending on the character.
I’ve tried escaping and unescaping the string, I’ve tried reg-ex’ing it, and replacing the slash with a double-slash, but I cannot figure out a way to get this to work because the first time that the raw string enters the script, it is unescaped, and any operation after that will see it incorrectly.
How can this be handled correctly?

So far I can only make this work in chrome:
javascript: var str = (function(){STARTOFSTRING:/*%s*/ENDOFSTRING:;}).toString().match( /STARTOFSTRING:\/\*([\s\S]*)\*\/ENDOFSTRING:/ )[1]; alert(str);
writing c:\zebra will alert c:\zebra.
Firefox doesn't sustain the comments inside the function body when decompiled, unfortunately.
You also can't write the sequence */ in the string, but everything else should be passed literally, including quotes " ' etc

Develop Reference

JavaScript is the programming language of the Web.

Regex won't find '\u2028' unicode characters - javascript

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

Email Validation RegEx username/local name length check not running

How to filter emojis from string jquery/javascript?

JS/XSS: When assigning user-provided strings to variables; is it enough to replace <,>, and string delimiter?

Escaping raw, unescaped strings in bookmarklet

Categories

Resources