Why is regex failing when input contains a newline?

Why is regex failing when input contains a newline? - javascript

I've inherited this javascript regex from another developer and now, even though nothing has changed, it doesn't seem to match the required text. Here is the regex:
/^.*(already (active|exists|registered)).*$/i
I need it to match any text that looks like
stuff stuff already exists more stuff etc
It looks perfectly fine to me, it only looks for those 2 words together and should in theory ignore the rest of the string. In my script I check the text like this
var cardUsedRE = /^.*(already (active|exists|registered)).*$/i;
if(cardUsedRE.test(responseText)){
mdiv.className = 'userError';
mdiv.innerHTML = 'The card # has already been registered';
document.getElementById('cardErrMsg').innerHTML = arrowGif;
}
I've stepped through this in FireBug and I've seen it fail to test this string:
> Error: <detail>Card number already registered for CLP.\n</detail>
Am I missing something? What is the likely issue with this?

Here's a simplified but functionally-equivalent regex that should handle newlines:
/(already\s+(active|exists|registered))/i
Not sure why you'd ever want to lead with ^.* or end with .*$ unless your goal is specifically to prevent newlines. Otherwise it's just superfluous.
EDIT: I replaced the space with \s+ so it will be more liberal with how it handles whitespace (e.g. one space, two spaces, a tab, etc. should all match).

tldr; Use the m modifier to make . match newlines. See the MDC regular expression documentation.
Failing (note the "\n" in the string literal):
var str = "Error: <detail>Card number already registered for CLP.\n</detail>"
str.match(/^.*(already (active|exists|registered)).*$/i)
Working (note m flag for "multi-line" behavior of .):
var str = "Error: <detail>Card number already registered for CLP.\n</detail>"
str.match(/^.*(already (active|exists|registered)).*$/mi)
I would use a simpler form, however: (Adjust for definition of "space".)
var str = "Error: <detail>Card number already registered for CLP.\n</detail>";
str.match(/(?:already\s+(?:active|exists|registered))/i)
Happy coding.

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

I hope its just something i'm not doing right.
I've been using a simple script to create a form out of a spreadsheet. The script seems to be working fine. The output form is going to get some inputs from third parties so i can analyze them in my consulting activity.
Creating the form was not a big deal, the structure is good to go. However, after having the form creator script working, i've started working on its validations, and that's where i'm stuck at.
For text validations, i will need to use specific Regexes. Many of the inputs my clients need to give me are going to be places' and/or people's names, therefore, i should only allow them usign A-Z, single spaces, apostrophes and dashes.
My resulting regexes are:
//Regex allowing a **single name** with the first letter capitalized and the occasional use of "apostrophes" or "dashes".
const reg1stName = /^[A-Z]([a-z\'\-])+/
//Should allow (a single name/surname) like Paul, D'urso, Mac'arthur, Saint-Germaine ecc.
//Regex allowing **composite names and places names** with the first letter capitalized and the occasional use of "apostrophes" or "dashes". It must avoid double spaces, however.
const regNamesPlaces = /^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$/
//This should allow (names/surnames/places' names) like Giulius Ceasar, Joanne D'arc, Cosimo de'Medici, Cosimo de Medici, Jean-jacques Rousseau, Firenze, Friuli Venezia-giulia, L'aquila ecc.
Further in the script, these Regexes are called as validation pattern for the forms text items, in accordance with each each case.
//Validation for single names
var val1stName = FormApp.createTextValidation()
.setHelpText("Only the person First Name Here! Use only (A-Z), a single apostrophe (') or a single dash (-).")
.requireTextMatchesPattern(reg1stName)
.build();
//Validation for composite names and places names
var valNamesPlaces = FormApp.createTextValidation()
.setHelpText(("Careful with double spaces, ok? Use only (A-Z), a single apostrophe (') or a single dash (-)."))
.requireTextMatchesPattern(regNamesPlaces)
.build();
Further yet, i have a "for" loop that creates the form based on the spreadsheets fields. Up to this point, things are working just fine.
for(var i=0;i<numberRows;i++){
var questionType = data[i][0];
if (questionType==''){
continue;
}
else if(questionType=='TEXTNamesPlaces'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(valNamesPlaces)
.setRequired(false);
}
else if(questionType=='TEXT1stName'){
form.addTextItem()
.setTitle(data[i][1])
.setHelpText(data[i][2])
.setValidation(val1stName)
.setRequired(false);
}
The problem is when i run the script and test the resulting form.
Both validations types get imported just fine (as can be seen in the form's edit mode), but when testing it in preview mode i get an error, as if the Regex wasn't matching (sry the error message is in portuguese, i forgot to translate them as i did with the code up there):
A screenshot of the form in edit mode
A screeshot of the form in preview mode
However, if i manually remove the bars out of this regex "//" it starts working!
A screenshot of the form in edit mode, Regex without bars
A screenshot of the form in preview mode, Regex without bars
What am i doing wrong? I'm no professional dev but in my understanding, it makes no sense to write a Regex without bars.
If this is some Gforms pattern of reading regexes, i still need all of this to be read by the Apps script that creates this form after all. If i even try to pass the regex without the bars there, the script will not be able to read it.
const reg1stName = ^[A-Z]([a-z\'])+
const regNamesPlaces = ^[^\s]([A-Z]|[a-z]|\b[\'\- ])+[^\s]$
//Can't even be saved. Returns: SyntaxError: Unexpected token '^' (line 29, file "Code.gs")
Passing manually all the validations is not an option. Can anybody help me?
Thanks so much

This
/^[A-Z]([a-z\'\-])+/
will not work because the parser is trying to match your / as a string literal.
This
^[A-Z]([a-z\'\-])+
also will not work, because if the name is hyphenated, you will only match up to the hyphen. This will match the 'Some-' in 'Some-Name', for example. Also, perhaps you want a name like 'Saint John' to pass also?
I recommend the following :)
^[A-Z][a-z]*[-\.' ]?[A-Z]?[a-z]*
^ anchors to the start of the string
[A-Z] matches exactly 1 capital letter
[a-z]* matches zero or more lowercase letters (this enables you to match a name like D'Urso)
[-\.' ]? matches zero or 1 instances of - (hyphen), . (period), ' (apostrophe) or a single space (the . (period) needs to be escaped with a backslash because . is special to regex)
[A-Z]? matches zero or 1 capital letter (in case there's a second capital in the name, like D'Urso, St John, Saint-Germaine)

Find exact match in haystack but only when text is not part of larger string of chars

My google-fu has failed me here, because I'm not sure how to search for an answer without getting generic results for finding a string needle in a haystack in javascript. If this is a duplicate question, just let me know and I'll close this one out.
What I'm trying to do
I'm currently searching through text using indexOf() in javascript to find any occurrence of a user's username that starts with the # char. indexOf is working well enough in most cases for this, but it's failing when a user has a name that is also part of another user's name.
For instance, using indexOf(), I current get matches for a username "RandomDonkey" when there is text directed at "#RandomDonkeyKong" or "#RandomDonkeyFarmer".
What I'm looking for
I'd like to find the most efficient way to ensure that messages containing (for example) "#RandomDonkeyFarmer" don't cause alerts for the user "RandomDonkey", as only an exact match with no extra chars included with the username for "#RandomDonkey" should be cause for an alert.
What I've considered
I'm no good at writing regex, so I've considered it a possible solution but am not sure how to write it.
I've also considered looking for the match, and the further checking that there are no characters other than a space after the last character (assuming that the username doesn't end the string).
Is there a better way to go about this, or would one of those two solutions be the most efficient?
The code I'm currently using and some examples that should pass / fail
var username = 'RandomDonkey';
if(message.toLowerCase().indexOf('#' + username.toLowerCase()) != -1){
alert('this is a direct message');
}//if direct message
else{
alert('this is NOT a direct message');
}
Some messages that should pass:
message = "Hey #randomdonkey what's going on?";
message = "#RandomDonkey what are you up to";
message = "These are silly examples #RandomDonkey";
Some messages that should fail:
message = "#RandomDonkeyKong is not a match for RandomDonkey";
message = "I'm messaging #RandomDonkeyFarmer";
Currently all of these examples pass because of the way that indexOf() works, which is why I'm looking for another method.

I believe regex is indeed the answer. This should work for most cases:
var username = "RandomDonkey"
var text = "hey #RandomDonkey are you something something etc."
var re=new RegExp("#"+username+"\b", "i")
if (re.test(text)) {
alert(username);
}
This works when usernames can only have word characters in them (so A-Z, a-z, 0-9 and the underscore character _)
To allow usernames with dashes so that this doesn't break, use this in place of "\b" in the regex: "([^\\w\\-]|$)"
So the regex defining line becomes: var re=new RegExp("#"+username+"([^\\w\\-]|$)", "i")
It looks for a character which doesn't match a word character or a dash, so anything passwords shouldn't contain, or the end of the string.
The only issues that might arise are if people have special regex chars in their usernames, which should be easily preventable by just prohibiting usernames that don't match ^[\w\-]+$ (one or more letters, numbers, dashes and underscores and nothing more)

javascript regex invalid quantifier error

I have the following javascript code:
if (url.match(/?rows.*?(?=\&)|.*/g)){
urlset= url.replace(/?rows.*?(?=\&)|.*/g,"rows="+document.getElementById('rowcount').value);
}else{
urlset= url+"&rows="+document.getElementById('rowcount').value;
}
I get the error invalid quantifier at the /?rows.*?.... This same regex works when testing it on http://www.pagecolumn.com/tool/regtest.htm using the test string
?srt=acc_pay&showfileCL=yes&shownotaryCL=yes&showclientCL=no&showborrowerCL=yes&shownotaryStatusCL=yes&showclientStatusCL=yes&showbillCL=yes&showfeeCL=yes&showtotalCL=yes&dir=asc&closingDate=12/01/2011&closingDate2=12/31/2011&sort=notaryname&pageno=0&rows=anything&Start=0','bodytable','xyz')
In this string, the above regex is supposed to match:
rows=anything
I actually don't even need the /? to get it to work, but if I don't put that into my javascript, it acts like it's not even regex... I'm terrible with Regex period, so this one has me pretty confused. And that error is the only one I am getting in Firefox's error console.
EDIT
Using that link I posted above, it seems that the leading / tries to match an actual forward slash instead of just marking the code as the beginning of a regex statement. So the ? is in there so that if it doesn't match the / to anything, it continues anyway.
RESOLUTION
Ok, so in the end, I had to change my regex to this:
/rows=.*(?=\&?)/g
This matched the word "rows=" followed by anything until it hit an ampersand or ran out of text.

You need to escape the first ?, since it has special meaning in a regex.
/\?rows.*?(?=\&)|.*/g
// ^---escaped

regtest.htm produces
new RegExp("?rows.?(?=\&)|.", "") returned a SyntaxError: invalid
quantifier
The value you put into the web site shouldn't have the / delimiters on the regex, so put in ?rows.*?(?=\&)|.* and it shows the same problem. Your JavaScript code should look like
re = /rows.*?(?=\&)|.*/g;
or similar (but that is a pointless regex as it matches everything). If you can't fix it, please describe what you want to match and show your JavaScript

You might consider refactoring you code to look something like this:
var url = "sort=notaryname&pageno=0&rows=anything&Start=0"
var rowCount = "foobar";
if (/[\?\&]rows=/.test(url))
{
url = url.replace(/([\?\&]rows=)[^\&]+/g,"$1"+rowCount);
}
console.log(url);
Output
sort=notaryname&pageno=0&rows=foobar&Start=0

Need a regex for acceptable file names

I'm using Fancy Upload 3 and onSelect of a file I need to run a check to make sure the user doesn't have any bad characters in the filename. I'm currently getting people uploading files with hieroglyphics and such in the names.
What I need is to check if the filename only contains:
A-Z
a-z
0-9
_ (underscore)
- (minus)
SPACE
ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü (as single and double byte)
Obviously you can see the difficult thing there. The non-english single and double byte chars.
I've seen this:
[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]
And this:
[\x80-\xA5]
But neither of them fully cover the situation right.
Examples that should work:
fást.zip
abc.zip
ABC.zip
Über.zip
Examples that should NOT work:
∑∑ø∆.zip
¡wow!.zip
•§ªº¶.zip
The following is close, but I'm NO RegEx'pert, not even close.
var filenameReg = /^[A-Za-z0-9-_]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF]+$/;
Thanks in advance.
Solution from Zafer mostly works, but it does not catch all of the other symbols, see below.
Uncaught:
¡£¢§¶ª«ø¨¥®´åß©¬æ÷µç
Caught:
™∞•–≠'"πˆ†∑œ∂ƒ˙∆˚…≥≤˜∫√≈Ω
Regex:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;

Alternation between two character classes (ie. [abc]|[def]) can be simplified to a single character class ([abcdef]) -- the first can be read as "(a or b or c) OR (d or e or f)"; the second as "(a or b or c or d or e or f)". What probably tripped up your regular expression is the unescaped dash in the first class -- if you want a literal dash, it should be the last character in the class.
So we'll modify your expression to get it working:
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+$/;
The problem now is that you're not accounting for the file extension, but that is an easy modification (assuming you're always getting .zip files):
var filenameReg = /^[A-Za-z0-9_\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF-]+\.zip$/;
Replace zip with another pattern if the extension differs.

It looks like it is the character ranges that are causing the problem, because they include some unallowable characters in between. Since you already have the list of allowable characters, the best thing would be to just use that directly:
var filenameReg = /^[A-Za-z0-9_\-\ ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü]+$/;

The following should work:
var filenameReg = /^([A-Za-z0-9\-_. ]|[\x00A0-\xD7FF\xF900-\xFDCF\xFDF0-\xFFEF])+$/;
I've put \ next to - and grouped two expressions otherwise + sign doesn't affect the first expression.
EDIT 1 :I've also put . in the expression.

We have diffrent rules for diffrent platforms. But I think you mean long file names in windows. For that you can use following RegEx:
var longFilenames = #"^[^\./:*\?\""<>\|]{1}[^\/:*\?\""<>\|]{0,254}$";
NOTE: Instead of saying which Character is allowed, you need to say which ones are not allowed!
But keep in mind that this is not 100% complete RegEx. If you really want to make it complete you have to add exceptions for reserved names as well.
You can find more information about filename rules here:
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx

How to search csv string and return a match by using a Javascript regex

I'm trying to extract the first user-right from semicolon separated string which matches a pattern.
Users rights are stored in format:
LAA;LA_1;LA_2;LE_3;
String is empty if user does not have any rights.
My best solution so far is to use the following regex in regex.replace statement:
.*?;(LA_[^;]*)?.*
(The question mark at the end of group is for the purpose of matching the whole line in case user has not the right and replace it with empty string to signal that she doesn't have it.)
However, it doesn't work correctly in case the searched right is in the first position:
LA_1;LA_2;LE_3;
It is easy to fix it by just adding a semicolon at the beginning of line before regex replace but my question is, why doesn't the following regex match it?
.*?(?:(?:^|;)(LA_[^;]*))?.*
I have tried numerous other regular expressions to find the solution but so far without success.

I am not sure I get your question right, but in regards to the regular expressions you are using, you are overcomplicating them for no clear reason (at least not to me). You might want something like:
function getFirstRight(rights) {
var m = rights.match(/(^|;)(LA_[^;]*)/)
return m ? m[2] : "";
}

You could just split the string first:
function getFirstRight(rights)
{
return rights.split(";",1)[0] || "";
}

To answer the specific question "why doesn't the following regex match it?", one problem is the mix of this at the beginning:
.*?
eventually followed by:
^|;
Which might be like saying, skip over any extra characters until you reach either the start or a semicolon. But you can't skip over anything and then later arrive at the start (unless it involves newlines in a multiline string).
Something like this works:
.*?(\bLA_[^;]).*
Meaning, skip over characters until a word boundary followed by "LA_".

Develop Reference

JavaScript is the programming language of the Web.

Why is regex failing when input contains a newline? - javascript

Related

Regex in Google Apps Script practical issue. Forms doesn't read regex as it should

Find exact match in haystack but only when text is not part of larger string of chars

javascript regex invalid quantifier error

Need a regex for acceptable file names

How to search csv string and return a match by using a Javascript regex

Categories

Resources