regular expressions explanation in javascript

regular expressions explanation in javascript - javascript

Can somebody explain what this regular expression does?
document.cookie.match(/cookieInfo=([^;]*).*$/)[1]
Also it would be great if I can strip out the double quotes I'm seeing in the cookieInfo values. i.e. when cookieInfo="xyz+asd" - I want to strip out the double quotes using the above regular expression.

It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo='
Try this to eliminate the double quotes:
document.cookie.match(/cookieInfo="([^;]*)".*$/)[1]

It searches the document.cookie string for cookieInfo=.
Next it grabs all of the characters which are not ; (until it hits the first semicolon).
[...] set of all characters included inside.
[^...] set of all characters which don't match
Then it lets the RegEx search through all other characters.
.* any character, 0 or more times.
$ end of string (or in some special cases, end of line).
You could replace " a couple of different ways, but rather than stuffing it into the regex, I'd recommend doing a replace on it after the fact:
var string = document.cookie.match(...)[1],
cleaned_string = string.replace(/^"|"$/g, "");
That second regex says "look at the start of the string and see if there's a ", or look at the end of the string and see if there's a ".
Normally, a RegEx would stop after it did the first thing it found. The g at the end means to keep going for every match it can possibly find in the string that you gave it.
I wouldn't put it in the original RegEx, because playing around with optional quotes can be ugly.
If they're guaranteed to always, always be there, then that's great, but if you assume they are, and you hit one that doesn't have them, then you're going to get a null match.

The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'.
To strip out the double quotes you can use the regex /"/ and replace it with an empty string.

Related

Replace a phrase in a string that is being broken up into 2 separate lines [duplicate]

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.

You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/

While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.

Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.

You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE

If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.

This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.

The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

If Statement with .match(regex) in javascript not picking up spaces

Hi guys I'm trying to check if user input string contains a space. I'm using http://regexr.com/ to check if my regular expression is correct. FYI new to regex. Seems to be correct.
But it doesn't work, the value still gets returned even if there is a space. is there something wrong with my if statement or am I missing how regex works.
var regex = /([ ])\w+/g;
if (nameInput.match(regex)||realmInput.match(regex)) {
alert('spaces not allowed');
} else {
//do something else
}
Thanks in Advance

This regex /([ ])\w+/g will match any string which contain a space followed by any number of "word characters". This won't catch, for example, a space at the end of the string, not followed by anything.
Try using /\s+/g instead. It will match any occurrence of at least one space (including tabs).
Update:
If you wish to match only a single space this will do the trick: / /g. There's no real need for the brackets and parenthesis, and since one space is enough even the g flag is kind of obsolete, it could have simply been / /.

Your current regex doesn't match 'abc '(a word with space character at the end) . If you want to make sure, you can trim you input before check :).
You can check here https://regex101.com/
The right regex for matching only white space is
/([ ])/g

Unable to find a string matching a regex pattern

While trying to submit a form a javascript regex validation always proves to be false for a string.
Regex:- ^(([a-zA-Z]:)|(\\\\{2}\\w+)\\$?)(\\\\(\\w[\\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
I have tried following strings against it
abc.jpg,
abc:.jpg,
a:.jpg,
a:asdas.jpg,
What string could possible match this regex ?

This regex won't match against anything because of that $? in the middle of the string.
Apparently using the optional modifier ? on the end string symbol $ is not correct (if you paste it on https://regex101.com/ it will give you an error indeed). If the javascript parser ignores the error and keeps the regex as it is this still means you are going to match an end string in the middle of a string which is supposed to continue.
Unescaped it was supposed to match a \$ (dollar symbol) but as it is written it won't work.
If you want your string to be accepted at any cost you can probably use Firebug or a similar developer tool and edit the string inside the javascript code (this, assuming there's no server side check too and assuming it's not wrong aswell). If you ignore the $? then a matching string will be \\\\w\\\\ww.jpg (but since the . is unescaped even \\\\w\\\\ww%jpg is a match)
Of course, I wrote this answer assuming the escaping is indeed the one you showed in the question. If you need to find a matching pattern for the correctly escaped one ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(\.jpeg|\.JPEG|\.jpg|\.JPG)$ then you can use this tool to find one http://fent.github.io/randexp.js/ (though it will find weird matches). A matching pattern is c:\zz.jpg

If you are just looking for a regular expression to match what you got there, go ahead and test this out:
(\w+:?\w*\.[jpe?gJPE?G]+,)
That should match exactly what you are looking for. Remove the optional comma at the end if you feel like it, of course.

If you remove escape level, the actual regex is
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))+(.jpeg|.JPEG|.jpg|.JPG)$
After ^start the first pipe (([a-zA-Z]:)|(\\{2}\w+)\$?) which matches an alpha followed by a colon or two backslashes followed by one or more word characters, followed by an optional literal $. There is some needless parenthesis used inside.
The second part (\\(\w[\w].*))+ matches a backslash, followed by two word characters \w[\w] which looks weird because it's equivalent to \w\w (don't need a character class for second \w). Followed by any amount of any character. This whole thing one or more times.
In the last part (.jpeg|.JPEG|.jpg|.JPG) one probably forgot to escape the dot for matching a literal. \. should be used. This part can be reduced to \.(JPE?G|jpe?g).
It would match something like
A:\12anything.JPEG
\\1$\anything.jpg
Play with it at regex101. A better readable could be
^([a-zA-Z]:|\\{2}\w+\$?)(\\\w{2}.*)+\.(jpe?g|JPE?G)$
Also read the explanation on regex101 to understand any pattern, it's helpful!

Cannot get a regex to work in JavaScript that allows whitespace and backslash

I have a regular expression as below. It should allow alphabets, digits, round brackets, square brackets, backslash and following punctuation marks: period, comma, semi-colon, full colon, exclamation, percentage and dash.
^[(a-z)(A-Z) .,;:!'%\-(0-9)(\\)\(\)[\]\s]+$
Question : I have tried this regular expression with some text at this online tester: https://regex101.com/r/kO5tW2/2, but it always comes up with no matches. What is causing the expression to fail in above case? To me, the string being tested should come back as valid, but it's not.

Your spec does not mention a question mark. However, the test text you give does include a question mark. You could have tested this easily enough by removing one character at a time from the test text until you got a match, which would have happened when you removed the question mark.
Either add the question mark to the regexp, or remove it from your test test.
Also, you do not need to (and should not) enclose ranges in parentheses.
In the below, I've also removed escaping for characters which do not need to be escaped:
^[a-zA-Z .,;:!'%\-0-9\\()[\]\s?]+$
^
https://regex101.com/r/kO5tW2/4

Try adding m (multiline) modifier to regex
If you have a string consisting of multiple lines, like first line\nsecond line (where \n indicates a line break), it is often desirable to work with lines, rather than the entire string. Therefore, all the regex engines discussed in this tutorial have the option to expand the meaning of both anchors. ^ can then match at the start of the string (before the f in the above string), as well as after each line break (between \n and s). Likewise, $ still matches at the end of the string (after the last e), and also before every line break (between e and \n). Source

Regular expression for apostrophes/ single quotes with double

I'm currently working with a regular expression (in Javascript) for replacing double quotes with smart quotes:
// ie: "quotation" to “quotation”
Here's the expression I've used for replacing the double quotes:
str = str.replace(/"([A-Za-z ]*)"/ig, "“$1”")
The above works perfectly if the phrase inside the quotes contains no additional punctuation, however, I also need to replace any apostrophes:
// ie: replace "It's raining again" with “It’s raining again!”
The expression for replacing single quotes/ apostrophes works fine if not encapsulated:
str.replace(/\'\b/g, "’"); // returns it's as it’s correctly
// Using both:
str.replace(/"([A-Za-z ]*)"/ig, "“$1”").replace(/\'\b/g, "’");
// "It's raining again!" returns as "It’s raining again!"
// Ignores double quotes
I know this is because the expression for replacing the double quotes is being matched to letters only, but my limited experience with regular expressions has me flummoxed at how to create a match for quotations that may also contain single quotes!
Any help would be HUGELY appreciated! Thanks in advance.

You can include in quotes all except quotes:
str = str.replace(/"([^"]*)"/ig, "“$1”")
Another option: use non-greedy search:
str = str.replace(/"(.*?)"/ig, "“$1”")
Also I'm not sure that you need to change only single quotes that are at the end of a word. May be it were better to change all of them?
replace(/\'/g, "’");

You can search for anything not a ". I would also make a lazy match with ? in case you had something like "Hey," she said, "what's up?" as your str:
str.replace(/"([^"]*?)"/ig, "“$1”").replace(/\'\b/g, "’");

Just to add to the current answers, you are performing a match on [A-Za-z ]* for the double quote replace, which means "match uppercase, lowercase or a space". This won't match It's raining, since your match expression does not contain the single quote.
Follow the advice of matching "anything but another double quote", since with your original regex a string like She said "It's raining outside." He said "really?" will result in She said ”It's raining outside." He said "really?” (the greedy match will skip past the 'inner' double quotes.)

It's a good idea to limit the spesific characters left and right of the quotes, especially if this occurs in a html file. I am using this.
str = str.replace(/([\n >*_-])"([A-Za-z0-9 ÆØÅæøå.,:;!##]*)"([ -.,!<\n])/ig, "$1«$2»$3");
In this way, you avoid replacing quotes inside html-tags like href="http.....
Normaly, there is an space left of the opening quote, and another right of the closing quote. In html document, it might be a closing bracket, a new line, etc. I have also included the norwegian characters. :-)

Develop Reference

JavaScript is the programming language of the Web.

regular expressions explanation in javascript - javascript

It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo=' Try this to eliminate the double quotes: document.cookie.match(/cookieInfo="([^;])".$/)[1]

The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'. To strip out the double quotes you can use the regex /"/ and replace it with an empty string.

Related

Replace a phrase in a string that is being broken up into 2 separate lines [duplicate]

If Statement with .match(regex) in javascript not picking up spaces

Unable to find a string matching a regex pattern

Cannot get a regex to work in JavaScript that allows whitespace and backslash

Regular expression for apostrophes/ single quotes with double

Categories

Resources

Develop Reference

JavaScript is the programming language of the Web.

regular expressions explanation in javascript - javascript

It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo=' Try this to eliminate the double quotes: document.cookie.match(/cookieInfo="([^;]*)".*$/)[1]

The regular expression matches a string starting with 'cookieInfo=' followed by and capturing 0 or more non-semi-column characters followed by 0 or more 'anythings'. To strip out the double quotes you can use the regex /"/ and replace it with an empty string.

Related

Replace a phrase in a string that is being broken up into 2 separate lines [duplicate]

If Statement with .match(regex) in javascript not picking up spaces

Unable to find a string matching a regex pattern

Cannot get a regex to work in JavaScript that allows whitespace and backslash

Regular expression for apostrophes/ single quotes with double

Categories

Resources

It basically saying grab as many characters that are not semi-colons and that follow after the string 'cookieInfo=' Try this to eliminate the double quotes: document.cookie.match(/cookieInfo="([^;])".$/)[1]