I have a custom regular expression which I use to detect whole numbers, fractions and floats.
var regEx = new RegExp("^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(/[0-9])?)?)$");
var quantity = 'd';
var matched = quantity.match(regEx);
alert(matched);
(The code is also found here: http://jsfiddle.net/aNb3L/ .)
The problem is that for a single letter it matches, and I can't figure out why. But for more letters it fails(which is good).
Disclaimer: I am new to regular expressions, although in http://gskinner.com/RegExr/ it doesn't match a single letter
It's easier to use straight regular expression syntax:
var regEx = /^((^[1-9]|(0\.)|(\.))([0-9]+)?((\s|\.)[0-9]+(\/[0-9])?)?)$/;
When you use the RegExp constructor, you have to double-up on the backslashes. As it is, your code only has single backslashes, so the \. subexpressions are being treated as . — and that's how single non-digit characters are slipping through.
Thus yours would also work this way:
var regEx = new RegExp("^((^[1-9]|(0\\.)|(\\.))([0-9]+)?((\\s|\\.)[0-9]+(/[0-9])?)?)$");
This happens because the string syntax also uses backslash as a quoting mechanism. When your regular expression is first parsed as a string constant, those backslashes are stripped out if you don't double them. When the string is then passed to the regular expression parser, they're gone.
The only time you really need to use the RegExp constructor is when you're building up the regular expression dynamically or when it's delivered to your code via JSON or something.
Well, for a whole number this would be your regex:
/^(0|[1-9]\d*)$/
Then you have to account for the possibility of a float:
/^(0|[1-9]\d*)(.\d+)?$/
Then you have to account for the possibility of a fraction:
/^(0|[1-9]\d*)((.\d+)|(\/[1-9]\d*)?$/
To me this regex is much easier to read than your original, but it's up to you of course.
Related
If I have a string, which is the source of a regular expression:
"For example, I have (.*) string with (\.d+) special bits (but this is just an aside)."
Is there a way to extract the special parts of the regular expression?
In particular, I'm interested in the parts that will give back values when I call string.match(expr);
Regex can be complicated, but if you do a global regex with ([\.\\]([*a-z])\+?), it will capture your individual fields without including the parenthesis per your request. Demo code as put in this fiddle is below as well.
var testString = 'For example, I have (.*) string with (.d+) special bits (but this is just an aside). (\\w+)';
var regex = /([\.\\]([*a-z])\+?)/gi;
var matches_array = testString.match(regex);
//Outputs the following: [".*", ".d+", "\w+"]
Regular expressions are not powerful enough to recognize the language of matching parentheses. (The formal proof uses the equivalence of regular expressions and finite state machines and the fact that there are infinitely many levels of nesting possible.) Thus, matching the first ) after each ( would make (\d+(\.d+)?) return (\d+(\.d+) and matching the last ) after each ( would make (\w+) (\w+) match the entire string.
The correct way to do this is with recursion (which mathematical regular expressions do not allow, but actual implementations such as PCRE do). You can also get a simple expression for non-nested parentheses. Just be careful to parse escape characters: to be fully robust, \( and \\\( are special, but \\( is not.
I am using this regex in JavaScript which is to evaluate if a given string matches some german phone number patterns.
var reg = new RegExp("(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[/–-]\s*)?)?\d+(?:\s*(?:[\s/–-]\s*)?\d+)*");
When using it I get this error:
SyntaxError: invalid quantifier
...eg = new RegExp("(?:\+\d+)\s*(?:\(\d+\)\s*(?:[/–-]\s*))\d+(?:\s*(?:[\s/–-]\s*)\d...
I'm trying hard to learn reading regular expressions, but can not understand them full yet. I did not write this expression by myself and I am struggling to understand it.
Why I am getting this error?
Because you're using a string litteral, you need to escape each backslash:
(?:\\+\\d+)?\\s*(?:\\(\\d+\\)\\s*(?:[/–-]\\s*)?)?\\d+(?:\\s*(?:[\\s/–-]\\s*)?\\d+)*
The other solution would be to use the regex litteral:
var reg = /(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[\/–-]\s*)?)?\d+(?:\s*(?:[\s\/–-]\s*)?\d+)*/;
new RegExp requires a string, and since backslashes already have meaning inside strings, they need to be escaped again.
In your case, though, you're using a static pattern, so you'd be better off with a literal:
var reg = /(?:\+\d+)?\s*(?:\(\d+\)\s*(?:[\/–-]\s*)?)?\d+(?:\s*(?:[\s\/–-]\s*)?\d+)*/;
Just be aware that you need to escape / here ;)
As an additional tip, you can simplify the need to escape things to some extent by doing stuff like [+] for a literal +. I think it looks nicer then \+, but that's just my opinion.
I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.
Can you please help me. How can I add this regex (?<=^|\s):d(?=$|\s) in javascript RegExp?
e.g
regex = new RegExp("?????" , 'g');
I want to replace the emoticon :d, but only if it is surrounded by spaces (or at an end of the string).
Firstly, as Some1.Kill.The.DJ mentioned, I recommend you use the literal syntax to create the regular expression:
var pattern = /yourPatternHere/g;
It's shorter, easier to read and you avoid complications with escape sequences.
The reason why the pattern does not work is that JavaScript does not support lookbehinds ((?<=...). So you have to find a workaround for that. You won't get around including that character in your pattern:
var pattern = /(?:^|\s):d(?!\S)/g;
Since there is no use in capturing anything in your pattern anyway (because :d is fixed) you are probably only interested in the position of the match. That means, when you find a match, you will have to check whether the first character is a space character (or is not :). If that is the case you have to increment the position by 1. If you know that your input string can never start with a space, you can simply increment any found position if it is not 0.
Note that I simplified your lookahead a bit. That is actually the beauty of lookarounds that you do not have to distinguish between end-of-string and a certain character type. Just use the negative lookahead, and assure that there is no non-space character ahead.
Just for future reference that means you could have simplified your initial pattern to:
(?<!\S):d(?!\S)
(If you were using a regex engine that supports lookbehinds.)
EDIT:
After your comment on the other answer, it's actually a lot easier to use the workaround. Just write back the captured space-character:
string = string.replace(/(^|\s):d(?!\S)/g, "$1emoticonCode");
Where $1 refers to what was matched with (^|\s). I.e. if the match was at the beginning of the string $1 will be empty, and if there was a space before :d, then $1 will contian that space character.
Javascript doesnt support lookbehind i.e(?<=)..
It supports lookahead
Better use
/(?:^|\s)(:d)(?=$|\s)/g
Group1 captures required match
I cannot get this regex string to work in Javascript:
var input = $("input").val();
var hi = "(?<=[^ ])" + input + "(?=[$ ])";
var reg = new RegExp(hi);
alert(reg);
The last line is not working, but it does work when the regex is valid. I put the variable into a second string for the full regex search before passing that one to the regex object. Why isn't this regex query valid? (In case you are wondering, the chars in the brackets are space, zwsp, nbsp, and zwj.)
JavaScript regular expressions do not support look-behind.
They do however, support look-ahead, so if you really need the functionality you can reverse the input and write the expression "backwards". If you want both look-ahead and look-behind at the same time, this gets a little complicated.
As you haven't revealed what you're actually trying to achieve, you may be able to avoid the zero-width matches and just use normal capture groups.