Allow space in regex when validating file

Allow space in regex when validating file - javascript

I've got a text box where I wanted to ensure some goods and bads out of it.
For instance good could include:
GoodString
GoodString88
99GoodString
Some bad things I did not want to include was:
Good*String
Good&String
But one thing I wanted to allow would be to allow spaces between words so this should of been good:
Good String
However my regex/js is stating this is NOT a good string - I want to allow it. I'm using the test routine for this and I'm as dumb as you can get with regexes. I don't know why I can never understand these things...
In any event my validation is as follows:
var rx = /^[\w.-]+$/;
if (!rx.test($("#MainContent_txtNewDocumentTitle").val())) {
//code for bad string
}else{
//code for good string
}
What can I do to this:
var rx = /^[\w.-]+$/;
Such that spaces are allowed?

You can use this regex instead to allow space only in middle (not at start/end):
var rx = /^[\w.-]+(?:[ \t]+[\w.-]+)*$/gm;
RegEx Demo
RegEx Breakup:
^ # line start
[\w.-]+ # match 1 or more of a word character or DOT or hyphen
(?: # start a non-capturing group
[ \t]+ # match one or more space or tab
[\w.-]+ # match 1 or more of a word character or DOT or hyphen
)* # close the non-capturing group. * will allow 0 or more matches of group
$ # line end
/gm # g for global and m for multiline matches
RegEx Reference

Related

Regular Expression to match text between # and only if # is not preceded by '

Hello I'm trying to find a regular expression that can help me find all matches inside a string when they're inside # and only if # are not preceded by an apostrophe "'".
Basically I need to bold the text just as here when we use double * to bold text like this, but the apostrophe should work as an escape character.
For example
#Hello my name is Noé# should look like Hello my name is Noé
#Hello this has an escape apostrophe '# so I'll match until here# should look like Hello this has an escape apostrophe '# so I'll match until here
Inside a long text there might or might not be several matches:
"Hello I'm a text #I'm bold#, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
So i can print it like
"Hello I'm a text I'm bold, and I need to know how to match my text that's inside two '#, and #I will not match either 'cause I got no end"
If thats not possible with a RegExp I could program a finite state machine, but I was hoping I was possible, thank you in advance God bless you!
Note: I will handle the escape characters later by now I just need to know how to mach this
/(?<!')#.*(?<!')#/gim
This was the only thing I could come up with, but honestly, I have no idea how negative look behind works :(, with this regexp it would match wrong. For example, if I type:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"
matches from the first # occurrence until the last one, like so:
"I'm a text #and I should be a match# and this should not #But this should as well# and I'm just some random extra text"

I think this should work:
(?<!')#(.*?)(?<!')#
Here you can see the regexp working with your examples: https://regex101.com/r/wnguiA/1
(?<!') is Negative Lookbehind, it tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a b that is not preceded by an a.
More easy is the (.*?) that matches any character (except for line terminators); adding ? tells the capturing group to be not-greedy and stop at the first occourence of the succesive token.

To prevent triggering the negatilve lookbehind at all the positions not asserting a ' to the left, you can also first match # and do the assertion after it.
#(?<!'#)(.*?)#(?<!'#)
Regex demo
Another option instead of using the non greedy .*? is to use a negated character class matching any char except #
Then when you encounter # only match it if there is ' before it using a positive lookbehind.
#(?<!'#)([^#\n]*(?:#(?<='#)[^#\n]*)*)#(?<!'#)
#(?<!'#) Match # not directly preceded by '
( Capture group 1
[^#\n]* Optionally match any char except # or a newline
(?: Non capture group
#(?<='#) Match # not directly preceded by '
[^#\n]* Match optional repetitions of any char except # or a newline
)* Close non capture group and optionally repeat it to match all occurrences
) Close group 1
#(?<!'#) Match # not directly preceded by '
Regex demo

JS regex to match a username with specific special characters and no consecutive spaces

I am pretty new to this reg ex world. Struck up with small task regarding Regex.
Before posting new question I have gone thru some answers which am able to understand but couldnt crack the solution for my problem
Appreciate your help on this.
My Scenario is:
Validating the Username base on below criteria
1- First character has to be a-zA-Z0-9_# (either of two special characters(_#) or alphanumeric)
2 - The rest can be any letters, any numbers and -#_ (either of three special characters and alphanumeric).
3 - BUT no consecutive spaces between words.
4- Max size should be 30 characters
my username might contain multiple words seperated by single space..for the first word only _# alphanumeric are allowed and for the second word onwards it can contain _-#aphanumeric
Need to ignore the Trailing spaces at the end of the username
Examples are: #test, _test, #test123, 123#, test_-#, test -test1, #test -_#test etc...
Appreciate your help on this..
Thanks
Arjun

Here you go:
^(?!.*[ ]{2,})[\w#][-#\w]{0,29}$
See it working on regex101.com.
Condition 3 is ambigouus though as you're not allowing spaces anyway. \w is a shortcut for [a-zA-Z_], (?!...) is called a neg. lookahead.
Broken down this says:
^ # start of string
(?!.*[ ]{2,}) # neg. lookahead, no consecutive spaces
[\w#] # condition 1
[-#\w]{0,29} # condition 2 and 4
$ # end of string

This might work ^(?=.{1,30}$)(?!.*[ ]{2})[a-zA-Z0-9_#]+(?:[ ][a-zA-Z0-9_#-]+)*$
Note - the check for no consecutive spaces (?! .* [ ]{2} ) is not really
necessary since the regex body only allows a single space between words.
It is left in for posterity, take it out if you want.
Explained
^ # BOS
(?= .{1,30} $ ) # Min 1 character, max 30
(?! .* [ ]{2} ) # No consecutive spaces (not really necessary here)
[a-zA-Z0-9_#]+ # First word only
(?: # Optional other words
[ ]
[a-zA-Z0-9_#-]+
)*
$ # EOS

Minification: Using regex to remove linebreaks from JavaScript code

For the pure purpose of obfuscation, the first three lines seem to clean up the script pretty nicely from unnecessary enters.
Can anyone tell me what the lines 1 - 4 actually do? Only thing I know from trial and error is that if I comment out the fourth line the site works, if I leave it in place the site breaks.
<?php
header("Content-type: text/javascript; charset=UTF-8");
ob_start("compress");
function compress($buffer)
{
# remove extra or unneccessary new line from javascript
$buffer = preg_replace('/([;])\s+/', '$1', $buffer);
$buffer = preg_replace('/([}])\s+(else)/', '$1else', $buffer);
$buffer = preg_replace('/([}])\s+(var)/', '$1;var', $buffer);
$buffer = preg_replace('/([{};])\s+(\$)/', '$1\$', $buffer);
return $buffer;
}
Is there a better way to remove one or multiple line enters from JavaScript?

Dissection of all four regular expressions
Let's try and dissect each one of the regular expressions.
First regex
$buffer = preg_replace('/([;])\s+/', '$1', $buffer);
Explanation
( # beginning of the first capturing group
[;] # match the literal character ';'
) # ending of the first capturing group
\s+ # one or more whitespace characters (including newlines)
The above regular expression removes any whitespace that occurs immediately following a semicolon. ([;]) is a capturing group, meaning if a match is found, it is stored into a backreference, so we could use it later. For example, if our string was foo; <space><space>, then the expression would match ; and the whitespace characters. The replacement pattern here is $1, which means the entire matched string would be replaced with just a semicolon.
Second regex
$buffer = preg_replace('/([}])\s+(else)/', '$1else', $buffer);
Explanation
( # beginning of the first capturing group
[}] # match the literal character ';'
) # ending of the first capturing group
\s+ # one or more whitespace characters
(else) # match and capture 'else'
The above regex removes any whitespace between a closing curly brace (}) and else. The replacement pattern here is $1else, which means, the string with whitespace will get replaced by what was captured by the first capturing group ([}]) (which is just the semicolon) followed by the keyword else. Nothing much to it.
Third regex
$buffer = preg_replace('/([}])\s+(var)/', '$1;var', $buffer);
Explanation
( # beginning of the first capturing group
[}] # match the literal character ';'
) # ending of the first capturing group
\s+ # one or more whitespace characters
(var) # match and capture 'var'
This is the same as previous regex. The only difference here is the keyword - var instead of else. The semicolon character is optional in JavaScript. But if you want to write multiple statements in a single line, there's no way for the interpreter to know they're multiple lines, so a ; will need to be used to terminate each statement.
Fourth regex
$buffer = preg_replace('/([{};])\s+(\$)/', '$1\$', $buffer);
Explanation
( # beginning of the first capturing group
[{};] # match the literal character '{' or '}' or ';'
) # ending of the first capturing group
\s+ # one or more whitespace characters
( # beginning of the second capturing group
\$ # match the literal character '$'
) # ending of the second capturing group
The replacement pattern here is $1\$, which means the entire matched string would be replaced with what was matched by the first capturing group ([{};]) followed by a literal $ character.
Sidenote
This answer was only meant to explain the four regexes and what it does. The expressions could be improved a lot, but I'm not going into that as it's not the correct approach. As Qtax points out in the comments, you really should use a proper JS minifier to achieve this task. You might want to check out Google's Closure Compiler - it looks pretty neat.
If you're still confused how it works, don't worry. Learning regexes can be difficult in the beginning. I suggest you use this website - http://regularexpressions.info. It is a pretty decent resource for learning regular expressions. If you're looking for a book, you might want to check out Mastering Regular Expressions By Jeffrey Friedl.

JavaScript Regex does not match exact string

In the example below the output is true. It cookie and it also matches cookie14214 I'm guessing it's because cookie is in the string cookie14214. How do I hone-in this match to only get cookie?
var patt1=new RegExp(/(biscuit|cookie)/i);
document.write(patt1.test("cookie14214"));
Is this the best solution?
var patt1=new RegExp(/(^biscuit$|^cookie$)/i);

The answer depends on your allowance of characters surrounding the word cookie. If the word is to appear strictly on a line by itself, then:
var patt1=new RegExp(/^(biscuit|cookie)$/i);
If you want to allow symbols (spaces, ., ,, etc), but not alphanumeric values, try something like:
var patt1=new RegExp(/(?:^|[^\w])(biscuit|cookie)(?:[^\w]|$)/i);
Second regex, explained:
(?: # non-matching group
^ # beginning-of-string
| [^\w] # OR, non-alphanumeric characters
)
(biscuit|cookie) # match desired text/words
(?: # non-matching group
[^\w] # non-alphanumeric characters
| $ # OR, end-of-string
)

Yes, or use word boundaries. Note that this will match great cookies but not greatcookies.
var patt1=new RegExp(/(\bbiscuit\b|\bcookie\b)/i);
If you want to match the exact string cookie, then you don't even need regular expressions, just use ==, since /^cookie$/i.test(s) is basically the same as s.toLowerCase() == "cookie".

Javascript multiple regex pattern

I'm trying to exclude some internal IP addresses and some internal IP address formats from viewing certain logos and links in the site.I have multiple range of IP addresses(sample given below). Is it possible to write a regex that could match all the IP addresses in the list below using javascript?
10.X.X.X
12.122.X.X
12.211.X.X
64.X.X.X
64.23.X.X
74.23.211.92
and 10 more

Quote the periods, replace the X's with \d+, and join them all together with pipes:
const allowedIPpatterns = [
"10.X.X.X",
"12.122.X.X",
"12.211.X.X",
"64.X.X.X",
"64.23.X.X",
"74.23.211.92" //, etc.
];
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
const allowedRegexp = new RegExp(allowedRegexStr);
Then you're all set:
'10.1.2.3'.match(allowedRegexp) // => ['10.1.2.3']
'100.1.2.3'.match(allowedRegexp) // => null
How it works:
First, we have to turn the individual IP patterns into regular expressions matching their intent. One regular expression for "all IPs of the form '12.122.X.X'" is this:
^12\.122\.\d+\.\d+$
^ means the match has to start at the beginning of the string; otherwise, 112.122.X.X IPs would also match.
12 etc: digits match themselves
\.: a period in a regex matches any character at all; we want literal periods, so we put a backslash in front.
\d: shorthand for [0-9]; matches any digit.
+: means "1 or more" - 1 or more digits, in this case.
$: similarly to ^, this means the match has to end at the end of the string.
So, we turn the IP patterns into regexes like that. For an individual pattern you could use code like this:
const regexStr = `^` + ipXpattern.
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
`$`;
Which just replaces all .s with \. and Xs with \d+ and sticks the ^ and $ on the ends.
(Note the doubled backslashes; both string parsing and regex parsing use backslashes, so wherever we want a literal one to make it past the string parser to the regular expression parser, we have to double it.)
In a regular expression, the alternation this|that matches anything that matches either this or that. So we can check for a match against all the IP's at once if we to turn the list into a single regex of the form re1|re2|re3|...|relast.
Then we can do some refactoring to make the regex matcher's job easier; in this case, since all the regexes are going to have ^...$, we can move those constraints out of the individual regexes and put them on the whole thing: ^(10\.\d+\.\d+\.\d+|12\.122\.\d+\.\d+|...)$. The parentheses keep the ^ from being only part of the first pattern and $ from being only part of the last. But since plain parentheses capture as well as group, and we don't need to capture anything, I replaced them with the non-grouping version (?:..).
And in this case we can do the global search-and-replace once on the giant string instead of individually on each pattern. So the result is the code above:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '\\d+') +
')$';
That's still just a string; we have to turn it into an actual RegExp object to do the matching:
const allowedRegexp = new RegExp(allowedRegexStr);
As written, this doesn't filter out illegal IPs - for instance, 10.1234.5678.9012 would match the first pattern. If you want to limit the individual byte values to the decimal range 0-255, you can use a more complicated regex than \d+, like this:
(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])
That matches "any one or two digits, or '1' followed by any two digits, or '2' followed by any of '0' through '4' followed by any digit, or '25' followed by any of '0' through '5'". Replacing the \d with that turns the full string-munging expression into this:
const allowedRegexStr = '^(?:' +
allowedIPpatterns.
join('|').
replace(/\./g, '\\.').
replace(/X/g, '(?:\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])') +
')$';
And makes the actual regex look much more unwieldy:
^(?:10\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5]).(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])|12\.122\....
but you don't have to look at it, just match against it. :)

You could do it in regex, but it's not going to be pretty, especially since JavaScript doesn't even support verbose regexes, which means that it has to be one humongous line of regex without any comments. Furthermore, regexes are ill-suited for matching ranges of numbers. I suspect that there are better tools for dealing with this.
Well, OK, here goes (for the samples you provided):
var myregexp = /\b(?:74\.23\.211\.92|(?:12\.(?:122|211)|64\.23)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])|(?:10|64)\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]))\b/g;
As a verbose ("readable") regex:
\b # start of number
(?: # Either match...
74\.23\.211\.92 # an explicit address
| # or
(?: # an address that starts with
12\.(?:122|211) # 12.122 or 12.211
| # or
64\.23 # 64.23
)
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
| # or
(?:10|64) # match 10 or 64
\. # .
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\. # followed by 0..255 and a dot
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # followed by 0..255
)
\b # end of number

/^(X|\d{1,3})(\.(X|\d{1,3})){3}$/ should do it.

If you don't actually need to match the "X" character you could use this:
\b(?:\d{1,3}\.){3}\d{1,3}\b
Otherwise I would use the solution cebarrett provided.

I'm not entirely sure of what you're trying to achieve here (doesn't look anyone else is either).
However, if it's validation, then here's a solution to validate an IP address that doesn't use RegEx. First, split the input string at the dot. Then using parseInt on the number, make sure it isn't higher than 255.
function ipValidator(ipAddress) {
var ipSegments = ipAddress.split('.');
for(var i=0;i<ipSegments.length;i++)
{
if(parseInt(ipSegments[i]) > 255){
return 'fail';
}
}
return 'match';
}
Running the following returns 'match':
document.write(ipValidator('10.255.255.125'));
Whereas this will return 'fail':
document.write(ipValidator('10.255.256.125'));
Here's a noted version in a jsfiddle with some examples, http://jsfiddle.net/VGp2p/2/

Develop Reference

JavaScript is the programming language of the Web.

Allow space in regex when validating file - javascript

Related

Regular Expression to match text between # and only if # is not preceded by '

JS regex to match a username with specific special characters and no consecutive spaces

Minification: Using regex to remove linebreaks from JavaScript code

JavaScript Regex does not match exact string

Javascript multiple regex pattern

Categories

Resources