Javascript match eveything except given words - javascript

Im working on a node.js app, and im doing router matching.
I need to match all routes with all variables except the ones which begin with
"public , static , files or same words with added "/"
i know i could do it using an if statement before regexp, to check if those words are withing url, and if they are, skip regexp, but i dont want to add such nesting, and knowing how to do it using regexp will become in handy in the future anyways.
i know how to match anything except...some letters, ie ^[0-9] , but i cant use the same for words. I googled and found that lookahead could solve this, but... i cant get it to work.
In the end, id like to use something like this (in pseudo code)
where the .+ would match only if the pattern does not match any of the given words.
match(/^(?!public|static|files) .+ /gi)
edit 1:
The format of the url's would be something like this..with or without slashes.
/controller/action/4/var:something/
i want to make a regexp that matches this controller - action - id
pattern, but at the same time wouldnt match patterns like this
/public/images/4
or
static/files/somefile
in general, id like to know how to match a pattern, but only if it doesnt begin with given words.
e.g something like this...but it doesnt work
( match .+, but only if it doesnt contain the words mentioned before
/^(?!public|static|files).+ /gi)

Actually, I'm not having trouble with negative look-aheads. Something like this seems to work just fine, although it's not super extensible.
/^\/(?!public|static|files)([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?(.*)$/i
1st capture will be the controller, 2nd is the action, 3rd is the ID, and 4th is whatever is left.
See this jsfiddle

Related

Javascript - regex to check if user write correct formated input

in my CLI users can specify what they want to use:
A user command can look like this:
include=name1,name2,name3
category=name1,name2
category=name1
In another words, a command always consists of 3 parts:
command name: can be just include or category
=: is in every command
name or names of things they want to use, split by ,
How can I test this to get always true but false on everything else.
I am really bad in regex but I tried something like this:
/\category|include=\w/.test(str);
to simply test, at least, the most easy alternative which would be category=name1 but without success.
Can someone help me with this?
You were on the right path. Here's a fixed regex:
/^(category|include)=\w+(,\w+)*$/.test(str);
Note:
the parens around the alternative parts
the + after the \w so that you can have several characters
the optional (,\w+)*
the start and end of string marks (^ and $) in order to check the whole string
You can use this regex for your requorement:
/^(category|include)=(\w+(?:,\w+)*)$/
RegEx Demo
\w+(?:,\w+)*) in the value part after = will allow 1 or more of comma separated words.

RegEx match only final domain name from any email address

I want to match only parent domain name from an email address, which might or might not have a subdomain.
So far I have tried this:
new RegExp(/.+#(:?.+\..+)/);
The results:
Input: abc#subdomain.maindomain.com
Output: ["abc#subdomain.domain.com", "subdomain.maindomain.com"]
Input: abc#maindomain.com
Output: ["abc#maindomain.com", "maindomain.com"]
I am interested in the second match (the group).
My objective is that in both cases, I want the group to match and give me only maindomain.com
Note: before the down vote, please note that neither have I been able to use existing answers, nor the question matches existing ones.
One simple regex you can use to get only the last 2 parts of the domain name is
/[^.]+\.[^.]$/
It matches a sequence of non-period characters, followed by period and another sequence of non-periods, all at the end of the string. This regex doesn't ensure that this domain name happens after a "#". If you want to make a regex that also does that, you could use lazy matching with "*?":
/#.*?([^.]+\.[^.])$/
However,I think that trying to do everything at once tends to make the make regexes more complicated and hard to read. In this problem I would prefer to do things in two steps: First check that the email has an "#" in it. Then you get the part after the "#" and pass it to the simple regex, which will extract the domain name.
One advantage of separating things is that some changes are easier. For example, if you want to make sure that your email only has a single "#" in it its very easy to do in a separate step but would be tricky to achieve in the "do everything" regex.
You can use this regex:
/#(?:[^.\s]+\.)*([^.\s]+\.[^.\s]+)$/gm
Use captured group #1 for your result.
It matches # followed by 0 or more instance of non-DOT text and a DOT i.e. (?:[^.\s]+\.)*.
Using ([^.\s]+\.[^.\s]+)$ it is matching and capturing last 2 components separated by a DOT.
RegEx Demo
With the following maindomain should always return the maindomain.com bit of the string.
var pattern = new RegExp(/(?:[\.#])(\w[\w-]*\w\.\w*)$/);
var str = "abc#subdomain.maindomain.com";
var maindomain = str.match(pattern)[1];
http://codepen.io/anon/pen/RRvWkr
EDIT: tweaked to disallow domains starting with a hyphen i.e - '-yahoo.com'

Regex to find web addresses in short copy

Having a short copy I need to match all occurrences of links to websites. To keep things simple a need to find out addresses in this format:
www.aaaaaa.bbbbbb
http://aaaaaa.bbbb
https://aa.bbbb
but also I need to take care of longer www/http/https versions:
www.aaaaa.bbbb.ccc.ddd.eeee
etc. So basically number of subdomains is not known. Now I came up with this regex:
(www\.([a-zA-Z0-9-_]|\.(?!\s))+)[\s|,|$]|(http(s)?:\/\/(?!\.)([a-zA-Z0-9-_]|\.(?!\s))+)[\s|,|$]
If you test on:
this is some tex with www.somewIebsite.dfd.jhh.hjh inside of it or maybe http://www.ssss.com or maybe https://evenore.com hahaah blah
It works fine with exception of when address is at the very end. $ seems to work only when there is \n in the end and it fails for:
this is some tex with www.somewIebsite.dfd.jhh.hjh
I'm guessing fix is simple and I miss something obvious so how would I fix it? BTW I posted regex here if yu want to quickly play around https://regex101.com/r/eL1bI4/3
The problem is that you placed the end anchor $ inside the character group []
[\s|,|$]
It is then interpreted literally as a dollar sign, and not as the anchor (the pipe character | is also interpreted literally, it's not needed there). The solution is to move the $ anchor outside:
(?:[\s,]|$)
However, in this case it makes more sense to use a positive lookahead instead of the noncapturing group (you don't want trailing spaces, or commas):
(?=[\s,]|$)
In the result you will end up with the following regex pattern:
(www\.([a-zA-Z0-9-_]|\.(?!\s))+)(?=[\s,]|$)|(http(s)?:\/\/(?!\.)([a-zA-Z0-9-_]|\.(?!\s))+)(?=[\s,]|$)
See the working demo.
The updated version that handles trailing full stops:
(www\.([a-zA-Z0-9-_]|\.(?!\s|\.|$))+)(?=[\s,.]|$)|(http(s)?:\/\/(?!\.)([a-zA-Z0-9-_]|\.(?!\s|\.|$))+)(?=[\s,.]|$)
See the working demo.

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Javascript RegEx to match URL's but exclude images

I need to replace all text links in a string of HTML text by actual clickable links. Works fine with the following RegEx:
/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gi
I then noticed it also replaces images and already formatted links. Figures I need to exclude links preceded by src" and > ... I searched a bit and read a lot on negative lookahead in many questions answered here. I tried this (added something right after the first /):
/(^(?!src="|>)\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gi
But this doesn't match any link anymore. I tried several similar statements, without the ^, changing some brackets, etc etc, but simply nothing seems to work. I tried putting .{0} in between the part I added and \b, to make sure he would only look at stuff right in front of the url and not consider anything farther away.
EDIT: The discussion was getting long, so I decided to update the answer instead.
Trusting that your original regex works, I'm just going to refer to a simplified version through the rest of this answer:
/\b(https?|ftp|file)/gi
Now, you attempted this:
/^(?!src="|>)\b(https?|ftp|file)/gi
^
The main error here is marked by a caret: the caret. That forces your regex to match from the beginning of the line, which is why it matched nothing. Let's remove that and move on:
/(?!src="|>)\b(https?|ftp|file)/gi
The main error, this time, is in your conception of lookahead assertions. As I explained in the comments, this assertion is redundant, because you are saying, "Match http or https or ftp or file, as long as none of these are src=" or >." It's almost so redundant that the sentence doesn't even make sense to us! What you want, instead, is a lookbehind assertion:
/(?<!src="|>)\b(https?|ftp|file)/gi
^
Why? Because you wish to find src=" or > behind the string you potentially wish to match. The problem? JavaScript doesn't support lookbehind assertions. So, I suggested an alternative. Admittedly, it was flawed (although not the cause of the HTML breaking, as you brought up). Here it is, fixed:
/(.[^>"]|[^=]")\b(https?|ftp|file)/gi
^^^^^^^^^^^^
This is indeed a non-intuitive regex, and warrants explanation. It splits our cases into two. Say we have a two-character set. If the set doesn't end in > or ", then we're not suspicious of it; we're good to go; match any URL that might follow. However, if it does end in > or ", well, the only "forgivable" case is where the first character is not an =. So you see, a bit of logic trickery here.
Now, as for why this might break your HTML. Be sure to use JavaScript's replace, and substitute the first captured group back into the page! If you simply substitute each match with nothingness, you end up "eating up" the two-character sets, which we only meant to investigate, not destroy.
html.replace(/(.[^>"]|[^=]")\b(https?|ftp|file)/gi,
function(match, $1, offset, original) {
return $1;
});
I have to go home and haven't tested yet, but I'd feel more comfortable dealing with the easier task of isolating HTML you don't want out first.
Do a match to get an array of the stuff you don't want to deal with.
Rip it all out with a split.
Iterate the split array and replace URLs and then splice matched items back in
Join and return
The only assumption is that you don't end on an anchor or img tag in your text
function zipperParse(htmlText,matcher){
var zipBackInArray = htmlText.match(matcher),
workingArray = htmlText.split(matcher),
i = workingArray.length;
while(i--){
buildAnchorTagIfURLPresent(workingArray[i]); //You got this one covered
workingArray.splice(i,0,zipBackInArray.pop());
//working backwards makes splice much easier to use here
}
return workingArray.join('');
}
var toExclude = /<a[^>]*>[^>]*>|<img[^>]*>/g;
// is supposed to match all img and anchor pairs but not handling tags inside anchors yet
zipperParse(yourHtmlText,toExclude);
this code works for me... just change the Google Api KEY to exclude..=> XXXXXXXXXXXXXXXXXXXXXX i just put it in my functions.php theme of my wordpress. The first thing is to see, how your google maps code appears on your site, and then it is to match it to what is replaced.
function remove_script_version( $src ) {
$parts1 = explode( '?', $src );
$parts2 = str_replace('//maps.googleapis.com/maps/api/js', '//maps.googleapis.com/maps/api/js?language=es&v=3.31&libraries=places&key=XXXXXXXXXXXXXXXXXXXXXX&ver=3.31', $parts1);
return $parts2[0]; }
add_filter( 'script_loader_src', 'remove_script_version', 15, 1 );
add_filter( 'style_loader_src', 'remove_script_version', 15, 1 );

Categories

Resources