Javascript Regex vs Java Regex - javascript

I have a a regex in Javascript that works great: /:([\w]+):/g
I am working on converting my javascript app to java, and I know to escape the \ using \ i.e. /:([\\w]+):/g, yet my tests are still returning no match for the string "hello :testsmilie: how are you?"
Pattern smiliePattern = Pattern.compile("/:([\\w]+):/g");
Matcher m = smiliePattern.matcher(message);
if(m.find()) {
System.println(m.group(0));
}
In javascript it returns ":testsmilie:" just fine, so i'm not sure what the difference is. Any help would be much appreciated!

Your regex in java can just be :
Pattern.compile(":[^:]+:")
Which match : followed by one or more no two dots : followed by :
Or if you want to use \w you can use :
Pattern.compile(":\\w+:")
If you note you don't need parenthesis of group (), so to get the result you can just use :
System.out.println(m.group());

You should learn how is made a Javascript regex, because the / are the delimiters of the real regex, and g is a modifier for global
In Java the equivalent is: :([\\w]+):, and no need of global flag as you just need to call multiple times .find() to get all the matches
You should take a look at regex101 which is a good website to test regex

Related

Validate jquery selector syntax using regex

I am trying to write a regex expression in javascript to validate whether a string is a valid jquery selector. This is strictly educational and not a particular requirement in any project of mine
Pattern
/^(\$|Jquery)\(('|")[\.|#]?[a-zA-Z][a-zA-Z0-9!]*('|")\)$/gi
It works fine for below tests
$("#id")//true
$('.class')//true
jquery('.class')//true
jquery('div')//true
My problem is that the test on $('#id") also returns true i.e, using mixing single and double quote in js in invalid. How to restrict this. Can we have conditional regex?
const pattern = /^(\$|Jquery)\(('|")[\.|#]?[a-zA-Z][a-zA-Z0-9!]*('|")\)$/gi;
[
`$("#id")`, //true
`$('.class')`, //true
`jquery('.class')`, //true
`jquery('div')`, //true
].forEach(str => console.log(pattern.test(str)));
You can capture the first quote or doublequote in a group, and require that same group (the same quote or doublequote) at the end, using a backreference:
const re = /^(?:\$|Jquery)\((['"])[\.#]?[a-zA-Z][a-zA-Z0-9!]*\1\)$/gi;
console.log(re.test(`$("#id")`))
console.log(re.test(`$('#id")`))
console.log(re.test(`$("#id')`))
console.log(re.test(`$('#id')`))
There are also a couple other things to fix:
/^\$|Jquery...
meant that any string starting with $ would fulfill the regex. Enclose it in a group instead.
Single quote ' doesn't need escaping - best to remove the backslash.
Rather than
[\.|#]?
if you want to possibly match . or # (and not a pipe), use [\.#]? instead

Regex: get string between last character occurence before a comma

I need some help with Regex.
I have this string: \\lorem\ipsum\dolor,\\sit\amet\conseteteur,\\sadipscing\elitr\sed\diam
and want to get the result: ["dolor", "conseteteur", "diam"]So in words the word between the last backslash and a comma or the end.
I've already figured out a working test, but because of reasons it won't work in neitherChrome (v44.0.2403.130) nor IE (v11.0.9600.17905) console.There i'm getting the result: ["\loremipsumdolor,", "\sitametconseteteur,", "\sadipscingelitrseddiam"]
Can you please tell me, why the online testers aren't working and how i can achieve the right result?
Thanks in advance.
PS: I've tested a few online regex testers with all the same result. (regex101.com, regexpal.com, debuggex.com, scriptular.com)
The string
'\\lorem\ipsum\dolor,\\sit\amet\conseteteur,\\sadipscing\elitr\sed\diam'
is getting escaped, if you try the following in the browser's console you'll see what happens:
var s = '\\lorem\ipsum\dolor,\\sit\amet\conseteteur,\\sadipscing\elitr\sed\diam'
console.log(s);
// prints '\loremipsumdolor,\sitametconseteteur,\sadipscingelitrseddiam'
To use your original string you have to add additional backslashes, otherwise it becomes a different one because it tries to escape anything followed by a single backslash.
The reason why it works in regexp testers is because they probably sanitize the input string to make sure it gets evaluated as-is.
Try this (added an extra \ for each of them):
str = '\\\\lorem\\ipsum\\dolor,\\\\sit\\amet\\conseteteur,\\\\sadipscing\\elitr\\sed\\diam'
re = /\\([^\\]*)(?:,|$)/g
str.match(re)
// should output ["\dolor,", "\conseteteur,", "\diam"]
UPDATE
You can't prevent the interpreter from escaping backslashes in string literals, but this functionality is coming with EcmaScript6 as String.raw
s = String.raw`\\lorem\ipsum\dolor,\\sit\amet\conseteteur,\\sadipscing\elitr\sed\diam`
Remember to use backticks instead of single quotes with String.raw.
It's working in latest Chrome, but I can't say for all other browsers, if they're moderately old, it probably isn't implemented.
Also, if you want to avoid matching the last backslash you need to:
remove the \\ at the start of your regexp
use + instead of * to avoid matching the line end (it will create an extra capture)
use a positive lookahead ?=
like this
s = String.raw`\\lorem\ipsum\dolor,\\sit\amet\conseteteur,\\sadipscing\elitr\sed\diam`;
re = /([^\\]+)(?=,|$)/g;
s.match(re);
// ["dolor", "conseteteur", "diam"]
You may try this,
string.match(/[^\\,]+(?=,|$)/gm);
DEMO

Non-capturing groups in Javascript regex

I am matching a string in Javascript against the following regex:
(?:new\s)(.*)(?:[:])
The string I use the function on is "new Tag:var;"
What it suppod to return is only "Tag" but instead it returns an array containing "new Tag:" and the desired result as well.
I found out that I might need to use a lookbehind instead but since it is not supported in Javascript I am a bit lost.
Thank you in advance!
Well, I don't really get why you make such a complicated regexp for what you want to extract:
(?:new\\s)(.*)(?:[:])
whereas it can be solved using the following:
s = "new Tag:";
var out = s.replace(/new\s([^:]*):.*;/, "$1")
where you got only one capturing group which is the one you're looking for.
\\s (double escaping) is only needed for creating RegExp instance.
Also your regex is using greedy pattern in .* which may be matching more than desired.
Make it non-greedy:
(?:new\s)(.*?)(?:[:])
OR better use negation:
(?:new\s)([^:]*)(?:[:])

difference between ruby regex and javascript regex

I made this regular expression: /.net.(\w*)/
I'm trying to capture the qa in a string like this:
https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG
I'm doing .replace on it like so location.replace(/.net.(\w*)/,data.newName));
But instead of capturing qa, it captures .net, when I run the code in Javascript
According to this online regex tool made for ruby, it captures qa as intended
http://rubular.com/r/ItrG7BRNRn
What's the difference between Javascript regexes and Ruby regexes, and how can I make my regex work as intended in javascript?
Edit:
I changed my code to this:
var str = `https://xxxxxxxxxx.cloudfront.net/qa/club`;
var re = /\.net\/([^\/]*)\//;
console.log(data2.files[i].location.replace(re,'$1'+ "test"));
And instead of
https://dm7svtk8jb00c.cloudfront.net/test/club
I get this:
https://dm7svtk8jb00c.cloudfrontqatestclub
If I remove the $1 I get https://dm7svtk8jb00c.cloudfronttestclub, which is closer, but I want to keep the slashes.
This would be a better regex:
/\.net\/([^\/]*)\//
Remember that . will match any character, not the period character. For that you need to escape it with a leading backslash: \.
Also, \w will only match numbers, letters and underscores. You could quite legitimately have a dash in that part of the URL. Therefore you're far better off matching anything that isn't a forward slash.
I am not sure how Ruby works, but JavaScript replace will not just replace the capture group, it replaces the whole matched string. By adding another capture group, you can use $1 to add back in the string you want to keep.
...replace(/(.net.)(\w*)/,"$1" + data.newName");
You have to do that like this:
location.replace(/(\.net.)(\w*)/, '$1' + data.newName)
replace replaces the whole matched substring, not a particular group. Ruby works exactly in the same way:
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/.net.(\w*)/, '##')"
https://xxxxxx.cloudfront##/club/Slide1.PNG
ruby -e "puts 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'.sub(/(.net.)(\w*)/, '\\1' + '##')"
https://xxxxxx.cloudfront.net/##/club/Slide1.PNG
There's no difference (at least with the pattern you've provided). In both cases, the expression matches ".net/qa", with qa being the first capture group within the expression. Notice that even in your linked example the entire match is highlighted.
I'd recommend something like this:
location.replace(/(.net.)\w*/, "$1" + data.newName);
Or this, to be a bit safer:
location.replace(/(.net.)\w*/, function(m, a) { return a + data.newName; });
It's not so much a different between JavaScript and Ruby's implementations of regular expressions, it's your pattern that needs a bit of work. It's not tight enough.
You can use something like /\.net\/([^\/]+)/, which you can see in action at Rubular.
That returns the characters delimited by / following .net.
Regex patterns are very powerful, but they're also fraught with dangerous side-effects that open up big holes easily, causing false-positives, which can ruin results unexpectedly. Until you know them well, start simply, and test them every imaginable way. And, once you think you know them well, keep doing that; Patterns in code we write where I work are a particular hot-button for me, and I'm always finding holes in them in our code-reviews and requiring them to be tightened until they do exactly what the developer meant, not what they thought they meant.
While the pattern above works, I'd probably do it a bit differently in Ruby. Using the tools made for the job:
require 'uri'
URL = 'https://xxxxxx.cloudfront.net/qa/club/Slide1.PNG'
uri = URI.parse(URL)
path = uri.path # => "/qa/club/Slide1.PNG"
path.split('/')[1] # => "qa"
Or, more succinctly:
URI.parse(URL).path.split('/')[1] # => "qa"

JavaScript Regex Replacement with Reference

I'm struggling a little bit with a JavaScript regex statement - and I can't quite see what's wrong. I've tested in online tools and they suggest it should work so I'm assuming there's something different between C# regex that I'm used to and JavaScript.
The string I'm working with is quite simple:
[a] + [b]
The regex match I'm trying to use is:
/[(?<name>[a-zA-Z0-9])/]
I'm trying to replace the value with the following:
viewModel.$1.control.value()
Which should leave me with:
viewModel.a.control.value() + viewModel.b.control.value()
Unfortunately I'm always getting my inital value printed, suggesting my matching isn't working but I can't see why. The only obvious thing I tried was switching the escaping of the square brackets between forward and backslash.
Can anyone suggest what else might be wrong?
There is no named groups in Javascript regex. Use this:
var s = '[a] + [b]';
repl = s.replace(/\[([a-zA-Z0-9])\]/g, 'viewModel.$1.control.value()');
//=> "viewModel.a.control.value() + viewModel.b.control.value()"
Also you need to escape [ and ] in order to match them in a regex.

Categories

Resources