JS RegEx: simple match with optional parts - javascript

I think I don't really get RegEx stuff, so I need help matching the following simple pattern:
SOME_TEXT _Syn: SYN_TEXT _Ant: ANT_TEXT
quotes are decorative, X_TEXT is any text (that does not contain _Syn: or _Ant: that are special abbreviation), _Syn or _Ant parts are optional
I need to get SOME_TEXT, SYN_TEXT and ANT_TEXT in array
So for example if _Syn part not present (input is SOME_TEXT _Ant: ANT_TEXT) result should be [SOME_TEXT, '', ANT_TEXT]
Tried different approaches with lazy modifiers but fails to implement it.

/(.*?)(?:_Syn:(.*?))?(?:_Ant:(.*?))?$/
The important parts are the ? after the .* which make them reluctant (not greedy) and the $ at the end that forces the match in spite of all of the optional matches.

Use this regex
var n=str.match(/(SOME|SYN|ANT)_TEXT/g);
n would contain an array of matched strings

Related

Validate jquery selector syntax using regex

I am trying to write a regex expression in javascript to validate whether a string is a valid jquery selector. This is strictly educational and not a particular requirement in any project of mine
Pattern
/^(\$|Jquery)\(('|")[\.|#]?[a-zA-Z][a-zA-Z0-9!]*('|")\)$/gi
It works fine for below tests
$("#id")//true
$('.class')//true
jquery('.class')//true
jquery('div')//true
My problem is that the test on $('#id") also returns true i.e, using mixing single and double quote in js in invalid. How to restrict this. Can we have conditional regex?
const pattern = /^(\$|Jquery)\(('|")[\.|#]?[a-zA-Z][a-zA-Z0-9!]*('|")\)$/gi;
[
`$("#id")`, //true
`$('.class')`, //true
`jquery('.class')`, //true
`jquery('div')`, //true
].forEach(str => console.log(pattern.test(str)));
You can capture the first quote or doublequote in a group, and require that same group (the same quote or doublequote) at the end, using a backreference:
const re = /^(?:\$|Jquery)\((['"])[\.#]?[a-zA-Z][a-zA-Z0-9!]*\1\)$/gi;
console.log(re.test(`$("#id")`))
console.log(re.test(`$('#id")`))
console.log(re.test(`$("#id')`))
console.log(re.test(`$('#id')`))
There are also a couple other things to fix:
/^\$|Jquery...
meant that any string starting with $ would fulfill the regex. Enclose it in a group instead.
Single quote ' doesn't need escaping - best to remove the backslash.
Rather than
[\.|#]?
if you want to possibly match . or # (and not a pipe), use [\.#]? instead

Why would the replace with regex not work even though the regex does?

There may be a very simple answer to this, probably because of my familiarity (or possibly lack thereof) of the replace method and how it works with regex.
Let's say I have the following string: abcdefHellowxyz
I just want to strip the first six characters and the last four, to return Hello, using regex... Yes, I know there may be other ways, but I'm trying to explore the boundaries of what these methods are capable of doing...
Anyway, I've tinkered on http://regex101.com and got the following Regex worked out:
/^(.{6}).+(.{4})$/
Which seems to pass the string well and shows that abcdef is captured as group 1, and wxyz captured as group 2. But when I try to run the following:
"abcdefHellowxyz".replace(/^(.{6}).+(.{4})$/,"")
to replace those captured groups with "" I receive an empty string as my final output... Am I doing something wrong with this syntax? And if so, how does one correct it, keeping my original stance on wanting to use Regex in this manner...
Thanks so much everyone in advance...
The code below works well as you wish
"abcdefHellowxyz".replace(/^.{6}(.+).{4}$/,"$1")
I think that only use ()to capture the text you want, and in the second parameter of replace(), you can use $1 $2 ... to represent the group1 group2.
Also you can pass a function to the second parameter of replace,and transform the captured text to whatever you want in this function.
For more detail, as #Akxe recommend , you can find document on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace.
You are replacing any substring that matches /^(.{6}).+(.{4})$/, with this line of code:
"abcdefHellowxyz".replace(/^(.{6}).+(.{4})$/,"")
The regex matches the whole string "abcdefHellowxyz"; thus, the whole string is replaced. Instead, if you are strictly stripping by the lengths of the extraneous substrings, you could simply use substring or substr.
Edit
The answer you're probably looking for is capturing the middle token, instead of the outer ones:
var str = "abcdefHellowxyz";
var matches = str.match(/^.{6}(.+).{4}$/);
str = matches[1]; // index 0 is entire match
console.log(str);

RegEx match only final domain name from any email address

I want to match only parent domain name from an email address, which might or might not have a subdomain.
So far I have tried this:
new RegExp(/.+#(:?.+\..+)/);
The results:
Input: abc#subdomain.maindomain.com
Output: ["abc#subdomain.domain.com", "subdomain.maindomain.com"]
Input: abc#maindomain.com
Output: ["abc#maindomain.com", "maindomain.com"]
I am interested in the second match (the group).
My objective is that in both cases, I want the group to match and give me only maindomain.com
Note: before the down vote, please note that neither have I been able to use existing answers, nor the question matches existing ones.
One simple regex you can use to get only the last 2 parts of the domain name is
/[^.]+\.[^.]$/
It matches a sequence of non-period characters, followed by period and another sequence of non-periods, all at the end of the string. This regex doesn't ensure that this domain name happens after a "#". If you want to make a regex that also does that, you could use lazy matching with "*?":
/#.*?([^.]+\.[^.])$/
However,I think that trying to do everything at once tends to make the make regexes more complicated and hard to read. In this problem I would prefer to do things in two steps: First check that the email has an "#" in it. Then you get the part after the "#" and pass it to the simple regex, which will extract the domain name.
One advantage of separating things is that some changes are easier. For example, if you want to make sure that your email only has a single "#" in it its very easy to do in a separate step but would be tricky to achieve in the "do everything" regex.
You can use this regex:
/#(?:[^.\s]+\.)*([^.\s]+\.[^.\s]+)$/gm
Use captured group #1 for your result.
It matches # followed by 0 or more instance of non-DOT text and a DOT i.e. (?:[^.\s]+\.)*.
Using ([^.\s]+\.[^.\s]+)$ it is matching and capturing last 2 components separated by a DOT.
RegEx Demo
With the following maindomain should always return the maindomain.com bit of the string.
var pattern = new RegExp(/(?:[\.#])(\w[\w-]*\w\.\w*)$/);
var str = "abc#subdomain.maindomain.com";
var maindomain = str.match(pattern)[1];
http://codepen.io/anon/pen/RRvWkr
EDIT: tweaked to disallow domains starting with a hyphen i.e - '-yahoo.com'

Non-capturing groups in Javascript regex

I am matching a string in Javascript against the following regex:
(?:new\s)(.*)(?:[:])
The string I use the function on is "new Tag:var;"
What it suppod to return is only "Tag" but instead it returns an array containing "new Tag:" and the desired result as well.
I found out that I might need to use a lookbehind instead but since it is not supported in Javascript I am a bit lost.
Thank you in advance!
Well, I don't really get why you make such a complicated regexp for what you want to extract:
(?:new\\s)(.*)(?:[:])
whereas it can be solved using the following:
s = "new Tag:";
var out = s.replace(/new\s([^:]*):.*;/, "$1")
where you got only one capturing group which is the one you're looking for.
\\s (double escaping) is only needed for creating RegExp instance.
Also your regex is using greedy pattern in .* which may be matching more than desired.
Make it non-greedy:
(?:new\s)(.*?)(?:[:])
OR better use negation:
(?:new\s)([^:]*)(?:[:])

Breaking a String into Chunks based on Pattern

I have one string, that looks like this:
a[abcdefghi,2,3,jklmnopqr]
The beginning "a" is fixed and non-changing, however the content within the brackets is and can follow a pattern. It will always be an alphabetical string, possibly followed by numbers separate by commas or more strings and/or numbers.
I'd like to be able to break it into chunks of the string and any numbers that follow it until the "]" or another string is met.
Probably best explained through examples and expected ideal results:
a[abcdefghi] -> "abcdefghi"
a[abcdefghi,2] -> "abcdefghi,2"
a[abcdefghi,2,3,jklmnopqr] -> "abcdefghi,2,3" and "jklmnopqr"
a[abcdefghi,2,3,jklmnopqr,stuvwxyz] -> "abcdefghi,2,3" and "jklmnopqr" and "stuvwxyz"
a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz] -> "abcdefghi,2,3" and "jklmnopqr,1,9" and "stuvwxyz"
a[abcdefghi,1,jklmnopqr,2,stuvwxyz,3,4] -> "abcdefghi,1" and "jklmnopqr,2" and "stuvwxyz,3,4"
Ideally a malformed string would be partially caught (but this is a nice extra):
a[2,3,jklmnopqr,1,9,stuvwxyz] -> "jklmnopqr,1,9" and "stuvwxyz"
I'm using Javascript and I realize a regex won't bring me all the way to the solution I'd like but it could be a big help. The alternative is to do a lot of manually string parsing which I can do but doesn't seem like the best answer.
Advice, tips appreciated.
UPDATE: Yes I did mean alphametcial (A-Za-z) instead of alphanumeric. Edited to reflect that. Thanks for letting me know.
You'd probably want to do this in 2 steps. First, match against:
a\[([^[\]]*)\]
and extract group 1. That'll be the stuff in the square brackets.
Next, repeatedly match against:
[a-z]+(,[0-9]+)*
That'll match things like "abcdefghi,2,3". After the first match you'll need to see if the next character is a comma and if so skip over it. (BTW: if you really meant alphanumeric rather than alphabetic like your examples, use [a-z0-9]*[a-z][a-z0-9]* instead of [a-z]+.)
Alternatively, split the string on commas and reassemble into your word with number groups.
Why wouldn't a regex bring you all the way to a solution?
The following regex works against the given data, but it makes a few assumptions (at least two alphas followed by comma separated single digits).
([a-z]{2,}(?:,\\d)*)
Example:
re = new RegExp('[a-z]{2,}(?:,\\d)*', 'g')
matches = re.exec("a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz]")
Assuming you can easily break out the string between the brackets, something like this might be what you're after:
> re = new RegExp('[a-z]+(?:,\\d)*(?:,?)', 'gi')
> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
abcdefghi,2,3,
jklmnopqr,1,9,
stuvwxyz
This has the advantage of working partially in your malformed case:
> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
jklmnopqr,1,9,
stuvwxy
The first character class [a-z] can be modified if you meant for it to be truly alphanumeric.

Categories

Resources