How can I shorten this regex for JavaScript?

How can I shorten this regex for JavaScript? - javascript

Basically I just want it to match anything inside (). I tried the . and * but they don't seem to work. Right now my regex looks like:
\(([\\\[\]\-\d\w\s/*\.])+\)
The strings it's going to match are URL routes like:
#!/foo/bar/([a-z])/([\d\w])/(*)
In this example, my regex above matches:
([a-z])
([\d\w])
(*)
BONUS:
How can I make it so that it only matches when it starts with a ( and ends with a ). I thought I used the ^ at the front where it's \( and the $ and the end where it's \) but no luck.
Disregard this bonus. I didnt realize it didnt matter...

Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren:
\(([^)]*)\)

Basically I just want it to match anything inside ().
BONUS: How can I make it so that it only matches when it starts with a ( and ends with a )?
Easy peasy.
var re1 = /^\(.*\)$/
// or
var re2 = new RegExp('^\\(.*\\)$');
Edit
Re: #Mike Samuel's comments
Does not match newlines between the parentheses which were explicitly matched by \s in the original.
...
Maybe you should use [\s\S] instead of .
...
If you're going to exclude newlines you should do so intentionally or explicitly.
Note that . matches any single character except the newline character. If you also want to match newlines as part of the "anything" between parentheses, use the [\s\S] character class:
var re3 = /^\([\s\S]*\)$/
// or
var re4 = new RegExp('^\\([\\s\\S]*\\)$');

To negate a match, you use the [^...] construct. Thus, to match anything within parentheses, you would use:
\([^)]+\)
which says "match any string that starts with an open parenthesis, contains any number of characters that are not closing parentheses and ends with a closing parenthesis.
To match entire lines that match the above construct, just wrap it with ^ and $:
^\([^)]+\)$

I'm not completely sure I understand what you're doing, but try this:
var re = /\/(\([^()]+\)(?=\/|$)/;
Matching the leading slash in addition to the opening paren ensures that the paren is indeed at the beginning. You can't do the same thing at the end because you don't know there will be a trailing slash. And if there is one, you don't want to consume it because it's also the leading slash for the next match attempt.
Instead, you use the lookahead - (?=\/|$) - to match the trailing slash without consuming it. If there is no slash, I assume no other character should be present either--hence the anchor: $.
#patorjk brought up a good point, though: can there be more parentheses between the outermost pair? If there are, the problem is much more complicated. I won't bother trying to expand my regex to deal with nested parens; some regex flavors can handle such things, but not JavaScript. Instead I'll recommend this sloppier regex:
\/(\([\s\S]+?\))(?=\/|$)
I say "sloppy" because it relies on the assumption that the sequences /( and )/ will never appear inside a valid match. As with my first regex, the text that you're interested in (i.e., everything but the leading and trailing slashes) will be captured in group #1.
Notice the non-greedy quantifier, too. With a regular greedy quantifier it will match everything from the first ( to the last ) in one shot. In other words, it'll match ([a-z])/([\d\w])/(*) instead of ([a-z]), ([\d\w]) and (*) as you wanted.

Related

The second pattern of a regex not replacing apostrophe

I'm creating a regex that matches straight apostrophes and replaces them with a curly ones. Sometimes an apostrophe goes in the middle of two characters. Other times goes at the end of a character/word (e.g. ellipsis').
So I have two regexes that handle both situations (separated by an or statement).
However, only the first case is being replaced, not the second. In other words, this:
"Wor'd word'".replace(/(?<=\w)\'(?=\w)|(?<=\w)\'(?=\s)/, '’')
Becomes this:
"Wor’d word'"
This confuses me because both types of apostrophes are matching: https://regexr.com/4td7p
Why is this, and how to fix it?
Update: I figured the problem was that there's no space after the last apostrophe, so I changed the second part of the regex to this: (?<=\w)\'(?!\w) (don't match if there's a character after the apostrophe). But I'm getting the same result.

If you want to match (?<=\w)\' followed by a character and also match (?<=\w)\' not followed by a character, why not just drop the logic after it altogether and just use (?<=\w)'? (no need to escape 's in a regex)
You also need the global flag to replace more than one thing at a time:
console.log(
"Wor'd word'".replace(/(?<=\w)'/g, '’')
);

updated
var str = "Wor'd word' that's a good thing'";
var afterReplace = str.replace(/'\b/g, '’')
console.log(afterReplace);

RegEx does not match my example

This RegEx could not find example string.
RegEx:
^ALTER\\sTABLE\\sADMIN_\\sADD CONSTRAINT \\s(.*)\\sPRIMARY KEY \\s(\(.*\))\\.([a-zA-Z0-9_]+)
Example:
ALTER TABLE ADMIN_ ADD CONSTRAINT PK_ADMIN_ PRIMARY KEY (RECNOADM);
I am new to regex and tried to complete my RegEx at REGEX101.COM but with no success. What am I missing?
Djorjde

^\s*ALTER\s+TABLE\s+ADMIN_\s+ADD\s+CONSTRAINT\s+(.+)\s+PRIMARY\s+KEY\s*\((.+)\)\s*;\s*$
This expression will match the SQL statement you used as an example, capturing PK_ADMIN_ in the first group and RECNOADM in the second.
My suggestion is to use always \s+ to match the spaces (\s* when they are optional, like the leading or trailing spaces), unless they have to be exactly a single space.
So let's break the regex down:
^ Marks the beginning of the line. You don't want the line to match if there's anything else before.
\s* Optional leading spaces.
ALTER\s+TABLE\s+ADMIN_\s+ADD\s+CONSTRAINT This will match ALTER TABLE ADMIN_ ADD CONSTRAINT, regardless of the spacing used.
\s+(.+)\s+ Then, the next space-bound word(s)** will be captured into the first group. You're accepting any character here! Maybe you could want to restrict that to \w+ or the like. Unless you accept an empty group, use the + closure (i.e., one or more), not the * one (i.e., zero or more)
PRIMARY\s+KEY Matches the sequence PRIMARY KEY, again, regardless of the spacing.
\s*\((.+)\) This will capture anything inside the parentheses as the PK in the second capture group.
\s* Means that it can be optionally preceded by an arbitrary number of spaces (although they are optional. They are in SQL if I recall correctly)
\(...\) You have to escape the parentheses because they are characters to match, no special characters of the regex.
(.+) Here you capture (between unescaped parentheses) everything between the (escaped) parentheses into a capture group. The second one in this case.
\s*;\s* The sentence has to end with a semicolon, optionally preceded and/or succeeded by any spaces.
$ Marks the end of the line.
In case you want to accept more than one sentence in the same line, you'd remove the ^ and $ zero-width delimiters.
About the escaping, the easiest way here is to simply double every backslash in the expression you built in the editor: ^\\s*ALTER\\s+TABLE\\s+ADMIN_\\s+ADD\\s+CONSTRAINT\\s+(.+)\\s+PRIMARY\\s+KEY\\s*\\((.+)\\)\\s*;\\s*$ However, there are context and/or languages where a more complex escaping may be needed (e.g., the Linux shell)
** Note that in 4, the inner expression .+ will take as many characters as possible, as long as the remaining parts also match the string. This is because the closures are by default greedy, meaning that the engine will try to match the longest string possible. That means that, for instance, this entry will match: ALTER TABLE ADMIN_ ADD CONSTRAINT PK_ADMIN_ OR *WHATEVER* YOU "WANT" TO PUT HERE! PRIMARY KEY (RECNOADM);, capturing PK_ADMIN_ OR *WHATEVER* YOU "WANT" TO PUT HERE! in the first group. Hence the importance of restricting the set of accepted characters ;)

Have you tried the following?
^ALTER\sTABLE\sADMIN_\sADD\sCONSTRAINT\s((.*))\sPRIMARY\sKEY\s\((.*)\);
I am wrapping two separate blocks through () in order to identify from the Regex the values inserted if you need to access them too.
In your regex there are few issues with white spaces (mixing up white spaces with \s and the white space should be \s not \s)

In JavaScript you only need to escape backslashes that are part of escape sequences when you're composing a regexp from a string, e.g.:
var r = new RegExp('\\d');
console.log(r.test('2'));
But the additional \ is not part of the regexp and you don't need it when using the literal syntax (or regexp101):
var r = /\d/;
console.log(r.test('2'));

how to negate a capture group?

Using a javascript regexp, I would like to find strings like "/foo" or "/foo d/" but not "/foo /"; ie, "annotation character", then either word with no terminating annotation, or multiple words, where the termination comes at the end of the phrase (with no space). Complicating the situation, there are three possible annotation symbols: /, \ and |.
I've tried something like:
/(?:^|\s)([\\\/|])((?:[\w_-]+(?![^\1]+[\w_-]\1))|(?:[\w\s]+[\w](?=\1)))/g
That is, start with space, then annotation, then
word not followed by (anything but annotation) then letter and annotation... or
possibly multiple words, immediately followed by annotation character.
The problem is the [^\1]: this doesn't read as "anything but the annotation character" in the angle brackets.
I could repeat the whole phrase three times, one for each annotation character. Any better ideas?

As you've mentioned, [^\1] doesn't work - it matches anything that is not the character 1. In JavaScript, you can negate \1 by using a lookahead: (?:(?!\1).)* . This is not as efficient, but it works.
Your pattern can be written as:
([\\\/|])([\w\-]+(?:(?:(?!\1).)*[\w\-]\1)?)
Working example at Regex101
\w already contains underscore.
Instead of alternation (a|ab) I'm using an optional group (a(?:b)?) - we always match the first word, with optional further words and tags.
You may still want to include (?:^|\s) at the beginning.

JS regular expression, basic lookahead

I cannot figure out, for the life of me, why this regular expression
^\.(?=a)$
does not match
".a"
anyone know why?
I am going off the information provided here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

The reason it doesn't work is because the lookahead doesn't actually consume any characters, so your matching position doesn't advance.
^\.(?=a)$
Matches the beginning of line (^ -- this matches) followed by a literal . (\. -- this also matches), and then (without consuming any characters), checks to see if the next character is a literal a ((?=a)). It is, so the lookahead matches. It then asserts that your position is at the end of the string ($). This is not the case, because we're still right after the ., so the match fails.
Another possible matching expression would be
^\.(?=a$)
Which works just as above, but the assertion about the end of the line is contained in the lookahead, so this time, it matches.

Your regex is only going to match a period that's followed by an 'a', without including 'a' in the match.
Another issue is that you're using $ after a character that's basically being ignored.
Remove the $ and it will work as described.
Bonus: I've enjoyed using this lately http://www.regexpal.com/

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.

Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Develop Reference

JavaScript is the programming language of the Web.

How can I shorten this regex for JavaScript? - javascript

Are you worried about nested parentheses? If not, you could set it up to match all characters that aren't a closing paren: \(([^)]*)\)

Related

The second pattern of a regex not replacing apostrophe

RegEx does not match my example

how to negate a capture group?

JS regular expression, basic lookahead

Regexp: excluding a word but including non-standard punctuation

Categories

Resources