Regex finding second string - javascript

I'm attempting to get the last word in the following strings.
After about 45 minutes I can't seem to find the right combination of slashes, dashes and brackets.
The closest I've got is
/(?![survey])[a-z]+/gi
It matches the following strings, except for "required" it is returning the match "quired" I'm assuming it's because the re are in the word survey.
survey[1][title]
survey[1][required]
survey[2][anotherString]

You're using a character set, which will exclude any of the characters from being the first character in the match, which isn't what you want. Using plain negative lookahead would be a start:
(?!survey)[a-z]+
But you also want to match the final word, which can be done by matching word characters that are followed with \]$ - that is, by a ] and the end of the string:
[a-z]+(?=\]$)
https://regex101.com/r/rLvsY5/1
If you want to be more efficient, match the whole string, but capture what comes between the square brackets in a capturing group - the last repeated captured group will be in the result:
survey(?:\[(\w+)\])+
https://regex101.com/r/rLvsY5/2

One way to solve this is to match the full line and only capture the part you need.
survey\[\d+\]\[([a-z]+)\]

Related

Select a character if some character from a list is before the character

I have this regular expression:
/([a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This regex selects č- from my text:
sme! a Želiezovce 2015: Spoloíč-
ne pre Európu. Oslávili aj 940.
But I want to select only - (without č) (if some character from the list [a-záäéěíýóôöúüůĺľŕřčšťžňď] is before the -).
In other languages you would use a lookbehind
/(?<=[a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This matches -$\s* only if it's preceded by one of the characters in the list.
However, Javascript doesn't have lookbehind, so the workaround is to use a capturing group for the part of the regular expression after it.
var match = /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-$\s*)/gmi.match(string);
When you use this, match[1] will contain the part of the string beginning with the hyphen.
First, in regex everything you put in parenthesis will be broken down in the matching process, so that the matches array will contain the full matching string at it's 0 position, followed by all of the regex's parenthesis from left to right.
/[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$\s*/gmi
Would have returned the following matches for you string: ["č-", "-"] so you can extract the specific data you need from your match.
Also, the $ character indicates in regex the end of the line and you are using the multiline flag, so technically this part \s* is just being ignored as nothing can appear in a line after the end of it.
The correct regex should be /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$/gmi

Exclude characters from [ ] group regex, while still looking for characters

I want to match all the words that do not contain letter l. I tried this:
[a-z^k]+
But apparently ^ only works right behind [. If it was just letter l, I guess this would do:
[a-km-z]+
Of course apart from the fact that it just treats l-words as two words:
But this is not the real concern, the question remains just as in title:
Q: How do I search for list of characters, but exclude another list of characters?
You need to use \b word boundary to make sure that the match doesn't start and end within words.
\b[a-km-z]+\b
Alternatively, you can create an exclusion list using lookahead.
\b(?:(?![l])[a-z])+\b
Demo on regex101

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Match backwards from a given word with javascript

Using Javascript, I need to find an occurrence of a phrase in some text then match everything from it back to the last occurrence of a 5 digit number. (or at least thats the best way I know how to describe what I need)
Consider the following text:
24854
Random words
Ending Words
34975
Random words
Ending Words
47593
Random words
Ending Words
Target Word
32302
Random words
Ending Words
Given the above, I'd like my regex to match Every thing from 47593 to Target Word.
Each match should include both 47593 and Target Word
It needs to be greedy in that there will be multiple matches in my actual text and I need them all returned in an array.
This is what I've tried: .match(/[0-9]{5}[\s\S]+?Target Word/g)
My problem (as always with these) is the new lines. In order to match across multiple lines, I'm using [\s\S] but doing so makes the regex match everything from the first 5 digit number to the first occurrence of Target Word
How can I change this to achieve the desired result? I'm thinking I need to use lookbehind but most examples I've found have been very confusing for me.
You could use negative lookahead,
[0-9]{5}(?:(?![0-9]{5})[\S\s])*?Target\s*Word
DEMO
The above negative lookahead (?:(?![0-9]{5})[\S\s])* asserts that after the 5 digit number, match any space or non-space character zero or more times but it must not be a 5 digit number.
if there are no 5 digit pattern in the random words, you may perhaps use
/([\d]{5}(?:[^\d]{5})+?Target Word)/gm
demo here

Regular expression match 0 or exact number of characters

I want to match an input string in JavaScript with 0 or 2 consecutive dashes, not 1, i.e. not range.
If the string is:
-g:"apple" AND --projectName:"grape": it should match --projectName:"grape".
-g:"apple" AND projectName:"grape": it should match projectName:"grape".
-g:"apple" AND -projectName:"grape": it should not match, i.e. return null.
--projectName:"grape": it should match --projectName:"grape".
projectName:"grape": it should match projectName:"grape".
-projectName:"grape": it should not match, i.e. return null.
To simplify this question considering this example, the RE should match the preceding 0 or 2 dashes and whatever comes next. I will figure out the rest. The question still comes down to matching 0 or 2 dashes.
Using -{0,2} matches 0, 1, 2 dashes.
Using -{2,} matches 2 or more dashes.
Using -{2} matches only 2 dashes.
How to match 0 or 2 occurrences?
Answer
If you split your "word-like" patterns on spaces, you can use this regex and your wanted value will be in the first capturing group:
(?:^|\s)((?:--)?[^\s-]+)
\s is any whitespace character (tab, whitespace, newline...)
[^\s-] is anything except a whitespace-like character or a -
Once again the problem is anchoring the regex so that the relevant part isn't completely optionnal: here the anchor ^ or a mandatory whitespace \s plays this role.
What we want to do
Basically you want to check if your expression (two dashes) is there or not, so you can use the ? operator:
(?:--)?
"Either two or none", (?:...) is a non capturing group.
Avoiding confusion
You want to match "zero or two dashes", so if this is your entire regex it will always find a match: in an empty string, in --, in -, in foobar... What will be match in these string will be an empty string, but the regex will return a match.
This is a common source of misunderstanding, so bear in mind the rule that if everything in your regex is optional, it will always find a match.
If you want to only return a match if your entire string is made of zero or two dashes, you need to anchor the regex:
^(?:--)?$
^$ match respectively the beginning and end of the string.
a(-{2})?(?!-)
This is using "a" as an example. This will match a followed by an optional 2 dashes.
Edit:
According to your example, this should work
(?<!-)(-{2})?projectName:"[a-zA-Z]*"
Edit 2:
I think Javascript has problems with lookbehinds.
Try this:
[^-](-{2})?projectName:"[a-zA-Z]*"
Debuggex Demo

Categories

Resources