Exclude characters from [ ] group regex, while still looking for characters - javascript

I want to match all the words that do not contain letter l. I tried this:
[a-z^k]+
But apparently ^ only works right behind [. If it was just letter l, I guess this would do:
[a-km-z]+
Of course apart from the fact that it just treats l-words as two words:
But this is not the real concern, the question remains just as in title:
Q: How do I search for list of characters, but exclude another list of characters?

You need to use \b word boundary to make sure that the match doesn't start and end within words.
\b[a-km-z]+\b
Alternatively, you can create an exclusion list using lookahead.
\b(?:(?![l])[a-z])+\b
Demo on regex101

Related

RegExp Match all text parts except given words

I have a text and I need to match all text parts except given words with regexp
For example if text is ' Something went wrong and I could not do anything ' and given words are 'and' and 'not' then the result must be ['Something went wrong', 'I could', 'do anything']
Please don't advise me to use string.split() or string.replace() and etc. I know a several ways how I can do this with build-in methods. I'm wonder if there a regex which can do this, when I will execute text.math(/regexp/g)
Please note that the regular expression must work at least in Chrome, Firefox and Safari versions not lower than the current one by 3! At the moment of asking this question the actual versions are 100.0, 98.0.2 and 15.3 respectively. For example you can not use lookbehind feature in Safari
Please, before answering my question, go to https://regexr.com/ and check your answer!. Your regular expression should highlight all parts of a sentence, including spaces between words of need parts and except empty spaces around need parts, except for the given words
Before asking this question I tried to do my own search but this links didn't help me. I also tried non accepted answers:
Match everything except for specified strings
Regex: match everything but a specific pattern
Regex to match all words except a given list
Regex to match all words except a given list (2)
Need to find a regular expression for any word except word1 or word2
Matching all words except one
Javascript match eveything except given words
It's possible with only using match and lookaheads in javascript.
/\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi
Test on RegExr here
Basically match the start of a word that's not a restricted word
\b(?=\w)(?!(?:and|not)\b)
Then a lazy match till the next whitespaces and restricted word, or the end of the line without including last whitespaces.
.*?(?=\s+(?:and|not)\b|\s*$)
Test Snippet :
const re = /\b(?=\w)(?!(?:and|not)\b).*?(?=\s+(?:and|not)\b|\s*$)/gi
let str = ` Something went wrong and I could not do anything `;
let arr = str.match(re);
console.log(arr);
See Edit further down.
You can use this regex, which only use look ahead:
/(?!and|not)\b.*?(?=and|not|$)/g
Explanation:
(?!and|not) - negative look ahead for and or not
\b - match word boundary, to prevent matching nd and ot
.*? - match any char zero or more times, as few as possible
(?=and|not|$) - look ahead for and or not or end of text
If your text has multiple lines you can add the m flag (multiline). Alternatively you can replace dot (.) with [\s\S].
Edit:
I have changed it a little so spaces around the forbidden words are removed:
/(?!and|not)\b\w.*?(?= and| not|$)/g
I have added a \w character match to push the start of the match after the space and added spaces in the look ahead.
Edit2: (to handle multiple spaces around words):
You were very close! All you need is a \s* before the dollar sign and specified words:
/(?!and|not|\s)\b.*?(?=\s*(and|not|$))/g
Updated link: regexr.com

Regex finding second string

I'm attempting to get the last word in the following strings.
After about 45 minutes I can't seem to find the right combination of slashes, dashes and brackets.
The closest I've got is
/(?![survey])[a-z]+/gi
It matches the following strings, except for "required" it is returning the match "quired" I'm assuming it's because the re are in the word survey.
survey[1][title]
survey[1][required]
survey[2][anotherString]
You're using a character set, which will exclude any of the characters from being the first character in the match, which isn't what you want. Using plain negative lookahead would be a start:
(?!survey)[a-z]+
But you also want to match the final word, which can be done by matching word characters that are followed with \]$ - that is, by a ] and the end of the string:
[a-z]+(?=\]$)
https://regex101.com/r/rLvsY5/1
If you want to be more efficient, match the whole string, but capture what comes between the square brackets in a capturing group - the last repeated captured group will be in the result:
survey(?:\[(\w+)\])+
https://regex101.com/r/rLvsY5/2
One way to solve this is to match the full line and only capture the part you need.
survey\[\d+\]\[([a-z]+)\]

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Javascript regex: how to not capture an optional string on the right side

For example /(www\.)?(.+)(\.com)?/.exec("www.something.com") will result with 'something.com' at index 1 of the resulting array. But what if we want to capture only 'something' in a capturing group?
Clarifications:
The above string is just for example - we dont want to assume anything about the suffix string (.com above). It could as well be orange.
Just this part can be solved in C# by matching from right to left (I dont know of a way of doing that in JS though) but that will end up having www. included then!
Sure, this problem as such is easily solvable mixing regex with other string methods like replace / substring. But is there a solution with only regex?
(?:www\.)?(.+?)(?:\.com|$)
This will give only something ingroups.Just make other groups non capturing.See demo.
https://regex101.com/r/rO0yD8/4
Just removing the last character (?) from the regex does the trick:
https://regex101.com/r/uR0iD2/1
The last ? allows a valid output without the (\.com) matching anything, so the (.+) can match all the characters after the www..
Another option is to replace the greedy quantifier +, which always tries to match as much characters as possible, with the +?, which tries to match as less characters as possible:
(www\.)?(.+?)(\.com)?$
https://regex101.com/r/oY7fE0/2
Note that it is necessary to force a match with the entire string through the end of line anchor ($).
If you only want to capture "something", use non-capturing groups for the other sections:
/(?:www\.)?(.+)(?:\.com)?/.exec("www.something.com")
The ?: denotes the groups as non-capturing.

simple regex to matching multiple word with spaces/multiple space or no spaces

I am trying to match all words with single or multiple spaces. my expression
(\w+\s*)* is not working
edit 1:
Let say i have a sentence in this form
[[do "hi i am bob"]]
[[do "hi i am Bob"]]
now I have to replace this with
cool("hi i am bob") or
cool("hi i am Bob")
I do not care about replacing multiple spaces with single .
I can achieve this for a single word like
\[\[do\"(\w+)\"\]\] and replacing regex cool\(\"$1\") but this does not look like an effective solution and does not match multiple words ....
I apologies for incomplete question
any help will be aprecciated
Find this Regular Expression:
/\[\[do\s+("[\w\s]+")\s*\]\]/
And do the following replacement:
'cool($1)'
The only special thing that's being done here is using character classes to our advantage with
[\w\s]+
Matches one or more word or space characters (a-z, A-Z, 0-9, _, and whitespace). That';; eat up your internal stuff no problem.
'[[do "hi i am Bob"]]'.replace(/\[\[do\s+("[\w\s]+")\s*\]\]/, 'cool($1)')
Spits out
cool("hi i am Bob")
Though - if you want to add punctuation (which you probably will), you should do it like this:
/\[\[do\s+("[^"]+")\s*\]\]/
Which will match any character that's not a double quote, preserving your substring. There are more complicated ones to allow you to deal with escaped quotation marks, but I think that's outside the scope of this question.
To match "all words with single or multiple spaces", you cannot use \s*, as it will match even no spaces.
On the other hand, it looks like you want to match even "hi", which is one word with no spaces.
You probably want to match one or more words separated by spaces. If so, use regex pattern
(\w+(?:$|\s+))+
or
\w+(\s+\w+)*
I'm not sure, but maybe this is what you're trying to get:
"Hi I am bob".match(/\b\w+\b/g); // ["Hi", "I", "am", "bob"]
Use regex pattern \w+(\s+\w+)* as follows:
m = s.match(/\w+(\s+\w+)*/g);
Simple. Match all groups of characters that are not white spaces
var str = "Hi I am Bob";
var matches = str.match(/[^ ]+/g); // => ["Hi", "I", "am", "Bob"]
What your regex is doing is:
/([a-zA-Z0-9_]{1,}[ \r\v\n\t\f]{0,}){0,}/
That is, find the first match of one or more of A through Z bother lower and upper along with digits and underscore, then followed by zero or more space characters which are:
A space character
A carriage return character
A vertical tab character
A new line character
A tab character
A form feed character
Then followed by zero or more of A through Z bother lower and upper along with digits and underscore.
\s matches more than just simple spaces, you can put in a literal space, and it will work.
I believe you want:
/(\w+ +\w+)/g
Which all matches of one or more of A through Z bother lower and upper along with digits and underscore, followed by one or more spaces, then followed by one or more of A through Z bother lower and upper along with digits and underscore.
This will match all word-characters separated by spaces.
If you just want to find all clusters of word characters, without punctuation or spaces, then, you would use:
/(\w+)/g
Which will find all word-characters that are grouped together.
var regex=/\w+\s+/g;
Live demo: http://jsfiddle.net/GngWn/
[Update] I was just answering the question, but based on the comments this is more likely what you're looking for:
var regex=/\b\w+\b/g;
\b are word boundaries.
Demo: http://jsfiddle.net/GngWn/2/
[Update2] Your edit makes it a completely different question:
string.replace(/\[\[do "([\s\S]+)"\]\]/,'cool("$1")');
Demo: http://jsfiddle.net/GngWn/3/

Categories

Resources