I want to take a string and remove all occurrences of characters within square brackets:
[foo], [foo123bar], and [123bar] should be removed
But I want to keep intact any brackets consisting of only numbers:
[1] and [123] should remain
I've tried a couple of things, to no avail:
text = text.replace(/\[^[0-9+]\]/gi, "");
text = text.replace(/\[^[\d]\]/gi, "");
The tool you're looking for is negative lookahead. Here's how you would use it:
text = text.replace(/\[(?!\d+\])[^\[\]]+\]/g, "");
After \[ locates an opening bracket, the lookahead, (?!\d+\]) asserts that the brackets do not contain only digits.
Then, [^\[\]]+ matches anything that's not square brackets, ensuring (for example) that you don't accidentally match "nested" brackets, like [[123]].
Finally, \] matches the closing bracket.
You probably need this:
text = text.replace(/\[[^\]]*[^0-9\]][^\]]*\]/gi, "");
Explanation: you want to keep those sequences within brackets that contain only numbers. An alternative way to say this is to delete those sequences that are 1) enclosed within brackets, 2) contain no closing bracket and 3) contain at least one non-numeric character. The above regex matches an opening bracket (\[), followed by an arbitrary sequence of characters except the closing bracket ([^\]], note that the closing bracket had to be escaped), then a non-numeric character (also excluding the closing bracket), then an arbitrary sequence of characters except the closing bracket, then the closing bracket.
In python:
import re
text = '[foo] [foo123bar] [123bar] [foo123] [1] [123]'
print re.sub('(\[.*[^0-9]+\])|(\[[^0-9][^\]]*\])', '', text)
Related
I'm trying to parse a string that always has the format: [firstpart:lastpart] in such a way that I can get "firstpart" and "lastpart" as separate items. The "firstpart" value is always a string, and the "lastpart" value could contain integers and text. The whole string [firstpart:lastpart] could be surrounded by any amount of other text that I don't need, hence the brackets.
I've been trying to modify this:
([^:\s]+):([^:\s]+)
As is, it gets me this:
[firstpart:lastpart
[firstpart
lastpart]
So it's just that I need to remove the open and close brackets from 2 and 3.
Is this possible with just a regex? I'm using JavaScript in a TinyMCE plugin, in case that is relevant.
Put \[ and \] at the beginning and end of the regular expression, respectively, and capture the text between them:
console.log(
'foo[firstpart:lastpart]bar'.match(/\[([^:]+):([^:\]]+)\]/)
);
You could match the opening and the closing bracket outside of the group:
\[([a-z]+):([a-z0-9]+)]
Note that [^:\s]+ Matches not a colon or a whitespace character which matches more than a string or a string or integers and escape the opening \[ to match it literally or else it would start a character class.
let str = "[firstpart:lastpart]";
console.log(str.match(/\[([a-z]+):([a-z0-9]+)]/i));
I am looking to ignore the characters inside the square brackets because it matches with my split parameters.
The string that i want to split is
var str = "K1.1.[Other] + K1.2A.[tcc + K*-=>]";
var split = str.split(/[+|,|*|/||>|<|=|-]+/);
I want the output as K1.1.[Other], K1.2A.[tcc + K*-=>].
But this above code is including the characters inside square brackets which i don't want to consider. Any suggestion on how to solve this?
Split on the following pattern: /\+(?![^\[]*\])/
https://regex101.com/r/NZKaKD/1
Explanation:
\+ - A literal plus sign
(?! ... ) - Negative lookahead (don't match the previous character/group if it is followed by the contents of this block)
[^\[]* - Any number of non-left-square-brackets
\] - A literal right square bracket
split by both plus and braces as well. then go through chunks and join everything between braces pairs.
But better not to use regexp at all for that.
I am trying to split a string in JS on spaces except when the space is in a quote. However, an incomplete quote should be maintained. I'm not skilled in regex wizardry, and have been using the below regex:
var list = text.match(/[^\s"]+|"([^"]*)"/g)
However, if I provide input like sdfj "sdfjjk this will become ["sdfj","sdfjjk"] rather than ["sdfj",""sdfjjk"].
You can use
var re = /"([^"]*)"|\S+/g;
By using \S (=[^\s]) we just drop the " from the negated character class.
By placing the "([^"]*)" pattern before \S+, we make sure substrings in quotes are not torn if they come before. This should work if the string contains well-paired quoted substrings and the last is unpaired.
Demo:
var re = /"([^"]*)"|\S+/g;
var str = 'sdfj "sdfjjk';
document.body.innerHTML = JSON.stringify(str.match(re));
Note that to get the captured texts in-between quotes, you will need to use RegExp#exec in a loop (as String#match "drops" submatches).
UPDATE
No idea what downvoter thought when downvoting, but let me guess. The quotes are usually used around word characters. If there is a "wild" quote, it is still a quote right before/after a word.
So, we can utilize word boundaries like this:
"\b[^"]*\b"|\S+
See regex demo.
Here, "\b[^"]*\b" matches a " that is followed by a word character, then matches zero or more characters other than " and then is followed with a " that is preceded with a word character.
Moving further in this direction, we can make it as far as:
\B"\b[^"\n]*\b"\B|\S+
With \B" we require that " should be preceded with a non-word character, and "\B should be followed with a non-word character.
See another regex demo
A lot depends on what specific issue you have with your specific input!
Try the following:
text.match(/".*?"|[^\s]+/g).map(s => s.replace(/^"(.*)"$/, "$1"))
This repeatedly finds either properly quoted substrings (first), OR other sequences of non-whitespace. The map part is to remove the quotes around the quoted substrings.
> text = 'abc "def ghi" lmn "opq'
< ["abc", "def ghi", "lmn", ""opq"]
Whats wrong with this regular expression?
/^[a-zA-Z\d\s&#-\('"]{1,7}$/;
when I enter the following valid input, it fails:
a&'-#"2
Also check for 2 consecutive spaces within the input.
The dash needs to be either escaped (\-) or placed at the end of the character class, or it will signify a range (as in A-Z), not a literal dash:
/^[A-Z\d\s&#('"-]{1,7}$/i
would be a better regex.
N. B: [#-\(] would have matched #, $, %, &, ' or (.
To address the added requirement of not allowing two consecutive spaces, use a lookahead assertion:
/^(?!.*\s{2})[A-Z\d\s&#('"-]{1,7}$/i
(?!.*\s{2}) means "Assert that it's impossible to match (from the current position) any string followed by two whitespace characters". One caveat: The dot doesn't match newline characters.
The - (hyphen) has a special meaning inside a character class, used for specifying ranges. Did you mean to escape it?:
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/;
This RegExp matches your input.
You have an unescaped - in the middle of your character class. This means that you're actually searching for all characters between and including # and ( (which are #, $, %, &, ', and (). Either move it to the end or escape it with a backslash. Your regex should read:
/^[a-zA-Z\d\s&#\('"-]{1,7}$/
or
/^[a-zA-Z\d\s&#\-\('"]{1,7}$/
remove the ; at the end and
^[a-zA-Z\d\s\&\#\-\(\'\"]+$
Your input does not match the regular expression. The problem here is the hyphen in you regexp. If you move it from its position after the '#' character to the start of the regex, like so:
/^[-a-zA-Z\d\s&#\('"]{1,7}$/;
everything is fine and dandy.
You can always use Rubular for checking your regular expressions. I use it on a regular (no pun intended) basis.
I have this kind of expression:
var string = [a][1] [b][2] [c][3] [d .-][] [e][4]
I woud like to match the fourth element [d .-][] which may contain any character (letters, numbers, punctuation, etc) within the first pair of bracket but the second pair of bracket remains empty. Other elements, for example, [a][1], may contain any character but they do have a number inside the second pair of brackets.
I tried this:
string.match(/\\[[^]+]\\[ ]/);
but it is too greedy.
Any help would be appreciated.
I woud like to match the fourth element [d .-][] which may contain any character (letters, numbers, punctuation, etc) within the first pair of bracket but the second pair of bracket remains empty
string.match(/\[[^\]]*\]\[\]/)
should do it.
To break it down,
\[ matches a literal left square bracket,
[^\]]* matches any number of characters other than a right square bracket,
\] matches a literal right square bracket, and
\[\] matches the two character sequence [], square brackets with nothing in between.
To answer your question about greediness though, you can make the greedy match [^]+ non-greedy by adding a question-mark: [^]+?. You should know though that [^] does not work in IE. To match any UTF-16 code-unit I tend to use [\s\S] which is a bit more verbose but works on all browsers.