Regex to match string inside square brackets, separated by colon - javascript

I'm trying to parse a string that always has the format: [firstpart:lastpart] in such a way that I can get "firstpart" and "lastpart" as separate items. The "firstpart" value is always a string, and the "lastpart" value could contain integers and text. The whole string [firstpart:lastpart] could be surrounded by any amount of other text that I don't need, hence the brackets.
I've been trying to modify this:
([^:\s]+):([^:\s]+)
As is, it gets me this:
[firstpart:lastpart
[firstpart
lastpart]
So it's just that I need to remove the open and close brackets from 2 and 3.
Is this possible with just a regex? I'm using JavaScript in a TinyMCE plugin, in case that is relevant.

Put \[ and \] at the beginning and end of the regular expression, respectively, and capture the text between them:
console.log(
'foo[firstpart:lastpart]bar'.match(/\[([^:]+):([^:\]]+)\]/)
);

You could match the opening and the closing bracket outside of the group:
\[([a-z]+):([a-z0-9]+)]
Note that [^:\s]+ Matches not a colon or a whitespace character which matches more than a string or a string or integers and escape the opening \[ to match it literally or else it would start a character class.
let str = "[firstpart:lastpart]";
console.log(str.match(/\[([a-z]+):([a-z0-9]+)]/i));

Related

Regex to replace all substrings in text

how can I remove all substring here :
"{SiteName}: Hello, {FirstName}. lallalalallala
{some other text}
{Hook some text (USD)}"
Which starts with {Hook. What I tried so far is:
mystring.repalceAll('{Hook [A-Z\-0-9\s]{1,}}', '')
Obviously the regex is not correct but can't really handle it.
If you're trying to capture the substring between the curly brackets, then you can use this:
{Hook.+}
Regex will literally match {Hook in the original string, and then an unlimited amount of any character (.+) until the next curly bracket (}).
Example

How to find char if there is not another one before

I've got a string
{'lalala'} text before \{'lalala'\} {'lalala'} text after
I want to get open bracket { but only if there is no escape char \ before.
Kind of /(?:[^\\])\{/ but it doesn't work at first statement.
The typical approach is to match the non-\ preceding character (or beginning of string), and then put it back in your replacement logic.
const input = String.raw`{'lalala'} text before \{'lalala'\} {'lalala'} text after`;
function replace(str) {
return input.replace(/(^|[^\\])\{'(\w+)'\}/g,
(_, chr, word) => chr + word.toUpperCase());
}
console.log(replace(input));
That's where ^ comes in: it anchors a piece of regex to the start of the string (or the line, in m multiline mode). Because a 'valid' opening bracket is either at the start of a string or after a non-\ character, we can use the following regex:
/(?:^|[^\\])\{/g
I added the g global flag because we want to match all 'valid' opening brackets. Example use:
console.log("{'lalala'} text before \\{'lalala'\\} {'lalala'} text after".match(/(?:^|[^\\])\{/g))
If you want to use the regex in a replace, you might want to capture the character before the bracket, as that gets replaced as well.

Select a character if some character from a list is before the character

I have this regular expression:
/([a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This regex selects č- from my text:
sme! a Želiezovce 2015: Spoloíč-
ne pre Európu. Oslávili aj 940.
But I want to select only - (without č) (if some character from the list [a-záäéěíýóôöúüůĺľŕřčšťžňď] is before the -).
In other languages you would use a lookbehind
/(?<=[a-záäéěíýóôöúüůĺľŕřčšťžňď])-$\s*/gmi
This matches -$\s* only if it's preceded by one of the characters in the list.
However, Javascript doesn't have lookbehind, so the workaround is to use a capturing group for the part of the regular expression after it.
var match = /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-$\s*)/gmi.match(string);
When you use this, match[1] will contain the part of the string beginning with the hyphen.
First, in regex everything you put in parenthesis will be broken down in the matching process, so that the matches array will contain the full matching string at it's 0 position, followed by all of the regex's parenthesis from left to right.
/[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$\s*/gmi
Would have returned the following matches for you string: ["č-", "-"] so you can extract the specific data you need from your match.
Also, the $ character indicates in regex the end of the line and you are using the multiline flag, so technically this part \s* is just being ignored as nothing can appear in a line after the end of it.
The correct regex should be /[a-záäéěíýóôöúüůĺľŕřčšťžňď](-)$/gmi

RegExp for BBCode tags javascript

I have this RegExp, and i dont know what's wrong with it
tag = new RegExp('(\\['+tag+'=("|'|)(.*?)\1\\])((?:.|\\r?\\n)*?)\\[/'+tag+']','g');
The bbcode tags can have double quotation marks, single quotation marks or no quotation marks.
[tag="teste"]123[/tag]
[tag='teste']123[/tag]
[tag=teste]123[/tag]
Desired output in captures: teste and 123
To match the optional quotation marks, it should be ("|'|), (["|\']*) or ("|\'?)?
Whats wrong with the string
First, let's correct the syntax in your string
You need to define the var tag
tag = 'tag';
result = new RegExp( <...> );
You have unballanced quotes in '("|'|) <...> ', that needs to be escaped as ("|\'|)
Also, escape \1 as \\1
so now we have the expression '(\\['+tag+'=("|\'|)(.*?)\\1\\])((?:.|\\r?\\n)*?)\\[/'+tag+']' with the value:
(\[tag=("|'|)(.*?)\1\])((?:.|\r?\n)*?)\[/tag]
What's wrong with the RegEx
Only one thing really, in ("|\'|)(.*?)\\1 you're using \1 to match the same quotation mark as the one used as opening. However, the 1 refers to the first capturing group (the first parenthesis from left to right), but ("|'|) is actually the second set of parenthesis, the second group. All you need to do is change it to \2.
(\[tag=("|'|)(.*?)\2\])((?:.|\r?\n)*?)\[/tag]
That's it!
Let's add some final suggestions
Instead of .*? I would use [^\]]+ (any characters except "]")
Use the i modifier (case-insensitive match, for "[tag]...[/TaG]")
("|'|) is the same as ("|'?)
Instead of (?:.|\r?\n)*? I would use [\s\S]*? as #nhahtdh suggested
Code:
tag = 'tag';
result = new RegExp('(\\['+tag+'=("|\'?)([^\\]]+)\\2\\])([\\s\\S]*?)\\[/'+tag+']','gi');
Alternative: [EDIT: from info added in comments]
result = new RegExp('\\['+tag+'(?:=("|\'?)([^\\]]+)\\1)?\\]([\\s\\S]*?)\\[/'+tag+']', 'gi');
As for your second question: Although both (["|\']*) and ("|\'?) will match, the latter is the correct way for what you're trying to match. The * looks for 0 to infinite repetitions, and the | is interpreted as literal in a character class. Instead, ("|\'?) matches a single quote, a double quote, or none.

How can I remove all content in brackets except entirely numerical content?

I want to take a string and remove all occurrences of characters within square brackets:
[foo], [foo123bar], and [123bar] should be removed
But I want to keep intact any brackets consisting of only numbers:
[1] and [123] should remain
I've tried a couple of things, to no avail:
text = text.replace(/\[^[0-9+]\]/gi, "");
text = text.replace(/\[^[\d]\]/gi, "");
The tool you're looking for is negative lookahead. Here's how you would use it:
text = text.replace(/\[(?!\d+\])[^\[\]]+\]/g, "");
After \[ locates an opening bracket, the lookahead, (?!\d+\]) asserts that the brackets do not contain only digits.
Then, [^\[\]]+ matches anything that's not square brackets, ensuring (for example) that you don't accidentally match "nested" brackets, like [[123]].
Finally, \] matches the closing bracket.
You probably need this:
text = text.replace(/\[[^\]]*[^0-9\]][^\]]*\]/gi, "");
Explanation: you want to keep those sequences within brackets that contain only numbers. An alternative way to say this is to delete those sequences that are 1) enclosed within brackets, 2) contain no closing bracket and 3) contain at least one non-numeric character. The above regex matches an opening bracket (\[), followed by an arbitrary sequence of characters except the closing bracket ([^\]], note that the closing bracket had to be escaped), then a non-numeric character (also excluding the closing bracket), then an arbitrary sequence of characters except the closing bracket, then the closing bracket.
In python:
import re
text = '[foo] [foo123bar] [123bar] [foo123] [1] [123]'
print re.sub('(\[.*[^0-9]+\])|(\[[^0-9][^\]]*\])', '', text)

Categories

Resources