Suppose I have a text like this:
This is the title: \ titles always have a colon
This is a regular sentence.
A sentence always ends with a period.
A sentence can
span multiple lines.
A sentence can contain numbers like 123.
The phrase can also contain "text enclosed in double quotes"
or 'text enclosed in single quotes'.
Other symbols that may appear in sentences are
the comma ,
the semicolon ;
the dollar sign $
parentheses ( )
the plus sign +
the minus sign -
and the square brackets[ ].
This is an isolated phrase that the regular expression should not match.
How could I create a regex in javascript to match the text between the first colon* and the last full stop after brackets[ ]? (assuming there will be no other colon, if there are any, they will be enclosed in double quotes)
I've tried using :.* but it maches all lines.
Using the s flag, the following regex captures everything after the colon till the double linebreak.
/(?<=:).*?(?=(?:\r?\n)+[^\n]*$)/sg
Test on regex101 here
I'm trying to parse a string that always has the format: [firstpart:lastpart] in such a way that I can get "firstpart" and "lastpart" as separate items. The "firstpart" value is always a string, and the "lastpart" value could contain integers and text. The whole string [firstpart:lastpart] could be surrounded by any amount of other text that I don't need, hence the brackets.
I've been trying to modify this:
([^:\s]+):([^:\s]+)
As is, it gets me this:
[firstpart:lastpart
[firstpart
lastpart]
So it's just that I need to remove the open and close brackets from 2 and 3.
Is this possible with just a regex? I'm using JavaScript in a TinyMCE plugin, in case that is relevant.
Put \[ and \] at the beginning and end of the regular expression, respectively, and capture the text between them:
console.log(
'foo[firstpart:lastpart]bar'.match(/\[([^:]+):([^:\]]+)\]/)
);
You could match the opening and the closing bracket outside of the group:
\[([a-z]+):([a-z0-9]+)]
Note that [^:\s]+ Matches not a colon or a whitespace character which matches more than a string or a string or integers and escape the opening \[ to match it literally or else it would start a character class.
let str = "[firstpart:lastpart]";
console.log(str.match(/\[([a-z]+):([a-z0-9]+)]/i));
I have this RegExp, and i dont know what's wrong with it
tag = new RegExp('(\\['+tag+'=("|'|)(.*?)\1\\])((?:.|\\r?\\n)*?)\\[/'+tag+']','g');
The bbcode tags can have double quotation marks, single quotation marks or no quotation marks.
[tag="teste"]123[/tag]
[tag='teste']123[/tag]
[tag=teste]123[/tag]
Desired output in captures: teste and 123
To match the optional quotation marks, it should be ("|'|), (["|\']*) or ("|\'?)?
Whats wrong with the string
First, let's correct the syntax in your string
You need to define the var tag
tag = 'tag';
result = new RegExp( <...> );
You have unballanced quotes in '("|'|) <...> ', that needs to be escaped as ("|\'|)
Also, escape \1 as \\1
so now we have the expression '(\\['+tag+'=("|\'|)(.*?)\\1\\])((?:.|\\r?\\n)*?)\\[/'+tag+']' with the value:
(\[tag=("|'|)(.*?)\1\])((?:.|\r?\n)*?)\[/tag]
What's wrong with the RegEx
Only one thing really, in ("|\'|)(.*?)\\1 you're using \1 to match the same quotation mark as the one used as opening. However, the 1 refers to the first capturing group (the first parenthesis from left to right), but ("|'|) is actually the second set of parenthesis, the second group. All you need to do is change it to \2.
(\[tag=("|'|)(.*?)\2\])((?:.|\r?\n)*?)\[/tag]
That's it!
Let's add some final suggestions
Instead of .*? I would use [^\]]+ (any characters except "]")
Use the i modifier (case-insensitive match, for "[tag]...[/TaG]")
("|'|) is the same as ("|'?)
Instead of (?:.|\r?\n)*? I would use [\s\S]*? as #nhahtdh suggested
Code:
tag = 'tag';
result = new RegExp('(\\['+tag+'=("|\'?)([^\\]]+)\\2\\])([\\s\\S]*?)\\[/'+tag+']','gi');
Alternative: [EDIT: from info added in comments]
result = new RegExp('\\['+tag+'(?:=("|\'?)([^\\]]+)\\1)?\\]([\\s\\S]*?)\\[/'+tag+']', 'gi');
As for your second question: Although both (["|\']*) and ("|\'?) will match, the latter is the correct way for what you're trying to match. The * looks for 0 to infinite repetitions, and the | is interpreted as literal in a character class. Instead, ("|\'?) matches a single quote, a double quote, or none.
Given an input text such where all spaces are replaced by n _ :
Hello_world_?. Hello_other_sentenc3___. World___________.
I want to keep the _ between words, but I want to stick each punctuation back to the last word of a sentence without any space between last word and punctuation. I want to use the the punctuation as pivot of my regex.
I wrote the following JS-Regex:
str = str.replace(/(_| )*([:punct:])*( |_)/g, "$2$3");
This fails, since it returns :
Hello_world_?. Hello_other_sentenc3_. World_._
Why it doesn't works ? How to delete all "_" between the last word and the punctuation ?
http://jsfiddle.net/9c4z5/
Try the following regex, which makes use of a positive lookahead:
str = str.replace(/_+(?=\.)/g, "");
It replaces all underscores which are immediately followed by a punctuation character with the empty string, thus removing them.
If you want to match other punctuation characters than just the period, replace the \. part with an appropriate character class.
JavaScript doesn't have :punct: in its regex implementation. I believe you'd have to list out the punctuation characters you care about, perhaps something like this:
str = str.replace(/(_| )+([.,?])/g, "$2");
That is, replace any group of _ or space that is immediately followed by punctation with just the punctuation.
Demo: http://jsfiddle.net/9c4z5/2/
I'm trying this:
str = "bla [bla]";
str = str.replace(/\\[\\]/g,"");
console.log(str);
And the replace doesn't work, what am I doing wrong?
UPDATE: I'm trying to remove any square brackets in the string,
what's weird is that if I do
replace(/\[/g, '')
replace(/\]/g, '')
it works, but
replace(/\[\]/g, ''); doesn't.
It should be:
str = str.replace(/\[.*?\]/g,"");
You don't need double backslashes (\) because it's not a string but a regex statement, if you build the regex from a string you do need the double backslashes ;).
It was also literally interpreting the 1 (which wasn't matching). Using .* says any value between the square brackets.
The new RegExp string build version would be:
str=str.replace(new RegExp("\\[.*?\\]","g"),"");
UPDATE: To remove square brackets only:
str = str.replace(/\[(.*?)\]/g,"$1");
Your above code isn't working, because it's trying to match "[]" (sequentially without anything allowed between). We can get around this by non-greedy group-matching ((.*?)) what's between the square brackets, and using a backreference ($1) for the replacement.
UPDATE 2: To remove multiple square brackets
str = str.replace(/\[+(.*?)\]+/g,"$1");
// bla [bla] [[blaa]] -> bla bla blaa
// bla [bla] [[[blaa] -> bla bla blaa
Note this doesn't match open/close quantities, simply removes all sequential opens and closes. Also if the sequential brackets have separators (spaces etc) it won't match.
You have to escape the bracket, like \[ and \]. Check out http://regexpal.com/. It's pretty useful :)
To replace all brackets in a string, this should do the job:
str.replace(/\[|\]/g,'');
I hope this helps.
Hristo
Here's a trivial example but worked for me. You have to escape each sq bracket, then enclose those brackets within a bracket expression to capture all instances.
const stringWithBrackets = '[]]][[]]testing][[]][';
const stringWithReplacedBrackets = stringWithBrackets.replace(/[\[\]]/g, '');
console.log(stringWithReplacedBrackets);
Two backslashes produces a single backslash, so you're searching for "a backslash, followed by a character class consisting of a 1 or a right bracket, and then you're missing an closing bracket.
Try
str.replace(/\[1\]/g, '');
What exactly are you trying to match?
If you don't escape the brackets, they are considered character classes. This:
/[1\\]/
Matches either a 1 or a backslash. You may want to escape them with one backslash only:
/\[1\]/
But this won't match either, as you don't have a [1] in your string.
I stumbled on this question while dealing with square bracket escaping within a character class that was designed for use with password validation requiring the presence of special characters.
Note the double escaping:
var regex = new RegExp('[\\]]');
As #rudu mentions, this expression is within a string so it must be double escaped. Note that the quoting type (single/double) is not relevant here.
Here is an example of using square brackets in a character class that tests for all the special characters found on my keyboard:
var regex = new RegExp('[-,_,\',",;,:,!,#,#,$,%,^,&,*,(,),[,\\],\?,{,},|,+,=,<,>,~,`,\\\\,\,,\/,.]', 'g')
How about the following?
str = "bla [bla]";
str.replace(/[[\\]]/g,'');
You create a character set with just the two characters you are interested in and do a global replace.
Nobody quite made it simple and correct:
str.replace(/[[\]]/g, '');
Note the use of a character class, with no escape for the open bracket, and a single backslash escape for the close bracket.