RegEx, extract text between keys

RegEx, extract text between keys - javascript

Given:
This is some text which could have
line breaks and tabs before and after {code}
and I want them {code} to be replaced {code}in pairs{code} without any issues.
I want:
This is some text which could have
line breaks and tabs before and after <code>
and I want them </code> to be replaced <code>in pairs</code> without any issues.
JsFiddle: http://jsfiddle.net/egrwD/1
Simple working text sample:
var sample1 = 'test test test {code}foo bar{code} {code}good to know{code}';
var regEx1 = new RegExp('(\{code\})(.*?)(\{code\})', 'gi');
var r1 = sample1.replace(regEx1, '<code>$2</code>');
Gives:
test test test <code>foo bar</code> <code>good to know</code>
Non working sample:
var sample2 = 'test test test {code}\tfoo bar{code} {code}\r\ngood to know{code}';
var regEx2 = new RegExp('(\{code\})(.*?)(\{code\})', 'gi');
var r2 = sample2.replace(regEx2, '<code>$2</code>');
Gives:
test test test {code} foo bar{code} {code}
good to know{code}

Looks like you just need to make the pattern match across line breaks, properly escape that first {, and use a regex literal to fix the need to double-escape backslashes in the string:
/(\{code\})([\s\S]*?)(\{code\})/gi
http://jsfiddle.net/mattball/QNak5
Note that you don't even need the capturing parentheses around the {code}s:
/\{code\}([\s\S]*?)\{code\}/gi
http://jsfiddle.net/mattball/Jk5cr

Related

Regex that allows a pattern to start with a an optional, specific character, but no other character

How can I write a regex that allows a pattern to start with a specific character, but that character is optional?
For example, I would like to match all instances of the word "hello" where "hello" is either at the very start of the line or preceded by an "!", in which case it does not have to be at the start of the line. So the first three options here should match, but not the last:
hello
!hello
some other text !hello more text
ahello
I'm specfically interested in JavaScript.

Match it with: /^hello|!hello/g
The ^ will only grab the word "hello" if it's at the beginning of a line.
The | works as an OR.
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^hello|!hello/g;
console.log( str.match(regex) );
Edit:
If you're trying to match the whole line beginning with "hello" or containing "!hello" as suggested in the comment below, then use the following regex:
/^.*(^hello|!hello).*$/gm
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^.*(^hello|!hello).*$/gm;
console.log(str.match(regex));

Final solution (hopefully)
Looks like, catching the groups is only available in ECMAScript 2020. Link 1, Link 2.
As a workaround I've found the following solution:
const str = `hello
!hello
some other text !hello more text
ahello
this is a test hello !hello
JvdV is saying hello
helloing or helloed =).`;
function collectGroups(regExp, str) {
const groups = [];
str.replace(regExp, (fullMatch, group1, group2) => {
groups.push(group1 || group2);
});
return groups;
}
const regex = /^(hello)|(?:!)(hello\b)/g;
const groups = collectGroups(regex, str)
console.log(groups)
/(?=!)?(\bhello\b)/g should do it. Playground.
Example:
const regexp = /(?=!)?(\bhello\b)/g;
const str = `
hello
!hello
some other text !hello more text
ahello
`;
const found = str.match(regexp)
console.log(found)
Explanation:
(?=!)?
(?=!) positive lookahead for !
? ! is optional
(\bhello\b): capturing group
\b word boundary ensures that hello is not preceded or succeeded by a character
Note: If you also make sure, that hello should not be succeeded by !, then you could simply add a negative lookahead like so /(?=!)?(\bhello\b)(?!!)/g.
Update
Thanks to the hint of #JvdV in the comment, I've adapted the regex now, which should meet your requirements:
/(^hello\b)|(?:!)(hello\b)/gm
Playground: https://regex101.com/r/CXXPHK/4 (The explanation can be found on the page as well).
Update 2:
Looks like the non-capturing group (?:!) doesn't work well in JavaScript, i.e. I get a matching result like ["hello", "!hello", "!hello", "!hello"], where ! is also included. But who cares, here is a workaround:
const regex = /(^hello\b)|(?:!)(hello\b)/gm;
const found = (str.match(regex) || []).map(m => m.replace(/^!/, ''));

Javascript get all text in between string

I have string content that gets delivered to me via TCP. This info is only relevant because it means that I do not consistently retrieve the same string. I have a <start> and <stop> separator to ensure that any time I get the data via TCP, I am outputting the full content.
My incoming content looks like so:
<start>Apple Bandana Cadillac<stop>
I want to get everything in between <start> and <stop>. So just Apple Bandana Cadillac.
My script to do this looks like so:
servercsv.on("connection", function(socket){
let d_basic = "";
socket.on('data', function(data){
d_basic += data.toString();
let d_csvindex = d_basic.indexOf('<stop>');
while (d_csvindex > -1){
try {
let strang = d_basic.substring(0, d_csvindex);
let dyson = strang.replace(/<start>/g, '');
let dson = papaparse.parse(dyson);
myfunction(dson);
}
catch(e){ console.log(e); }
d_basic = d_basic.substring(d_csvindex+1);
d_csvindex = d_basic.indexOf('<stop>');
}
});
});
What this means is that I am getting everything before the <stop> string and outputting it. I have also included the line let dyson = strang.replace(/<start>/g, ''); because I want to remove the <start> text.
However, because this is TCP, I am not guranteed to get all parts of this string. As a result, I frequently get back stop>Apple Bandana Cadillac<stop> or some variation of this (such as start>Apple Bandana Cadillac<stop>. It is not consistent enough that I can just do strang.replace("start>", "")
Ideally, I would like my separator to select content that is in between <start> and <stop>. Not just <stop>. However, I am unsure how to do so.
Alternatively, I can also settle for a regex that retrieves all combination of <start><stop> strings during my while loop, and just delete them. So check for <, s, t, a, r, t individually and so forth. But unsure how to implement regex to delete portions of a whole string.

Assuming you get full response:
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>(.*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
If there are new lines between <start> and <stop>
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>([\\S\\s]*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
Using regular expressions capturing group here.

Try this regex with replace() method:
/<st.*?>(.*?)(?!<st)/g
Literal.................................................: <st
Any char zero or more times lazily...: .*?
Literal..................................................: >
Begin capture group..........................: (
Any char zero or more times lazily...: .*?
End capture group.............................: )
Begin negative lookahead.................: (?!
Literal...................................................: <st
End negative lookahead....................: )
In the Demo below notice that the test example consists of multiple lines, and variances of <start> and <stop> (basically <st).
Demo 1
var rgx = /<st.*?>(.*?)(?!<st)/g;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>`;
var res = str.replace(rgx, `$1`);
console.log(res);
Update
"say I have op>Grapes Trampoline Ham<stop>...still trying to remove all parts of the string <stop>"
/^(.*?>)(.*?)(<.*?)$/gm;
A simple explanation will have to do since a step-by-step such as Demo 1 would take too much time.
This RegEx is multiline. /m
^..........Begin line.
(.*?>)..Lazily capture everything until literal >........[Return as $1]
(.*?)...Then lazily capture everything until................[Return as $2]
(<.*?)..Literal < and lazily capture everything until..[Return as $3]
$...........End line.
The trick is to replace the second capture $2 and leave $1 and $3 alone.
Demo 2
var rgx = /^(.*?>)(.*?)(<.*?)$/gm;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>
op>Score False Razor<stop>
`;
var res = str.replace(rgx, `$2`);
console.log(res);

JS regex replace not working

I've got a JS string
var str = '<at id="11:12345678">#robot</at> ping';
I need to remove this part of a string
<at id="11:12345678">#
So I am trying to use
var str = str.replace("<at.+#","");
But there is no change after excution. Moreover if I try to use match it gives me
str.match("<at.+#");
//Result from Chrome console Repl
["<at id="11:12345678">#", index: 0, input: "<at id="11:12345678">#robot</at> ping"]
So pattern actualy works but replace do nothing

Use // for regex. Replace "<at.+#" with /<at.+#/.
var str = '<at id="11:12345678">#robot</at> ping';
str = str.replace(/<at.+#/,"");
console.log(str);
Documentation for replace

JavaScript Regex - Splitting a string into an array by the Regex pattern

Given an input field, I'm trying to use a regex to find all the URLs in the text fields and make them links. I want all the information to be retained, however.
So for example, I have an input of "http://google.com hello this is my content" -> I want to split that by the white space AFTER this regex pattern from another stack overflow question (regexp = /(ftp|http|https)://(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(/|/([\w#!:.?+=&%#!-/]))?/) so that I end up with an array of ['http://google.com', 'hello this is my content'].
Another ex: "hello this is my content http://yahoo.com testing testing http://google.com" -> arr of ['hello this is my content', 'http://yahoo.com', 'testing testing', 'http://google.com']
How can this be done? Any help is much appreciated!

First transform all the groups in your regular expression into non-capturing groups ((?:...)) and then wrap the whole regular expression inside a group, then use it to split the string like this:
var regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*#)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%#!-/]))?)/;
var result = str.split(regex);
Example:
var str = "hello this is my content http://yahoo.com testing testing http://google.com";
var regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*#)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%#!-/]))?)/;
var result = str.split(regex);
console.log(result);

You had few unescaped backslashes in your RegExp.
var str = "hello this is my content http://yahoo.com testing testing http://google.com";
var captured = str.match(/(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!-/]))?/g);
var nonCaptured = [];
str.split(' ').map((v,i) => captured.indexOf(v) == -1 ? nonCaptured.push(v) : null);
console.log(nonCaptured, captured);

Replace string that starts with specific character and ends with specific character in regular expression

I wanted to replace a string section that starts with a specific character and ends with specific character. At below, I demonstrate test case.
var reg = /pattern/gi;
var str = "asdfkadsf[xxxxx]bb";
var test = str.replace(reg,"") == "asdfkadsfbb"
console.log(test);

This pattern should work for replace anything between brackets (including the brackets):
var reg = /(\[.*?\])/gi;
var str = "asdfkadsf[xxxxx]bb";
var test = str.replace(reg,"") == "asdfkadsfbb"

based on your example, this works:
/\[.*]/gi

Develop Reference

JavaScript is the programming language of the Web.

RegEx, extract text between keys - javascript

Related

Regex that allows a pattern to start with a an optional, specific character, but no other character

Javascript get all text in between string

JS regex replace not working

JavaScript Regex - Splitting a string into an array by the Regex pattern

Replace string that starts with specific character and ends with specific character in regular expression

Categories

Resources