JavaScript, Dynamic Regex, Replace everything between and including delimiters - javascript

Text To Remove:
I am trying to remove a line of text from a file.
Everything between <%'/testStart'%> and <%'/testEnd'%>` including the delimiters.
<%'/testStart'%> Some text with other random characters in between <%'/testEnd'%>
JavaScript:
I have tried this with no luck. Well, at one point I had it working with everything hard coded in the RegExp. But I have tried so many ways I can't remember what I did. Basically I think I am just not escaping something properly.
var p = "/test"; //this is dynamic
var start = "<%'" + p + "Start" + "'%>";
var end = "<%'" + p + "End" + "'%>";
var regex = new RegExp("\\" + start + "[^:]\\" + end);
var newData = data.replace(regex,"");
Expected Result:
Completely remove this line.
<%'/testStart'%> Some text with other random characters in between <%'/testEnd'%>
Any help is much appreciated. Thanks,

Match Inside with a Lazy .*?
Replace your inner section "[^:]\\" with ".*?"
var regex = new RegExp( start + ".*?" + end );
The effect is to match everything up to the end parameter.
Explanation
The star quantifier in .*? is made "lazy" by the ? so that the dot only matches as many characters as needed to allow the next token to match (shortest match). Without the ?, the .* first matches the whole string, then backtracks only as far as needed to allow the next token to match (longest match).

Related

Replace all slashes on a line with a match on at the beginning of the same line

I'm trying to change all slashes on a line to replace with the 3-characters block at the beginninig of each line. (PMC,PAJ, etc in below example)
.PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00
.PAJ.77998XX/77896XX.T00
.PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00
.PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00
The desired outcome should be:
PMC.89569XXPMC90051XXPMC90204XXPMC89533XXPMC90554XXPMC90053XXPMC90215XXPMC89874XXPMC89974XXPMC90481XXPMC90221XXPMC90508XXPMC90183XXPMC88526XXPMC89843XXPMC88041XXPMC90446XXPMC88515XXPMC89574XXPMC89847XXPMC88616XXPMC90513XXPMC90015XXPMC90334XXPMC89649XX.T00
I'm not sure how to accomplish this.
This is what I have so far:
(.)([A-Z]{3})(.)(\/)
If you only plan to support ECMAScript 2018 and newer, you may achieve what you need with a single regex:
.replace(/(?<=^\.([^.]+)\..*?)\//g, "$1")
See the regex demo.
Details
(?<=^\.([^.]+)\..*?) - a positive lookbehind that, immediately to left of the current location, requires
^ - start of string
\. - a dot
([^.]+) - Group 1: one or more chars other than a dot
\. - a dot
.*? - any 0+ chars, other than linebreak chars, as few as possible
\/ - a / char.
JS demo:
var strs = ['.PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00','.PAJ.77998XX/77896XX.T00','.PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00','.PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00'];
for (var s of strs) {
console.log(s.replace(/(?<=^\.([^.]+)\..*?)\//g, "$1"));
}
I am not sure if you can do it with just one regex and you will have to probably do it as a two step process. First, you can capture the three capital letters using substring() method and then you can replace all slashes with those three letter appearing in the beginning of character after first dot. Here is a demo with JS code,
function transformLine(s) {
var repStr = s.substring(1,4);
var replacedStr = s.replace(/\//g, repStr);
return replacedStr.substring(1,replacedStr.length);
}
var lines = [".PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00", ".PAJ.77998XX/77896XX.T00", ".PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00", ".PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00"];
for (var i = 0;i<lines.length;i++) {
console.log("Before: " + lines[i]);
console.log("After: " + transformLine(lines[i])+"\n\n");
}
I've replaced the first dot as your expected output does not have it.
Let me know if this works for you.
Edit:
I have updated the code to provide a function that takes a string as input and returns the modified string. Please check the demo.
Edit2: Solving it mostly using regex
This one liner in the function does all the job for you in transforming your line to the required one.
function transformLine(s) {
return s.replace(/\//g, /^.(.{3})/.exec(s)[1]).replace(/^./,'');
}
var lines = [".PMC.89569XX/90051XX/90204XX/89533XX/90554XX/90053XX/90215XX/89874XX/89974XX/90481XX/90221XX/90508XX/90183XX/88526XX/89843XX/88041XX/90446XX/88515XX/89574XX/89847XX/88616XX/90513XX/90015XX/90334XX/89649XX.T00", ".PAJ.77998XX/77896XX.T00", ".PAG.78116XX/78104XX/77682XX/07616XX/77663XX/77863XX/07634XX/78088XX/77746XX/78148XX.T00", ".PKC.22762XX/22358XX/22055XX/22672XX/22684XX/22154XX/22608XX/22768XX/22632XX/22266XX/22714XX/22658XX/22631XX/22288XX/22020XX/22735XX/22269XX/22138XX/22331XX/22387XX/22070XX/22636XX/22629XX/22487XX/22725XX.T00"];
for (var i = 0;i<lines.length;i++) {
console.log("Before: " + lines[i]);
console.log("After: " + transformLine(lines[i])+"\n\n");
}
As you can see here, this line,
return s.replace(/\//g, /^.(.{3})/.exec(s)[1]).replace(/^./,'');
does all the job you need. It first extracts the three capital letter using this /^.(.{3})/.exec(s)[1] then all slashes are replaced with this captured word and then finally first character which is a dot is removed using this /^./,'' and finally returns the string you need.
Let me know if this is what you wanted. Else let me know if you further wanted it in any particular way.

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

Last Character of regex match

I am trying to use a regex to detect a pattern in the current page query string.
For this reason I have the following regex:
var re = new RegExp("([?|&])" + key + "=.*?(&|#|$)", "i");
I am testing it against this input:
http://127.0.0.1:33822/?year=2015&country=Portugal&format=psd
And it works just fine finding the pattern.
What I am really trying to get is the last character index relative to the whole URL.
So for example if I want the index o the last 'd' character:
http://127.0.0.1:33822/?year=2015&country=Portugal&format=ps**d**
In this specific case I would want 60.
Thanks in advance
You can use match.index for that. It will give you where the match exists in the original string.
Then you can use the matche[0].length to find the last character in the match
key='format'
re = new RegExp("([?|&])" + key + "=.*?(&|#|$)", "i");
url='http://127.0.0.1:33822/?year=2015&country=Portugal&format=psd'
match=url.match(re)
console.log(match.index + match[0].length-1) // 60

Javascript Regex match everything after last occurrence of string

I am trying to match everything after (but not including!) the last occurrence of a string in JavaScript.
The search, for example, is:
[quote="user1"]this is the first quote[/quote]\n[quote="user2"]this is the 2nd quote and some url https://www.google.com/[/quote]\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
Edit: I'm looking to match everything after the last quote block. So I was trying to match everything after the last occurrence of "quote]" ? Idk if this is the best solution but its what i've been trying.
I'll be honest, i suck at this Regex stuff.. here is what i've been trying with the results..
regex = /(quote\].+)(.*)/ig; // Returns null
regex = /.+((quote\]).+)$/ig // Returns null
regex = /( .* (quote\]) .*)$/ig // Returns null
I have made a JSfiddle for anyone to have a play with here:
https://jsfiddle.net/au4bpk0e/
One option would be to match everything up until the last [/quote], and then get anything following it. (example)
/.*\[\/quote\](.*)$/i
This works since .* is inherently greedy, and it will match every up until the last \[\/quote\].
Based on the string you provided, this would be the first capturing group match:
\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
But since your string contains new lines, and . doesn't match newlines, you could use [\s\S] in place of . in order to match anything.
Updated Example
/[\s\S]*\[\/quote\]([\s\S]*)$/i
You could also avoid regex and use the .lastIndexOf() method along with .slice():
Updated Example
var match = '[\/quote]';
var textAfterLastQuote = str.slice(str.lastIndexOf(match) + match.length);
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;
Alternatively, you could also use .split() and then get the last value in the array:
Updated Example
var textAfterLastQuote = str.split('[\/quote]').pop();
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;

Using regex to search for keywords at the beginning of words only

I have a searching system that splits the keyword into chunks and searches for it in a string like this:
var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i");
I would like to modify this so that so that I would only search for it in the beginning of the words.
For example if the string is:
"Bbe be eb ebb beb"
And the keyword is: "be eb"
Then I want only these to hit "be ebb eb"
In other words I want to combine the above regexp with this one:
var regexp_school = new RegExp("^" + split_keywords[0], "i");
But I'm not sure how the syntax would look like.
I'm also using the split function to split the keywords, but I don't want to set a length since I don't know how many words there are in the keyword string.
split_keywords = school_keyword.split(" ", 3);
If I leave the 3 out, will it have dynamic length or just length of 1? I tried doing a
alert(split_keywords.lenght);
But didn't get a desired response
You should use the special word boundary character \b to match the beginning of a word. To create the expression for an arbitrary number of keywords, you can generate it in a loop.
var regex = '';
for(var i = split_keywords.length;i--; ) {
// two slashes are needed to insert `\` literally
regex += "(?=.*\\b" + split_keywords[i] + ")";
}
var regexp_school = new RegExp(regex, "i");
I'm not sure about performance, but you can also consider to use indexOf to test whether a substring is contained in a string.
Update:
If \b does not work for you (because of other "special" characters), and all your words are separated by a white space, you can use
"(?=.*\\s" + split_keywords[i] + ")"
or
"(?=.* " + split_keywords[i] + ")"
But for this to work you have to prepend the text you are searching in with a white space:
" " + textYouSearchIn
or you are write a more complex expression:
"(?=(^|.*\\s)" + split_keywords[i] + ")"
A couple points. First, you need to anchor the regex to the start of the string. Otherwise, if there is no match, there are a LOT of combinations that the regex engine must try before declaring a match failure (it must check all of them, in fact). Second, when splitting the string, use /\s+/ instead of a single space - this prevents getting empty matches in the resulting array in case there are multiple spaces between any keywords. Third, if there are empty strings in the array of keywords, you do not want to add them to the regex. Felix's solution is pretty close to the mark, but does not actually match the string once all the positive lookahead assertions are finished. That said, here's my proposed solution:
var split_keywords = school_keyword.split(/\s+/);
var regex = "^"; // Anchor to start of string.
for (var i = 0, len = split_keywords.length; i < len; ++i) {
if (split_keywords[i]) { // Skip empty keyword strings.
regex += "(?=.*?\\b" + split_keywords[i] + ")";
}
}
regex += ".*$"; // Add ending to actually match the line.
var regexp_school = new RegExp(regex, "i");
I've also changed the greedy quantifier to lazy. This is one case where it is applicable.

Categories

Resources