Javascript string replace with regex variable manipulation - javascript

How do I replace all instances of digits within a string pattern with that digit plus an offset.
Say I want to replace all HTML tags with that number plus an offset
strRegEx = /<ol start="(\d+)">/gi;
strContent = strContent.replace(strRegEx, function() {
/* return $1 + numOffset; */
});

#Tomalak is right, you shouldn't really use regex's with HTML, you should use the broswer's own HTML DOM or an XML parser.
For example, if that tag also had another attribute assigned to it, such as a class, the regex will not match it.
<ol start="#" > does not equal <ol class="foo" start="#">.
There is no way to use regexes for this, you should just go through the DOM to find the element you are looking for, grab its attributes, check to see if they match, and then go from there.
function replaceWithOffset(var offset) {
var elements = document.getElementsByTagName("ol");
for(var i = 0; i < elements.length; i++) {
if(elements[i].hasAttribute("start")) {
elements[i].setAttribute("start", parseInt(elements[i].getAttribute("start")) + offset);
}
}
}

the replace function obviously doesn't allow that, so doing what you need required a bit more effort
executing (with .exec()) a global regex multiple time will return subsequent results until no more matches are available and null is returned. You can use that in a while loop and then use the returned match to substring the original input and perform your modifications manually
var strContent = "<ol start=\"1\"><ol start=\"2\"><ol start=\"3\"><ol start=\"4\">"
var strRegEx = /<ol start="(\d+)">/g;
var match = null
while (match = strRegEx.exec(strContent)) {
var tag = match[0]
var value = match[1]
var rightMark = match.index + tag.length - 2
var leftMark = rightMark - value.length
strContent = strContent.substr(0, leftMark) + (1 + +value) + strContent.substr(rightMark)
}
console.log(strContent)
note: as #tomalak said, parsing HTML with regexes is generally a bad idea. But if you're parsing just a piece of content of which you know the precise structure beforehand, I don't see any particular issue ...

Related

Replace script tag that has a dynamic version number which keeps changing

I would like to remove or replace this string in an HTML file using javascript.
'<script src="../assets/js/somejs.js?1.0.953"></script>'
Trouble is, the version number "1.0.953" keeps changing so I can't use a simple string.replace
You could achieve this using the following Regex:
\?.*?\"
Or, more specifically
(\d+)\.(\d+)\.(\d+)
For matching the version number.
Utilizing the regex, you can replace the version number with a blank value.
Ok this is messy but it works...
String.prototype.replaceAnyVersionOfScript = function(target, replacement) {
// 1. Build the string we need to replace (target + xxx + ></script>)
var targetLength = target.length;
var targetStartPos = this.indexOf(target);
var dynamicStartPos = targetStartPos + targetLength;
var dynamicEndPos = this.indexOf('></script>', dynamicStartPos) + 10;
var dynamicString = this.substring(dynamicStartPos, dynamicEndPos);
var dymamicStringToReplace = target + dynamicString;
// 2. Now we know what we are looking for. We can replace it.
return this.replace(dymamicStringToReplace, replacement);
};
shellHTML = shellHTML.replaceAnyVersionOfScript('<script src="./assets/js/CloudZoom.js', '');

Split a string of HTML into an array by particular tags

Given this HTML as a string "html", how can I split it into an array where each header <h marks the start of an element?
Begin with this:
<h1>A</h1>
<h2>B</h2>
<p>Foobar</p>
<h3>C</h3>
Result:
["<h1>A</h1>", "<h2>B</h2><p>Foobar</p>", "<h3>C</h3>"]
What I've tried:
I wanted to use Array.split() with a regex, but the result splits each <h into its own element. I need to figure out how to capture from the start of one <h until the next <h. Then include the first one but exclude the second one.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
var foo = html.split(/(<h)/);
Edit: Regex is not a requirement in anyway, it's just the only solution that I thought would work for generally splitting HTML strings in this way.
In your example you can use:
/
<h // Match literal <h
(.) // Match any character and save in a group
> // Match literal <
.*? // Match any character zero or more times, non greedy
<\/h // Match literal </h
\1 // Match what previous grouped in (.)
> // Match literal >
/g
var str = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>'
str.match(/<h(.)>.*?<\/h\1>/g); // ["<h1>A</h1>", "<h2>B</h2>", "<h3>C</h3>"]
But please don't parse HTML with regexp, read RegEx match open tags except XHTML self-contained tags
From the comments to the question, this seems to be the task:
I'm taking dynamic markdown that I'm scraping from GitHub. Then I want to render it to HTML, but wrap every title element in a ReactJS <WayPoint> component.
The following is a completely library-agnostic, DOM-API based solution.
function waypointify(html) {
var div = document.createElement("div"), nodes;
// parse HTML and convert into an array (instead of NodeList)
div.innerHTML = html;
nodes = [].slice.call(div.childNodes);
// add <waypoint> elements and distribute nodes by headings
div.innerHTML = "";
nodes.forEach(function (node) {
if (!div.lastChild || /^h[1-6]$/i.test(node.nodeName)) {
div.appendChild( document.createElement("waypoint") );
}
div.lastChild.appendChild(node);
});
return div.innerHTML;
}
Doing the same in a modern library with less lines of code is absolutely possible, see it as a challenge.
This is what it produces with your sample input:
<waypoint><h1>A</h1></waypoint>
<waypoint><h2>B</h2><p>Foobar</p></waypoint>
<waypoint><h3>C</h3></waypoint>
I'm sure someone could reduce the for loop to put the angle brackets back in but this is how I'd do it.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
//split on ><
var arr = html.split(/></g);
//split removes the >< so we need to determine where to put them back in.
for(var i = 0; i < arr.length; i++){
if(arr[i].substring(0, 1) != '<'){
arr[i] = '<' + arr[i];
}
if(arr[i].slice(-1) != '>'){
arr[i] = arr[i] + '>';
}
}
Additionally, we could actually remove the first and last bracket, do the split and then replace the angle brackets to the whole thing.
var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
//remove first and last characters
html = html.substring(1, html.length-1);
//do the split on ><
var arr = html.split(/></g);
//add the brackets back in
for(var i = 0; i < arr.length; i++){
arr[i] = '<' + arr[i] + '>';
}
Oh, of course this will fail with elements that have no content.
Hi I used this function to convert html String Dom in array
static getArrayTagsHtmlString(str){
let htmlSplit = str.split(">")
let arrayElements = []
let nodeElement =""
htmlSplit.forEach((element)=>{
if (element.includes("<")) {
nodeElement = element+">"
}else{
nodeElement = element
}
arrayElements.push(nodeElement)
})
return arrayElements
}
Happy code

Regular expression in Javascript: table of positions instead of table of occurrences

Regular expressions are most powerful. However, the result they return is sometimes useless:
For example:
I want to manage a CSV string using semicolons.
I define a string like:
var data = "John;Paul;Pete;Stuart;George";
If I use the instruction:
var tab = data.match(/;/g)
after what, "tab" contains an array of 4 ";" :
tab[0]=";", tab[1]=";", tab[2]=";", tab[3]=";"
This array is not useful in the present case, because I knew it even before using the regular expression.
Indeed, what I want to do is 2 things:
1stly: Suppress the 4th element (not "Stuart" as "Stuart", but "Stuart" as 4th element)
2ndly: Replace the 3rd element by "Ringo" so as to get back (to where you once belonged!) the following result:
data == "John;Paul;Ringo;George";
In this case, I would greatly prefer to obtain an array giving the positions of semicolons:
tab[0]=4, tab[1]=9, tab[2]=14 tab[3]=21
instead of the useless (in this specific case)
tab[0]=";", tab[1]=";", tab[2]=";", tab[3]=";"
So, here's my question: Is there a way to obtain this numeric array using regular expressions?
To get tab[0]=4, tab[1]=9, tab[2]=14 tab[3]=21, you can do
var tab = [];
var startPos = 0;
var data = "John;Paul;Pete;Stuart;George";
while (true) {
var currentIndex = data.indexOf(";", startPos);
if (currentIndex == -1) {
break;
}
tab.push(currentIndex);
startPos = currentIndex;
}
But if the result wanted is "John;Paul;Ringo;George", you can do
var tab = data.split(';'); // Split the string into an array of strings
tab.splice(3, 1); // Suppress the 4th element
tab[2] = "Ringo"; // Replace the 3rd element by "Ringo"
var str = tab.join(';'); // Join the elements of the array into a string
The second approach is maybe better in your case.
String.split
Array.splice
Array.join
You should try a different approach, using split.
tab = data.split(';') will return an array of the form
tab[0]="John", tab[1]="Paul", tab[2]="Pete", tab[3]="Stuart", tab[4]="George"
You should be able to achieve your goal with this array.
Why use a regex to perform this operation? You have a built-in function split, which can split your string based on the delimiter you pass.
var data = "John;Paul;Pete;Stuart;George";
var temp=data.split(';');
temp[0],temp[1]...

how to regex the string between two tokens and return string without the tokens?

Fighting with regex....
I'm using this to find pieces of HTML-string between certain elements:
for (i = 0; i < 2; i += 1) {
target = block[i]; // like BODY or HEAD
regex = RegExp('<' + target + '>(.)+</' + target + '>');
// in case string passed includes breaks/spaces
data = data.replace(/(\r\n|\n|\r)/gm,"").replace(/\s+/g," ")
.match(regex);
entry = data[0].replace(/<!-- [\s\S]*? -->/g, '');
console.log(entry);
}
While this works fine, it returns something like this:
<head>....everthing I want ....</head>
Question:
How do I need to modifiy the regex, so that I can still specifiy the element whose content I need, but which returns only the content and not content & tokens (like <head></head>).
Thanks!
Use the first matching group instead of the whole match.
regex = RegExp('<' + target + '>(.+)</' + target + '>');
and then...
entry = data[1].replace(/<!-- [\s\S]*? -->/g, '');

Javascript substring() trickery

I have a URL that looks like http://mysite.com/#id/Blah-blah-blah, it's used for Ajax-ey bits. I want to use substring() or substr() to get the id part. ID could be any combination of any length of letters and numbers.
So far I have got:
var hash = window.location.hash;
alert(hash.substring(1)); // remove #
Which removes the front hash, but I'm not a JS coder and I'm struggling a bit. How can I remove all of it except the id part? I don't want anything after and including the final slash either (/Blah-blah-blah).
Thanks!
Jack
Now, this is a case where regular expressions will make sense. Using substring here won't work because of the variable lengths of the strings.
This code will assume that the id part wont contain any slashes.
var hash = "#asdfasdfid/Blah-blah-blah";
hash.match(/#(.+?)\//)[1]; // asdfasdfid
The . will match any character and
together with the + one or more characters
the ? makes the match non-greedy so that it will stop at the first occurence of a / in the string
If the id part can contain additional slashes and the final slash is the separator this regex will do your bidding
var hash = "#asdf/a/sdfid/Blah-blah-blah";
hash.match(/#(.+?)\/[^\/]*$/)[1]; // asdf/a/sdfid
Just for fun here are versions not using regular expressions.
No slashes in id-part:
var hash = "#asdfasdfid/Blah-blah-blah",
idpart = hash.substr(1, hash.indexOf("/"));
With slashes in id-part (last slash is separator):
var hash = "#asdf/a/sdfid/Blah-blah-blah",
lastSlash = hash.split("").reverse().indexOf("/") - 1, // Finding the last slash
idPart = hash.substring(1, lastSlash);
var hash = window.location.hash;
var matches = hash.match(/#(.+?)\//);
if (matches.length > 1) {
alert(matches[1]);
}
perhaps a regex
window.location.hash.match(/[^#\/]+/)
Use IndexOf to determine the position of the / after id and then use string.substr(start,length) to get the id value.
var hash = window.location.hash;
var posSlash = hash.indexOf("/", 1);
var id = hash.substr(1, posSlash -1)
You need ton include some validation code to check for absence of /
This one is not a good aproach, but you wish to use if you want...
var relUrl = "http://mysite.com/#id/Blah-blah-blah";
var urlParts = [];
urlParts = relUrl.split("/"); // array is 0 indexed, so
var idpart = = urlParts[3] // your id will be in 4th element
id = idpart.substring(1) //we are skipping # and read the rest
The most foolproof way to do it is probably the following:
function getId() {
var m = document.location.href.match(/\/#([^\/&]+)/);
return m && m[1];
}
This code does not assume anything about what comes after the id (if at all). The id it will catch is anything except for forward slashes and ampersands.
If you want it to catch only letters and numbers you can change it to the following:
function getId() {
var m = document.location.href.match(/\/#([a-z0-9]+)/i);
return m && m[1];
}

Categories

Resources