I have a large string of HTML (and javascript). I need to get text that is inside document.write()
<script>
$('.navigation').html();
window.jQuery || document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $('.link').attr('href',url) \x3C/script>")
$('.button').html();
</script>
Currently I am finding the index of document.write then deleting any text before it.
strIndex = scriptHtml.indexOf('document.write(');
scriptHtml = scriptHtml.substr(strIndex);
This will Leave me with a string like this.
document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link").attr('href',url) \x3C/script>")
$('.button').html();
</script>
I need to find the first bracket in this new string and then know where the matching bracket ends so that i can get the string inside it.
I have tried some regex but cannot make one that works.
\(([^)]+)\)
The above regex does not work as it will match to:
("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link")
as it just searches for an opening and closing bracket without considering how many have been opened.
Has anyone got an idea of how i can get the text i want or think of a better way i can get the text inside document.write?
Thanks
Regular Expressions are simply not the right tool for matching parenthesis that can nest, as they lack the mechanisms that would allow you to do this properly (in this case, recursion). See this answer for more information.
That said, in the example code you posted, simply matching the string document.write along with its quote marks will work (assuming you put the whole code into a variable named str):
console.log(str.match(/document\.write\("([^"]*)"\)/)[1]);
However, I strongly advise against this, as there are many, many possible cases in which parsing it this way will fail and accounting for all possibilities is very complex and really depends on how much you know about (or have control of) the possible inputs.
Related
Firstly, I apologize if my terminology here isn't the most accurate; I'm very much a novice when it comes to programming. A forum I frequent has added a bunch of unneccessary, "glitchy" images and text to the page as a part of some promotion, but the result is that the forum is now difficult to use and read. I was able to script out most of it using adblock, but there's one last bit that shows up inside the forum elements themselves, and adblock wants to remove the whole element (which breaks the forum). This is part of the code in question, with the URLs changed:
<td class="windowbg" valign="middle" width="42%">▓▓▓▓▓
Thread title <span class="smalltext"></span><img src="example.com/forumicon.gif"></td>
As you can see, the ▓ character shows up a bunch of times for no reason. Is there a way to make my browser ignore this character when it's inside of an element? If there's a way to do this using AdBlock, I am not smart enough to see it.
Here's one way to do it, using a NodeIterator:
var iter = document.createNodeIterator( document.body, NodeFilter.SHOW_TEXT );
var node;
while (node = iter.nextNode()) {
node.textContent = node.textContent.replace( /[\u2580-\u259f]+/g, '' );
}
This is just plain JavaScript code; you can paste it into the Firefox / Chrome JS console to test it. The regexp /[\u2580-\u259f]+/ matches any sequence of characters in the "Block Elements" Unicode block, including U+2593 Dark Shade (▓). You may want to tweak the regexp to match the characters you want to remove. (Tip: If you don't know what the codes for those characters are, copy and paste them into the "UTF8 String" box on this page.)
Ps. If these characters that you want to remove occur only in a certain part of the document, you can make this code a bit more efficient by replacing the root node (document.body above) with the specific DOM node that you want to remove the characters from. To find the nodes you want, you can use e.g. document.getElementById() or, more generally, document.querySelector() (or even document.querySelectorAll() and loop over the results).
Ok this one seems pretty simple (and it probably is). I am trying to use jQuery's replace with method but I don't feel like putting all of the html that will be replacing the html on the page into the method itself (its like 60 lines of HTML). So I want to put the html that will be the replacement in a variable named qOneSmall like so
var qOneSmall = qOneSmall.html('..........all the html');
but when I try this I get this error back
Uncaught SyntaxError: Unexpected token ILLEGAL
I don't see any reserved words in there..? Any help would be appreciated.
I think the solution is to only grab the element on the page you're interested in. You say you have like 60 lines. If you know exactly what you want to replace..place just that text in a div with an id='mySpecialText'. Then use jQuery to find and replace just that.
var replacementText = "....all the HTML";
$("#mySpecialText").text(replacementText);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="mySpecialText">Foo</div>
If you're only looking to replace text then jaj.laney's .text() approach can be used. However, that will not render the string as HTML.
The reason the way you're using .html() is likely illegal is that qSmallOne is not a JQuery object. The method cannot be performed on arbitrary variables. You can set the HTML string to a variable and pass that string to the .html() function like this:
var htmlstring = '<em>emphasis</em> and <strong>strong</strong>';
$('#target').html(htmlstring);
To see the difference between using .html() and .text() you can check out this short fiddle.
Edit after seeing the HTML
So there is a lot going on here. I'm just going to group these things into a list of issues
The HTML Strings
So I actually learned something here. Using the carriage return and tab keys in the HTML string is breaking the string. The illegal-ness is coming from the fact the string is never properly terminated since it thinks it ends at the first line. Strip out the white space in your strings and they're perfectly valid.
Variable Names
Minor thing, you've got a typo in qSmallOne. Be sure to check your spelling especially when working with these giant variables. A little diligence up front will save a bunch of headache later.
Selecting the Right Target
Your targets for the change in content are IDs that are in the strings in your variables and not in the actual DOM. While it looks like you're handling this, I found it rather confusing. I would use one containing element with a static ID and target that instead (that way you don't have to remember why you're handling multiple IDs for one container in the future).
Using replaceWith() and html()
.replaceWith() is used to replace an element with something else. This includes the element that is being targeted, so you need to be very aware of what you're wanting to replace. .html() may be a better way to go since it replaces the content within the target, not including the target itself.
I've made these updates and forked your fiddle here.
I'm trying to make a bit of a crude ad-blocker with javascript
The code I currently have:
var pattern = '<iframe(.*?)</iframe>|<object(.*?)</object>';
if (document.body.parentNode.innerHTML.match(pattern))
{
document.body.parentNode.innerHTML =
document.body.parentNode.innerHTML.replace(pattern, '<b>AD BLOCKED</b>');
}
The problem is that the page reloads. Is there a way I can stop the page from reloading? (My main target is adsense)
This does not seem right, since you're just wanting to replace the html on the page. I can't imagine what that will do. To answer your Regex question, though, try this.
var pattern = /<iframe.*<\/iframe>/gi;
document.body.innerHTML =
document.body.innerHTML.replace(pattern, '<strong>bye iframe</strong>');
replace() will swap out all the matches found by the RegExp with the second parameter.
/<iframe.*<\/iframe>/ is a regular expression matching anything within iframe tags.
gi modifies the regex telling it to be global and case-insensitive.
Again, you will probably have some unexpected behavior rewriting the innerHTML of the body, so I'd rethink your approach. Perhaps you could use jQuery to find the tags you don't want and hide or remove them. (example here)
There are no inline scripts involved, whatsoever. I have an external file script, which fetches some JSONP from twitter. Let's suppose that a property of the object represented in the returned JSONP was a string that contained somewhere in it the substring "</script>". Could this cause any problems on its own, without getting added to the DOM at all? (It gets scrubbed clean well before that point.)
I can't see why it would, but HTML parsing is notoriously whacky and quirky, so who knows? I know that if you want to have a string literal within an inline script, you need to break it up, like var slashScriptContainingString = 'foo</scr' + 'ipt>bar'; Again, I feel like it should be fine, but just checking to see if anyone knows why it might not be.
<!doctype html>
<script src="file.js"></script>
File.js:
var f = function(twobj) {
console.log(twobj);
doOtherStuffWith(twobj);
}
<script src="https://api.twitter.com/statuses/user_timeline/user.json?callback=f"></script>
Returned JSONP:
f(["this is an object, returned as part of the JSONP response, except it contains a string literal with the substring \"</script>\". Is this a problem? Note: I haven't said anything about injecting this string in the DOM in any way shape or form. I can't think of a reason why it might be, but I'd just like to be sure."]);
No, string literals can contain whatever you want. As long as you are not blindly trying to set the innerHTML of something, a string is just a string. The example you have posted is safe.
The reason that you need to split up your </script> tag in your JavaScript source is that you are missing CDATA blocks. Without them, technically everything in your inline JavaScript needs to be properly escaped for HTML. (< becomes <, etc.) Browsers are nice to you and let it slide, but </script> inside inline JavaScript becomes ambiguous. You should be using CDATA blocks to keep things like this from happening.
<script type="text/javascript">
//<![CDATA[
...code...
//]]>
</script>
See this question for more details: When is a CDATA section necessary within a script tag?
using javascript, I generate HTML code, for example adding an function which starts by clicking a link, like:
$('#myDiv').append('click');
So start() should be called if somebody hits the link (click).
TERM could contain a single word, like world or moody's, the generated HTML code would look like:
click
OR
click
As you can see, the 2nd example will not work. So i decided to "escape" the TERM, like so:
$('#myDiv').append('click');
Looking at the HTML source-code using firebug, is see, that the following code was generated:
click
Thats works fine, until I really click the link - so the browser (here firefox) seams to interpret the %27 and tries to fire start('moody's');
Is there a way to escape the term persistent without interpreting the %27 until the term is handled in JS? Is there an other solution instead of using regular expressions to change ' to \'?
Don't try to generate inline JavaScript. That way lies too much pain and maintenance hell. (If you were to go down that route, then you would escape characters in JavaScript strings with \).
Use standard event binding routines instead.
Assuming that $ is jQuery, and not one of the many other libraries that use that unhelpful variable name:
$('#myDiv').append(
$('<a>').append("click").attr('href', 'A sensible fallback').click(function (e) {
alert(TERM); // Because I don't have the function you were calling
e.preventDefault();
})
);
See also http://jsfiddle.net/TudEw/
escape() is used for url-encoding stuff, not for making it possible to put in a string literal. Your code is seriously flawed for several reasons.
If you want an onclick event, use an onclick event. Do not try to "inject" javascript code with your markup. If you have the "string" in a variable, you should never need to substitute anything in it unless you are generating urls or other restricted terms.
var element = $('<span>click</span>');
element.bind('click', function () { start(TERM); });
$('#myDiv').append(element);
If you don't know what this does, then go back to basic and learn what events and function references in javascript means.
That escape() function is for escaping url's for passing over a network, not strings. I don't know that there's a built-in function to escape strings for JavaScript, but you can try this one I found online: http://www.willstrohl.com/Blog/EntryId/67/HOW-TO-Escape-Single-Quotes-for-JavaScript-Strings.
Usage: EscapeSingleQuotes(strString)
Edit: Just noticed your note about regular expressions. This solution does use regular expressions, but I think there's nothing wrong with that :-)