How to run regex on webpage with javascript

How to run regex on webpage with javascript - javascript

I'm trying to make a bit of a crude ad-blocker with javascript
The code I currently have:
var pattern = '<iframe(.*?)</iframe>|<object(.*?)</object>';
if (document.body.parentNode.innerHTML.match(pattern))
{
document.body.parentNode.innerHTML =
document.body.parentNode.innerHTML.replace(pattern, '<b>AD BLOCKED</b>');
}
The problem is that the page reloads. Is there a way I can stop the page from reloading? (My main target is adsense)

This does not seem right, since you're just wanting to replace the html on the page. I can't imagine what that will do. To answer your Regex question, though, try this.
var pattern = /<iframe.*<\/iframe>/gi;
document.body.innerHTML =
document.body.innerHTML.replace(pattern, '<strong>bye iframe</strong>');
replace() will swap out all the matches found by the RegExp with the second parameter.
/<iframe.*<\/iframe>/ is a regular expression matching anything within iframe tags.
gi modifies the regex telling it to be global and case-insensitive.
Again, you will probably have some unexpected behavior rewriting the innerHTML of the body, so I'd rethink your approach. Perhaps you could use jQuery to find the tags you don't want and hide or remove them. (example here)

Related

Using grease/tampermonkey (or another extension) to omit certain characters inside web elements

Firstly, I apologize if my terminology here isn't the most accurate; I'm very much a novice when it comes to programming. A forum I frequent has added a bunch of unneccessary, "glitchy" images and text to the page as a part of some promotion, but the result is that the forum is now difficult to use and read. I was able to script out most of it using adblock, but there's one last bit that shows up inside the forum elements themselves, and adblock wants to remove the whole element (which breaks the forum). This is part of the code in question, with the URLs changed:
<td class="windowbg" valign="middle" width="42%">&blk34;&blk34;&blk34;&blk34;&blk34;
Thread title <span class="smalltext"></span><img src="example.com/forumicon.gif"></td>
As you can see, the ▓ character shows up a bunch of times for no reason. Is there a way to make my browser ignore this character when it's inside of an element? If there's a way to do this using AdBlock, I am not smart enough to see it.

Here's one way to do it, using a NodeIterator:
var iter = document.createNodeIterator( document.body, NodeFilter.SHOW_TEXT );
var node;
while (node = iter.nextNode()) {
node.textContent = node.textContent.replace( /[\u2580-\u259f]+/g, '' );
}
This is just plain JavaScript code; you can paste it into the Firefox / Chrome JS console to test it. The regexp /[\u2580-\u259f]+/ matches any sequence of characters in the "Block Elements" Unicode block, including U+2593 Dark Shade (▓). You may want to tweak the regexp to match the characters you want to remove. (Tip: If you don't know what the codes for those characters are, copy and paste them into the "UTF8 String" box on this page.)
Ps. If these characters that you want to remove occur only in a certain part of the document, you can make this code a bit more efficient by replacing the root node (document.body above) with the specific DOM node that you want to remove the characters from. To find the nodes you want, you can use e.g. document.getElementById() or, more generally, document.querySelector() (or even document.querySelectorAll() and loop over the results).

Javascript Getting text from inside parenthesis

I have a large string of HTML (and javascript). I need to get text that is inside document.write()
<script>
$('.navigation').html();
window.jQuery || document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $('.link').attr('href',url) \x3C/script>")
$('.button').html();
</script>
Currently I am finding the index of document.write then deleting any text before it.
strIndex = scriptHtml.indexOf('document.write(');
scriptHtml = scriptHtml.substr(strIndex);
This will Leave me with a string like this.
document.write("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link").attr('href',url) \x3C/script>")
$('.button').html();
</script>
I need to find the first bracket in this new string and then know where the matching bracket ends so that i can get the string inside it.
I have tried some regex but cannot make one that works.
\(([^)]+)\)
The above regex does not work as it will match to:
("<script src='//cdn.shopify.com/s/files/1/0967/6522/t/2/assets/jquery.min.js?15152727378558387064'> $(".link")
as it just searches for an opening and closing bracket without considering how many have been opened.
Has anyone got an idea of how i can get the text i want or think of a better way i can get the text inside document.write?
Thanks

Regular Expressions are simply not the right tool for matching parenthesis that can nest, as they lack the mechanisms that would allow you to do this properly (in this case, recursion). See this answer for more information.
That said, in the example code you posted, simply matching the string document.write along with its quote marks will work (assuming you put the whole code into a variable named str):
console.log(str.match(/document\.write\("([^"]*)"\)/)[1]);
However, I strongly advise against this, as there are many, many possible cases in which parsing it this way will fail and accounting for all possibilities is very complex and really depends on how much you know about (or have control of) the possible inputs.

Using variables with jQuery's replaceWith() method

Ok this one seems pretty simple (and it probably is). I am trying to use jQuery's replace with method but I don't feel like putting all of the html that will be replacing the html on the page into the method itself (its like 60 lines of HTML). So I want to put the html that will be the replacement in a variable named qOneSmall like so
var qOneSmall = qOneSmall.html('..........all the html');
but when I try this I get this error back
Uncaught SyntaxError: Unexpected token ILLEGAL
I don't see any reserved words in there..? Any help would be appreciated.

I think the solution is to only grab the element on the page you're interested in. You say you have like 60 lines. If you know exactly what you want to replace..place just that text in a div with an id='mySpecialText'. Then use jQuery to find and replace just that.
var replacementText = "....all the HTML";
$("#mySpecialText").text(replacementText);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="mySpecialText">Foo</div>

If you're only looking to replace text then jaj.laney's .text() approach can be used. However, that will not render the string as HTML.
The reason the way you're using .html() is likely illegal is that qSmallOne is not a JQuery object. The method cannot be performed on arbitrary variables. You can set the HTML string to a variable and pass that string to the .html() function like this:
var htmlstring = '<em>emphasis</em> and <strong>strong</strong>';
$('#target').html(htmlstring);
To see the difference between using .html() and .text() you can check out this short fiddle.
Edit after seeing the HTML
So there is a lot going on here. I'm just going to group these things into a list of issues
The HTML Strings
So I actually learned something here. Using the carriage return and tab keys in the HTML string is breaking the string. The illegal-ness is coming from the fact the string is never properly terminated since it thinks it ends at the first line. Strip out the white space in your strings and they're perfectly valid.
Variable Names
Minor thing, you've got a typo in qSmallOne. Be sure to check your spelling especially when working with these giant variables. A little diligence up front will save a bunch of headache later.
Selecting the Right Target
Your targets for the change in content are IDs that are in the strings in your variables and not in the actual DOM. While it looks like you're handling this, I found it rather confusing. I would use one containing element with a static ID and target that instead (that way you don't have to remember why you're handling multiple IDs for one container in the future).
Using replaceWith() and html()
.replaceWith() is used to replace an element with something else. This includes the element that is being targeted, so you need to be very aware of what you're wanting to replace. .html() may be a better way to go since it replaces the content within the target, not including the target itself.
I've made these updates and forked your fiddle here.

escape() doesn't seem to work consistently?

using javascript, I generate HTML code, for example adding an function which starts by clicking a link, like:
$('#myDiv').append('click');
So start() should be called if somebody hits the link (click).
TERM could contain a single word, like world or moody's, the generated HTML code would look like:
click
OR
click
As you can see, the 2nd example will not work. So i decided to "escape" the TERM, like so:
$('#myDiv').append('click');
Looking at the HTML source-code using firebug, is see, that the following code was generated:
click
Thats works fine, until I really click the link - so the browser (here firefox) seams to interpret the %27 and tries to fire start('moody's');
Is there a way to escape the term persistent without interpreting the %27 until the term is handled in JS? Is there an other solution instead of using regular expressions to change ' to \'?

Don't try to generate inline JavaScript. That way lies too much pain and maintenance hell. (If you were to go down that route, then you would escape characters in JavaScript strings with \).
Use standard event binding routines instead.
Assuming that $ is jQuery, and not one of the many other libraries that use that unhelpful variable name:
$('#myDiv').append(
$('<a>').append("click").attr('href', 'A sensible fallback').click(function (e) {
alert(TERM); // Because I don't have the function you were calling
e.preventDefault();
})
);
See also http://jsfiddle.net/TudEw/

escape() is used for url-encoding stuff, not for making it possible to put in a string literal. Your code is seriously flawed for several reasons.
If you want an onclick event, use an onclick event. Do not try to "inject" javascript code with your markup. If you have the "string" in a variable, you should never need to substitute anything in it unless you are generating urls or other restricted terms.
var element = $('<span>click</span>');
element.bind('click', function () { start(TERM); });
$('#myDiv').append(element);
If you don't know what this does, then go back to basic and learn what events and function references in javascript means.

That escape() function is for escaping url's for passing over a network, not strings. I don't know that there's a built-in function to escape strings for JavaScript, but you can try this one I found online: http://www.willstrohl.com/Blog/EntryId/67/HOW-TO-Escape-Single-Quotes-for-JavaScript-Strings.
Usage: EscapeSingleQuotes(strString)
Edit: Just noticed your note about regular expressions. This solution does use regular expressions, but I think there's nothing wrong with that :-)

javascript regular expression

I want to match some links from a web content. I know I can use file_get_contents(url) to do this in php. How about in javascript?
For regular expression, like
contents
How can I use js regular expression to match this (match only once, do not greedy). I try to use this
/^\<a href=\"someurl\/something\" id=\"someid\"\>(+?)\<\/a\>$/
but it doesn't work.
Can someone help?
Thanks!

You should know that parsing HTML with regex is not the optimal way to solve this problem, and if you have access to a live DOM of the page, you should use DOM methods instead. As in, you should use
document.getElementById('someid').innerHTML // this will return 'contents'
instead of a regex.

I'd highly recommend using a library like jQuery to get the element, and then get the contents via a .text() call. It's much more simple and reliable than trying to parse HTML with regex.

DOM and jQuery suggestions are better but if you still want to use regex then try this:
/^<a href=".*?" id=".*?">(.*?)<\/a>$/

You might as well create the elements with jQuery
var elements = $(html);
var links = elements.find('a');
links.each(function(i, link){
//Do the regexp matching in here if you wish to search for specific urls only
});
In bigger documents, using the DOM is way quicker than regexping the whole thing as text.

Try this~
try {
boolean foundMatch = subjectString.matches("(?im)<a[^>]*href=(\"[^\"]*\"|'[^']*'|[^\\s>]*)[^>]*>.*?</a>");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Match double quotation marks,single quotes and empty.
contents
<a href='someurl/something' id='someid'>contents</a>
<a href=someurl/something id=someid>contents</a>

Develop Reference

JavaScript is the programming language of the Web.

How to run regex on webpage with javascript - javascript

Related

Using grease/tampermonkey (or another extension) to omit certain characters inside web elements

Javascript Getting text from inside parenthesis

Using variables with jQuery's replaceWith() method

escape() doesn't seem to work consistently?

javascript regular expression

Categories

Resources