javascript regular expression - javascript

I want to match some links from a web content. I know I can use file_get_contents(url) to do this in php. How about in javascript?
For regular expression, like
contents
How can I use js regular expression to match this (match only once, do not greedy). I try to use this
/^\<a href=\"someurl\/something\" id=\"someid\"\>(+?)\<\/a\>$/
but it doesn't work.
Can someone help?
Thanks!

You should know that parsing HTML with regex is not the optimal way to solve this problem, and if you have access to a live DOM of the page, you should use DOM methods instead. As in, you should use
document.getElementById('someid').innerHTML // this will return 'contents'
instead of a regex.

I'd highly recommend using a library like jQuery to get the element, and then get the contents via a .text() call. It's much more simple and reliable than trying to parse HTML with regex.

DOM and jQuery suggestions are better but if you still want to use regex then try this:
/^<a href=".*?" id=".*?">(.*?)<\/a>$/

You might as well create the elements with jQuery
var elements = $(html);
var links = elements.find('a');
links.each(function(i, link){
//Do the regexp matching in here if you wish to search for specific urls only
});
In bigger documents, using the DOM is way quicker than regexping the whole thing as text.

Try this~
try {
boolean foundMatch = subjectString.matches("(?im)<a[^>]*href=(\"[^\"]*\"|'[^']*'|[^\\s>]*)[^>]*>.*?</a>");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Match double quotation marks,single quotes and empty.
contents
<a href='someurl/something' id='someid'>contents</a>
<a href=someurl/something id=someid>contents</a>

Related

Using variables with jQuery's replaceWith() method

Ok this one seems pretty simple (and it probably is). I am trying to use jQuery's replace with method but I don't feel like putting all of the html that will be replacing the html on the page into the method itself (its like 60 lines of HTML). So I want to put the html that will be the replacement in a variable named qOneSmall like so
var qOneSmall = qOneSmall.html('..........all the html');
but when I try this I get this error back
Uncaught SyntaxError: Unexpected token ILLEGAL
I don't see any reserved words in there..? Any help would be appreciated.
I think the solution is to only grab the element on the page you're interested in. You say you have like 60 lines. If you know exactly what you want to replace..place just that text in a div with an id='mySpecialText'. Then use jQuery to find and replace just that.
var replacementText = "....all the HTML";
$("#mySpecialText").text(replacementText);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="mySpecialText">Foo</div>
If you're only looking to replace text then jaj.laney's .text() approach can be used. However, that will not render the string as HTML.
The reason the way you're using .html() is likely illegal is that qSmallOne is not a JQuery object. The method cannot be performed on arbitrary variables. You can set the HTML string to a variable and pass that string to the .html() function like this:
var htmlstring = '<em>emphasis</em> and <strong>strong</strong>';
$('#target').html(htmlstring);
To see the difference between using .html() and .text() you can check out this short fiddle.
Edit after seeing the HTML
So there is a lot going on here. I'm just going to group these things into a list of issues
The HTML Strings
So I actually learned something here. Using the carriage return and tab keys in the HTML string is breaking the string. The illegal-ness is coming from the fact the string is never properly terminated since it thinks it ends at the first line. Strip out the white space in your strings and they're perfectly valid.
Variable Names
Minor thing, you've got a typo in qSmallOne. Be sure to check your spelling especially when working with these giant variables. A little diligence up front will save a bunch of headache later.
Selecting the Right Target
Your targets for the change in content are IDs that are in the strings in your variables and not in the actual DOM. While it looks like you're handling this, I found it rather confusing. I would use one containing element with a static ID and target that instead (that way you don't have to remember why you're handling multiple IDs for one container in the future).
Using replaceWith() and html()
.replaceWith() is used to replace an element with something else. This includes the element that is being targeted, so you need to be very aware of what you're wanting to replace. .html() may be a better way to go since it replaces the content within the target, not including the target itself.
I've made these updates and forked your fiddle here.

How to run regex on webpage with javascript

I'm trying to make a bit of a crude ad-blocker with javascript
The code I currently have:
var pattern = '<iframe(.*?)</iframe>|<object(.*?)</object>';
if (document.body.parentNode.innerHTML.match(pattern))
{
document.body.parentNode.innerHTML =
document.body.parentNode.innerHTML.replace(pattern, '<b>AD BLOCKED</b>');
}
The problem is that the page reloads. Is there a way I can stop the page from reloading? (My main target is adsense)
This does not seem right, since you're just wanting to replace the html on the page. I can't imagine what that will do. To answer your Regex question, though, try this.
var pattern = /<iframe.*<\/iframe>/gi;
document.body.innerHTML =
document.body.innerHTML.replace(pattern, '<strong>bye iframe</strong>');
replace() will swap out all the matches found by the RegExp with the second parameter.
/<iframe.*<\/iframe>/ is a regular expression matching anything within iframe tags.
gi modifies the regex telling it to be global and case-insensitive.
Again, you will probably have some unexpected behavior rewriting the innerHTML of the body, so I'd rethink your approach. Perhaps you could use jQuery to find the tags you don't want and hide or remove them. (example here)

Array check element ID with wildcard in if statement

I can check an object ID in a array with
if (obj[0].id != "myID")
I would like to do the same with a wildcard, so that
if (obj[0].id != "myID*")
will exclude #myID1, #myID2, #myID3 etc.
I have to stay inside the if statement for this check, I can't call an external function.
If it is not possible, I can use obj[0].className instead of .id :
if (obj[0].className != "myClass")
but every object has several classes in addition of myClass.
jQuery is allowed although I'm not sure it will help.
If you're using jQuery (you've added the tag), why not use the selectors?
$('*:not[id^="myID"]')
This gets all the elements where the attribute does not start with myID. You can use this in your if statement like so:
if($(obj[0]).is('[id^="myID"]'))
First of all, you can definitely use an id attribute selector like this
if(!$(obj[0]).is("[id^=myID]"))
However, why not assign a class to all those elements instead? That sounds like a much more reasonable approach, allowing
if(!$(obj[0]).hasClass("myClass"))
Using String.prototype.indexOf might be one possible approach:
if (obj[0].id.indexOf('myID') !== 0) {
// ID does not start with 'myID'
}
You can even use regular expressions:
if( !/(myId)/g.test( obj[0].id.indexOf('myID') ) ) {
}
I can suggest you this really good playground to test you regexp:
http://lea.verou.me/regexplained/
And this talk:
http://www.youtube.com/watch?v=EkluES9Rvak
Regular expression can be very powerful. Maybe your case is not that hard to be managed with other tecniques but you would find regular expressions reeeally useful in the future for other problems.
You could check that the first 4 characters are myID with .substring():
if(obj[0].id.substring(0,4) != 'myId'){ }
If you wanted to use jQuery it would be really easy to check the id or class:
if(!$(obj[0]).is('[id^=myId]')){ }
or
if(!$(obj[0]).hasClass('myClass')){ }

strip a div and all child elements from a string with JavaScript replace RegEx

I have a block of HTML stored in a variable called address_form, within that block of HTML I want to remove, or replace, a portion of it. The part I want to replace is a div with an ID of address_container.
There's clearly something wrong with my RegEx here that i'm using with the replace function as it is not working:
var tempStr = address_form.replace('/\<div id=\"#address_container\"\>.*<\/div\>/', '');
I simply want to replace a string, within a string.
Since you've tagged your question with jQuery, then I would suggest you use jQuery to do this task. Something like:
var tempStr = jQuery(address_from).remove('#address_container').html();
Don't do that, just get the contents of the div and replace the parent of that div with the contents.
So
var tempStr = $('#address_container').html(); // or text()
$('#parent_of_address_container').html(tempStr);
Your regex is wrong. Use this instead:
address_form.replace(/<div id=["-]#address_container["-]>.*<\/div>/,'');
From #RidgeRunner
Correctly matching a DIV element, (which itself may contain other DIV
elements), using a single JavaScript regex is impossible. This is
because the js regex engine does not support matching nested
structures.

Regular Expression for relative links ONLY

I'm creating a javascript that checks for links in the DOM and changes those who are NOT absolute links. Unfortunately I'm not having any luck...
I would like to match only the first type of links below, and add a folder path
link
<a href"http://somesite.net/somepage.html">link</a>
I've used string.replace(/a.+href="([^http]+)"/, 'path'+$1); to no avail...
Can someone help me here? Thanks in advance.
If the regular expression that you've written to solve a problem using just regular expressions starts to look like overkill, then it is probably overkill. Sometimes a simple if statement used in conjunction with regular expressions can do wonders:
$("a").each(function () {
if (!/^http:\/\//.test(this.href)) {
this.href = "http://example.com/folder/" + this.href; // etc.
}
});
You may want to look at the <base> html tag, instead. It allows you to set the path to which all links and images are relative.
http://www.w3schools.com/tags/tag_base.asp
http://www.w3.org/TR/html5/semantics.html#the-base-element
You've created a character class with the square brackets. Remove them. You want a "negative lookbehind", see comment below for info on syntax. Not all languages support this regex feature though.
Javascript doesn't support lookbehind. This may help though: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript
You can use
string.replace(/(a.+href=)"(?!http)(.+)"/gi, '$1"path/$2"')
for example sake, I just made a variable with a couple links in it. You can easily adapt the .replace() to work with however you get the links.
var content = 'linklinklink';
// whatever you want to prefix link with
var base='http://somsite.net';
content = content.replace(/(href=")(?!https?:\/\/)([^"]*)/gi,'$1'+base+'/$2').replace(/\/+/g,'/');
Thanks everyone.
I was able to replace relative paths ONLY by using the following syntax:
var basepath = "pathto/";
var html = html.replace(/(<(a|img)[^>]+(href|src)=")(?!http)([^"]+)/g, '$1'+basepath+'$4');

Categories

Resources