Regular Expression for relative links ONLY - javascript

I'm creating a javascript that checks for links in the DOM and changes those who are NOT absolute links. Unfortunately I'm not having any luck...
I would like to match only the first type of links below, and add a folder path
link
<a href"http://somesite.net/somepage.html">link</a>
I've used string.replace(/a.+href="([^http]+)"/, 'path'+$1); to no avail...
Can someone help me here? Thanks in advance.

If the regular expression that you've written to solve a problem using just regular expressions starts to look like overkill, then it is probably overkill. Sometimes a simple if statement used in conjunction with regular expressions can do wonders:
$("a").each(function () {
if (!/^http:\/\//.test(this.href)) {
this.href = "http://example.com/folder/" + this.href; // etc.
}
});

You may want to look at the <base> html tag, instead. It allows you to set the path to which all links and images are relative.
http://www.w3schools.com/tags/tag_base.asp
http://www.w3.org/TR/html5/semantics.html#the-base-element

You've created a character class with the square brackets. Remove them. You want a "negative lookbehind", see comment below for info on syntax. Not all languages support this regex feature though.
Javascript doesn't support lookbehind. This may help though: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript

You can use
string.replace(/(a.+href=)"(?!http)(.+)"/gi, '$1"path/$2"')

for example sake, I just made a variable with a couple links in it. You can easily adapt the .replace() to work with however you get the links.
var content = 'linklinklink';
// whatever you want to prefix link with
var base='http://somsite.net';
content = content.replace(/(href=")(?!https?:\/\/)([^"]*)/gi,'$1'+base+'/$2').replace(/\/+/g,'/');

Thanks everyone.
I was able to replace relative paths ONLY by using the following syntax:
var basepath = "pathto/";
var html = html.replace(/(<(a|img)[^>]+(href|src)=")(?!http)([^"]+)/g, '$1'+basepath+'$4');

Related

Replace part of string with other text?

I have a variable
link = 'dog.jpg'
How do I write code to change link to = dog.webm instead?
I've tried link.text.replace('jpg', 'webm'); but it has no effect?
https://jsfiddle.net/m5Lp6a1z/
You need to make an actual assignment back to the link variable:
link = link.replace('jpg', 'webm');
But actually, a regex replacement targeting only JPEG extensions would be probably be safer here:
link = link.replace(/\.jpg$/, '.webm');
If you want to Replace Text Then you need to try this,
let link = 'dog.jpg'
link.replace('jpg', 'webm')
you don't need to write link.text.replace
It will help you.

Find and replace regex JavaScript

I know this is a super easy question, but I can't seem to wrap my head about it. I've got a bunch of URLs in varying languages such as:
www.myurl.com?lang=spa
www.myurl.com?lang=deu
www.myurl.com?lang=por
I need to create buttons to quickly switch from any language extension (spa, por, deu, rus, ukr, etc) to another language. I have the following code so far:
var url = window.location.toString();
window.location = url.replace(/lang=xxx/, 'lang=deu');
I just can't figure out the 3-character wildcard character. I know that I need to do some sort of regular expression or something, I'm just not sure how to go about it. Any help?
Thanks in advance
You can use
([&?]lang=)\w+
This will work with urls like www.myurl.com?foo=bar&lang=por&bar=foo too.
Instead of lang=deu, you'll have to replace with $1deu.
Try ... or .{3} or \w{3} or even [a-z]{3}, depending on how specific you want to be.
var s = 'www.myurl.com?lang=spa';
s.replace(/lang=[a-z]{3}/, 'lang=deu');
// => "www.myurl.com?lang=deu"
Here's a railroad diagram of the above example:
Use /lang=[a-z][3}/, here's an example:
/lang=[a-z]{3}/
Debuggex Demo

How to run regex on webpage with javascript

I'm trying to make a bit of a crude ad-blocker with javascript
The code I currently have:
var pattern = '<iframe(.*?)</iframe>|<object(.*?)</object>';
if (document.body.parentNode.innerHTML.match(pattern))
{
document.body.parentNode.innerHTML =
document.body.parentNode.innerHTML.replace(pattern, '<b>AD BLOCKED</b>');
}
The problem is that the page reloads. Is there a way I can stop the page from reloading? (My main target is adsense)
This does not seem right, since you're just wanting to replace the html on the page. I can't imagine what that will do. To answer your Regex question, though, try this.
var pattern = /<iframe.*<\/iframe>/gi;
document.body.innerHTML =
document.body.innerHTML.replace(pattern, '<strong>bye iframe</strong>');
replace() will swap out all the matches found by the RegExp with the second parameter.
/<iframe.*<\/iframe>/ is a regular expression matching anything within iframe tags.
gi modifies the regex telling it to be global and case-insensitive.
Again, you will probably have some unexpected behavior rewriting the innerHTML of the body, so I'd rethink your approach. Perhaps you could use jQuery to find the tags you don't want and hide or remove them. (example here)

javascript regular expression

I want to match some links from a web content. I know I can use file_get_contents(url) to do this in php. How about in javascript?
For regular expression, like
contents
How can I use js regular expression to match this (match only once, do not greedy). I try to use this
/^\<a href=\"someurl\/something\" id=\"someid\"\>(+?)\<\/a\>$/
but it doesn't work.
Can someone help?
Thanks!
You should know that parsing HTML with regex is not the optimal way to solve this problem, and if you have access to a live DOM of the page, you should use DOM methods instead. As in, you should use
document.getElementById('someid').innerHTML // this will return 'contents'
instead of a regex.
I'd highly recommend using a library like jQuery to get the element, and then get the contents via a .text() call. It's much more simple and reliable than trying to parse HTML with regex.
DOM and jQuery suggestions are better but if you still want to use regex then try this:
/^<a href=".*?" id=".*?">(.*?)<\/a>$/
You might as well create the elements with jQuery
var elements = $(html);
var links = elements.find('a');
links.each(function(i, link){
//Do the regexp matching in here if you wish to search for specific urls only
});
In bigger documents, using the DOM is way quicker than regexping the whole thing as text.
Try this~
try {
boolean foundMatch = subjectString.matches("(?im)<a[^>]*href=(\"[^\"]*\"|'[^']*'|[^\\s>]*)[^>]*>.*?</a>");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Match double quotation marks,single quotes and empty.
contents
<a href='someurl/something' id='someid'>contents</a>
<a href=someurl/something id=someid>contents</a>

REGEX / replace only works once

I'm using REGEX and js replace to dynamically populate a variable in a href. The following code works the first time it is used on the page, but if a different variable is passed to the function, it does not replace ANYTHING.
function change(fone){
$("a[href^='/application']").each(function(){
this.href = this.href.replace(/device=.*/,"device="+ fone);
});
}
The problem is that this.href actually returns a full absolute URL. So even your HTML is <a href="/foo"> the .href property will return http://mydomain.com/foo.
So your href attributes is being populated with a full absolute URL, and then the a[href^='/application'] selector doesn't match anymore, because the href attribute starts with the domain name, instead of /application.
.href returns a fully qualified URL, e.g. `http://www.mydomain.com/application/...'. So the selector doesn't work the 2nd time around since your domain relative URL has been replaced with a full URL, and it's looking for things that start with "/application".
Use $(this).attr('href').replace... instead.
Fiddle here: http://jsfiddle.net/pcm5K/3/
As Squeegy says, you're changing the href the first time around so it no longer begins with /application - the second time around it begins with http://.
You can use the jQuery Attribute Contains Selector to get the links, and it's probably also better practice to use a capture group to do the replacement. Like so:
$("a[href*='/application']").each(function(){
this.href = this.href.replace(/(device=)\w*/, "$1" + fone);
});
You'll need to add the g flag to match all instances of the pattern, so your regular expression will look like this:
/device=.*/g
and your code will look like:
this.href = this.href.replace(/device=.*/g,"device="+ fone);
The reason is that unless all you links start as "/device=." the regex wont work.
you need to use /.*device=.*/
the lack of global flag is not the problem. its the backslash in your pattern.

Categories

Resources