filtering escaped angle brackets in javascript - javascript

I have a javascript feature that allows users to place arbitrary text strings on a page. I don't want them to be able to insert html or other code, just plain text.
So I figure that stripping out all angle brackets(< >) would do the trick. (I don't care if they have 'broken' html on the page, or that they're not able to put angle brackets in their text) Then I realized I had to filter escaped angle brackets (< >) and probably others.
What all do I need to filter out, for security? Will removing all angle brackets do the trick?

Will removing all angle brackets do the trick?
Just replace all angle brackets with their escaped form. That way, people can write as much "code" as they like, and it just shows up as plain-text instead.

Make sure that the first thing you do is replace & with &
a) For HTML content, just < should be enough.
b) For attribute values, for example if it is going in <input name="sendtoserver" value="custom text"/> you need to take care of double-quotes, but that is all that is necessary. Still it is good to also do < and >.
It depends on the context. If you want to play it safe, tell your JavaScript to use innerText which does not need encoding, but you may want to set the css to white-space:pre-wrap. This is less error prone, but also less browser-compatible.
c) On a loosely related note, when escaping JavaScript strings terminators using backslashes, The item that might sneak up on you is if you place content in a script, you need to take care of </script> (not case sensitive) You can just escape </ or / should be enough

Related

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

how to set ASCII code to the button when the page loads

I'm designing chess board in HTML. &#9814 is the code to display the WHITE ROOK.
I'm trying to set the value while page loads and it is displaying it as a string, but the ROOK is not coming on the button
function load() {
document.getElementById('A1').value="&#9814";
}
function load() {
document.getElementById('A1').innerHTML="♖";
}
http://jsfiddle.net/RM5VD/2/
The notation &#9814 or ♖ has no special meaning in JavaScript; they are just strings of characters, though these strings can be assigned to the innerHTML property, causing HTML parsing.
The simplest way use a Unicode character in JavaScript to insert it as such, though this requires a suitable editor and the use of the UTF-8 character encoding. Example:
document.getElementById('A1').value = '♖';
The next simplest is to use the JavaScript escape notation, namely \u followed by exactly four hexadecimal digits. Since WHITE CHESS ROOK is U+2656 (2656 hex = 9815 decimal), you would use this:
document.getElementById('A1').value="\u2656";
This makes sense only if the element modified has the value property as per HTML specs. For example, <input type=button> has it, but button doesn’t. But this affects just the left hand of the assignment, i.e. what you assign the string to.
Beware that font support to chess piece characters like this is rather limited. Moreover, browsers may have their own ideas of the font to be used in buttons. In practice, you should probably use some downloadable font.
You need to rewrite the function like this:
function load() {
document.getElementById('A1').value=String.fromCharCode(9814);
}
It's not clear exactly what kind of element you're modifying, but you may need to modify the innerHTML instead of the value depending on your situation.
The way you have it, it is passing the text as a literal string, not a representation of a single character.
jsFiddle example

Selecting element by unescaped data attribute

Without going into specifics why I'm doing this... (it should be encoded to begin with, but it's not for reasons outside my control)
Say I have a bit of HTML that looks like this
<tr data-path="files/kissjake's files"">...</tr> so the actual data-path is files/kissjake's files"
How do I go about selecting that <tr> by its data path?
The best I can currently do is when I bring the variables into JS and do any manipulation, I URLEncode it so that I'm always working with the encoded version. jQuery seems smart enough to determine the data-path properly so I'm not worried about that.
The problem is on one step of the code I need to read from a data-path of another location, and then compare them.
Actually selecting this <tr> is what's confusing me.
Here is my coffeescript
oldPriority = $("tr[data-path='#{path}']").attr('data-priority')
If I interpolate the URLEncoded version of the path, it doesn't find the TR. And I can't URLDecode it because then jQuery breaks as there are multiple ' and " conflicting in the path.
I need some way to select any <tr> that matches a particular data-attribute, even if its not encoded in the html to begin with
First, did you mean to have the extra " in there? You will have to escape that, as it's not valid HTML.
<tr data-path="files/kissjake's files"">...</tr>
To select it, you need to escape inside the selector. Here's an example of how that would look:
$("tr[data-path='files/kissjake\\'s files\"']")
Explanation:
\\' is used to escape the ' inside the CSS selector. Since ' is inside other single quotes, it must be escaped at the CSS level. The reason there are two slashes '\` is we must escape a slash so that it makes it into the selector string.
Simpler example: 'John\\'s' yields the string John\'s.
\" is used to escape the double quote which is contained inside the other double quotes. This one is being escaped on the JS level (not the CSS level), so only one slash is used because we don't need a slash to actually be inside the string contents.
Simpler example: 'Hello \"World\"' yields the string Hello "World".
Update
Since you don't have control over how the HTML is output, and you are doomed to deal with invalid HTML, that means the extra double quote should be ignored. So you can instead do:
$("tr[data-path='files/kissjake\\'s files']")
Just the \\' part to deal with the single quote. The extra double quote should be handled by the browser's lenient HTML parser.
Building off of #Nathan Wall's answer, this will select all <tr> tags with a data-path attribute on them.
$("tr[data-path]");

html entity is not rendered

If I just put in XUL file
<label value="°C"/>
it works fine. However, I need to assing ° value to that label element and it doesn't show degree symbol, instead literal value.
UPD
sorry guys, I just missed couple words here - it doesn't work from within javascript - if I assign mylablel.value = degree + "°" - this will show literal value.
It does show degree symbol only if I put above manually in XUL file.
What happens when you use a JavaScript escape, like "\u00B0C", instead of "°C"?
Or when using mylabel.innerHTML instead of mylabel.value? (According to MDC, this should be possible.)
EDIT: you can convert those entities to JavaScript escapes using the Unicode Code Converter.
This makes sense to me. When you express the entity in an attribute value within XML markup, the XML parser interpolates the entity reference and then sets the label value to the result. From Javascript, however, there's no XML parser to do that work for you, and in fact life would be pretty nasty if there were! Note that when you set the value attribute (from Javascript) of an <input type='text'> element, you don't have to worry about having to escape XML entities (or even angle brackets, for that matter). However, you do have to worry about XML entities when you're setting the "value" attribute within XML markup.
Another way to think about it is this: XML entity notation is XML syntax, not Javascript syntax. In Javascript, you can produce special characters using 16-bit Unicode escape sequences, which look like \u followed by a four-digit hex constant. As noted in Marcel Korpel's answer, if you know what Unicode value is produced by the XML entity, then you should be able to use that directly from Javascript. In this case, you could use "\u00B0".
This way it will not work ,can you convert it to be like this
<label>°C</label>

Javascript and CSS, using dashes

I'm starting to learn some javascript and understand that dashes are not permitted when naming identifiers. However, in CSS it's common to use a dash for IDs and classes.
Does using a dash in CSS interfere with javascript interaction somehow? For instance if I were to use getElementByID("css-dash-name"). I've tried a few examples using getElementByID with dashes as a name for a div ID and it worked, but I'm not sure if that's the case in all other contexts.
Having dashes and underscores in the ID (or class name if you select by that) that won't have any negative effect, it's safe to use them. You just can't do something like:
var some-element = document.getElementByID('css-dash-name');
The above example is going to error out because there is a dash in the variable you're assigning the element to.
The following would be fine though since the variable doesn't contain a dash:
var someElement = document.getElementByID('css-dash-name');
That naming limitation only exists for the javascript variables themselves.
It's only in the cases where you can access the elements as properties that it makes a difference. For example form fields:
<form>
<input type="text" name="go-figure" />
<input type="button" value="Eat me!" onclick="...">
</form>
In the onclick event you can't access the text box as a property, as the dash is interpreted as minus in Javascript:
onclick="this.form.go-figure.value='Ouch!';"
But you can still access it using a string:
onclick="this.form['go-figure'].value='Ouch!';"
Whenever you have to address a CSS property as a JavaScript variable name, CamelCase is the official way to go.
element.style.backgroundColor = "#FFFFFF";
You will never be in the situation to have to address a element's ID as a variable name. It will always be in a string, so
document.getElementById("my-id");
will always work.
Using Hypen (or dash) is OK
I too is currently studying JavaScript, and as far as I read David Flanagan's book (JavaScript: The Definitive Guide, 5th Edition) — I suggest you read it. It doesn't warn me anything about the use of hypen or dash (-) in IDs and Classes (even the Name attribute) in an HTML document.
Just as what Parrots already said, hypens are not allowed in variables, because the JavaScript interpreter will treat it as a minus and/or a negative sign; but to use it on strings, is pretty much ok.
Like what Parrots and Guffa said, you can use the following ...
[ ] (square brackets)
'' (single quotation marks or single quotes)
"" (double quotation marks or double quotes)
to tell the JavaScript interpreter that your are declaring strings (the id/class/name of your elements for instance).
Use Hyphen (or dash) — for 'Consistency'
#KP, that would be ok if he is using HTML 4.1 or earlier, but if he is using any versions of XHTML (.e.g., XHTML 1.0), then that cannot be possible, because XHTML syntax prohibits uppercase (except the !DOCTYPE, which is the only thing that needs to declared in uppercase).
#Choy, if you're using HTML 4.1 or earlier, going to either camelCase or PascalCase will not be a problem. Although, for consistency's sake as to how CSS use separators (it uses hypen or dash), I suggest following its rule. It will be much more convinient for you to code your HTML and CSS alike. And moreoever, you don't even have to worry if you're using XHTML or HTML.
IDs are allowed to contain hyphens:
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
And there is no restriction when using IDs in JavaScript except if you want to refer to elements in the global scope. There you need to use:
window['css-dash-name']
Other answers are correct as far as where you can and can't use hyphens, however at the root of the question, you should consider the idea of not using dashes/hyphens in your variable/class/ID names altogether. It's not standard practice, even if it does work and requires careful coding to make use of it.
Consider using either PascalCase (all words begin in capital) or camelCase (first word begins in lowercase, following words being in uppercase). These are the two most common, accepted naming conventions.
Different resources will recommend different choices between the two (with the exception of JavaScript which is pretty much always recommended camelCase). In the end as long as you are consistent in your approach, this is the most important part. Using camel or Pascal case will ensure you don't have to worry about special accessors or brackets in your code.
For JavaScript conventions, try this question/discussion:
javascript naming conventions
Here's another great discussion of conventions for CSS, Html elements, etc:
What's the best way to name IDs and classes in CSS and HTML?
It would cause an error in this case:
const fontSize = element.style.font-size;
Because including a hyphen prevents the property from being accessed via the dot operator. The JavaScript parser would see the hyphen as a subtraction operator. Correct way would be:
const fontSize = element.style['font-size']
No, this won't cause an issue. You're accessing the ID as a string (it's enclosed in quotes), so the dash poses no problem. However, I would suggest not using document.getElementById("css-dash-name"), and instead using jQuery, so you can do:
$("#css-dash-name");
Which is much clearer. the jQuery documentation is also quite good. It's a web developers best friend.

Categories

Resources