I have a custom attribute that is being filled from a database. This attribute can contain an embedded single quote like this,
MYATT='Tony\'s Test'
At some pont in my code I use jquery to copy this attribute to a field like this,
$('#MY_DESC').val($(recdata).attr('MYATT'));
MY_DESC is a text field in a dialog box. When I display the dialog box all I see in the field is
Tony\
What I need to see is,
Tony's Test
How can I fix this so I can see the entire string?
Try:
MYATT='Tony's Test'
I didn't bother verifying this with the HTML spec, but the wikipedia entry says:
The ability to "escape" characters in this way allows for the characters < and & (when written as < and &, respectively) to be interpreted as character data, rather than markup. For example, a literal < normally indicates the start of a tag, and & normally indicates the start of a character entity reference or numeric character reference; writing it as & or & or & allows & to be included in the content of elements or the values of attributes. The double-quote character ("), when used to quote an attribute value, must also be escaped as " or " or " when it appears within the attribute value itself. The single-quote character ('), when used to quote an attribute value, must also be escaped as ' or ' (should NOT be escaped as ' except in XHTML documents) when it appears within the attribute value itself. However, since document authors often overlook the need to escape these characters, browsers tend to be very forgiving, treating them as markup only when subsequent text appears to confirm that intent.
In case you won't use double-quotes, put your custom attribute into them :)
If not, I suggest escape the value.
Before setting the value of your text field, you might try running a regular expression against the string to remove all backslashes from the string.
If you do this:
alert($(recdata).attr('MYATT'));
You will see the same result of "Tony\" meaning that the value isn't being properly consumed by the browser. The escaped \' value isn't working in this case.
Do you have the means to edit these values as they are being produced? Can you parse them to include escape values before being rendered?
Related
We have a FTL that has -
<a href='javascript:func("${event.title}");'>Link</a>
When the value has an apostrophe, like Norman'S Birthday - it breaks.
We used js_string to fix it -
Link
But we also had to change the double quotes around $expression to single quotes - this does Not work with double quotes.
Question -
Is there a way to fix this with the original double quotes around the $expression ?
Something like -
<a href='javascript:func("${event.title?js_string}");'>Link</a>
Note the above does not work.
You need two kind of escaping applied here: JavaScript string escaping and HTML escaping. You need both, as the two formats are independent, and the JavaScript is embedded into HTML.
How to do that... the primitive way is event.title?js_string?html, but it's easy to forget to add that ?html, so don't do that. Instead, use auto-escaping (see https://freemarker.apache.org/docs/dgui_quickstart_template.html#dgui_quickstart_template_autoescaping). If you can't use that form of auto-escaping for some reason (like you are using some old FreeMarker version and aren't allowed to upgrade), put the whole template into <#escape x as x?html>...</#escape>. In both cases, you can just write ${event.title?js_string}, and it will just work.
However, if you are using #escape, ensure that the incompatible_improvements setting (see https://freemarker.apache.org/docs/pgui_config_incompatible_improvements.html) is at least 2.3.20, or else ?html doesn't escape '.
Modify the value of event.title so that single quotes are replaces with ' and double quotes are replaces with ", then you won't have to worry at all about which kind of quotes you use for the rest of it.
?js_string should be outside curly brackets {}. Now you should use double quotes " " around href value, if you wanna handle single quotes ' ' inside the string and vice versa. Single and double quote can not be used in same string, so either of them needs to be replaced to other.
solution:
Link
Same expression can be used in JavaScript to handle special characters
I have following C# code in my ASP.NET application:
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
CompoundErrStr is an error message generated by SQL Server (exception text bubbled up from the stored procedure). If it contains any table column names they are enclosed in single quotes and JavaScript breaks during execution because single quotes are considered a string terminator.
As a fix for single quotes I changed my code to this:
CompoundErrStr = CompoundErrStr.Replace("'", #"\'");
string script = #"alert('Message head:\n\n" + CompoundErrStr + " message tail.');";
System.Web.UI.ScriptManager.RegisterClientScriptBlock(this, this.GetType(), "Test", script, true);
and it now works fine.
However, are there any other special characters that need to be escaped like this? Is there a .Net function that can be used for this purpose? Something similar to HttpServerUtility.HtmlEncode but for JavaScript.
EDIT I use .Net 3.5
Note: for this task you can't (and you shouldn't) use HTML encoders (like HttpServerUtility.HtmlEncode()) because rules for HTML and for JavaScript strings are pretty different. One example: string "Check your Windows folder c:\windows" will be encoded as "Check your Windows folder c:'windows" and it's obviously wrong. Moreover it follows HTML encoding rules then it won't perform any escaping for \, " and '. Simply it's for something else.
If you're targeting ASP.NET Core or .NET 5 then you should use System.Text.Encodings.Web.JavaScriptEncoder class.
If you're targeting .NET 4.x you can use HttpUtility.JavaScriptStringEncode() method.
If you're targeting .NET 3.x and 2.x:
What do you have to encode? Some characters must be escaped (\, " and ') because they have special meaning for JavaScript parser while others may interfere with HTML parsing so should escaped too (if JS is inside an HTML page). You have two options for escaping: JavaScript escape character </kbd> or \uxxxx Unicode code points (note that \uxxxx may be used for them all but it won't work for characters that interferes with HTML parser).
You may do it manually (with search and replace) like this:
string JavaScriptEscape(string text)
{
return text
.Replace("\\", #"\u005c") // Because it's JS string escape character
.Replace("\"", #"\u0022") // Because it may be string delimiter
.Replace("'", #"\u0027") // Because it may be string delimiter
.Replace("&", #"\u0026") // Because it may interfere with HTML parsing
.Replace("<", #"\u003c") // Because it may interfere with HTML parsing
.Replace(">", #"\u003e"); // Because it may interfere with HTML parsing
}
Of course </kbd> should not be escaped if you're using it as escape character! This blind replacement is useful for unknown text (like input from users or text messages that may be translated). Note that if string is enclosed with double quotes then single quotes don't need to be escaped and vice-versa). Be careful to keep verbatim strings on C# code or Unicode replacement will be performed in C# and your client will receive unescaped strings. A note about interfere with HTML parsing: nowadays you seldom need to create a <script> node and to inject it in DOM but it was a pretty common technique and web is full of code like + "</s" + "cript>" to workaround this.
Note: I said blind escaping because if your string contains an escape sequence (like \uxxxx or \t) then it should not be escaped again. For this you have to do some tricks around this code.
If your text comes from user input and it may be multiline then you should also be ready for that or you'll have broken JavaScript code like this:
alert("This is a multiline
comment");
Simply add .Replace("\n", "\\n").Replace("\r", "") to previous JavaScriptEscape() function.
For completeness: there is also another method, if you encode your string Uri.EscapeDataString() then you can decode it in JavaScript with decodeURIComponent() but this is more a dirty trick than a solution.
While the original question mentions .NET 3.5, it should be known to users of 4.0+ that you can use HttpUtility.JavaScriptStringEncode("string")
A second bool parameter specifies whether to include quotation marks (true) or not (false) in the result.
All too easy:
#Html.Raw(myString)
I've researched stackoverflow and find similar results but it is not really what I wanted.
Given an xml string: "<a b=\"c\"></a>" in javascript context, I want to create a regex that will capture the attribute value including the quotation marks.
NOTE: this is similar if you're using single quotation marks.
Currently I have a regular expression tailored to the XML specification:
[_A-Za-z][\w\.\-]*(?:=\"[^\"]*\")?
[_A-Za-z][\w\.\-]* //This will match the attribute name.
(?:=\"[^\"]*\")? //This will match the attribute value.
\"[^\"]*\" //This part concerns me.
My question now is, what if the xml string looks like this:
<shout statement="Hi! \"Richeve\"."></shout>
I know this is a dumb question to ask but I just want to capture rare cases that this scenario might happen (I know the coder can use single quotes on this scenario) but there are cases that we don't know the current value of the attribute given that the attribute value changes dynamically at runtime.
So to make this clearer, the result of that using the correct regex should be:
"Hi! \"Richeve\"."
I hope my question is clear. Thanks for all the help!
PS: Note that the language context is Javascript and I know it is tempting to use lookbehinds but currently lookbehinds are not supported.
PS: I know it is really hard to parse XML but I have an elegant solution to this :) so I just need this small problem to be solved. So this problem only main focus is capturing quotation marked string tokens containing quotation marks inside the string token.
The standard pattern for content with matching delimiters and embedded escaped delimiters goes like this:
"[^"\\]*(?:\\.[^"\\]*)*"
Ignoring the obvious first and last characters in the pattern, here's how the rest of the pattern works:
[^"\\]*: Consume all characters until a delimiter OR backslash (matching Hi! in your example)
(?:\\.[^"\\]*)* Try to consume a single escaped character \\. followed by a series of non delimiter/backslash characters, repeatedly (matching \"Richeve first and then \". next in your example)
That's it.
You can try to use a more generic delimiter approach using (['"]) and back references, or you can just allow for an alternate pattern with single quotes like so:
("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')
Here's another description of this technique that might also help (see the section called Strings): http://www.regular-expressions.info/examplesprogrammer.html
Description
I'm pretty really sure embedding double quotes inside a double quoted attribute value is not legal. You could use the unicode equivalent of a double quote \x22 inside the value.
However to answer the question, this expression will:
allow escaped quotes inside attribute values
capture the attribute statement 's value
allow attributes to appear in any order inside the tag
will avoid many of the edge cases which will trip up pattern matching inside html text
doesn't use lookbehinds
<shout\b(?=\s)(?=(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*?\sstatement=(['"])((?:\\['"]|.)*?)\1(?:\s|\/>|>))(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/shout>
Example
Pretty Rubular
Ugly RegexPlanet set to Javascript
Sample Text
Note the difficult edge case in the first attribute :)
<shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
Matches
Group 0 gets the entire tag from open to close
Group 1 gets the quote surrounding the statement attribute value, this is used to match the closing quote correctly
Group 2 gets the statement attribute value which may include escaped quotes like \" but not including the surrounding quotes
[0][0] = <shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
[0][1] = "
[0][2] = Hi! \"Richeve\".
Without going into specifics why I'm doing this... (it should be encoded to begin with, but it's not for reasons outside my control)
Say I have a bit of HTML that looks like this
<tr data-path="files/kissjake's files"">...</tr> so the actual data-path is files/kissjake's files"
How do I go about selecting that <tr> by its data path?
The best I can currently do is when I bring the variables into JS and do any manipulation, I URLEncode it so that I'm always working with the encoded version. jQuery seems smart enough to determine the data-path properly so I'm not worried about that.
The problem is on one step of the code I need to read from a data-path of another location, and then compare them.
Actually selecting this <tr> is what's confusing me.
Here is my coffeescript
oldPriority = $("tr[data-path='#{path}']").attr('data-priority')
If I interpolate the URLEncoded version of the path, it doesn't find the TR. And I can't URLDecode it because then jQuery breaks as there are multiple ' and " conflicting in the path.
I need some way to select any <tr> that matches a particular data-attribute, even if its not encoded in the html to begin with
First, did you mean to have the extra " in there? You will have to escape that, as it's not valid HTML.
<tr data-path="files/kissjake's files"">...</tr>
To select it, you need to escape inside the selector. Here's an example of how that would look:
$("tr[data-path='files/kissjake\\'s files\"']")
Explanation:
\\' is used to escape the ' inside the CSS selector. Since ' is inside other single quotes, it must be escaped at the CSS level. The reason there are two slashes '\` is we must escape a slash so that it makes it into the selector string.
Simpler example: 'John\\'s' yields the string John\'s.
\" is used to escape the double quote which is contained inside the other double quotes. This one is being escaped on the JS level (not the CSS level), so only one slash is used because we don't need a slash to actually be inside the string contents.
Simpler example: 'Hello \"World\"' yields the string Hello "World".
Update
Since you don't have control over how the HTML is output, and you are doomed to deal with invalid HTML, that means the extra double quote should be ignored. So you can instead do:
$("tr[data-path='files/kissjake\\'s files']")
Just the \\' part to deal with the single quote. The extra double quote should be handled by the browser's lenient HTML parser.
Building off of #Nathan Wall's answer, this will select all <tr> tags with a data-path attribute on them.
$("tr[data-path]");
We're allowing users to upload pictures and provide a text description. Users can view this through a pop up box (actually a div ) via javascript. The uploaded text is a parameter to a javascript function. I 'm worried about XSS and also finding issues with HTMLEncode().
We're using HTMLEncode to guard against XSS. Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'. We also need to replace single and double quotes that people may include. Is there a single function that will do all these special type characters or must we do that manually via .NET string.Replace()?
Unfortunately, we're finding that HTMLEncode() only replaces '<' and '>'.
Assuming you are talking about HttpServerUtility.HtmlEncode, that does encode the double-quote character. It also encodes as character references the range U+0080 to U+00FF, for some reason.
What it doesn't encode is the single quote. Bit of a shame but you can usually work around it by using only double quotes as attribute value delimiters in your HTML/XML. In that case, HtmlEncode is enough to prevent HTML-injection.
However, javascript is in your tags, and HtmlEncode is decidedly not enough to escape content to go in a JavaScript string literal. JavaScript-encoding is a different thing to HTML-encoding, so if that's the reason you're worried about the single quote then you need to employ a JS string encoder instead.
(A JSON encoder is a good start for that, but you would want to ensure it encodes the U+2028 and U+2029 characters which are, annoyingly, valid in JSON but not in JavaScript. Also you might well need some variety of HTML-escaping on top of that, if you have JavaScript in an HTML context. This can get hairy; it's usually better to avoid these problems by hiding the content you want in plain HTML, for example in a hidden input or custom attribute, where you can use standard HTML-escaping, and then read that data from the DOM in JS.)
If the text description is embedded inside a JavaScript string literal, then to prevent XSS, you will need to escape special characters such as quotes, backslashes, and newlines. The HttpUtility.HtmlEncode method is not suitable for this task.
If the JavaScript string literal is in turn embedded inside HTML (for example, in an attribute), then you will need to apply HTML encoding as well, on top of the JavaScript escaping.
You can use Microsoft's Anti-Cross Site Scripting library to perform the necessary escaping and encoding, but I recommend that you try to avoid doing this yourself. For example, if you're using WebForms, consider using an <asp:HiddenField> control: Set its Value property (which will be HTML-encoded automatically) in your server-side code, and access its value property from client-side code.
how about you htmlencode all of the input with this extended function:
private string HtmlEncode(string text)
{
char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));
foreach (char c in chars)
{
int value = Convert.ToInt32(c);
if (value > 127)
result.AppendFormat("&#{0};", value);
else
result.Append(c);
}
return result.ToString();
}
this function will convert all non-english characters, symbols, quotes, etc to html-entities..
try it out and let me know if this helps..
If you're using ASP.NET MVC2 or ASP.NET 4 you can replace <%= with <%: to encode your output. It's safe to use for everything it seems (like HTML Helpers).
There is a good write up of this here: New <%: %> Syntax for HTML Encoding Output in ASP.NET 4 (and ASP.NET MVC 2)