Setting charset for a specific div - javascript

Is it possible to assign a charset for at specific div? So that you can have more than one charset on a page.
I'm currently importing snippets of text to my site via JS, and some of this text requires the UTF-8 charset. To be sure that my text is shown right on every page it is included (Sometimes external sites), I force the metatag into all the sites.
Is it possible to apply this charset to only a specific div, span or something like that?

No it is not, and it also entirely unnecessary.
The <meta> element declaring a charset or, better, the equivalent HTTP header is only there to help the browser correctly interpret the HTML text. Once the browser has done so, it constructs a DOM out of it and you may essentially treat the text as having no concrete charset after this point. For all intends and purposes the text exists as text in the DOM, not as binary representation which must be interpreted by a charset decoder.
When you're adding new content to the DOM via Javascript, the same ideas apply. The browser needs to fetch the new content via HTTP and the content's encoding should be denoted by an HTTP header. The browser can convert the text from the specific encoding to "DOM text" based on that, after which is doesn't matter anymore what encoding it was in.
Therefore, you can perfectly mix and match encodings from different sources being delivered in separate HTTP responses within the same page/DOM without having to worry about a "global" encoding.

I think, You can't use multi charset on one page, but use an iframe can resolve it the problem.

Related

Safely encoding <script> content in ASP.NET Core (ETAGO Problem)

The project I am working on requires user generated server data to be encoded to JSON and sent down with the HTML document in script tags. At the moment I am doing this with the TagBuilder class using the InnerHtml.AppendHtml(...) method to write the script content.
I have since discovered I have to escape / encode the script content as if the user content has for whatever reason the text "</script>"somewhere, the HTML parser ends the script tag (other HTML probably has various side effects as well).
I read this blog post which describes how to handle the situation in an Node.js environment by using the jsesc library. Does anything similar exist for .NET (ideally Core or Standard)?
I wanted to ask before I roll my own as I'm always weary of doing that for security related code.
Edit
Due to time constraints, for the time being I have injected the JSON into a HTML element as an attribute value, and the ASP.Net Core engine automatically encodes those correctly.
Unfortunately it does increase the size of the HTML document a little more than I would like, as double quotes are encoded as ", but it is what it is.
I'm leaving this question open though in case an answer to my original question comes along.

Do modern browsers support base64 encoded JS or CSS chunks in HTML like they do for images?

I'm writing an HTML mail viewer which gets MIME source on input (with HTML mail body and related objects attached). I would like to base64 encode all embedded objects (like images which appear in the message). However, I'm not sure if it's gonna work with other types of embedded resources like CSS or JS files.
Although it's not common for emails to have CSS or JS as separate files attached to the message (rather than be included directly in HTML), this is still possible and I want my mail viewer to be prepared to this situation.
For now, I'm planning to find things like cid:some-content-id in tag attributes in HTML body and replace all occurrences with base64 encoded bodies of corresponding embedded objects (which have the same content-id in headers of the corresponding object). In this approach I don't even bother what kind of a resource I'm dealing with now (be it image or whatever, I just run regex pattern match). But if it turns out that this method does not work for anything but images, I need to find another solution.
Yes, browsers support DATA URLs (which can be base 64) in place of actual files.
<link rel="stylesheet" href="data:text/css;charset=utf-8;base64,Ym9keXtiYWNrZ3JvdW5kOmJsYWNrO2NvbG9yOndoaXRlO30="></link>
<script src="data:application/javascript;charset=utf-8;base64,d2luZG93LmFsZXJ0KCJ0aGlzIGlzIGV4ZWN1dGVkIGZybSBiYXNlNjQiKTs="></script>
<p>This text is styled white from the data-uri loaded css</p>
That said, Javascript is generally not allowed in emails and CSS can simply be copied into <style> tags, so this is not necessary in your situation.

Cross Site Scripting: Is restricting the use of < and > tags an effective way to reduce Cross Site Scripting?

If I want to prevent XSS, would restricting the input of special characters such as < and > in all text entry forms be the best way to prevent it?
I mean, this would prevent the entry of html tags such as <script> , <img> etc. and effectively block XSS.
Would you agree?
No. The best way to prevent it is to ensure that all the information you output onto the page is appropriately encoded.
Some possible examples of why angle brackets (and other special character blocking) is insufficient:
https://security.stackexchange.com/questions/36629/cross-site-scripting-without-special-chars
One of the biggest problems with preventing XSS is that a single webpage has many different encoding contexts, some of which may or may not overlap. There's a reason double-encoding is considered inherently dangerous.
Let's see an example. You prohibit < and >, so I can no longer input a HTML element in your page, right? Well, not quite. For example, if you put the text I loaded into an attribute, it will be interpreted differently:
onload="document.write('<script>window.alert("Gotcha!")</script>')"
There's plenty of such opportunities, and each needs their own variant of correct encoding. Even encoding the input as proper HTML text (e.g. turning < into <) may be a vulnerability if the text is then taken in javascript, and used in something like innerHTML, for example.
The same kind of issue occurs with any kind of URL (img src="javascript:alert('I can't let you do that, Dave')"), or with embedding user input in any kind of script (\x3C). URL is especially dangerous, since it does triple encoding - URL encoding, (X)HTML encoding and possibly JavaScript encoding. I'm not sure if it's even possible to have user input that is safe under those conditions :D
Ideally, you want to limit your area of exposure as much as you can. Do not read from the generated document unless you trust the user (e.g. an admin). Avoid multiple encoding, and always make sure you know exactly where each potentially unsafe encoding goes. In XHTML, you have a great option in CDATA sections, which make encoding potentially dangerous code easy, but that might be interpreted incorrectly by browsers that don't support XHTML correctly. Otherwise, use a proper documented encoding method - in JS, this would be innerText. Of course, you need to make sure that your JS script isn't compromised due to user data.

Escape HTML tags. Any issue possible with charset encoding?

I have a function to escape HTML tags, to be able to insert text into HTML.
Very similar to:
Can I escape html special chars in javascript?
I know that Javascript use Unicode internally, but HTML pages may be encoded in different charsets like UTF-8 or ISO8859-1, etc..
My question is: There is any issue with this very simple conversion? or should I take into consideration the page charset?
If yes, how to handle that?
PS: For example, the equivalente PHP function (http://php.net/manual/en/function.htmlspecialchars.php) has a parameter to select a charset.
No, JavaScript lives in the Unicode world so encoding issues are generally invisible to it. escapeHtml in the linked question is fine.
The only place I can think of where JavaScript gets to see bytes would be data: URLs (typically hidden beneath base64). So this:
var markup = '<p>Hello, '+escapeHtml(user_supplied_data);
var url = 'data:text/html;base64,'+btoa(markup);
iframe.src = url;
is in principle a bad thing. Although I don't know of any browsers that will guess UTF-7 in this situation, a charset=... parameter should be supplied to ensure that the browser uses the appropriate encoding for the data. (btoa uses ISO-8859-1, for what it's worth.)

Opera User-JS: how do I get the raw server response?

I'm writing some user-JS for Opera. It reacts on a request that doesn't have an extension, e.g. /stuff/code/MyFile, or has one not related to JavaScript, e.g. /stuff/code/load.do. The content-type of the response is set to text/html, even though it returns pure JavaScript source (text/javascript). As I don't have access to the server code I simply have to live with this.
The problem now is that I want to format the source with line numbers and such and display it inside Opera. Therefore, I wrote some user-JS to react on AfterEvent.DOMContentLoaded (also tried AfterEvent.load, same thing). It reads e.event.target.body.innerHTML to gain access to the body, i.e. the JavaScript-code.
That alone would work nicely, if only the source wouldn't contain HTML-tags or comparison operators (<, >). Since it does, I never get the output I want. Opera seems to have some internal logic to convert the text/html-response into its own representation format. This includes that e.g. a CRLF after a HTML-tag is removed or code between two "matching" < and > (comparison operators!) are crunched together into one single line applying ="" after each word in there.
And that's where the problem is.
If I request the same URL without my user-JS and then look at the source of the "page" I see a clean JavaScript-code identical to what the server sent out. And this is what I want to get access to.
If I use innerText instead of innerHTML, Opera strips out the HTML-tags making the file different to the original, too.
I also tried to look at outerHTML, outerText and textContent, but they all have the same problems.
I know that Opera doesn't do anything wrong here. The server says it's a text/html and Opera simply does what it usually does with a text/html-kind of response.
Therefore, my question is: is there any way to get the untouched response with a user-JS?
There isn't any way to access the pre-parsed markup from JS. The only way to do that would be to use XMLHttpRequest to request the content yourself.

Categories

Resources