Restrict Javascript in subsection of HTML containing rich-text editor - javascript

I'm working on a legacy web app that uses both the rich-text summernote editor, which can save formatted notes to our server, as well as a lot of instances of in-line Javascript.
Summernote can permit writing HTML/CSS/JS via it's "code" view, but doesn't seem to have any built-in support for preventing XSS. If you go to their homepage for instance, go to code-view for the editor, and write <script>alert(0)</script> and turn it back to text-view, it will execute the script. We want users to be able to do things like bold content, insert hyperlinks & images, etc, but no JS or probably even CSS.
We've just discovered the developer who originally implemented XSS protection for these editors in the app did a bit of a weak job, and are now trying to sure it up. Thus far, it seems my options are either: (a) creating/sourcing a whitelist which will parse the Summernote HTML and only allow certain HTML elements in the field; and/or (b) implementing CSP headers and moving all in-line JS / CSS to separate files.
Given it's only these editors that need to be able to return valid HTML to users, the rest of the site we can and do fully sanitize content, is it possible to disable inline JS / CSS in only a section of an HTML page (I.e. the area we load the editor's content to)? Or is that not an option / liable to be susceptible to workarounds?
Edit: Just as a note, I recognize that the rich-text editor can't prevent XSS if we're choosing to save content to the server, as it runs on the client-end. I meant that I couldn't find resources / suggestions regarding HTML whitelists, or example sanitization for various servers via their site.
Edit 2: Will definitely be implementing some sort of whitelist to be on the safe-side (possibly CSP as well, but refactoring will be a headache), but still want to know if this is possible: I.e. between two sections of HTML, is it possible to block all inline JS & CSS?

As I read in this note https://github.com/summernote/summernote/issues/1617
You have to validate at serverside anyway. It's a funny behavior, I
agree, but the user is allowed to do anything in his own browser. The
only thing you (you, and not this package) should taker care of, is
not letting users storing harmful code in your database (which is
later displayed in other users browser)
So, basically you are in charge of what users are trying to put in your database or web page. So you need to:
Make a "server-side" validation of the content (delete <script> tags with content, for example)
Implementing a client-size sanitization (it does not prevent you to make a server-side validation) to help the user notify he is not allowed to include scripts. This could be done while switching between code view and preview (See Summernote event for switch from code view to editor)

Related

Forbid javascript in a div

At the moment I am working on a website. Inside this website administrators shall be able to post text.
I´d like to give the user a possibility to use HTML-Code, but I do not want them to be able to post javascript code.
Is there an html-Tag (or workaround) to prohibit javascript?
There's no plain html tag that blocks inline JS from running.
Between the many workarounds, the most elegant one is to disable inline script tags altogether by using CSP headers, but this may not be possible depending on your current architecture. You could also consider using some sanitization library to clean up the post content, there are simple strategies like using a regex to find <script tags.
I suggest reading https://glebbahmutov.com/blog/disable-inline-javascript-for-security/ to get a better sense of how CSP works and what are your options.
It's also worth reading https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

Dynamic HTML content pages like Dropbox and Soundcloud

Check out the source code of Dropbox's main page or any Soundcloud page. You can see they've got a lot of Scripts going on, and little pure HTML content (article, main, p, div). I've been searching and it seems that way of generating pages is called dynamic content/HTML (correct me if wrong).
So, the function I think it has is to be able to edit multiple external separate files in Javascript (if that's the language it uses since they're scripts) so that the HTML documentes where they're linked to are generated dynamically.
Also, other possible function would be to have one external document, which let's say it's a navigation bar, and so you place it in multiple pages, and when you have to update, you just edit the external document and not each page (hooray!).
Questions:
Is it actually named Dynamic content?
What languages does it requires besides HTML, CSS, and JS? Like PHP or ASP (supposing if any is necesary at all).
Does creating pages in that way affects negatively/positively your website positioning in Google? Since I think when Googlebot reaches the page all it see are scripts.
There are two subtly different definitions of the word dynamic, which may be confusing your search for information about this. I'll answer your questions separately for each.
Dynamic as in "generated from content held in a database"
For example, on this page your reputation score was fetched from Stack Overflow's database and injected into the HTML.
Yes, this would be referred to as dynamic content. In contrast to static content, which would just be fixed files, dynamic content would be built up from its parts for each user who requests it.
Your second set of languages (PHP, etc.) are what read from the database and spit out the corresponding HTML.
Google's bot is smart: it can render pages and will see similar content to what you get in a browser. So generating pages dynamically instead of statically won't count against the site for SEO; dynamically generating lots of pages that are very similar might count against it though.
Dynamic as in "page content that updates without you having to refresh the whole page"
For example, as you wrote your question Stack Overflow tried to find similar questions and show them to you in case it had already been asked. JavaScript was sending a request to their server and updating part of the page in response.
This would also be referred to as dynamic content. The key difference is that it's JavaScript in the page that's making further calls to the server to fetch more content, which is what you're seeing on the minimalist sites you mention. This used to be called dynamic HTML (DHTML); more modern references are more likely to discuss it in terms of AJAX or "single page website".
Typically you'd have PHP or similar running on the web server, responding to the requests for content.
Again, Google's bot is smart enough to cope with this. That won't necessarily be the case for all search engines though.

Is there an option to tell crawlers / bots: "don't use javascript"?

I was searching for this subject with no results, so I consider asking a question. I know that there is an option to make pages loaded by AJAX "crawable", using www.example.com/#!somecontent. But is there an option (i.e. Meta tag or robots.txt variable) that say: "Hey, robots, disable javascipt!"?
It can be used in example by:
1) online javascript games, which has huge amount of javascript, and nothing special for SEO and bots to crawl (bots memory and time saver)
2) To build a site using PHP, HTML, CSS (with meta tags change, etc) for robots. And then add some extra functionality (in example reload only content, but NOT CHANGE META TAGS) using AJAX, that crawlers and bots don't need to analyze. In that case bots see meta tags, and contents, and You prevent default actions for anchors, and user only reload content via AJAX, and meta tags are standard in that case.
PS) The question isn't about: You can do it better, or You can rebuild application in other way. It is about: can we suggest bots to disable javascript.
The quick answer is no, you can not.
The long answer is, if you have more than one version of your content, use Canonical link element in one link that use javascript that link to a page that do not have javascript enabled. You can append to url one parameter like no_js=1 and via server side remove all parts of your HTML that use Javascript.
So, yes, with some work you can do it.

Javascript in MediaWiki

I'd like to use some Javascript on my wiki pages, but I haven't been able to figure out how. I'm using a hosted solution on Wikia. So I am unable to modify the installation, add extensions or hack the settings. But I have admin rights on my wiki so I can access the MediaWiki namespace and MediaWiki:Common.js.
The javascript I want to use (Tangle) will consist of an external script that will be common to a number of pages(but not all pages in the wiki) and some code that will be specific to each page, the kind you would normally put inline in the <script> tag.
The trouble is, Mediawiki sanitizes <script> tags, and I haven't been able to find a way to put them in. I'm trying to make this into an editor-friendly setup that will be used across the wiki, so I'm also trying to avoid hacks and find a proper solution.
Update: New problem
Apparently MediaWiki also sanitizes the HTML5 data attributes, which Tangle relies on heavily. Any ideas on solving that problem is very welcome.
MediaWiki doesn't allow <script> tags in pages for obvious reasons: if it did, anyone could use them to inject JavaScript into your wiki and e.g. steal login credentials.
There are a couple of things you could do:
Write some generic JavaScript code to extract the parameters from something that is allowed on MediaWiki pages, such as a hidden <div>. Be careful not to introduce security holes when doing that.
Add something like this to MediaWiki:Common.js:
importScript('MediaWiki:Tangle/' + wgPageName + '.js');
Then, whenever a user visits the page "Foo", the page "MediaWiki:Tangle/Foo.js" will be loaded as JavaScript. Of course, that page will only be editable by admins, but that might still be enough for your needs. (You could use the same trick to import JS from pages in other namespaces, but that would open a security hole miles wide.)

newbie question about javascript embed code?

I am a javascript newbie. I am trying to write a requirements document, and need some help describing what I am looking for. We want our application to generate a javascript snippet like this:
<script src="http://www.jotform.com/jsform/10511502633"></script>
This will load a web form.
So my question is:
- How does a single script load an entire web form? Is this a JSON?
- What is this called? Is this a cross browser javascript?
- Can anyone point me in the direction of learning more about what this is?
Thank you for your help!
The javascript file is just hosted on an external site. It appears to be dynamically generated, so feel free to use some fancy words ;) But basically, you just include it here, as if it was on your own site.
You could say "The application will generate the required script-tags to include dynamically generated javascript file from an external, third-party site".
Offcourse you need to take special cautions for cases when the include won't work, because the other site is not reachable (site is down, DNS does not work, file is moved on other webserver, your application is on an intranet/behind a proxy/firewall...). Why can't you copy their file and mirror it locally? Or use a reliable Content Delivery Network, like Google or Amazon.
There are many names for this type of inclusion. The most common being widget.
What does it actually do:
take an id of some sort as parameter
use the id to fetch some specific data (most likely from a database)
generate some js and html based on the id/data
usually this involves iframes of some sort.
To use a script rather than an html iframe has multiple advantages
you can change what is actually delivered to the users browsers without changing the include
you can resize the iframe to fit certain predefined sizes
you can inject the necessary things into the page the widget is included (of course you need to make sure this is sanctioned)
We use this all the time and we never regreted it.
If you don't want to build the widget infrastructure yourself you can always use one of the widget providers like widgetbox:
http://www.widgetbox.com/widgets/make/
With those you are up and running in no time.
This is typically called a script include.
Google have lots of these types of items, and even they call them by many names,
widgets, custom javascript, snippets, custom code, etc. It really depending on who you are writing for... I would go with "cross platform embeddable javascript code" meaning that it would need to load all its dependancies. Also specify which browsers need to be supported and what should happen is the user has javascript turned off.
EDIT :
Actually since we are talking unique IDs, you will need 2 parts probably, the user/site unique "cross platform embeddable javascript code" and whatever serverside code to support it. Basically this is an API that is accessed using your own javascript widget. Feel free you point to examples in your requirements document, programmers love examples.

Categories

Resources