Is there an option to tell crawlers / bots: "don't use javascript"? - javascript

I was searching for this subject with no results, so I consider asking a question. I know that there is an option to make pages loaded by AJAX "crawable", using www.example.com/#!somecontent. But is there an option (i.e. Meta tag or robots.txt variable) that say: "Hey, robots, disable javascipt!"?
It can be used in example by:
1) online javascript games, which has huge amount of javascript, and nothing special for SEO and bots to crawl (bots memory and time saver)
2) To build a site using PHP, HTML, CSS (with meta tags change, etc) for robots. And then add some extra functionality (in example reload only content, but NOT CHANGE META TAGS) using AJAX, that crawlers and bots don't need to analyze. In that case bots see meta tags, and contents, and You prevent default actions for anchors, and user only reload content via AJAX, and meta tags are standard in that case.
PS) The question isn't about: You can do it better, or You can rebuild application in other way. It is about: can we suggest bots to disable javascript.

The quick answer is no, you can not.
The long answer is, if you have more than one version of your content, use Canonical link element in one link that use javascript that link to a page that do not have javascript enabled. You can append to url one parameter like no_js=1 and via server side remove all parts of your HTML that use Javascript.
So, yes, with some work you can do it.

Related

Forbid javascript in a div

At the moment I am working on a website. Inside this website administrators shall be able to post text.
I´d like to give the user a possibility to use HTML-Code, but I do not want them to be able to post javascript code.
Is there an html-Tag (or workaround) to prohibit javascript?
There's no plain html tag that blocks inline JS from running.
Between the many workarounds, the most elegant one is to disable inline script tags altogether by using CSP headers, but this may not be possible depending on your current architecture. You could also consider using some sanitization library to clean up the post content, there are simple strategies like using a regex to find <script tags.
I suggest reading https://glebbahmutov.com/blog/disable-inline-javascript-for-security/ to get a better sense of how CSP works and what are your options.
It's also worth reading https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

Restrict Javascript in subsection of HTML containing rich-text editor

I'm working on a legacy web app that uses both the rich-text summernote editor, which can save formatted notes to our server, as well as a lot of instances of in-line Javascript.
Summernote can permit writing HTML/CSS/JS via it's "code" view, but doesn't seem to have any built-in support for preventing XSS. If you go to their homepage for instance, go to code-view for the editor, and write <script>alert(0)</script> and turn it back to text-view, it will execute the script. We want users to be able to do things like bold content, insert hyperlinks & images, etc, but no JS or probably even CSS.
We've just discovered the developer who originally implemented XSS protection for these editors in the app did a bit of a weak job, and are now trying to sure it up. Thus far, it seems my options are either: (a) creating/sourcing a whitelist which will parse the Summernote HTML and only allow certain HTML elements in the field; and/or (b) implementing CSP headers and moving all in-line JS / CSS to separate files.
Given it's only these editors that need to be able to return valid HTML to users, the rest of the site we can and do fully sanitize content, is it possible to disable inline JS / CSS in only a section of an HTML page (I.e. the area we load the editor's content to)? Or is that not an option / liable to be susceptible to workarounds?
Edit: Just as a note, I recognize that the rich-text editor can't prevent XSS if we're choosing to save content to the server, as it runs on the client-end. I meant that I couldn't find resources / suggestions regarding HTML whitelists, or example sanitization for various servers via their site.
Edit 2: Will definitely be implementing some sort of whitelist to be on the safe-side (possibly CSP as well, but refactoring will be a headache), but still want to know if this is possible: I.e. between two sections of HTML, is it possible to block all inline JS & CSS?
As I read in this note https://github.com/summernote/summernote/issues/1617
You have to validate at serverside anyway. It's a funny behavior, I
agree, but the user is allowed to do anything in his own browser. The
only thing you (you, and not this package) should taker care of, is
not letting users storing harmful code in your database (which is
later displayed in other users browser)
So, basically you are in charge of what users are trying to put in your database or web page. So you need to:
Make a "server-side" validation of the content (delete <script> tags with content, for example)
Implementing a client-size sanitization (it does not prevent you to make a server-side validation) to help the user notify he is not allowed to include scripts. This could be done while switching between code view and preview (See Summernote event for switch from code view to editor)

Fully javascript generated site seo

The index.html includes only a div where all the html generated by Javascript.
I know that one of the ways is to redirect search bots to an onother html. I read this on an old post and i want to know if is the best way and one or two tips for this.(not how to redirect)
The site is build in Tumult Hype so i cant place content on html.
If you redirect the search bot to a diffrent document then this is considered cloaking and may harm your ranking in google.
Yes, google is able to execute JS. But you should not dynamically generate the content on your site, it will hurt your rankings. One could use some kind of prerender.
Edit: of course you can dynamically generate content, but the main content should not be using dynamic javascript.

Javascript in MediaWiki

I'd like to use some Javascript on my wiki pages, but I haven't been able to figure out how. I'm using a hosted solution on Wikia. So I am unable to modify the installation, add extensions or hack the settings. But I have admin rights on my wiki so I can access the MediaWiki namespace and MediaWiki:Common.js.
The javascript I want to use (Tangle) will consist of an external script that will be common to a number of pages(but not all pages in the wiki) and some code that will be specific to each page, the kind you would normally put inline in the <script> tag.
The trouble is, Mediawiki sanitizes <script> tags, and I haven't been able to find a way to put them in. I'm trying to make this into an editor-friendly setup that will be used across the wiki, so I'm also trying to avoid hacks and find a proper solution.
Update: New problem
Apparently MediaWiki also sanitizes the HTML5 data attributes, which Tangle relies on heavily. Any ideas on solving that problem is very welcome.
MediaWiki doesn't allow <script> tags in pages for obvious reasons: if it did, anyone could use them to inject JavaScript into your wiki and e.g. steal login credentials.
There are a couple of things you could do:
Write some generic JavaScript code to extract the parameters from something that is allowed on MediaWiki pages, such as a hidden <div>. Be careful not to introduce security holes when doing that.
Add something like this to MediaWiki:Common.js:
importScript('MediaWiki:Tangle/' + wgPageName + '.js');
Then, whenever a user visits the page "Foo", the page "MediaWiki:Tangle/Foo.js" will be loaded as JavaScript. Of course, that page will only be editable by admins, but that might still be enough for your needs. (You could use the same trick to import JS from pages in other namespaces, but that would open a security hole miles wide.)

hide javascript/jquery scripts from html page? [duplicate]

This question already has answers here:
How do I hide javascript code in a webpage?
(12 answers)
Closed 8 years ago.
How do I hide my javascript/jquery scripts from html page (from view source on right click)? please give suggestion to achive this .
Thanks.
You can't hide the code, JavaScript is interpreted on the browser. The browser must parse and execute the code.
You may want to obfuscate/minify your code.
Recommended resources:
CompressorRater
YUI Compressor
JSMin
Keep in mind, the goal of JavaScript minification reduce the code download size by removing comments and unnecessary whitespaces from your code, obfuscation also makes minification, but identifier names are changed, making your code much more harder to understand, but at the end obfuscation gives you only a false illusion of privacy.
Your best bet is to either immediately delete the script tags after the dom tree is loaded, or dynamically create the script tag in your javascript.
Either way, if someone wants to use the Web developer tool or Firebug they will still see the javascript. If it is in the browser it will be seen.
One advantage of dynamically creating the script tag you will not load the javascript if javascript is turned off.
If I turned off the javascript I could still see all in the html, as you won't have been able to delete the script tags.
Update: If you put in <script src='...' /> then you won't see the javascript but you do see the javascript file url, so it is just a matter of pasting that into the address bar and you d/l the javascript. If you dynamically delete the script tags it will still be in the View Source source, but not in firebug's html source, and if you dynamically create the tag then firebug can see it but not in View Source.
Unfortunately, as I mentioned Firebug can always see the javascript, so it isn't hidden from there.
The only one I haven't tried, so I don't know what would happen is if you d/l the javascript as an ajax call and then 'exec' is used on that, to run it. I don't know if that would show up anywhere.
It's virtually impossible. If someone want's your source, and you include it in a page, they will get it.
You can try trapping right click and all sorts of other hokey ways, but in the end if you are running it, anyone with Firefox and a 100k download (firebug) can look at it.
You can't, sorry. No matter what you do, even if you could keep people from being able to view source, users can alway use curl or any similar tool to access the JavaScript manually.
Try a JavaScript minifier or obfuscator if you want to make it harder for people to read your code. A minifier is a good idea anyhow, since it will make your download smaller and your page load faster. An obfuscator might provide a little bit more obfuscation, but probably isn't worth it in the end.
Firebug can show obfuscation, and curl can get removed dom elements, while checking referrers can be faked.
The morale? Why try to even hide javascript? Include a short copyright notice and author information. If you want to hide it so an, say, authentication system cannot be hacked, consider strengthening the server-side so there are no open holes in server that are closed merely though javascript. Headers, and requests can easily be faked through curl or other tools.
If you really want to hide the javascript... don't use javascript. Use a complied langage of sorts (java applets, flash, activex) etc. (I wouldn't do this though, because it is not a very good option compared to native javascript).
Not possible.
If you just want to hide you business logic from user and not the manipulation of html controls of client side than you can use server side programming with ajax.

Categories

Resources