I wonder is it even possible: I'd like to redirect my 404 page to . [current catalog], or to .. [level up] if the first one failed.
The idea is if someone tries to reach http://mypage.asd/cat1/cat2/page.html, and there is no cat2 catalog in cat1 - browser would (eventually) get to cat1.
I don't want different 404 pages for different subcatalogs, which would be easier to make - so here I am with my question.
Page is in html with js elements. Tried dynamically changing meta tags with js, but it doesn't work (js is executed onload, meta tags are parsed earlier) and looks awful.
My redirection code - for now - is standard:
<meta http-equiv="refresh" content="3 URL=./">
You could do this on the server side -- intercept the raw http request, and then if the page doesn't exist, load a different page based on arbitrary logic. Depending on what server side platform/framework you're using, this could be achieved in various ways.
You could possible do this using .htaccess if you're running Apache server. See this stackoverflow question for how to do that.
I'll just comment that #anand.trex 's reply is THE answer for such a use case. When redirecting, especially with invalid URLs, for a multitude of reasons it is better to use server-side redirection, like through .htaccess of Apache's server.
Related
My application uses AngularJS for frontend and .NET for the backend.
In my application I have a list view. On clicking each list item, It will fetch a pre rendered HTML page from S3.
I am using angular state.
app.js
...
state('staticpage', {
url: "/staticpage",
templateUrl: function (){
return 'http://xxxxxxx.cloudfront.net/staticpage/staticpage1.html';
},
controller: 'StaticPageCtrl',
title: 'Static Page'
})
StaticPage1.html
<div>
Hello static world 1!
<div>
How do I do SEO here?
Do I really need to do HTML snapshot using PanthomJS or so.
Yes PhantomJS would do the trick or you can use prerender.io with that service you can just use their open source renderer and have your own server.
Another way is to use _escaped_fragment_ meta tag
I hope this helps, if you have any questions add comments and I will update my answer.
Do you know that google renders html pages and executes javascript code in the page and does not need any pre-rendering anymore?
https://webmasters.googleblog.com/2014/05/understanding-web-pages-better.html
And take a look at these :
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157
http://wijmo.com/blog/how-to-improve-seo-in-angularjs-applications/
My project front-end also has biult on top of Angular and I decieded to solve SEO issue like this:
I've created an endpiont for all search engines (SE) where all the requests go with _escaped_fragment_ parameter;
I parse a HTTP Request for _escaped_fragment_ GET parameter;
I make cURL request with parsed category and article parameters and get the article content;
Then I render a simpliest (and seo friendly) template for SE with the article content or throw a 404 Not Found Exception if article does not exists;
In total: I do not need to prerender some html pages or use prrender.io, have a nice user interface for my users and Search Engines index my pages very well.
P.S. Do not forget to generate sitemap.xml and include there all urls (with _escaped_fragment_) wich you want to be indexed.
P.P.S. Unfortunately my project's back-end has built on top of php and can not show you suitable example for you. But if you want more explanations do not hesitate to ask.
Firstly you can not assume anything.
Google does say that there bots can very well understand javascript application but that is not true for all scenarios.
Start from using crawl as google feature from the webmaster for your link and see if page is rendered properly. If yes, then you need not read further.
In case, you see just your skeleton HTML, this is because google bot assumes page load complete before it actually completes. To fix this you need an environment where you can recognize that a request is from a bot and you need to return it a prerendered page.
To create such environment, you need to make some changes in code.
Follow the instructions Setting up SEO with Angularjs and Phantomjs
or alternatively just write code in any server side language like PHP to generate prerendered HTML pages of your application.
(Phantomjs is not mandatory)
Create a redirect rule in your server config which detects the bot and redirects the bot to prerendered plain html files (Only thing you need to make sure is that the content of the page you return should match with the actual page content else bots might not consider the content authentic).
It is to be noted that you also need to consider how will you make entries to sitemap.xml dynamically when you have to add pages to your application in future.
In case you are not looking for such overhead and you are lacking time, you can surely follow a managed service like prerender.
Eventually bots will get matured and they would understand your application and you will say goodbye to your SEO proxy infrastructure. This is just for time being.
At this point in time, the question really becomes somewhat subjective, at least with Google -- it really depends on your specific site, like how quickly your pages render, how much content renders after the DOM loads, etc. Certainly (as #birju-shaw mentions) if Google can't read your page at all, you know you need to do something else.
Google has officially deprecated the _escaped_fragment_ approach as of October 14, 2015, but that doesn't mean you might not want to still pre-render.
YMMV on trusting Google (and other crawlers) for reasons stated here, so the only definitive way to find out which is best in your scenario would be to test it out. There could be other reasons you may want to pre-render, but since you mentioned SEO specifically, I'll leave it at that.
If you have a server-side templating system (php, python, etc.) you can implement a solution like prerender.io
If you only have AngularJS-only files hosted on a static server (e.g. amazon s3) => Have a look at the answer in the following post : AngularJS SEO for static webpages (S3 CDN)
yes you need to prerender the page for the bots, prrender.io
can be used and your page must have the
meta tag
<meta name="fragment" content="!">
The problem:
I work on an internal tool that allows users to upload images - and then displays those images back to them and others.
It's a Java/Spring application. I have the benefit of only needing to worry about IE11 exactly and Firefox v38+ (Chrome v43+ would be a nice to have)
After first developing the feature, it seems that users can just create a text file like:
<script>alert("malicious code here!")</script>
and save it as "maliciousImage.jpg" and upload it.
Later, when that image is displayed inside image tags like:
<img src="blah?imgName=foobar" id="someImageID">
actualImage.jpg displays normally, and maliciousImage.jpg displays as a broken link - and most importantly no malicious content is interpreted!
However If the user right-clicks on this broken link, and clicks 'view image'... bad things happen.
the browser does 'content-sniffing' a concept which is new to me, detects that 'maliciousImage.jpg' is actually a text file, and very kindly renders it as HTML without hesitation. Any script tags are passed to the JavaScript interpreter and, as you can imagine, we don't want this.
What I've tried so far
In short, every possible combination of response headers I can think of to prevent the browser from content-sniffing. All the answers I've found here on stackoverflow, and other docs, imply that setting the content-type header should prevent most browsers from content-sniffing, and setting X-content options should prevent some versions of IE.
I'm setting the x-content-type-options to no sniff, and I'm setting the response content type. The docs I've read lead me to believe this should stop content-sniffing.
response.setHeader("X-Content-Type-Options", "nosniff");
response.setContentType("image/jpg");
I'm intercepting the response and these headers are present, but seem to have no effect on how the malicious content is processed...
I've also tried detecting which images are and are not malicious at the point of upload, but I'm quickly realizing this is very much non-trivial...
End goal:
Naturally - any output at all for images that aren't really images (garbled nonsense, an unhandled exception, etc) would be better than executing the text-file as HTML/javascript in the clear, but displaying any malicious HTML as escaped/CDATA'd plain-text would be ideal... though maybe a bit impractical.
So I ended up fixing this problem but forgot to answer my own question:
Step 1: blocking invalid images
To get a quick fix out, I simply added some fairly blunt code that checked if an image was actually an image - during upload and before serving it, using the imageio lib:
import javax.imageio.ImageIO;
//......
Image img = attBO.getImage(imgId);
InputStream x = new ByteArrayInputStream(img.getData());
BufferedImage s;
try {
s = ImageIO.read(x);
s.getWidth();
} catch (Exception e) {
throw new myCustomException("Invalid image");
}
Now, initially i'd hoped that would fix my problem - but in reality it wasn't that simple and just made generating a payload more difficult.
While this would block:
<script>alert("malicious code here!")</script>
It's very possible to generate a valid image that's also an XSS payload - just a little more effort....
Step 2: framework silliness
It turned out there was an entire post-processing workflow that I'd never touched, that did things such as append tokens to response bodies and use additional frameworks to decorate responses with CSS, headers, footers etc.
This meant that, although the controller was explicitly returning image/png, it was being grabbed and placed (as bytes) post processing was taking that bytestream, and wrapping it in a header and footer, to form a fully qualified 'view' - this view would always have the 'content-type' text/html and thus was never displayed correctly.
The crux of this problem was that my controller was directly returning an image, in a RESTful fashion, when the rest of the framework was built to handle controllers returning full fledged views.
So I had to step through this workflow and create exceptions for the controllers in my code that returned something other than worked in a restful fashion.
for example with with site-mesh it was just an exclude(as always, simple fix once I understood the problem...):
<decorators defaultdir="/WEB-INF/decorators">
<excludes>
<pattern>*blah.ctl*</pattern>
</excludes>
<decorator name="foo" page="myDecorator.jsp">
<pattern>*</pattern>
</decorator>
and then some other other bespoke post-invocation interceptors.
Step 3: Content negotiation
Now, I finally got the stage where only image bytecode was being served and no review was being specified or explicitly generated.
A Spring feature called 'content negotiation' kicked in. It tries to reconcile the 'accepts' header of the request, with the 'messageconverters' it has on hand to produce such responses.
Because spring by default doesn't have a messageconverter to produce image/png responses, it was falling back to text/html - and I was still seeing problems.
Now, were I using spring 4, I could've simply added the annotation:
#Produces("image/png")
to my controller - simple fix...
Step 4: Legacy dependencies
but because I only had spring 3.0.5 (and couldn't upgrade it) I had to try other things.
I tried registering new messageconverters but that was a headache or adding a new post-method interceptor to simply change the content-type back to 'image/png' - but that was a hacky headache.
In the end I just exposed the request/reponse in the controller, and wrote my image directly to the response body - circumventing Spring's content-negotiation altogether
....and finally my image was served as an image and displayed as an image - and no injected code was executed!
That sounds odd, because it works perfectly elsewhere. Are you sure the X-Content-Type-Options header is present in the responses?
Here is a demo I built a while back, where I have a file that's a valid html, gif and javascript. As you can see it first loads as an HTML, but then loads itself as an image and as a script (which executes):
http://research.insecurelabs.org/content-sniffing/gifjs.html
However if you load it using the "X-Content-Type-Options: nosniff" header, the script no longer executes:
http://research.insecurelabs.org/content-sniffing/nosniff/gifjs.html
Btw, the image renders properly in FF/IE, but not in Chrome.
Here is a demo, where I attempted what you described:
http://research.insecurelabs.org/content-sniffing/stackexchange.html
First image is without nosniff, and second is with, and it seems to work as intended. Second one does not run the script when opened with "view image".
Edit:
Firefox doesn't seem to support X-Content-Type-Options: nosniff
So, you should also add "Content-disposition: attachment;filename=image.gif" or similar to the images. The image will load normally if loaded through an image tag, but if you open the URL directly, you will force a download instead of showing the image directly in the browser.
Example: http://research.insecurelabs.org/content-sniffing/attachment/
adeneo is pretty much spot-on. You should use whatever image library you want to check if the uploaded file is a valid file for the type it claims to be. Anything the client sends can be manipulated.
I just removed # tag from my url of angular single page app.
I did like.
$locationProvider.html5Mode(true);
And It worked fine.
My problem is when I directly enter any url to the browser it showing a 404 error. And its working fine when I traverse throughout the app through links.
Eg: www.example.com/search
www.example.com/search_result
www.example.com/project_detail?pid=19
All these url's are working fine. But when I directly enter any of the above url's into my browser it showing a 404 error.
Please any thoughts on it.
Thanks in advance.
Well i had a similar problem. The server side implementation included Spring in my case.
Routing on client side ensures that all the url changes are resolved on the client side. However, When you directly enter any such url in the browser, the browser actually goes to the server for retrieving a web page corresponding to the url.
Now in your case, since these are VIRTUAL urls, that are meaningful on the client side, the server throws 404.
You can capture page not found exception at your server side
implementation, and redirect to the default page [route] in your app.
In Spring, we do have handlers for page not found exceptions, so i
guess they'll be available for your server side implementation too.
When using the History API you are saying:
"Here is a new URL. The other JavaScript I have just run has transformed the page into the page you would have got by visiting that URL."
This requires that you write server side code that will build the page in that state for the other URLs. This isn't a trivial thing to do and will usually require a significant amount of work.
However, in exchange for that work you get robustness and performance. When one of those URLs is visited it will:
work even if the JS fails for any reason (such as a dropped network connection or a client (such as a search engine) that doesn't support JS)
load faster than loading the homepage and then transforming it with JS
You need to use rewrite rules. Angular is an single page app, so all your request should go to the same file(index.html). You could do this by creating an .htaccess.
Assuming your main page is index.html.
Something like this (not tested):
RewriteRule ^(.)*$ / [L,QSA]
L flag means that if the rule matches, don't execute the next RewriteRule.
QSA means that the URL query parameters are also passed with the rewrited url.
More info about htaccess: http://httpd.apache.org/docs/2.2/howto/htaccess.html
I am trying to clean up some files on a website, one task being to collate all references to jquery to a singular file.
Yes, it's a large site with multiple developers and some standards have not been followed resulting in the current situation where there are various versions of jquery referenced.
What I have tried to do is create a 301 redirect for these files to point to a single version.
eg: <script type="text/javascript" src="/someurl/js/jquery-1.4.4.min.js"> should end up pointing to /someurl/js/jquery-core.min.js
I have tried to do this but it appears to fail to load the new file and jquery does not exist, my net panel shows that the original file has a 301 on it and I can see the reference to the new one, however the "response" tab is empty.
Is it possible to use a 301 redirect in this way?
Thanks for any suggestions / feedback
p.s I know there are better ways to reference jquery etc but large company process and red tape stand in my way from doing this any other way
When a browser loads the script from the src attribute, it should follow all redirection links, in the same method it retrieves html, images, stylesheets, etc. So the use case you've provided should work.
But since it's not working for you, you have got a couple options to resolve your problem.
Use fiddler or a similar debugging proxy to see what's going on between your browser and the server. Perhaps the 301 is malformed, or perhaps the mime-type is misconfigured, it could be any number of things. Troubleshoot it the same way you'd troubleshoot any other issue where the browser isn't following redirects.
Or... instead of using redirects, you can use mod_rewrite (or a similar server-side URL rewriting tool) to modify the request for a particular version of a script to your canonical version.
I know it's impossible to hide source code but, for example, if I have to link a JavaScript file from my CDN to a web page and I don't want the people to know the location and/or content of this script, is this possible?
For example, to link a script from a website, we use:
<script type="text/javascript" src="http://somedomain.example/scriptxyz.js">
</script>
Now, is possible to hide from the user where the script comes from, or hide the script content and still use it on a web page?
For example, by saving it in my private CDN that needs password to access files, would that work? If not, what would work to get what I want?
Good question with a simple answer: you can't!
JavaScript is a client-side programming language, therefore it works on the client's machine, so you can't actually hide anything from the client.
Obfuscating your code is a good solution, but it's not enough, because, although it is hard, someone could decipher your code and "steal" your script.
There are a few ways of making your code hard to be stolen, but as I said nothing is bullet-proof.
Off the top of my head, one idea is to restrict access to your external js files from outside the page you embed your code in. In that case, if you have
<script type="text/javascript" src="myJs.js"></script>
and someone tries to access the myJs.js file in browser, he shouldn't be granted any access to the script source.
For example, if your page is written in PHP, you can include the script via the include function and let the script decide if it's safe" to return it's source.
In this example, you'll need the external "js" (written in PHP) file myJs.php:
<?php
$URL = $_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
if ($URL != "my-domain.example/my-page.php")
die("/\*sry, no acces rights\*/");
?>
// your obfuscated script goes here
that would be included in your main page my-page.php:
<script type="text/javascript">
<?php include "myJs.php"; ?>;
</script>
This way, only the browser could see the js file contents.
Another interesting idea is that at the end of your script, you delete the contents of your dom script element, so that after the browser evaluates your code, the code disappears:
<script id="erasable" type="text/javascript">
//your code goes here
document.getElementById('erasable').innerHTML = "";
</script>
These are all just simple hacks that cannot, and I can't stress this enough: cannot, fully protect your js code, but they can sure piss off someone who is trying to "steal" your code.
Update:
I recently came across a very interesting article written by Patrick Weid on how to hide your js code, and he reveals a different approach: you can encode your source code into an image! Sure, that's not bullet proof either, but it's another fence that you could build around your code.
The idea behind this approach is that most browsers can use the canvas element to do pixel manipulation on images. And since the canvas pixel is represented by 4 values (rgba), each pixel can have a value in the range of 0-255. That means that you can store a character (actual it's ascii code) in every pixel. The rest of the encoding/decoding is trivial.
The only thing you can do is obfuscate your code to make it more difficult to read. No matter what you do, if you want the javascript to execute in their browser they'll have to have the code.
Just off the top of my head, you could do something like this (if you can create server-side scripts, which it sounds like you can):
Instead of loading the script like normal, send an AJAX request to a PHP page (it could be anything; I just use it myself). Have the PHP locate the file (maybe on a non-public part of the server), open it with file_get_contents, and return (read: echo) the contents as a string.
When this string returns to the JavaScript, have it create a new script tag, populate its innerHTML with the code you just received, and attach the tag to the page. (You might have trouble with this; innerHTML may not be what you need, but you can experiment.)
If you do this a lot, you might even want to set up a PHP page that accepts a GET variable with the script's name, so that you can dynamically grab different scripts using the same PHP. (Maybe you could use POST instead, to make it just a little harder for other people to see what you're doing. I don't know.)
EDIT: I thought you were only trying to hide the location of the script. This obviously wouldn't help much if you're trying to hide the script itself.
Google Closure Compiler, YUI compressor, Minify, /Packer/... etc, are options for compressing/obfuscating your JS codes. But none of them can help you from hiding your code from the users.
Anyone with decent knowledge can easily decode/de-obfuscate your code using tools like JS Beautifier. You name it.
So the answer is, you can always make your code harder to read/decode, but for sure there is no way to hide.
Forget it, this is not doable.
No matter what you try it will not work. All a user needs to do to discover your code and it's location is to look in the net tab in firebug or use fiddler to see what requests are being made.
From my knowledge, this is not possible.
Your browser has to have access to JS files to be able to execute them. If the browser has access, then browser's user also has access.
If you password protect your JS files, then the browser won't be able to access them, defeating the purpose of having JS in the first place.
I think the only way is to put required data on the server and allow only logged-in user to access the data as required (you can also make some calculations server side). This wont protect your javascript code but make it unoperatable without the server side code
I agree with everyone else here: With JS on the client, the cat is out of the bag and there is nothing completely foolproof that can be done.
Having said that; in some cases I do this to put some hurdles in the way of those who want to take a look at the code. This is how the algorithm works (roughly)
The server creates 3 hashed and salted values. One for the current timestamp, and the other two for each of the next 2 seconds. These values are sent over to the client via Ajax to the client as a comma delimited string; from my PHP module. In some cases, I think you can hard-bake these values into a script section of HTML when the page is formed, and delete that script tag once the use of the hashes is over The server is CORS protected and does all the usual SERVER_NAME etc check (which is not much of a protection but at least provides some modicum of resistance to script kiddies).
Also it would be nice, if the the server checks if there was indeed an authenticated user's client doing this
The client then sends the same 3 hashed values back to the server thru an ajax call to fetch the actual JS that I need. The server checks the hashes against the current time stamp there... The three values ensure that the data is being sent within the 3 second window to account for latency between the browser and the server
The server needs to be convinced that one of the hashes is
matched correctly; and if so it would send over the crucial JS back
to the client. This is a simple, crude "One time use Password"
without the need for any database at the back end.
This means, that any hacker has only the 3 second window period since the generation of the first set of hashes to get to the actual JS code.
The entire client code can be inside an IIFE function so some of the variables inside the client are even more harder to read from the Inspector console
This is not any deep solution: A determined hacker can register, get an account and then ask the server to generate the first three hashes; by doing tricks to go around Ajax and CORS; and then make the client perform the second call to get to the actual code -- but it is a reasonable amount of work.
Moreover, if the Salt used by the server is based on the login credentials; the server may be able to detect who is that user who tried to retreive the sensitive JS (The server needs to do some more additional work regarding the behaviour of the user AFTER the sensitive JS was retreived, and block the person if the person, say for example, did not do some other activity which was expected)
An old, crude version of this was done for a hackathon here: http://planwithin.com/demo/tadr.html That wil not work in case the server detects too much latency, and it goes beyond the 3 second window period
As I said in the comment I left on gion_13 answer before (please read), you really can't. Not with javascript.
If you don't want the code to be available client-side (= stealable without great efforts),
my suggestion would be to make use of PHP (ASP,Python,Perl,Ruby,JSP + Java-Servlets) that is processed server-side and only the results of the computation/code execution are served to the user. Or, if you prefer, even Flash or a Java-Applet that let client-side computation/code execution but are compiled and thus harder to reverse-engine (not impossible thus).
Just my 2 cents.
You can also set up a mime type for application/JavaScript to run as PHP, .NET, Java, or whatever language you're using. I've done this for dynamic CSS files in the past.
I know that this is the wrong time to be answering this question but i just thought of something
i know it might be stressful but atleast it might still work
Now the trick is to create a lot of server side encoding scripts, they have to be decodable(for example a script that replaces all vowels with numbers and add the letter 'a' to every consonant so that the word 'bat' becomes ba1ta) then create a script that will randomize between the encoding scripts and create a cookie with the name of the encoding script being used (quick tip: try not to use the actual name of the encoding script for the cookie for example if our cookie is name 'encoding_script_being_used' and the randomizing script chooses an encoding script named MD10 try not to use MD10 as the value of the cookie but 'encoding_script4567656' just to prevent guessing) then after the cookie has been created another script will check for the cookie named 'encoding_script_being_used' and get the value, then it will determine what encoding script is being used.
Now the reason for randomizing between the encoding scripts was that the server side language will randomize which script to use to decode your javascript.js and then create a session or cookie to know which encoding scripts was used
then the server side language will also encode your javascript .js and put it as a cookie
so now let me summarize with an example
PHP randomizes between a list of encoding scripts and encrypts javascript.js then it create a cookie telling the client side language which encoding script was used then client side language decodes the javascript.js cookie(which is obviously encoded)
so people can't steal your code
but i would not advise this because
it is a long process
It is too stressful
use nwjs i think helpful it can compile to bin then you can use it to make win,mac and linux application
This method partially works if you do not want to expose the most sensible part of your algorithm.
Create WebAssembly modules (.wasm), import them, and expose only your JS, etc... workflow. In this way the algorithm is protected since it is extremely difficult to revert assembly code into a more human readable format.
After having produced the wasm module and imported correclty, you can use your code as you normallt do:
<body id="wasm-example">
<script type="module">
import init from "./pkg/glue_code.js";
init().then(() => {
console.log("WASM Loaded");
});
</script>
</body>