What are the possible ways to create a SEO friendly url for linking internal web page? I've read multiple forums about .htaccess with php. Is there any other ways to do it through client side?
Essentially, is there a client-side way to go from the home page to about page and have the url be customDomainName/about instead of customDomainName/about.html? Any links to tutorials that can achieve this will be helpful!
A client side can only be done if you build your webpage complete AJAX based.
This means on every click your visitors browser make an ajax request and refresh the page content. In reality your visitor never loaded a new page only the browser updates the content.
Here a post about modifieng the url: Modify the URL without reloading the page
This way is to complex. YOu have also to look for direct access the modified url with hashtags.
Related
We want to get the html content of a page on another domain. The following considerations are present:
1- The login page has a I am not a robot recaptcha.
2- The load of page in iFrame is restricted.
3- Could not use jQuery get or load methods because of cross domain restrictions.
With these limitations is it possible to develop a crawler or even use some client side codes to get data?
Thanks
Actually.. NO
But you can take the help of a backend server.
Let the server download the page and send it to the client.
This would solve problems related to CORS restrictions.
Coming to the captcha part, if the page operations are restricted by the captcha, then again there aren't much you can do. If it was that easy, the captcha wouldn't be used in the first place.
I'm attempting to create an app with Node.js (using http.createServer()) which will be a single page application with requests for data via XMLHttpRequest. To do this I need to be able to differentiate between a user navigating to my domain, and AJAX requests and requests generated by the browser for linked resources.
If the request is from the user I always want to return the index.html page which will handle requesting content but if the request is browser generated or AJAX and is for CSS, Javascript or other linked files I want to serve those files. Is there any way to detect this?
Looking at the request headers for the different file types I saw the referer header appeared when the request for content was generated by the page. I figured that was the solution I was looking for but that header is also set when a user clicks on a link to the page making it useless.
The only other thing which seems to change is the accept header which could sort of work but might not be a catch all solution. Any user requests always seem to have text/html as the preferred return type regardless of which url was entered. I could detect that but I'm pretty sure AJAX requests for html files would also have that accept header which would cause problems.
Is there anything I'm missing here (any headers or properties I can look for)?
Edit: I do not need the solution to protect files and I don't care about users bypassing it with their own requests. My intention is not to hide files or make them secure, but rather to keep any data that is requested within the scope of the app.
For example, if a user navigates to http://example.com/images/someimage.jpg they are instead shown the index.html file which can then show the image in a richer context and include all of the links and functionality to go with it.
TL/DR: I need to detect when someone is trying to access the app to then serve them the index page and have that send them the content they want. I also need to detect when the browser has requested resources (JS, CSS, HTML, images, etc) needed by the app to be able to actually return the resource not the index file.
In terms of HTTP protocol there are NO difference between a user-generated-query and a browser-generated-query.
Every query is just... a query.
You can make a query with a command line, with a browser, you can click a link, send some ascii text via telnet, request a proxy which will make the query for you, the server goal is never to identify how the query was requested by the user.
See for example a request made by a user on a reverse proxy cache, this query will never reach your server (response comes from the cache), the first query made to build this response could have been made by a real user or by a browser.
In terms of security trying to control that the user is never requesting data by-himself cannot be done by detecting that the query is a real human click (and search google for clickjacking if you want to be afraid). Every query that a browser can make can also be played by the user, every one, you have no way to prevent that.
Some browsers plugins are even doing pre-fetching, detecting links on the page and making the request before you do it yourself (if it's a GET query).
For ajax, some libraries like JQuery will add an X-Requested-With: XMLHttpRequest header, and this is used on most framework to detect ajax mode.
But it is more robust to depend on a location policy for that (like making your ajax queries with a /format/ajax, which could also be used on other ways (like /format/json, /format/html, or /format/csv).
Spending time on a location policy based routing is certainly more usefull.
But one thing can make a difference, POST queries are not indempotent, it means the browser cannot make a POST query without a real user interaction, because a POST query may alter the state of the session or the state of the server data (but js can make POST queries, this is just a default behavior of browsers). The browser will never automatically retrieve a POST query, so you could make a website where all users interactions are POST queries (via forms or via some js altering link clicks to send POST ajax queries instead). But I'm not that's your real goal.
Not technically an answer to the question but I found a simple solution which does what I want: prefix all app based requests with a subdomain eg. http://data.example.com/. It's then really simple to check the host header for that subdomain: if present send the resource else send the index page.
I just want to know how to get images from other web page and show in my website.
Case flow is:
Type some page URL in text box and submit
Collect all images in that web page (not in entire site) and show them in my webpage
So, you need to get images from page, and the input data is thh address of that page. Well, you have two solutions:
I. If this is functionality for your site which others will use, then plain JavaScript is not enough, because browser's privacy policies block getting such data from other pages. What you need in this case is to send the URL to a script on your server, which will download that page, parse it for s and return you the list of image srcs.
How exactly to do this is a pretty complicated question, for it depends on your site's serever-side programming language. Anyway such functionality would consist of client side javascript using AJAX techniques and server site script (e.g. php). Client script which is pretty much straight-forward.
On client side your js has to:
1. Get desired URLs
2. Send them to server
3. Wait for server's response (which contains srcs of images on desired page)
4. Create img tags with srcs which you got from server script
Keywords for this to google are, for example, AJAX, XmlHttpRequest and JSONP (sorry if you already know that :)
On server side your (php|ruby|python|perl|brainfuck) has to:
1. Get page URL from javascript code at step 2
2. Download a page by that url
3. Parse it looking for img tags and their srcs
4. Send list of srcs (in XML, JSONP or any other form) back to client
II. If you need to get images from other pages only for your personal use, you can write an extension for your browser. This way doesn't require any server side scripts.
If you want do scrape other websites with javascript, you should create a server side script which can act as proxy or you can use YQL.
Here's my answer for cross domain ajax call with YQL,
Cross Domain Post method ajax call using jquery with xml response
First of all check for Copyright. Copy only if the image is provided by the owner for free use. Also read and understand the license of usage.
If the image is free to use as stated by the owner under license then download the image and then use it. Also, please don't forget to keep copy of the license and the website url from where you downloaded the image.
Download and then use is suggested so that if tomorrow the other website shuts down then your website remains unaffected.
Last but not the least, try to design/ shoot your own images. Even if they are not as good as others at least they are genuine.
What would be a good way to upload the html content of the current page viewed in the browser to another server from a bookmarklet?
Assuming this url is on a server that requires authentication, so I want to avoid fetching the page on the sever side, but rather would like to see if it's possible to get the contents and upload them directly from within the browser.
Thanks in advance for any suggestions
Elisha
Considering that you are most probably going to have a situation in which the page being viewed in the browser is on a different domain from the domain you want to send the data to, an AJAX request will definitely fail (due to Cross-Domain restrictions). So doing this server side would be your best bet.
Retrieve location.href with XHR into string
Create FORM with desired cross-site action
POST data to server
?????
PROFIT!
The google guide Making AJAX Applications Crawlable tells how to format your url with hash an ! in order to make your site crawlable. A good example for that is the new twitter. If you type the URL:
http://twitter.com/dinizz
You will be redirected to:
http://twitter.com/#!/dinizz
I realized that the redirection is made on the server side, because i have tried to do with javascript and turns to every time that i'd change the url the browser reloads the page, i am trying to do on the server side with Ruby on Rails without success.
any help?
UPDATE: I found another question that address the same problem: How to show Ajax requests in URL?
This can't be sensibly done server side.
What should happen is that a client without JS will request the page, and then get data they can use.
If you redirect server side, then they will request the page, get a redirect to the homepage with a fragment identifier, and then get the default content of the homepage.
You have to do the redirect in JS on the client side.