Ajax crawlable application without hashbang - javascript

I am building a website that is Ajax based. when Dom is load, an async http request is made to a server which answer a JSON text, then the data from JSON are put in the DOM by javascript.
Google crawler just doesn't read content loaded after javscript, so i need to create an HTML snapshot of my page (on the server), and make my server handles requests with hashbang.
But my doubt is that i am not using hashbangs in my request.
My only ajax req is something like http://www.apiservice.com?get_data=true How can i tell google which request make to get the HTML snapshot of the entire page and where can i do it (maybe putting the request url in sitemap?)
Thank you in advantage

I understand your page is built in two steps: first request to server getting the core html/javascript, and a second one getting additional data to displayed in your page.
If so, then the first request is the one for the crawler with the hashbang. It makes a lot of sense to put it in your sitemap. The static html page that your server should return is the complete html resulting from the two server calls in your process.
If you do not cache the static html page for the crawler and instead generate it dynamically (e.g., use htmlunit, see this SO reference) then both steps would be executed before returning the static html snapshot. So if you cache it then you ought to make sure you do the same.

Related

Build an XHR link on javascript website for python requests

I'm scraping the following website Scorebing using requests.
In order to do so, I'm exploring the website to locate the XHR calls and get an url like this
being the code as follows
import requests,json
header={some data from the XHR I got using Postman}
url='https://lv.scorebing.com/ajax/score/data?mt=0&nr=1&corner=1'
response=requests.get(url=url,headers=header,data=json.dumps({}))
response.json()
No problems there. My problem is that if I switch tab, like from Corner to Fixture, no new XHR is called. In fact, only "Live Matches" and "Corners" allows for this direct XHR connection. I see that some js scripts are loaded, but I can't go from there to replicating my previous step.
I know I can scrape this using selenium, and probably using a regular requests to the url of the page and using BSoup, but what I don't understand is why some tabs make XHR calls to load data where other similar ones use js.
I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.
Firstly,you should know that XHR(XMLHttpRequest) in Chrome will record all the ajax request.
What's Ajax?
Ajax is a set of web development techniques using many web technologies on the client side to create asynchronous web applications.
Ajax could be achieved by JavaScript or jQuery(Well,jQuery is a JavaScript library.It is JavaScipt essentially,but jQuery offer a API about ajax).
In your example page,there are many ajax requests in the source code:
I would like to know how can you reverse engineer those js calls in order to get an API similar to the first part.
If you really want to do it just by the source code,you should:
Send a GET request to the page.
Analysis the source code of the page,then iterate each Javascript.(Also send GET request.)
Find all the ajax requests and also send GET requests,select the data you need from them.

How to know if a JS file is loaded on server side?

I'm a developer for the website Friconix, a collection of free icons. Users can include our JS file on their web pages as explain here : https://friconix.com/start/.
I would like to gather statistics on clients using our tools. How to know, on server side, information on pages (URL or at least domains) that request our JS file ?
It's important to explain that, for decreasing the loading time, the JS file is not dynamically generated. There is no PHP file loaded every time the file is requested. The file is saved in plain text on our server. I wonder if the right solution is not to add something in the .htaccess file ?
Since the script is requested from your server every time a user loads a browser-page you can track who and how often that path is requested.
A simple approach is that it will be present in you request log files. So you can create a script and read your log files every so often.
A second approach is to setup a special rule/location in nginx/apache/which-ever-server-you-are-running
A third approach is to serve the script via CDN that has all these attributes built in (ie. CloudFront)
This can be done via a simplistic REST API call from the script. Thus when your script will load it will call the rest API via an AJAX or XHR call. The request can contain a unique client ID. On the server-side, you can implement a simple API that will accept these requests and store them by extracting the necessary information for analytics.
All the information like domains and IP about the client can be gathered from the API request or requests which will be made from clients page.
Reference - How do I call a JavaScript function on page load?

JMeter does not record nor execute javascript code, the "buttons" will not be rendered in JMeter

I am working on performance testing using jmeter for my application
I am able to successfully record a test plan. Each http request to server has got __OSVSTATE & viewstate attribute value in request.
While navigating from one page to another I am able to extract this attribute from page using Regular expression extractor and used it in the new subsequent request.
I have some pages in my applications which sends ajax requests multiple times, for each response of request a new __OSVSTATE attribute value is getting generated and sent in the <script> </script> tag json format and it seems this value gets used in the new request.
Can someone give me suggestions to achieve this in JMeter?
Each record inside container has html button using which user clicks accept button.
In JMeter I am recording this whole process, after successful recording when I start script again in the JMeter - result tree shows response only in json format and not in HTML view like other pages.
I am able to execute http request there is one request- '/PerformanceProbe/rest/BeaconInternal/WebScreenClientExecutedEvent' which internally gets execute and fails.
Do you know reason behind this or if you have any other suggestions or solutions please share here.
There are two ways of creating scripts in Jmeter for web applications. Firstly, you can create selenium scripts in jmeter using "JMeter's WebDriver Sampler" which will launch browser, perform different actions. Secondly, you can use the HTTP sampler which will record network requests. i.e. it will not display browser and work on the request/response level. For now, you are using the second method and that is the reason you are unable to see HTML.
The first method is not recommended for high user load because it consumes a lot of memory.
Regarding failure of the network requests, you need to make sure that all the parameters and headers are properly used.

Dynamically load web-page content

I have a web-page which content must be constructed on the fly. When user clicks some parts of the web-page, it must load information from the file which is placed on the server in the same directory along with web-page into special content <div>.
As far as I get it, with JavaScript, I must use ajax technology so I have a question: should I configure server so that he can handle ajax requests specifically, or is it just simple GET over HTTP request which should be supported by any web-server anyway?
And my second question - if ajax is technology, which will work out only if server is properly configurated, can I do what I need by simple GET from JavaScript somehow?
Also, if it is easier to use server-side scripting, how can it be done by VBScript?
AJAX requests are very much like usual HTTP requests. So you do not need to configure your server in any special way to make them work.
A usual server should already support at least GET and POST requests.
One thing, that might be important for you, however, is, that as long as there is no other "protection" for the files, everyone can access them directly, too. So in case the AJAX-loaded content contains some kind of user sensitive data, you should put some access control in place!
AJAX involves server side scripting, so it doesn't make sense to say it is easier to use server side scripting. Additionally, AJAX is nothing more than GET or POST requests that a script carries out for you asynchronously, allowing you to use the server responses in a document without reloading the entire page.
AJAX in and of itself is not so much of a technology as a technique. You can use AJAX, for example, without ever using the ubiquitous XmlHttpRequest object supplied by javascript.
With the jQuery AJAX methods, you can request text, HTML, XML, or JSON from a remote server using both HTTP Get and HTTP Post - And you can load the external data directly into the selected HTML elements of your web page...
and yes, no configa server properly
i suggest to you jquery framework (no server configure needed) (see also Sirko answer)
http://api.jquery.com/jQuery.ajax/
this is help you to load dynamic content see this

Precomputing Client-side Javascript Execution

Suppose you were to build a highly functional single-page client-side application that listens to URL changes in order to navigate around the application.
Suppose then, that when a user (or search engine bot) loads a page by its url, instead of delivering the static javascript file and hits the api as normal, we'd like to precompute everything server-side and delivery the DOM along with the js state.
I am wondering if there are existing tools or techniques for persisting such an execution of state to the client.
I know that I could execute the script in something like phantom JS and output the DOM elements, but then event handlers, controllers and the js memory state would not be attached properly. I could sniff our user agent and only send the precomputed content to bots, but I am afraid google would punish for this, and we also lose the speed benefits of having sent everything precomputed in the first place.
So you want to compile, server-side and send to the client the results of requesting a resource at a specific URL? What is your backend written in?
We have an API running on GAE in Java. Our app is a single-page app, and we use the HTML5 history object so we have to have "real responses" for actual URLs on the front-end.
To handle this we use JSP to pre-cache the data in the page as it's loaded from the server and sent to the client.
On the front end we use Backbone, so we modified Backbone.sync to look for a copy of the data it's looking for locally on the page and if it's not there, only then to request it from the server as an AJAX call.
So, yes, this is pretty much what every site did before you had ajax. The trick is writing your app so that the data can be local in the page (or in localStorage even) and if not only then to request the data. Then make sure your page is "built" on the server end (so we actually populate the data in the HTML elements on the server end so the page doesn't require JS on the client end).
If you go somewhere else the data is dynamic and the page doesn't reload.

Categories

Resources