A co-worker took this url: https://www.rbi.org.in/Scripts/BS_PressReleaseDisplay.aspx which has month/year pagination via Javascript (see the elements on the right) and was able to give me this url:
https://www.rbi.org.in/Scripts/BS_PressReleaseDisplay.aspx?__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKMTg0MTg0MzQ2NmRk1lDKkbV9IbwhES0FyX%2BlSLhp%2FzA%3D&__VIEWSTATEGENERATOR=380F4D6F&__EVENTVALIDATION=%2FwEdAAiUUGGuo52vbcR6TOSGc2%2FnlK%2BXrsQEVyjeDxQ0A4GYXFBwzdjZXczwplb2HKGyLlqLrBfuDtX7nV3nL%2B5njT0xZDpy7WJnvc3tgXY08CYLJD%2BrfdwJAuBoVBISURIXWlx9xf1loRXvygROM%2FA1O%2BNHJounKCGGAHd04zzVhBPZz4BK5Wx46wqhV0iQkxGw1Nhr9A6c&hdnYear=2016&hdnMonth=12&UsrFontCntr%24txtSearch=&UsrFontCntr%24btn=
where I can replace the year after hdnYear and the month after hdnMonth with any year and month, and it will bring me directly to that page. I asked him how he did it, and he said "I used the Network tab in Chrome dev tools." That's about all I could get out of him.
Does anyone know exactly how this is done? For example, I'm now trying to discover similar way to get the actual url for each page of this site: http://www.ojk.go.id/id/regulasi/otoritas-jasa-keuangan/peraturan-dan-keputusan-dewan-komisioner/Default.aspx by looking at the Network tab as I change pages. There is nothing I can see in there that's similar to the above example.
This is how it was done for the rbi.org.in URL you've mentioned
Open Chrome and go to the URL you've given
Right click on the page and select Inspect
Click on the Network tab.
Click on one of the year/month links on the website (the pagination you referred to)
In the Network tab, you'll see a list of GET/POST requests being made by the client (ie, the browser) to the server.
In the Filter box (on the top-left of the Network tab), type in the search filter method:POST.
Click on the entry in the Name column. This will open up more details about the POST request. Scroll down to the section titled Form Data.
Click on the view encoded button in the Form Data section
These are the parameters your friend included in the URL. You'll notice hdnYear and hdnMonth also listed in there. The URL your friend gave can be obtained by clicking on view source
Well I can't really tell you how to exactly reproduce this in the site you're trying to, but I can tell you what your co-worker did.
In the page https://www.rbi.org.in/Scripts/BS_PressReleaseDisplay.aspx:
Open the network tab in dev tools, clean the log if theres anything there.
Click on a year and month
On the network log search for BS_PressReleaseDisplay.aspx in the "Name" column and click on it
Inside the Headers tab go to "Form Data" and click on "view source"
And thats it, theres is the URL parameters that your coworker gave you, you can try doing this on the site you want to reproduce it clicking on another page and searching for Default.aspx, but you'll have to figure out what does each parameter means to find which one is the page number or whatever you're looking for (check it in the parsed view for easier reading).
Screenshots:
http://prnt.sc/emsl2w
http://prnt.sc/emsm2z
Hope this helps you.
The URL he sent you, has URL parameters/query-strings that, is read by the server which then sends you the selected pages.
So basically the servers pics up the request and reads these paramters which then most likely is parsed into a method of some sort, querying a database then returning the result for you.
If your the owner of the linked website, you can implement such solution, otherwise you´re stuck since it requires coding on the backend.
Related
I have a small app which calls an URL and scrape the data returned from it. I now want to do something similar for another site but this site uses JavaScript and the results are not included in the html. I've found a way to retrieve the data by using "stringByEvaluatingJavaScript" but to complicate things, the results I want is displayed on the webpage only after I click a button / function on the website:
i.e. To get to display the results I want, I have to:
1) go to the website. (data is displayed but not what I want) 2) click one of the options on the site. (data I really want is displayed)
The URL of this page never changes, as expected being JavaScript. So I want to know if there's a way to call the page so that when the page is displayed, it is already on the option I want, e.g. "https://example.com/page1?option" etc...
I don't know if this is possible since I don't know JavaScript but technically I think it should be?
Thanks.
I would use the Developer Tools/javascript console on your browser
(Chrome has a pretty good one) to see what the browser sends to the
server when you click on the button, then use that as the basis for
your query. – cowbert
#cowbert's suggestion really did the trick! Upon digging more, I found more results in the Chrome console and one of them actually has the link to the data which is what I need!
Thank you to all who contributed! This is my first post here so if I didn't do something right, please forgive me.
Google calendar invite emails will update after they are sent if the original event has been changed... how does Google achieve this? Is there a general technique for anyone to do this? Or is this only possible because Google owns both gMail/gCalendar and the two systems are integrated behind the scenes outside of SMTP?
My first guess was that they used an iframe or an image that was loaded when the email was opened, but inspecting the source of the gMail page doesn't show any signs of that.
Here's a screenshot of the updated text:
And here's the HTML for that section of the page when reading the email within gMail:
Note :
Inspecting Source wil give you nothing other than the markup of the content you see in the page after all dynamic operations including ajax.
To check the actual source, you want to visit view-source:url.
Now the question
That information is updated automatically at Run time via a JavaScript code.
In the image, you checked on Inspect element, which show the code of live view and so, you saw the updated content.
It is done by JavaScript DOM and text manipulation.
To verify this,
Click on the address bar.
add view-source: before the url. So, it will look like view-source:https://url
Then press ctrl+f or the corresponding key to find.
Search for the <div id=":8hg" which will show 0 results.
The view-source is load the source of the file without any ajax or JavaScript manipulation.
The div is not present in the source. So, we can understand that it is done dynamically.
When checking in detail,
in the source, we can see a link https://www.google.com/calendar/event?action\u003dVIEW\u0026eid\u003db..... which is stored in an array.
From this link, the content is taken.
(I blacked out some text for privacy).
Based on the return of the url, the content on mail is upated.
To verify this,
In the mail, you can see This invitation is out of date
But in the view-source: page, search for This invitation is out of date and it will return 0 results.
So, it is sure that the Calendar details are taken via an API call by Gmail to the G Calendar API.
I wonder if on sending the email they create an image at some url and then if it changes they just remove it, then in the email they have something like
<div id="updated"></div>
<img src="asdfawe" onerror="document.getElementById('updated').innerhtml="some text""/>
Although im not sure if they can't use the onerror attribute (b/c email + js = bad idea). the only other way is just to use alt attribute and use some css trickery but I don't see how that could result in the inspected code.
I've been searching the internet to get some information about how i can be able to detect and show what a url is about.
Facebook has a good example of what i actually want to achieve:
If you create a update on facebook and paste in a URL, facebook will detect some information about it and show a box with some text and often the right picture.
for instance, take: http://www.ebay.com/itm/Mens-Monk-Strap-Loafers-Suede-Lined-Metal-Buckle-Slip-Casual-Dress-Shoes-New-/311170422772 . then it shows the image of the shoe and headline.
Ive found other services which does this with image services and youtube, but what i need is about getting information of products most of the time, so often urls from shops. So the user pastes in a url, and i can detect what that link is about.
any ideer how this can be done ?
Is it backend code, like c# or javascript ?
Hopefully some of you can point me in the right direction. Thanks in advance!
Facebook scrapes pages for specific metadata in the back-end and uses it to generate the snippets you see, which can either be served along with the initial page load or brought in via JavaScript (front-end). From there it's a matter of using CSS and JS to style the popup to your liking.
Depending on how inter-related your site is with the content you want to display, this can be an easy task or a difficult one.
I have a section of a site with multiple categories of Widget. There is a menu with each category name. For anybody with Javascript enabled, clicking a category reveals the content of the category within the page. They can click between categories at will, seeing the DOM updated as needed. The url is also updated using the standard hash/hashbang (if we are being Google-friendly). So for somebody who lands on example.com/widgets, they can navigate around to example.com/widgets#one, example.com/widgets#two, example.com/widgets#three etc.
However, to support user agents without Javascript enabled, following one of these category links must load a new page with the category displayed, so for someone without javascript enabled, they would navigate to example.com/widgets/one, example.com/widgets/two, example.com/widgets/three etc.
My question is: What should happen when somebody with Javascript enabled lands on one of these URLS? What should someone with Javascript enabled be presented with when landing on example.com/widgets/one for example? Should they be redirected to example.com/widgets#one?
Please note that I need a single page site experience for anybody with Javascript enabled, but I want a multi-page site for a user agent without JavaScript. Any answer that doesn't address this fact doesn't answer the question. I am not interested in the merits or problems of hashbangs or single-page-sites vs multi-page-sites.
This is how I would structure it:
Use HistoryJS to manage the URL. JS pushstate browsers got full correct URLs and JS non-pushstate browsers got hashed urls. Non-JS users went to the full URL as normal with a page reload.
When a user clicks a link:
If they have JS:
All clicks to other pages are handled by a function that prevents the default action, grabs the HREF and passes the URL to an ajax request and updates the URL at the same time. The http response for that ajax request is then parsed and then loaded into the content area.
Non JS:
Page refreshed as normal and loads the whole document.
When a page loads:
With JS: Attach an event handler to all your links to prevent the default so their href is dealt with via Ajax.
Without JS: Nothing. Allow anchors to work as normal.
I think you should definitely have all of your content accessible via a full, correct URL and being loading it in via ajax then updating the URL to reflect the address where you got your content from. That way, when JS isn't running, you don't have to change anything.
Is that what you mean?
Apparently your question already contains the answer. You say:
I need a single page site experience for anybody with Javascript enabled
and then ask:
What should someone with Javascript enabled be presented with when landing on example.com/widgets/one for example? Should they be redirected to example.com/widgets#one?
I'd say yes, they should be redirected. I don't see any other option, given your requirements (and the fact that information about JavaScript capabilities and the hash fragment of the URL are not available on the server side).
If you can accept relaxing the requirements a bit, I see another option. Remember when the web was crowded with framesets, and we landed on a specific frame via AltaVista (Google wasn't around yet!) search? It was common to see a header saying that page was supposed to be displayed as a frame, and a link to take the user to the frameset version.
You could do something similar: when scripting is available, detect that you're at example.com/widgets/one and add a link to the single-page version. I know that's not ideal, but it's better than nothing, and maybe better than a nasty client-side redirect.
Why should you need to redirect them to a different page. The user arrived at the page looking for an answer. He gets the answer even if he has javascript enabled. It doesn't matter. The user's query has been fulfilled.
But what would happen if the user lands on example.com/widgets#one ? You would need to set up an automatic redirect to example.com/widgets/one in that case. That could be done by checking the if javascript is enabled in the onload event and redirect to the appropriate page.
One way for designing such pages is to design without javascript first.
You can use anchors in the page so:
example.com/widgets#one
Will be a link to the element with id 'one'
Once your page works without javascript, then you add the javascript layer. You can prevent links to be followed by using the event.preventDefault.
(https://developer.mozilla.org/fr/docs/DOM/event.preventDefault), then add the desired javascript functionality.
I'm a student stuyding the bioinformatics.
I'm trying to make a crawler where I can put the lists of queries and get the results automatically.
The site I'm interested in is the GEO DataSet site.
www.ncbi.nlm.nih.gov/gds/
If I wish to send a query like 'lung cancer', I can use the following address.
http://www.ncbi.nlm.nih.gov/gds/?term=lung+cancer.
And there are 549 pages showing up.
I can get the results of the first page, but I don't know how to move to the next page.
I mean, how can I move to the next page by changing the URL?
The Next button is linked as "www.ncbi.nlm.nih.gov/gds/?term=lung+cancer#" and I don't think it's the actual URL that button is linked to.
I'm new to the JavaScript, but I heard the hash sign (#) is processed in the JavaScript
I wonder if there is something I can do like
"http://www.ncbi.nlm.nih.gov/gds/?term=lung+cancer&page=2"
so that I can move to the second page.
If you use any debugger tool (Firebug for Firefox, WebDeveloper for Chrome) you should be able to monitor the network traffic. If you do that, you'll see, that by clicking the next button a form is submitted, sending data via post method. However, when concatenating the post data to a get string you can also get to the next page. The following url lets you access to second page of the result set (warning: really, really long!):
http://www.ncbi.nlm.nih.gov/gds/?term=lung+cancer?term=lung+cancer&EntrezSystem2.PEntrez.Gds.Entrez_PageController.PreviousPageName=results&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sPresentation=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sPageSize=20&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sSort=none&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FFormat=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FSort=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FileFormat=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.LastPresentation=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.Presentation=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.PageSize=20&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.LastPageSize=20&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.Sort=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.LastSort=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FileSort=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.Format=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.LastFormat=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Pager.cPage=1&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Pager.CurrPage=2&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_ResultsController.ResultCount=10973&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_ResultsController.RunLastQuery=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Pager.cPage=1&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sPresentation2=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sPageSize2=20&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.sSort2=none&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FFormat2=docsum&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_DisplayBar.FSort2=&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Filters.CurrFilter=all&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Filters.LastFilter=all&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_MultiItemSupl.Taxport.TxView=list&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_MultiItemSupl.Taxport.TxListSize=5&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_MultiItemSupl.RelatedDataLinks.rdDatabase=rddbto&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Gds_MultiItemSupl.RelatedDataLinks.DbName=gds&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Discovery_SearchDetails.SearchDetailsTerm=%22lung+neoplasms%22%5BMeSH+Terms%5D+OR+lung+cancer%5BAll+Fields%5D&EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.HistoryDisplay.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.Db=gds&EntrezSystem2.PEntrez.DbConnector.LastDb=gds&EntrezSystem2.PEntrez.DbConnector.Term=lung+cancer&EntrezSystem2.PEntrez.DbConnector.LastTabCmd=&EntrezSystem2.PEntrez.DbConnector.LastQueryKey=1&EntrezSystem2.PEntrez.DbConnector.IdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LinkName=&EntrezSystem2.PEntrez.DbConnector.LinkReadableName=&EntrezSystem2.PEntrez.DbConnector.LinkSrcDb=&EntrezSystem2.PEntrez.DbConnector.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.TabCmd=&EntrezSystem2.PEntrez.DbConnector.QueryKey=&p%24a=EntrezSystem2.PEntrez.Gds.Gds_ResultsPanel.Entrez_Pager.Page&p%24l=EntrezSystem2&p%24st=gds
This complete GET string contains all search parameters like items per page, search terms, display and way more. You should be able to figure out which parameter is used for the offset (cPage and CurrPage are your friends) and then alter it to your needs.
EDIT: Btw, to find javascript events bound to an HTML element, you can use the bookmarklet found at http://www.sprymedia.co.uk/article/Visual+Event+2