Scraping a dynamically generated webpage with HTML5 <input> field

Scraping a dynamically generated webpage with HTML5 <input> field - javascript

I want to collect data from this page. I have keywords I want to input in the search box, which is defined as an HTML5 <input> with an eventlistener that dynamically changes the page based on the query.
For example, I want a script that inputs the term "hello world" in the search field and then scrapes the dynamically generated content, say the name of the collections that appear. Because of the Same Origin Policy I can't use JavaScript and I've spent the last 3 hours looking into Python but couldn't find anything there.
I can't tell if this is so obvious no one writes/asks about it, or it's a clever way to not let scripts scrape from your site.

Open the page in Chrome's Debugger or Firebug in Firefox and look at the Network Tab and find out the AJAX requests the JavaScript is doing when you enter text into the input field(s).
Then write a webscraper using any of:
https://pypi.python.org/pypi/requests
https://pypi.python.org/pypi/spyda
https://pypi.python.org/pypi/scrapy

Related

How does Google Calendar update the content of an email AFTER it is sent?

Google calendar invite emails will update after they are sent if the original event has been changed... how does Google achieve this? Is there a general technique for anyone to do this? Or is this only possible because Google owns both gMail/gCalendar and the two systems are integrated behind the scenes outside of SMTP?
My first guess was that they used an iframe or an image that was loaded when the email was opened, but inspecting the source of the gMail page doesn't show any signs of that.
Here's a screenshot of the updated text:
And here's the HTML for that section of the page when reading the email within gMail:

Note :
Inspecting Source wil give you nothing other than the markup of the content you see in the page after all dynamic operations including ajax.
To check the actual source, you want to visit view-source:url.
Now the question
That information is updated automatically at Run time via a JavaScript code.
In the image, you checked on Inspect element, which show the code of live view and so, you saw the updated content.
It is done by JavaScript DOM and text manipulation.
To verify this,
Click on the address bar.
add view-source: before the url. So, it will look like view-source:https://url
Then press ctrl+f or the corresponding key to find.
Search for the <div id=":8hg" which will show 0 results.
The view-source is load the source of the file without any ajax or JavaScript manipulation.
The div is not present in the source. So, we can understand that it is done dynamically.
When checking in detail,
in the source, we can see a link https://www.google.com/calendar/event?action\u003dVIEW\u0026eid\u003db..... which is stored in an array.
From this link, the content is taken.
(I blacked out some text for privacy).
Based on the return of the url, the content on mail is upated.
To verify this,
In the mail, you can see This invitation is out of date
But in the view-source: page, search for This invitation is out of date and it will return 0 results.
So, it is sure that the Calendar details are taken via an API call by Gmail to the G Calendar API.

I wonder if on sending the email they create an image at some url and then if it changes they just remove it, then in the email they have something like
<div id="updated"></div>
<img src="asdfawe" onerror="document.getElementById('updated').innerhtml="some text""/>
Although im not sure if they can't use the onerror attribute (b/c email + js = bad idea). the only other way is just to use alt attribute and use some css trickery but I don't see how that could result in the inspected code.

Access Text Field & Button of any webpage with java

I would like to access any input element on webpage with the help of java.
For example:- lets user has opened any website which contains 2text field & 1 text area and submit button.
So what I want is that, all that field should get typed by my java programs.
I have speech to text converter and it works fine.
So what I want is that if user open a site T would like to type some content then by speaking it self that content should get types on the web page.
Say for example,
Post on facebook.
Search friend on facebook.
Querying text on google without query string, means text must get types on browser and user must realize that text is getting typed as he is speaking.

It is a wrong approach to use java or javascript here, you need to modify the browser itself, for example you can check
https://wiki.mozilla.org/SpeechAPI

Display inspect element in a div

Something out there who had displayed the rendered html of a page in a div..
Lately I had develop a simple CMS for page meta taggings (dynamically add meta tags according to db record). All goes okay until SEO teams want a proof that it was 'really' rendering the metas.. I can prove to them using the developer tools but they do not want to manually press the F12 and check if the meta was rendered. They do want to display directly on screen e.gdiv.
And I have no idea where to start. Excluding my situatuon, Is it possible to grab the data in developer tools and display it on a div or iframe? Or the view source maybe?
I am searching for possible solution to this but unluckily, cant find one using javascript, jquery, php.

You could propose to make bookmarklets that your SEO team can run that would make JS alerts of meta tag innerHTML.
Otherwise as one comment says, they should just press Ctrl+U, Ctrl+F, type "meta", press enter, and get over it.

Hide WSS 3.0 Webpart Using JavaScript

I am using WSS 3.0 in my application. I am displaying a List as a DataView Webpart. My objective here is to make this webpart visible to a selected group of individuals. As there is no option for Target Audience in WSS 3.0, I went to edit Permissions for List and gave Read permissions only to selected users. This doesn't hide the web part from the page, rather shows an Access Denied message to other users.
Access denied. You do not have permission to perform this action or access this resource.
As I said, I want to hide this webpart, as in make it invisible on the web page from other users who do not have permissions to view it. As this message will be displayed only to those users who do not have permissions!, my approach is to search for the above message in the html and identify and hide the parentnode, thereby hiding the webpart.
I am not quite sure how to do this. Any ideas? Thanks in advance!

I'm going to assume you're in a situation where you can add additional web parts to the page and not trying to add JavaScript to the DataView Web Part directly. My suggestion won't work on a separate page if a Designer adds another view of this list.
Upload a blank .js file to your Site Assets. Add a Content Editor Web Part to your page, point it at that file. Add JQuery from a provider or host it yourself, adding the reference in your file. From there, you have 3 directions in which to work: first, explore the web part with Internet Explorer's F12 Developer Tools, keeping a particular eye on divs and tables with good unique ids, names, or classes that would solve your problem if hidden. Also keep an eye on the id of the div or table or cell or whatever that contains your access denied text. Second, (assuming you're new to JQuery) do some JQuery tutorials and then start playing with selecting the above items and, say, changing their background color. Once you have both of those, you're 90% there: (try to) select the object that would contain the access denied text, and if the innerHTML is present and equals that string, then set display:none for the div or tables to hide your web part. The third tool you have is editing the page directly with SharePoint Designer: you can toss a div with an id of your choosing around any xsl:template, which might help in your JQuery selecting.
I'm sorry I can't give you the specific code, since I'm not in a position to test it. If that changes, I'll try and give a more detailed response.
Old, misdirected answer: Do either of the answers here work for you? Alternatively, this answer has some great resources to solve your problem. Just change the message to an empty string.

Thanks Aron :D
I found the id for the webpart and hard coded it. It provided the solution, but I was hoping to programmatically fetch the id instead by searching the innerhtml, as I have more than one web parts that have to be hidden.
I found a partial solution here:
Hide SharePoint web part using javascript onclick method
I put a CEWP on the page and added the following script in it:
<script>
function hide()
{
var content = document.getElementById("webpartID").innerHTML;
var n = content.search("Access denied. You do not have permission to perform this action or access this resource");
if(n!=-1)
{ document.getElementById("webpartID").style.display="none";
}
}
_spbodyonloadfunctionnames.push("hide");
</script>
In my case, I picked up the webpart id from the aspx page or view source for the page.

Lotus Notes hide/show div

I cannot manage to make asmall piece of javascript working in a lotus notes 6.5 email.
I'm building a html, send it by mail as a html, and inside I would like to have some links to hide/show a few div.
I try to use document.getElementById but when I click on the link I have the following error:
"document.getElementById is not a function".
I'm thinking using a document.getElementById(id).style.display='none'; to hide it (if I can manage the div).
Any ideas how to show/hide my div?

The HTML engine in Lotus Notes is not anything like you'd get in a browser. I'm fairly certain the error message is correct when it says "document.getElementById is not a function" - there is little to no support for javascript in Notes emails.
If you need to have something hide/show in Notes, you will have to create a Notes form with actions and hide formulas to get the same effect. Then emails can be sent with the form embedded into the email, and when received the email will open that form instead of a typical memo form.
Note, it is unlikely most email clients (Outlook, etc) will support javascript due to the security holes it would open. You might have better luck sending a link to users and then having them open up a Web page or Notes database where you have more control over how things are presented to them.

The root of the problem is that Notes doesn't display HTML*. In order to display an HTML-formatted MIME email (or any other rich text field whose contents are stored as MIME and HTML), the content must first be converted to Notes Rich Text (composite data, or CD) format. The conversion of static HTML has improved a lot over the years, but once the conversion is completed, there is no HTML document to modify. Obviously, your link/action was properly translated to its Notes equivalent, but there are no hooks for DOM methods in the Notes client. JavaScript is pretty much restricted to manipulating field values (through the document.forms[0].LiteralFieldName method of access), swapping images (through the document.images collection) and a small subset of the window object's methods.
*One can view pure web pages in the Notes client, but that uses the IE ActiveX control in the full tab -- it's not available natively for rendering a part of a document.

it may not fit your HTML needs but might help you hide / show content:
In a new mail, select the content you want to hide / show
Click on Create / Section
You can also define a name for this section within section's properties
(works in Lotus Notes 8.5)

Develop Reference

JavaScript is the programming language of the Web.