I'm building an application and I need to get the html code source of a web page in order to parse it (this web page is not on my server).
I'm coding in Javascript and I can't find a way to do it, I know there is a way to do it in Python (with requests library) and I want basicaly the same thing in Javascript.
Does someone know how to do this ?
Thanks
Try this
document.documentElement.outerHTML
Related
I'm trying to write a webscraper, to get some sales leads. The problem is that in modern webdesign, most of websites uses some JavaScript to modify DOM (usually using React, Angular, or even just some jQuery). The problem is, that if I scrap some website by request node.js package, and pass html code to cheerio, then I'm simply not able to parse the code and get the info I want. Instead, all I can see are some React.js components ¯_ツ_/¯
Any resources on this topic will be helpful, thanks in advance.
Because the request package will not execute any of the javascript on the page. It will just download the html as is. If you want to see the actual page like a browser does, you would have to create a javascript parser that executes all javascript code in the state you want it to.
Luckily, there are some other options here:
You could take a look at the developer tools on the website you want to scrape and try to find the xhr requests that fetches the data you need. Then you can call this url directly.
You could use headless browser scraping like PhantomJS or CasperJS. These are packages that will try and modify the downloaded dom as good as possible with the included javascript resources.
I had been searching for this since long but ain't able to figure it out.. I have got a ready website built in angularjs using all the best practices and the server side is PHP CI.
Now what I am suppose to do is to get them the opengraph meta tags into the head section.
I could easily manage it using jquery ofcourse but the problem arises when the facebook scraper crawls over the page.
As it is a single page Application, there is gonna be only one head hence its not possible for me to mention it on any other html any how... And as it's HTML I cannot let php render the page..
I have tried to search for the answer and ultimately got to
http://www.michaelbromley.co.uk/blog/171/enable-rich-social-sharing-in-your-angularjs-app
But this is not possible for me to use.
I also read about the facebook opengraph pointer using
<link rel='opengraph' href='DESTINATION URL'>
But it says that all the basic tags need to be mentioned into the source and the additional tags can be obtained from destination url.
Is there any way I can solve this problem?
Here is the easiest way beyond sharesocial.in
Follow this link
http://sharelinkgenerator.com/
You will get your work done here.
http://www.sharesocial.in directly allows us to do it. It allows to do it for any page, Amy website and for fb, LinkedIn, Google + and whatsapp.
Preface: Hi, all! Here goes. First and foremost, I am not familiar with jquery, so any jquery code posted will be irrelevant to me until i research it. I'm newly teaching myself JS. I have decent XHTML experience, decent CSS3 familiarity/understanding, and some (very little) application programming experience (C++, Java).
The Situation: I'm designing a website's home page whereon, with JavaScript enabled, links are overwritten and instead of loading a new HTML document/page for each main link, I alter sections of the home page using JS.
The challenge: I want to load new page content without loading an entirely separate page. A similar old HTML solution to such a challenge was to load page frames, right? I want to overwrite an entire container element using content from another separate HTML file on the server. (The external file could be tailored to fit into the main page, but it would be even better if I could pull a SECTION of an external HTML document, perhaps one element).
Can I do this using only JavaScript?
If not, what other scripting could be used in conjunction with JS?
Would I need to implement "AJAX"?
Must I use "HTML DOM"? - Edit: I see that DOM is just integral to the function of HTML etc.
Thank you all for your patience and your expert advice. I <3 StackExchange.
Yes you can do that using Javascript and jQuery, take a look at the load() method that's exactly what you need
http://api.jquery.com/load/
Yes it is using AJAX, but it's really easy to use don't worry.
Can I do this using only JavaScript?
No, because the server will need to serve you the content / data that you wish to place in the HTML page. You'll need to combine client-side scripting (JavaScript) with functionality on the web server (see your next question below).
If not, what other scripting could be used in conjunction with JS?
That would be your server-side language, such as PHP, Ruby, or Python.
Would I need to implement "AJAX"?
Yes, what you described is implemented using Ajax. You've mentioned the jQuery library and it indeed has functionality to perform an Ajax call to the server (see $.ajax). The basic idea is: your JavaScript code performs an Ajax call, your web server returns the HTML and then your JavaScript code receives it and places it in the HTML page.
Must I use "HTML DOM"?
Web pages always make use of "HTML DOM". You could read the article about Document Object Model on Wikipedia for more information: you have to use DOM but not by choice - it's part of how web pages work.
I made a program in Java which takes an XML document, which user chooses, and present its content on screen (on JLabel using Gui). That works fine. Now, I need to create it as a web based. I want to parse a JSON file, instead of xml file, into JavaScipt. I tried to use JSP and JavaScipt in combination with HTML5 but I didn't really find the correct way to do it.
So I am wondering if there is a possible way to do it this way or if it is better to use servlets (send the data in server side).
Also, I am wondering if it is better to use the existing Java code I have, using Javabeans, instead of JavaScipt. And then combine servlets with JSP in order to print the result on screen. I know JavaScipt is quicker than Java in web applications but I think I can't see a way to make a connection between JavaScript and HTML5 to accomplish this parsing.
Newbie here.
I want to create a site search for my offline website which will be on a CD.
Is it possible to use javascript to index the html pages automagically without me having to copy-and-paste every page into the script?
Thanks in advance for any assistance/guidance.
No, with Javascript it's not possible to read files on the hard disc.
But you could use another language to automatically generate the index.