scrape external website that requires javascript being triggered - javascript

Since phantomjs is abandoned, I would like to know if there is any alternative method. e.g. chrome-webdriver wouldn't be a good solution as it wouldn't be able to run on a remote host such as heroku.
So, is it somehow possible to scrape an external website that require javascript being triggered first? Note that it should be possible to run it from a nodejs application.

I was getting ready to put together something for you, then I thought better and google'd it. Check out this build script; it seems to answer your question exactly.
https://github.com/stomita/heroku-buildpack-phantomjs
Set up a git branch and pull it locally if you have to, but this should work. Basically, you need to download the binary and then remote in and run "heroku run 'phantomjs'" or "heroku run 'bin/phantomjs'"

Related

Run Python script in a Firebase server that is triggered by an even in Android App

I would like to run a python script in the backend firebase server which is triggered by an even in the android app that we are creating. I do not know how to connect this Python script with my Firebase project and how it can be triggered by an event from the android app.
I would like this script to be run from a button press on the android app. I had a look at some HTTP functions that uses python but I believe that's used to invoke HTTP functions and not actually run scripts.
I have looked at Firebase functions but unfortunately only JavaScript and TypeScript are supported. I would like my backend script to be in python as it is easier for my use case. If there is no solution to this and I have to stick to writing my script in Javascript, then be it. I just would like to know how I would extract information from the Firestore database using Javascript.
If anything is not clear please let me know, I am more than willing to clear things up. Any help would be greatly appreciated!

RequireJS - Module name "wget" has not been loaded yet for context

I'm trying to run an HTML file with JavaScript inside of it. In the JavaScript I'm trying to run is a program called wget. It downloads information from a website. I used it on CMD and in a batch file to get data from an xml that is hosted locally on my computer. Now I am trying to run wget so it runs from an HTML file. (the panel.html for the twitch panel extension), however I have been having a time just making the thing run.
I have been fiddling around, and the issue I now face is when I try to run the HTML in Chrome web browser the inspector says, Module name "wget" has not been loaded yet for context.
Screenshots:
HTML:
Error from Chrome:
(Click image to enlarge)
Installed wget from cmd:
I tried to read this for hours, and I don't understand it at all. In fact I don't think the issue in the link is the same issue as mine, but this is what every search keeps coming up with. I don't understand the whole dynamic thing or why they are even using the word "dynamic" for. It just seems like they can't use require because it doesn't work against paths, however I am not trying to define a path. I just want wget to work from this HTML file. I'm annoyed that I can't find anything on this exact problem. Every problem I have seen like this doesn't have a basic example of var wget = require('wget');
I just need what's in my JavaScript or HTML script tag to work.
I downloaded the require.js file and put it into the HTML as a script tag. From here it should just work. I already downloaded wget from cmd so it's on the computer somewhere. I also put the .exe in the same folder as the .html and the require.js.
Also I read somewhere that another reason this doesn't work is because wget is "loaded" or something like that. In that case can someone tell me how to "load" wget into the HTML or JavaScript first so that this error goes away? The basic wget example I found online is:
Here is the HTML file:
I'm not using a path, I just want wget to work from JavaScript. The wget example shows that it uses require. If I don't need require then please provide an example of how I can use wget in JavaScript without require or how to make this error go away.
I've been trying to figure out the best way to get the status information from my VLC player and put it into an HTML file so I can use that as a Twitch extension on my Twitch channel. VLC media player has a status.xml when you run it as a http server. I can only access the localhost:8080/requests/status.xml from a browser because it has basic authentication where I have to put in my user name, so I use wget to put in my password and download the status.xml back to my computer as another copy that isn't setup with authenitcation. Then I can use that download status.xml's information to post what music is playing on my VLC player. The problem is I need wget to pull the information from the localhost:8080/requests/status.xml file from the html file so that whenever its ran, the status.xml gets update with the new information and this the html will post the most current thing playing on my VLC player.

Run HTML file via Bash? Possible?

I have a html file that when run in a browser such as Chrome and that contain javascript instructions, it sends the "emit" message to my websockets server and displays the value on that page.
Is there a way to call this same html file from a bash script as I'm wanting to insert data into a MySQL database which will ultimately call that html file to send an update to the websocket.
Hopefully that makes sense but hopefully there is a way to do it too :)
If you are only focused on rendering the HTML page (and not interacting with it via buttons or something) you may find this link helpful: Running HTML from Command Line
If you try to execute javascript instruction in your server without using a browser, I recommand you to use Node.JS with a real js script without html.
Otherwise you can try to run an html file with js instruction inside using something like phantomjs but I think is less performant than using Node correctly.
EDIT
It is Javascript yes, but I need to "Import" the socket.io.js file into the same script I have created and my browser I'm having to use doesn't support the new Javascript import methods. So I'm writing as HTML which calls the Javascript, otherwise I would use nodejs
I think you can import the socket.io lib in node application using npm.
https://www.npmjs.com/package/socket.io
The documentation says that you can use a client inside a node script too :
Socket.IO enables real-time bidirectional event-based communication. It consists in:
a Node.js server (this repository)
a Javascript client library for the browser (or a Node.js client)

Do i need to install some packages to run javascript on debian

I have to set up a web server using a existing code done by an other intern.
I have a web server on debian, I've already set up the web server and it works.
When I copied his code on the server, I noticed that it doesn't work perfectly.
His code contains some javascript and I want to know if I have to download some further packages on my debian server to let it run properly.
I have tried his code before on a WAMP server and I didn't got a problem when runnig his code, that's why I thought that maybe the reason was the javascript present on his code.
I've done some research on google and I have many links on Node.js but I can't really understand how It will solve my problem.
Thank you for reading my post and also for your answers!!
Perhaps you can try this:
https://packages.debian.org/squeeze/javascript-common
I am interested about debian too, but still a newbie here...

Any way to run Firefox with GreaseMonkey scripts without a GUI/X session

I need to build a small "monitoring" scraper for a 3rd party website (it's an external website that has stats about our visitors).
Unfortunately, this website is very hard to scrape through the normal "wget" mechanism, because it uses a ton of sophisticated JS, part of it generated by GWT. So my workaround was to create a GreaseMonkey script and then have this script call a PHP page that would log the scraped data. Then as soon as Firefox starts with this webpage-to-scrape, the script goes to work.
This works well, but now I am trying to make it more robust as far as monitoring tools go. I want it to run on the server using a cron job. As far as I understand such things, this requires a DISPLAY variable to be set and for an X session to exist (Firefox is refusing to run for me). Is there any nice way to allow it to run from the batchuser account as a cron job?
I've done something similar to get Selenium running headless on a server. I used Xvfb.
http://en.wikipedia.org/wiki/Xvfb
This article has some tips for using Xvfb with Firefox:
http://semicomplete.com/blog/geekery/xvfb-firefox.html
The best way to do that is to build Firefox in the headless mode: http://hg.mozilla.org/incubator/offscreen

Categories

Resources