Loading page in flask web app while scraping another website using selenium - javascript

I am struggling with an application I want to make, which is a web application that will scrape an other website using the selenium package with chromedriver.
I am using python 3.7 on elementary OS.
Everything works okay if I don't include the loading page, even if I didn't put it online yet so I dunno for sure, but my problem is that I would like to let the crawling headless and no sandbox, while having the waiting page in my app.
I'd like to render to a success html template when the job is done, too.
I searched on stackoverflow for answers, but I couldn't make my thing work. (sorry if duplicate)
One thing that I found is Luiz Aoqui's answer to this question : Flask is not render_template before executing long function , and it seems like the OP could solve his issue.
I couldn't, though.
I don't know javascript at all so maybe you will find this very simple, if the problem came from here.
The python flask code :
#app.route('/auto_connect/', methods=["GET", "POST"])
def connect():
if session['mail'] != None:
if request.method == "POST":
session['job'] = request.form['job']
return redirect(url_for('process', fun='auto_connect'))
return render_template('auto_connect.html')
return redirect(url_for('login'))
#app.route("/process/<fun>", methods=['GET', 'POST'])
def process(fun, *args):
if fun == 'auto_connect' or fun == 'auto_apply':
if request.method == 'GET':
return render_template('wait.html', fun=fun)
if request.method == 'POST':
print('test')
if fun == 'auto_connect':
auto_connector(session['mail'], session['password'], session['job'])
return 'done'
elif fun == 'auto_apply':
auto_applyer(session['mail'], session['password'], session['job'], session['location'])
return 'done'
else:
return "error"
return 'error'
The JS code :
var request = new XMLHttpRequest();
request.open('POST', '/process/'.concat({{fun}}));
request.onload = function() {
if (request.status === 200 && request.responseText === 'done') {
// long process finished successfully, redirect user
window.location = '/success/' ;
} else {
// ops, we got an error from the server
alert('Something went wrong. FROM server');
}
};
request.onerror = function() {
// ops, we got an error trying to talk to the server
alert('Something went wrong. TO server');
};
request.send();
The loading page is showing, but the scraping does not start.
I expect it to start with the "open" POST request in the js code which is part of the 'wait.html' template.
PS : there is the flask debugger, I put a print at the top of my scraping script which shows up in terminal when I do no render the loading page, but doesn't when I do.
127.0.0.1 - - [04/Aug/2019 02:03:20] "GET /auto_connect/ HTTP/1.1" 200 -
127.0.0.1 - - [04/Aug/2019 02:03:22] "POST /auto_connect/ HTTP/1.1" 302 -
127.0.0.1 - - [04/Aug/2019 02:03:22] "GET /process/auto_connect HTTP/1.1" 200 -
Thanks in advance for your answers.

I found the answer myself to my question.
I am not a web programmer so I lack some automatism sometimes, but by inspecting my waiting page, I saw an error in the JS script in the concat function.
Calling '{{fun}}' in the script instead of {{fun}} did the thing.
I didn't understand this behavior since my object fun is a python string, but I read the jinja2 doc and all became lucid to me, we can see in this link https://jinja.palletsprojects.com/en/2.10.x/templates/ the following:
{{ ... }} for Expressions to print to the template output
This prints the value, so it is no longer a string but raw text in html template.
(type(print(x)) is None in python3 btw)
If this helps someone then I'm glad.

Related

How to send user specific data using SSE

i am trying to create a social media page where in home page of every user they can see feeds,
feeds from friend's post. when ever my friend create a post i can see the same in my feeds in real time.
For that i am using SSE in python flask. everything working find but after adding few more users only i realise all the post are coming to all logged in people's feed. which wrong, i want to see feeds from only my friends.
Can any one help me how to achieve it. i am sharing the base level code of python and java script.
Client side:
var source = new EventSource("http://172.19.0.3:8044/events");
source.addEventListener('user_feeds', function(event) {
var data = JSON.parse(event.data);
console.log("Even from server ");
console.log(data);
}, false);
Server side
from flask_cors import CORS
from flask_sse import sse
from factory import create_app
app = create_app()
CORS(app)
app.config["REDIS_URL"] = "redis://redis"
input_user_feeds = dict()
app.register_blueprint(sse, url_prefix='/events)
PROMOTION_BLUEPRINT = Blueprint('my_page', __name__, url_prefix='/api/v1/')
#PROMOTION_BLUEPRINT.route('/feeds/<user_id>', methods=["GET"])
def feeds(user_id):
push_feeds(user_id)
return "SUCCESS"
#PROMOTION_BLUEPRINT.route('/user_request/<user_id>', methods=["POST"])
def user_request(user_id):
data = request.json
add_feeds(user_id, data)
return "SUCESS"
def push_feeds(user_id):
while 1 == 1:
if user_id in input_user_feeds:
input_request = input_user_feeds[user_id]
sse.publish(input_request, type='user_feeds')
del input_user_feeds[user_id]
def add_feeds(user_id, data):
input_user_feeds[user_id] = data
if __name__ == '__main__':
app.run(host='0.0.0.0', port=Config.PORT, debug=True)
I trued to do with single user id. but that is also not a good idea.
It will be helpful if anyone having good knowledge in SSE help me find the solution.
Thanks in advance.

Requests failing when made to service URL, but not to version url

I'm trying to set up a service on Google App Engine, but am having trouble getting XmlHttp to work consistently with it.
After deploying, the website can be accessed from 2 different urls: service-dot-project.appspot and version-dot-service-dot-project.appspot, and for some reason there is inconsistencies between the two.
Heres some demo code that verifyably causes me trouble.
# routes.py
from flask import render_template
from . import app
#app.route("/test", methods=["GET"])
def test():
return render_template("test.html")
#app.route("/api/test", methods=["GET"])
def api_test():
return "It Works!"
# templates/test.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Debug</title>
</head>
<body>
<div id="out"></div>
<button type="button" onclick="run()">
Test the thing.
</button>
<script>
function run() {
let xmlHttp = new XMLHttpRequest();
xmlHttp.onreadystatechange = () => {
if (xmlHttp.readyState === 4 && xmlHttp.status === 200)
document.getElementById("out").innerText = xmlHttp.responseText;
}
xmlHttp.open("GET", "/api/test", true);
xmlHttp.send(null);
}
</script>
</body>
</html>
# service.yaml
runtime: python38
service: name
automatic_scaling:
min_idle_instances: 1
instance_class: F4
entrypoint: gunicorn -b :$PORT main:app
env_variables:
...
When I try and press the button on the version url, it works as intended, and "It Works!" gets printed into the div above the button, but on the service url (without the version specified), the page itself loads, but pressing the button causes the request to hang for a few seconds, before printing this to the console:
GET https://service-dot-project.appspot.com/api/test [HTTP/2 404 Not Found 7912ms]
When testing using a local flask debugging environment, the problem does not occur.
Is there something that Google App Engine does that I should know about that may have caused this issue to happen? Is /api a reserved endpoint? The rest of my endpoints works on the service url, its only the api endpoints that break. My only app.before_request method fails with a 403, not a 404, so this cannot be the cause.
if you go to https://console.cloud.google.com/appengine/versions
and select your service that is having troubles, is there some other version that is receiving the traffic instead of your desired version?
Also, try going to the logs, find the entry for the 404, expand it and see which version is throwing that error, under protoPayload > versionId
It seems that the issue is being caused by one of the other services running on our project.
Our default service is defining in its dispatch.yaml
dispatch:
- url: "/api*"
module: otherservice
Which is intercepting all the requests made to myservice-dot-project and redirecting them to otherservice-dot-project
Why this isn't the case for the version url is probably because there is no version of the default service with the same version number.
The fix is to either change the dispatch url of the default service, or change the url of the new service's API endpoints.

Why does Node.js HTTP server not respond to a request from Python?

I've got a working HTTP node.js server.
I then created a program on python that uses the socket module to connect to the above server
Please for the time being do not mind the try and except statements. The code's connectTO() function simply connects to a server like any other code, with the exception that it handles some errors. Then the program send the message "hello". Next in the while loop it repeatedly waits for an answer and when it receives one, it prints it.
When I connect to the Node.js http server from python, I do get the message:
"You have just succesfully connected to the node.js server"
Which if you look at my code means that the s.connect(()) command was successful. My problem is that when a request is send to the server, it's supposed to output a message back, but it doesn't.
I also tried sending a message to the server, in which case the server sends back the following message:
HTTP/1.1 400 Bad Request
So why is the server not responding to the requests? Why is it rejecting them?
Python Client:
from socket import AF_INET, SOCK_STREAM, SOL_SOCKET, SO_REUSEADDR
import threading, socket, time, sys
s = socket.socket(AF_INET,SOCK_STREAM)
def connectTO(host,port):
connect = False
count = 0
totalCount = 0
while connect!= True:
try:
s.connect((host,port))
connect = True
print("You have just succesfully connected to the node.js server")
except OSError:
count += 1
totalCount += 1
if totalCount == 40 and count == 4:
print("Error: 404. Connection failed repeatedly")
sys.exit(0)
elif count == 4:
print("Connection failed, retrying...")
count = 0
else:
pass
connectTO("IP_OF_NODE.jS_SERVER_GOES_HERE",777)
message = "hello"
s.send(message.encode("utf-8"))
while True:
try:
data, addr = s.recvfrom(1024)
if data == "":
pass
else:
print(data.decode())
except ConnectionResetError:
print("it seems like we can't reach the server anymore..")
print("This could be due to a change in your internet connection or the server.")
s.close()
Node.js HTTP server:
function onRequest(req, res) {
var postData = "";
var pathname = url.parse(req.url).pathname;
//Inform console of event recievent
console.log("Request for "+pathanme+" received.");
//Set the encoding to equivelant one used in html
req.setEncoding("utf8");
//add a listener for whenever info comes in and output full result
req.addListener("data", function(postDataChunk) {
postData += postDataChunk;
console.log("Received POST data chunk: '"+postDataChunk+"'");
});
req.addListener("end", function() {
route(handle, pathname, res, frontPage, postData);
});
};
http.createServer(onRequest).listen(port,ip);
console.log("Server has started.");
Some of my Research
I should also note that after some research, it seems that an HTTP server accepts HTTP requests, but I don't understand most of what's on Wikipedia. Is this the reason why the server is not responding? And how do I fix that while still using the socket module.
Also there are a lot of similar questions on Stack Overflow, but none help me solve my problem. One of them describes my issue, and the only answer is about "handshakes". Google is also pointless here, but from what I understand it is simply a reaction between the server and the client which defines what the protocol will be. Could this be what I'm missing, and how do I implement it?
Some of these questions also use modules that I'm not ready to use yet like websocket. Either that or they describe a way in which the server connects to the client, which can be done by directly calling python code or connecting to it from Node.js express. I want the client to be the one connecting to an HTTP server, by the means of the socket module in python. For the sake of future visitors who are looking for something like this, here are some of these question:
How to connect node.js app with python script?
Python Client to nodeJS Server with Socket.IO
Python connecting to an HTTP server
A blog that also does something similar to what is described above: https://www.sohamkamani.com/blog/2015/08/21/python-nodejs-comm/
Here is an answer that doesn't actually seem that obvious, but also solves the issue with only the relevant code. People who don't yet no much about servers in general will have probably missed it:
how to use socket fetch webpage use python
You will need to construct an HTTP request.
Example: GET / HTTP/1.1\n\n
Try this:
from socket import AF_INET, SOCK_STREAM, SOL_SOCKET, SO_REUSEADDR
import threading, socket, time, sys
s = socket.socket(AF_INET,SOCK_STREAM)
def connectTO(host,port):
connect = False
count = 0
totalCount = 0
while connect!= True:
try:
s.connect((host,port))
connect = True
print("You have just succesfully connected to the node.js server")
except OSError:
count += 1
totalCount += 1
if totalCount == 40 and count == 4:
print("Error: 404. Connection failed repeatedly")
sys.exit(0)
elif count == 4:
print("Connection failed, retrying...")
count = 0
else:
pass
connectTO("IP_OF_NODE.jS_SERVER_GOES_HERE",777)
message = "GET / HTTP/1.1\n\n"
s.send(message.encode("utf-8"))
while True:
try:
data, addr = s.recvfrom(1024)
if data == "":
pass
else:
print(data.decode())
except ConnectionResetError:
print("it seems like we can't reach the server anymore..")
print("This could be due to a change in your internet connection or the server.")
s.close()
Read this to learn more about HTTP.
Now, I would recommend using this python lib to do what you're trying to do. It makes things much easier. However, if you are 100% set on using raw sockets, then you should make the node server use raw sockets as well. (Assuming you will only be connecting via python). Here is an excellent tutorial

Node.js catch empty POST request body

I am trying to catch a POST request with an empty body before it causes my server to crash. I've seen people using bodyparser but I am using the MVC model and basically I don't have references to app in this .js file.
var resource = req.body;
if(!resource) return res.status(400).send("Your request is missing details.");
I was told to try something like this but it still does not work. When I console.log resource it appears as "{}" even when no body was added in postman, so the null check doesn't work. If anyone has any advice I would appreciate it!
You can use:
if(Object.keys(req.body).length === 0)
or Object.getOwnPropertyNames(req.body).length == 0
And then your logic to respond to the user.

Sending data from JavaScript to Python function locally with AJAX

I am trying to build a website where a user can enter text, which will be picked up via javascript, and sent to a python function where it will be posted to twitter. For the time being, the python function is being stored locally, along with the rest of the site. However, my AJAX isn't too great and I'm having a few issues.
I have written AJAX code which sends a POST request to the python function with the tweet, and the response is the entire python script. No connection is made to the socket my script is listening to. Below is the AJAX function and the python script. Any ideas what's going on?
Thanks in advance for any help!
$(function(){
$('#PostTweet').on('click', function(e) {
var tweet = document.getElementById("theTweet").value;
var len = tweet.length;
if(len > 140){
window.alert("Tweet too long. Please remove some characters");
}else{
callPython(tweet);
}
});
});
function callPython(tweet){
window.alert("sending");
$.ajax({
type: "POST",
url: "tweet.py",
data: tweet,
success: function(response){
window.alert(response);
}
})
}
And the Python Script:
from OAuthSettings import settings
import twitter
from socket import *
consumer_key = settings['consumer_key']
consumer_secret = settings['consumer_secret']
access_token_key = settings['access_token_key']
access_token_secret = settings['access_token_secret']
s = socket()
s.bind(('', 9999))
s.listen(4)
(ns, na) = s.accept()
def PostToTwits(data):
try:
api = twitter.Api(
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token_key = access_token_key,
access_token_secret = access_token_secret)
api.PostUpdate(data)
makeConnection(s)
except twitter.TwitterError:
print 'Post Unsuccessful. Error Occurred'
def makeConnection(s):
while True:
print "connected with: " + str(na)
try:
data = ns.recv(4096)
print data
PostToTwits(data)
except:
ns.close()
s.close()
break
makeConnection(s)
Your problem is that you are working with pure sockets which know nothing about HTTP protocol. Take a look at Flask or Bottle web micro frameworks to see how to turn python script or function into web endpoint.
you need a webserver so that your can make request via web browser.
you can web framework like flask or django or you can use webpy.
A simple example using webpy from their website
import web
urls = (
'/(.*)', 'hello'
)
app = web.application(urls, globals())
class hello:
def GET(self, name):
if not name:
name = 'World'
return 'Hello, ' + name + '!'
if __name__ == "__main__":
app.run()
then you call url(your python function) from javascript.
You can totally write a simple web server using sockets, and indeed you've done so. But this approach will quickly get tedious for anything beyond a simple exercise.
For example, your code is restricted to handling a single request handler, which goes to the heart of your problem.
The url on the post request is wrong. In your setup there is no notion of a url "tweet.py". That url would actually work if you were also serving the web page where the jquery lives from the same server (but you can't be).
You have to post to "http://localhost:9999" and you can have any path you want after:"http://localhost:9999/foo", "http://localhost:9999/boo". Just make sure you run the python script from the command line first, so the server is listening.
Also the difference between a get and a post request is part of the HTTP protocol which your simple server doesn't know anything about. This mainly means that it doesn't matter what verb you use on the ajax request. Your server listens for all HTTP verb types.
Lastly, I'm not seeing any data being returned to the client. You need to do something like ns.sendall("Some response"). Tutorials for building a simple http server abound and show different ways of sending responses.

Categories

Resources