Here is a small websockets client and server POC.
It sends a single hard-coded message string from the (Python) server to the Javascript client page.
The question is, how to send further, ad-hoc messages? From the server to the client.
Tiny HTML client page with embedded Javascript:
<!DOCTYPE html>
<html lang="en">
<body> See console for messages </body>
<script>
# Create websocket
const socket = new WebSocket('ws://localhost:8000');
# Add listener to receive server messages
socket.addEventListener('open', function (event) {
socket.send('Connection Established');
});
# Add message to browser console
socket.addEventListener('message', function (event) {
console.log(event.data);
});
</script>
</html>
Here is the Python server code:
import asyncio
import websockets
import time
# Create handler for each connection
async def handler(websocket, path):
await websocket.send("message from websockets server")
# Start websocket server
start_server = websockets.serve(handler, "localhost", 8000)
# Start async code
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
This successfully sends a hard-coded message from server to client.
You can see the message in the browser console.
At this point the websocket is open.
The main application (not shown) now needs to send messages.
These will be dynamic messages, not hard-coded.
How can we send later, dynamic messages from the server?
After the code here runs?
I would like to put the socket into a global variable and call a send method but this is not possible because the server runs a continuous loop.
You can insert further messages into the Python server code like this:
import asyncio
import datetime
from typing import Iterator
import websockets
import random
websocket_connections = set()
sock_port = 8000
sock_url = 'localhost'
global_socket = lambda: None
async def register(websocket):
print('register event received')
websocket_connections.add(websocket) # Add this client's socket
global_socket = websocket
async def poll_log():
await asyncio.sleep(0.3) # Settle
while True:
await asyncio.sleep(0.3) # Slow things down
# Send a dynamic message to the client after random delay
r = random.randint(1, 10)
if (r == 5): # Only send 10% of the time
a_msg = "srv -> cli: " + str(random.randint(1,10000))
print("sending msg: " + a_msg)
websockets.broadcast(websocket_connections, a_msg) # Send to all connected clients
async def main():
sock_server = websockets.serve(register, sock_url, sock_port)
await asyncio.sleep(0.3) # Start up time
async with sock_server: await poll_log()
if __name__ == "__main__":
print("Websockets server starting up ...")
asyncio.run(main())
There is a very helpful example of a complete, full-duplex Websockets application here.
That example is part of the Websockets 10.4 documentation.
It's very helpful as a reference and to understand how websockets are used.
I have a website I want to automate some actions on but the page is generated by 2 JavaScript files and is defined like this in the html:
<script src="/build/runtime.js"></script><script src="/build/app.js"></script>
runtime.js is about 70 lines and app.js is about 40k lines... I have no idea how to read the code as I don't know any JavaScript and my Pyton knowledge is a mere atom more ;)
I'd share the particular site but the page is behind a login. So I've managed to get to the page using 2 different methods but can't find a way to press buttons within this next page generated by the JS.
Method 1 - Requests & BeautifulSoup but got stuck on the JS bit so switched to method2.
import requests
from bs4 import BeautifulSoup
# Site & creds
LOGIN_URL = 'https://website.com/login'
USERNAME = 'user'
PASSWORD = 'pass'
# Pretend to be browser
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
# Start session
session = requests.session()
# Get login page
response = session.get(LOGIN_URL, headers=headers, verify=False)
# Get csrf token
soup = BeautifulSoup(response.content, 'html.parser')
csrf_token = (soup.find(id="login_form__token")["value"])
# Set creds with csfr token
payload = {
'login_form[username]': USERNAME,
'login_form[password]': PASSWORD,
'login_form[login]': '',
'login_form[_token]': csrf_token
}
# Login & do something else with cookies I don't understand
response = session.post(LOGIN_URL, data=payload, verify=False)
response = session.get('https://website.com/pageIWant', verify=False)
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.prettify())
Method 2 - Selenium & ChromeDriver
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
options.add_argument('--disable-gpu')
options.add_argument('--disable-software-rasterizer')
driver = webdriver.Chrome(options=options, executable_path='chromedriver.exe')
driver.get("https://website.com/login")
driver.find_element_by_id("login_form_username").send_keys('user')
driver.find_element_by_id("login_form_password").send_keys('pass')
driver.find_element_by_id("login_form_login").click()
driver.get("https://website.com/pageIWant")
html = driver.page_source
print(html)
So I thought method 2 would make things easier but pretty much stuck at the same point. The page generated that I want contains buttons I'd need to press in order to access downstream pages. Read a lot about accessing elements but can't see anything within this 40k worth of JS jibberish. Where is a good place to start?
"Where is a good place to start?"
Regardless of how the page is generated (HTML or JS), ultimately what you have to address in Selenium is the page's live DOM. So "where to start" is inspecting the page's DOM in browser dev tools, and from the DOM, figure out how to find the button elements in Selenium.
I'm building a chrome extension and I want to run a python script which is in my PC on click of a button from an extension (which is basically HTML). The python script uses selenium web-driver to scrape data from a website and store it in another log file.
You basically use nativeMessaging. It allows you to create a communication bridge between your extension and an external process (such as python).
The way nativeMessaging works is by installing a host on your machine, and communicates to and from Chrome extension through stdin and stdout. For example:
Host in Python
This is how you write your nativeMessaging host in python, I have included the full example of this from the docs, but made it easier to understand with less code.
host.py
This is basically an echo server, respects stdin and stdout, makes sure it sends it as binary stream.
#!/usr/bin/env python
import struct
import sys
import os, msvcrt
# Set the I/O to O_BINARY to avoid modifications from input/output streams.
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
# Helper function that sends a message to the webapp.
def send_message(message):
# Write message size.
sys.stdout.write(struct.pack('I', len(message)))
# Write the message itself.
sys.stdout.write(message)
sys.stdout.flush()
# Thread that reads messages from the webapp.
def read_thread_func():
message_number = 0
while 1:
# Read the message length (first 4 bytes).
text_length_bytes = sys.stdin.read(4)
if len(text_length_bytes) == 0:
sys.exit(0)
# Unpack message length as 4 byte integer.
text_length = struct.unpack('i', text_length_bytes)[0]
# Read the text (JSON object) of the message.
text = sys.stdin.read(text_length).decode('utf-8')
send_message('{"echo": %s}' % text)
def Main():
read_thread_func()
sys.exit(0)
if __name__ == '__main__':
Main()
host.json
This defines the communication python host, make sure the extension guid is the guid of your extension.
{
"name": "com.google.chrome.example.echo",
"description": "Chrome Native Messaging API Example Host",
"path": "host.bat",
"type": "stdio",
"allowed_origins": [
"chrome-extension://knldjmfmopnpolahpmmgbagdohdnhkik/"
]
}
host.bat
This runs the python executable.
#echo off
python "%~dp0/host.py" %*
install_host.bat
You run this once, to register your host in your OS.
REG ADD "HKCU\Software\Google\Chrome\NativeMessagingHosts\com.google.chrome.example.echo" /ve /t REG_SZ /d "%~dp0host.json" /f
Chrome Extension
manifest.json
Add the permissions for nativeMessing
{
"permissions": [
"nativeMessaging"
]
}
communication.js
In order to connect to the python host, you need to do the following:
const hostName = "com.google.chrome.example.echo";
let port = chrome.runtime.connectNative(hostName);
port.onMessage.addListener(onNativeMessage);
port.onDisconnect.addListener(onDisconnected);
To send a message to your python host, just send a json object to the port.
const message = {"text": "Hello World"};
if (port) {
port.postMessage(message);
}
To know the error when it disconnects:
function onDisconnected() {
port = null;
console.error(`Failed to connect: "${chrome.runtime.lastError.message}"`);
}
This full example is in the docs, I just renamed some stuff for clarity, available for Windows/Unix https://chromium.googlesource.com/chromium/src/+/master/chrome/common/extensions/docs/examples/api/nativeMessaging
Im trying to do a HTTPS GET with basic authentication using python. Im very new to python and the guides seem to use diffrent librarys to do things. (http.client, httplib and urllib). Can anyone show me how its done? How can you tell the standard library to use?
In Python 3 the following will work. I am using the lower level http.client from the standard library. Also check out section 2 of rfc2617 for details of basic authorization. This code won't check the certificate is valid, but will set up a https connection. See the http.client docs on how to do that.
from http.client import HTTPSConnection
from base64 import b64encode
# Authorization token: we need to base 64 encode it
# and then decode it to acsii as python 3 stores it as a byte string
def basic_auth(username, password):
token = b64encode(f"{username}:{password}".encode('utf-8')).decode("ascii")
return f'Basic {token}'
username = "user_name"
password = "password"
#This sets up the https connection
c = HTTPSConnection("www.google.com")
#then connect
headers = { 'Authorization' : basic_auth(username, password) }
c.request('GET', '/', headers=headers)
#get the response back
res = c.getresponse()
# at this point you could check the status etc
# this gets the page text
data = res.read()
Use the power of Python and lean on one of the best libraries around: requests
import requests
r = requests.get('https://my.website.com/rest/path', auth=('myusername', 'mybasicpass'))
print(r.text)
Variable r (requests response) has a lot more parameters that you can use. Best thing is to pop into the interactive interpreter and play around with it, and/or read requests docs.
ubuntu#hostname:/home/ubuntu$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> r = requests.get('https://my.website.com/rest/path', auth=('myusername', 'mybasicpass'))
>>> dir(r)
['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'iter_content', 'iter_lines', 'json', 'links', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
>>> r.content
b'{"battery_status":0,"margin_status":0,"timestamp_status":null,"req_status":0}'
>>> r.text
'{"battery_status":0,"margin_status":0,"timestamp_status":null,"req_status":0}'
>>> r.status_code
200
>>> r.headers
CaseInsensitiveDict({'x-powered-by': 'Express', 'content-length': '77', 'date': 'Fri, 20 May 2016 02:06:18 GMT', 'server': 'nginx/1.6.3', 'connection': 'keep-alive', 'content-type': 'application/json; charset=utf-8'})
Update: OP uses Python 3. So adding an example using httplib2
import httplib2
h = httplib2.Http(".cache")
h.add_credentials('name', 'password') # Basic authentication
resp, content = h.request("https://host/path/to/resource", "POST", body="foobar")
The below works for python 2.6:
I use pycurl a lot in production for a process which does upwards of 10 million requests per day.
You'll need to import the following first.
import pycurl
import cStringIO
import base64
Part of the basic authentication header consists of the username and password encoded as Base64.
headers = { 'Authorization' : 'Basic %s' % base64.b64encode("username:password") }
In the HTTP header you will see this line Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=. The encoded string changes depending on your username and password.
We now need a place to write our HTTP response to and a curl connection handle.
response = cStringIO.StringIO()
conn = pycurl.Curl()
We can set various curl options. For a complete list of options, see this. The linked documentation is for the libcurl API, but the options does not change for other language bindings.
conn.setopt(pycurl.VERBOSE, 1)
conn.setopt(pycurlHTTPHEADER, ["%s: %s" % t for t in headers.items()])
conn.setopt(pycurl.URL, "https://host/path/to/resource")
conn.setopt(pycurl.POST, 1)
If you do not need to verify certificate. Warning: This is insecure. Similar to running curl -k or curl --insecure.
conn.setopt(pycurl.SSL_VERIFYPEER, False)
conn.setopt(pycurl.SSL_VERIFYHOST, False)
Call cStringIO.write for storing the HTTP response.
conn.setopt(pycurl.WRITEFUNCTION, response.write)
When you're making a POST request.
post_body = "foobar"
conn.setopt(pycurl.POSTFIELDS, post_body)
Make the actual request now.
conn.perform()
Do something based on the HTTP response code.
http_code = conn.getinfo(pycurl.HTTP_CODE)
if http_code is 200:
print response.getvalue()
A correct way to do basic auth in Python3 urllib.request with certificate validation follows.
Note that certifi is not mandatory. You can use your OS bundle (likely *nix only) or distribute Mozilla's CA Bundle yourself. Or if the hosts you communicate with are just a few, concatenate CA file yourself from the hosts' CAs, which can reduce the risk of MitM attack caused by another corrupt CA.
#!/usr/bin/env python3
import urllib.request
import ssl
import certifi
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
context.verify_mode = ssl.CERT_REQUIRED
context.load_verify_locations(certifi.where())
httpsHandler = urllib.request.HTTPSHandler(context = context)
manager = urllib.request.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, 'https://domain.com/', 'username', 'password')
authHandler = urllib.request.HTTPBasicAuthHandler(manager)
opener = urllib.request.build_opener(httpsHandler, authHandler)
# Used globally for all urllib.request requests.
# If it doesn't fit your design, use opener directly.
urllib.request.install_opener(opener)
response = urllib.request.urlopen('https://domain.com/some/path')
print(response.read())
Based on the #AndrewCox 's answer with some minor improvements:
from http.client import HTTPSConnection
from base64 import b64encode
client = HTTPSConnection("www.google.com")
user = "user_name"
password = "password"
headers = {
"Authorization": "Basic {}".format(
b64encode(bytes(f"{user}:{password}", "utf-8")).decode("ascii")
)
}
client.request('GET', '/', headers=headers)
res = client.getresponse()
data = res.read()
Note, you should set encoding if you use bytes function instead of b"".
requests.get(url, auth=requests.auth.HTTPBasicAuth(username=token, password=''))
If with token, password should be ''.
It works for me.
using only standard modules and no manual header encoding
...which seems to be the intended and most portable way
the concept of python urllib is to group the numerous attributes of the request into various managers/directors/contexts... which then process their parts:
import urllib.request, ssl
# to avoid verifying ssl certificates
httpsHa = urllib.request.HTTPSHandler(context= ssl._create_unverified_context())
# setting up realm+urls+user-password auth
# (top_level_url may be sequence, also the complete url, realm None is default)
top_level_url = 'https://ip:port_or_domain'
# of the std managers, this can send user+passwd in one go,
# not after HTTP req->401 sequence
password_mgr = urllib.request.HTTPPasswordMgrWithPriorAuth()
password_mgr.add_password(None, top_level_url, "user", "password", is_authenticated=True)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create OpenerDirector
opener = urllib.request.build_opener(handler, httpsHa)
url = top_level_url + '/some_url?some_query...'
response = opener.open(url)
print(response.read())
GET & POST request is usually used to submit forms. Here is a brief example of its usage
Views.py
def index(request)
col1 = float(request.GET.get('col1'))
index.html
<div class="form-group col-md-2">
<label for="Col 1">Price</label>
<input type="number" class="form-control" id="col1" name="col1">
</div>
Ok, I'm writing a pyqt software to generate a webpage. Due to some security issues with Chrome and other things, I need a webserver to test the webpage.
So I thought to create a button called run, that you can click or press f5 to start a server, and open the browser to the page. The snippet of code that this button calls, simplified (there is some code to do things, including changing current directory and such), looks like this:
import sys
import webbrowser
from SimpleHTTPServer import SimpleHTTPRequestHandler as HandlerClass
from BaseHTTPServer import HTTPServer as ServerClass
Protocol = 'HTTP/1.0'
port = 8080
ip = '127.0.0.1'
new = 2 #goes to new tab
url = "http://"+ip+":{0}".format(port)
serverAddress = (ip,port)
HandlerClass.protocol = Protocol
httpd = ServerClass(serverAddress, HandlerClass)
sa = httpd.socket.getsockname()
webbrowser.open(url,new=new)
httpd.serve_forever()
Ok, the problem is as serve_forever is called, it can be expected to serve forever. Is there a way to kill the server after browser is closed?
Edit: I understand many people recommend using threads but I can't find a way to detect that the browser has closed or killing the thread in system monitor (I'm on Ubuntu) while testing.
Edit2: ok, I've read webbrowser.py, it doesn't seem to return any sort of process identifier...
Edit3: I'm reflecting on this, maybe the correct approach would be checking if someone is accessing the server, and if not, then kill it... This way I can detect if the tab was closed... Problem is the only way I can think uses a dummy page with this power that loads whatever page to test inside, which seems too hackish...
It seems if I can find a way of doing this, maybe through error responses...I can do a webserver in a subprocess that has a while and exits by itself like the one here: https://docs.python.org/2/library/basehttpserver.html#more-examples
import sys
#from threading import Thread
import webbrowser
import BaseHTTPServer
import SimpleHTTPServer
serverClass=BaseHTTPServer.HTTPServer
handlerClass=SimpleHTTPServer.SimpleHTTPRequestHandler
Protocol = "HTTP/1.0"
port = 8080
ip = '127.0.0.1'
new = 2 #2 goes to new tab, 0 same and 1 window.
url = "http://"+ip+":{0}".format(port)
handlerClass.protocol = Protocol
httpd = serverClass((ip,port), handlerClass)
sa = httpd.socket.getsockname()
print("\n---\nServing HTTP on {0}, port {1}\n---\n".format(sa[0],sa[1]) )
browserOk = webbrowser.open(url,new=new)
def runWhileTrue():
while True:
#print(vars(httpd))
httpd.handle_request()
runWhileTrue()
Right now I'm thinking about using a timer like a watchdog, if the server is not used more then a period, it get's killed... But I think this is an awful solution... I wanted the browser to ping for it for some time while the tab is opened...maybe, don't know if optimal, looking this code right now : SimpleHTTPServer and SocketServer .
Thinking maybe if the server could understand a message from the website it could break loop. The tab closure could be detected in javascript like here : Browser/tab close detection using javascript (or any other language). Don't know how to communicate this to the server.
EditFinal:
In the javascript code of the webpage, I've inserted:
window.addEventListener('unload', function (e) { e.preventDefault(); jsonLevelGet("http://127.0.0.1:8081/exit.json"); }, false);
Then, the python code is this server.py:
import sys
from threading import Thread
import webbrowser
import BaseHTTPServer
import SimpleHTTPServer
serverClass=BaseHTTPServer.HTTPServer
handlerClass=SimpleHTTPServer.SimpleHTTPRequestHandler
Protocol = "HTTP/1.0"
port = 8080
ip = '127.0.0.1'
admIp = ip
admPort = 8081
new = 2 #2 goes to new tab, 0 same and 1 window.
url = "http://"+ip+":{0}".format(port)
handlerClass.protocol = Protocol
httpdGame = serverClass((ip,port), handlerClass)
httpdAdm = serverClass((admIp,admPort), handlerClass)
sa = httpdGame.socket.getsockname()
sb = httpdAdm.socket.getsockname()
print("\n---\nServing HTTP on {0}, port {1}\n---\n".format(sa[0],sa[1]) )
print("\n---\nAdm HTTP listening on {0}, port {1}\n---\n".format(sb[0],sb[1]) )
browserOk = webbrowser.open(url,new=new)
def runGameServer():
httpdGame.serve_forever()
print("\nrunGameServer stopped\n")
httpdAdm.shutdown()
return
def runAdmServer():
httpdAdm.handle_request()
httpdGame.shutdown()
print("\nrunAdmServer stopped\n")
return
gameServerThread = Thread(target=runGameServer)
gameServerThread.daemon = True
admServerThread = Thread(target=runAdmServer)
gameServerThread.start()
admServerThread.start()
admServerThread.join()
It works! When the tab is closed, the server.py code exits! Thanks #st.never!
As you said, you could detect (in Javascript, in the browser) that the window is being closed, and send one last request to the server to shut it down.
If you don't want to inspect all the requests searching for the "poweroff request", you can instead have your server listen on two different ports (probably on different threads). For example, the "main" server listens on port 8080 with the current behaviour, and a separate instance listens on port 8081. Then you can simply shut down the server whenever any request reaches port 8081.