i am new here and i wanted to create a custom Discord Bot. Starting with python, i was trying to implement a youtube searching feature to it, but i got locked away because of some user agent problem that looks overcomplicated for me, so i went to Developers.Google and i proceeded to create myself a custom search engine supposed to work with Youtube since i already had the API of it, but i realised that the code for it was in JS, thus making me ask the question : is it possible to make this custom search engine work with my Python Bot ?
here is my current code for it :
import urllib.parse, urllib.request, re
import aiohttp
import asyncio
from googleapiclient.discovery import build
def get_service():
return build("youtube", "v3", developerKey="edmond-dantefesses")
def search(term, channel):
service = get_service()
resp = service.search().list(
part="id",
q=term,
# safeSearch="none" if channel.is_nsfw() else "moderate",
videoDimension="2d",
).execute()
return resp["items"][0]["id"]["videoId"]
BASE = "https://youtube.com/results"
#bot.command()
async def YT(ctx, *, search):
p = {"search_query": search}
# Spoof a user agent header or the request will immediately fail
h = {"User-Agent": "Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
async with aiohttp.ClientSession() as bot:
async with bot.get(BASE, params=p, headers=h) as resp:
dom = await resp.text()
# open("debug.html", "w").write(dom)
found = re.findall(r'href"\/watch\?v=([a-zA-Z0-9_-]{11})', dom)
return f"https://youtu.be/{found[0]}"
i understand if my question sounds stupid or surreal and i thank you very much for your patience, have a good day :)
I'm trying to download files from http://www.oracle.com/technetwork/server-storage/developerstudio/downloads/index.html in a headless context. I have an account (they are free), but the site really doesn't make it easy, apparently it uses a chain of javascript forms/redirection. With Firefox I can use the element inspector, copy the url of the file as cURL when the download starts, and use it in a headless machine to download the file, but so far all my attempts to get the file only in the headless machine have failed.
I've managed to get the login with:
#!/usr/bin/env python3
username="<my username>"
password="<my password>"
import requests
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
caps = DesiredCapabilities.PHANTOMJS
caps["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0"
driver = webdriver.PhantomJS("/usr/local/bin/phantomjs")
driver.set_window_size(1120, 550)
driver.get("http://www.oracle.com/technetwork/server-storage/developerstudio/downloads/index.html")
print("loaded")
driver.find_element_by_name("agreement").click()
print("clicked agreement")
driver.find_element_by_partial_link_text("RPM installer").click()
print("clicked link")
driver.find_element_by_id("sso_username").send_keys(username)
driver.find_element_by_id("ssopassword").send_keys(password)
driver.find_element_by_xpath("//input[contains(#title,'Please click here to sign in')]").click()
print("submitted")
print(driver.get_cookies())
print(driver.current_url)
print(driver.page_source)
driver.quit()
I suspect the login worked, because in the cookies I see some data associated with my username, but in Firefox submitting the form results in a download starting after 3-4 redirections, while here I get nothing and the page_source and current_url still belong to the login page.
Maybe the site is actively blocking this kind of use, or maybe I'm doing something wrong. Any idea how to actually download the file?
Thanks to TheChetan's comment I got it working. I didn't use the javascript-blob route though, but the requests approach suggested by Tarun Lalwani in https://stackoverflow.com/a/46027215. It took me a while to realize I had to modify the user agent in the request too. Finally this works for me:
#!/usr/bin/env python3
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from requests import Session
from urllib.parse import urlparse
from os.path import basename
from hashlib import sha256
import sys
index_url = "http://www.oracle.com/technetwork/server-storage/developerstudio/downloads/index.html"
link_text = "RPM installer"
username="<my username>"
password="<my password>"
user_agent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0"
# set up browser
caps = DesiredCapabilities.PHANTOMJS
caps["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS("/usr/local/bin/phantomjs")
driver.set_window_size(800,600)
# load index page and click through
driver.get(index_url)
print("loaded")
driver.find_element_by_name("agreement").click()
print("clicked agreement")
link = driver.find_element_by_partial_link_text(link_text)
sha = driver.find_element_by_xpath("//*[contains(text(), '{0}')]/following::*[contains(text(), 'sum:')]/following-sibling::*".format(link_text)).text
file_url = link.get_attribute("href")
filename = basename(urlparse(file_url).path)
print("filename: {0}".format(filename))
print("checksum: {0}".format(sha))
link.click()
print("clicked link")
driver.find_element_by_id("sso_username").send_keys(username)
driver.find_element_by_id("ssopassword").send_keys(password)
driver.find_element_by_xpath("//input[contains(#title,'Please click here to sign in')]").click()
print("submitted")
# we should be logged in now
def progressBar(title, value, endvalue, bar_length=60):
percent = float(value) / endvalue
arrow = '-' * int(round(percent * bar_length)-1) + '>'
spaces = ' ' * (bar_length - len(arrow))
sys.stdout.write("\r{0}: [{1}] {2}%".format(title, arrow + spaces, int(round(percent * 100))))
sys.stdout.flush()
# transfer the cookies to a new session and request the file
session = Session()
session.headers = {"user-agent": user_agent}
for cookie in driver.get_cookies():
session.cookies.set(cookie["name"], cookie["value"])
driver.quit()
r = session.get(file_url, stream=True)
# now we should have gotten the url with param
new_url = r.url
print("final url {0}".format(new_url))
r = session.get(new_url, stream=True)
print("requested")
length = int(r.headers['Content-Length'])
title = "Downloading ({0})".format(length)
sha_file = sha256()
chunk_size = 2048
done = 0
with open(filename, "wb") as f:
for chunk in r.iter_content(chunk_size):
f.write(chunk)
sha_file.update(chunk)
done = done+len(chunk)
progressBar(title, done, length)
print()
# check integrity
if (sha_file.hexdigest() == sha):
print("checksums match")
sys.exit(0)
else:
print("checksums do NOT match!")
sys.exit(1)
So at the end the idea is using selenium+phantomjs for logging in, and then using the cookies for a plain request.
Im trying to do a HTTPS GET with basic authentication using python. Im very new to python and the guides seem to use diffrent librarys to do things. (http.client, httplib and urllib). Can anyone show me how its done? How can you tell the standard library to use?
In Python 3 the following will work. I am using the lower level http.client from the standard library. Also check out section 2 of rfc2617 for details of basic authorization. This code won't check the certificate is valid, but will set up a https connection. See the http.client docs on how to do that.
from http.client import HTTPSConnection
from base64 import b64encode
# Authorization token: we need to base 64 encode it
# and then decode it to acsii as python 3 stores it as a byte string
def basic_auth(username, password):
token = b64encode(f"{username}:{password}".encode('utf-8')).decode("ascii")
return f'Basic {token}'
username = "user_name"
password = "password"
#This sets up the https connection
c = HTTPSConnection("www.google.com")
#then connect
headers = { 'Authorization' : basic_auth(username, password) }
c.request('GET', '/', headers=headers)
#get the response back
res = c.getresponse()
# at this point you could check the status etc
# this gets the page text
data = res.read()
Use the power of Python and lean on one of the best libraries around: requests
import requests
r = requests.get('https://my.website.com/rest/path', auth=('myusername', 'mybasicpass'))
print(r.text)
Variable r (requests response) has a lot more parameters that you can use. Best thing is to pop into the interactive interpreter and play around with it, and/or read requests docs.
ubuntu#hostname:/home/ubuntu$ python3
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> r = requests.get('https://my.website.com/rest/path', auth=('myusername', 'mybasicpass'))
>>> dir(r)
['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'iter_content', 'iter_lines', 'json', 'links', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
>>> r.content
b'{"battery_status":0,"margin_status":0,"timestamp_status":null,"req_status":0}'
>>> r.text
'{"battery_status":0,"margin_status":0,"timestamp_status":null,"req_status":0}'
>>> r.status_code
200
>>> r.headers
CaseInsensitiveDict({'x-powered-by': 'Express', 'content-length': '77', 'date': 'Fri, 20 May 2016 02:06:18 GMT', 'server': 'nginx/1.6.3', 'connection': 'keep-alive', 'content-type': 'application/json; charset=utf-8'})
Update: OP uses Python 3. So adding an example using httplib2
import httplib2
h = httplib2.Http(".cache")
h.add_credentials('name', 'password') # Basic authentication
resp, content = h.request("https://host/path/to/resource", "POST", body="foobar")
The below works for python 2.6:
I use pycurl a lot in production for a process which does upwards of 10 million requests per day.
You'll need to import the following first.
import pycurl
import cStringIO
import base64
Part of the basic authentication header consists of the username and password encoded as Base64.
headers = { 'Authorization' : 'Basic %s' % base64.b64encode("username:password") }
In the HTTP header you will see this line Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=. The encoded string changes depending on your username and password.
We now need a place to write our HTTP response to and a curl connection handle.
response = cStringIO.StringIO()
conn = pycurl.Curl()
We can set various curl options. For a complete list of options, see this. The linked documentation is for the libcurl API, but the options does not change for other language bindings.
conn.setopt(pycurl.VERBOSE, 1)
conn.setopt(pycurlHTTPHEADER, ["%s: %s" % t for t in headers.items()])
conn.setopt(pycurl.URL, "https://host/path/to/resource")
conn.setopt(pycurl.POST, 1)
If you do not need to verify certificate. Warning: This is insecure. Similar to running curl -k or curl --insecure.
conn.setopt(pycurl.SSL_VERIFYPEER, False)
conn.setopt(pycurl.SSL_VERIFYHOST, False)
Call cStringIO.write for storing the HTTP response.
conn.setopt(pycurl.WRITEFUNCTION, response.write)
When you're making a POST request.
post_body = "foobar"
conn.setopt(pycurl.POSTFIELDS, post_body)
Make the actual request now.
conn.perform()
Do something based on the HTTP response code.
http_code = conn.getinfo(pycurl.HTTP_CODE)
if http_code is 200:
print response.getvalue()
A correct way to do basic auth in Python3 urllib.request with certificate validation follows.
Note that certifi is not mandatory. You can use your OS bundle (likely *nix only) or distribute Mozilla's CA Bundle yourself. Or if the hosts you communicate with are just a few, concatenate CA file yourself from the hosts' CAs, which can reduce the risk of MitM attack caused by another corrupt CA.
#!/usr/bin/env python3
import urllib.request
import ssl
import certifi
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
context.verify_mode = ssl.CERT_REQUIRED
context.load_verify_locations(certifi.where())
httpsHandler = urllib.request.HTTPSHandler(context = context)
manager = urllib.request.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, 'https://domain.com/', 'username', 'password')
authHandler = urllib.request.HTTPBasicAuthHandler(manager)
opener = urllib.request.build_opener(httpsHandler, authHandler)
# Used globally for all urllib.request requests.
# If it doesn't fit your design, use opener directly.
urllib.request.install_opener(opener)
response = urllib.request.urlopen('https://domain.com/some/path')
print(response.read())
Based on the #AndrewCox 's answer with some minor improvements:
from http.client import HTTPSConnection
from base64 import b64encode
client = HTTPSConnection("www.google.com")
user = "user_name"
password = "password"
headers = {
"Authorization": "Basic {}".format(
b64encode(bytes(f"{user}:{password}", "utf-8")).decode("ascii")
)
}
client.request('GET', '/', headers=headers)
res = client.getresponse()
data = res.read()
Note, you should set encoding if you use bytes function instead of b"".
requests.get(url, auth=requests.auth.HTTPBasicAuth(username=token, password=''))
If with token, password should be ''.
It works for me.
using only standard modules and no manual header encoding
...which seems to be the intended and most portable way
the concept of python urllib is to group the numerous attributes of the request into various managers/directors/contexts... which then process their parts:
import urllib.request, ssl
# to avoid verifying ssl certificates
httpsHa = urllib.request.HTTPSHandler(context= ssl._create_unverified_context())
# setting up realm+urls+user-password auth
# (top_level_url may be sequence, also the complete url, realm None is default)
top_level_url = 'https://ip:port_or_domain'
# of the std managers, this can send user+passwd in one go,
# not after HTTP req->401 sequence
password_mgr = urllib.request.HTTPPasswordMgrWithPriorAuth()
password_mgr.add_password(None, top_level_url, "user", "password", is_authenticated=True)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create OpenerDirector
opener = urllib.request.build_opener(handler, httpsHa)
url = top_level_url + '/some_url?some_query...'
response = opener.open(url)
print(response.read())
GET & POST request is usually used to submit forms. Here is a brief example of its usage
Views.py
def index(request)
col1 = float(request.GET.get('col1'))
index.html
<div class="form-group col-md-2">
<label for="Col 1">Price</label>
<input type="number" class="form-control" id="col1" name="col1">
</div>
Ok, I'm writing a pyqt software to generate a webpage. Due to some security issues with Chrome and other things, I need a webserver to test the webpage.
So I thought to create a button called run, that you can click or press f5 to start a server, and open the browser to the page. The snippet of code that this button calls, simplified (there is some code to do things, including changing current directory and such), looks like this:
import sys
import webbrowser
from SimpleHTTPServer import SimpleHTTPRequestHandler as HandlerClass
from BaseHTTPServer import HTTPServer as ServerClass
Protocol = 'HTTP/1.0'
port = 8080
ip = '127.0.0.1'
new = 2 #goes to new tab
url = "http://"+ip+":{0}".format(port)
serverAddress = (ip,port)
HandlerClass.protocol = Protocol
httpd = ServerClass(serverAddress, HandlerClass)
sa = httpd.socket.getsockname()
webbrowser.open(url,new=new)
httpd.serve_forever()
Ok, the problem is as serve_forever is called, it can be expected to serve forever. Is there a way to kill the server after browser is closed?
Edit: I understand many people recommend using threads but I can't find a way to detect that the browser has closed or killing the thread in system monitor (I'm on Ubuntu) while testing.
Edit2: ok, I've read webbrowser.py, it doesn't seem to return any sort of process identifier...
Edit3: I'm reflecting on this, maybe the correct approach would be checking if someone is accessing the server, and if not, then kill it... This way I can detect if the tab was closed... Problem is the only way I can think uses a dummy page with this power that loads whatever page to test inside, which seems too hackish...
It seems if I can find a way of doing this, maybe through error responses...I can do a webserver in a subprocess that has a while and exits by itself like the one here: https://docs.python.org/2/library/basehttpserver.html#more-examples
import sys
#from threading import Thread
import webbrowser
import BaseHTTPServer
import SimpleHTTPServer
serverClass=BaseHTTPServer.HTTPServer
handlerClass=SimpleHTTPServer.SimpleHTTPRequestHandler
Protocol = "HTTP/1.0"
port = 8080
ip = '127.0.0.1'
new = 2 #2 goes to new tab, 0 same and 1 window.
url = "http://"+ip+":{0}".format(port)
handlerClass.protocol = Protocol
httpd = serverClass((ip,port), handlerClass)
sa = httpd.socket.getsockname()
print("\n---\nServing HTTP on {0}, port {1}\n---\n".format(sa[0],sa[1]) )
browserOk = webbrowser.open(url,new=new)
def runWhileTrue():
while True:
#print(vars(httpd))
httpd.handle_request()
runWhileTrue()
Right now I'm thinking about using a timer like a watchdog, if the server is not used more then a period, it get's killed... But I think this is an awful solution... I wanted the browser to ping for it for some time while the tab is opened...maybe, don't know if optimal, looking this code right now : SimpleHTTPServer and SocketServer .
Thinking maybe if the server could understand a message from the website it could break loop. The tab closure could be detected in javascript like here : Browser/tab close detection using javascript (or any other language). Don't know how to communicate this to the server.
EditFinal:
In the javascript code of the webpage, I've inserted:
window.addEventListener('unload', function (e) { e.preventDefault(); jsonLevelGet("http://127.0.0.1:8081/exit.json"); }, false);
Then, the python code is this server.py:
import sys
from threading import Thread
import webbrowser
import BaseHTTPServer
import SimpleHTTPServer
serverClass=BaseHTTPServer.HTTPServer
handlerClass=SimpleHTTPServer.SimpleHTTPRequestHandler
Protocol = "HTTP/1.0"
port = 8080
ip = '127.0.0.1'
admIp = ip
admPort = 8081
new = 2 #2 goes to new tab, 0 same and 1 window.
url = "http://"+ip+":{0}".format(port)
handlerClass.protocol = Protocol
httpdGame = serverClass((ip,port), handlerClass)
httpdAdm = serverClass((admIp,admPort), handlerClass)
sa = httpdGame.socket.getsockname()
sb = httpdAdm.socket.getsockname()
print("\n---\nServing HTTP on {0}, port {1}\n---\n".format(sa[0],sa[1]) )
print("\n---\nAdm HTTP listening on {0}, port {1}\n---\n".format(sb[0],sb[1]) )
browserOk = webbrowser.open(url,new=new)
def runGameServer():
httpdGame.serve_forever()
print("\nrunGameServer stopped\n")
httpdAdm.shutdown()
return
def runAdmServer():
httpdAdm.handle_request()
httpdGame.shutdown()
print("\nrunAdmServer stopped\n")
return
gameServerThread = Thread(target=runGameServer)
gameServerThread.daemon = True
admServerThread = Thread(target=runAdmServer)
gameServerThread.start()
admServerThread.start()
admServerThread.join()
It works! When the tab is closed, the server.py code exits! Thanks #st.never!
As you said, you could detect (in Javascript, in the browser) that the window is being closed, and send one last request to the server to shut it down.
If you don't want to inspect all the requests searching for the "poweroff request", you can instead have your server listen on two different ports (probably on different threads). For example, the "main" server listens on port 8080 with the current behaviour, and a separate instance listens on port 8081. Then you can simply shut down the server whenever any request reaches port 8081.
I want to display my CPU usage dynamically. I don't want to reload the page to see a new value. I know how to get the CPU usage in Python. Right now I render a template with the value. How can I continually update a page with a value from Flask?
#app.route('/show_cpu')
def show_cpu():
cpu = getCpuLoad()
return render_template('show_cpu.html', cpu=cpu)
Using an Ajax request
Python
#app.route('/_stuff', methods= ['GET'])
def stuff():
cpu=round(getCpuLoad())
ram=round(getVmem())
disk=round(getDisk())
return jsonify(cpu=cpu, ram=ram, disk=disk)
Javascript
function update_values() {
$SCRIPT_ROOT = {{ request.script_root|tojson|safe }};
$.getJSON($SCRIPT_ROOT+"/_stuff",
function(data) {
$("#cpuload").text(data.cpu+" %")
$("#ram").text(data.ram+" %")
$("#disk").text(data.disk+" %")
});
}
Using Websockets
project/app/views/request/websockets.py
# -*- coding: utf-8 -*-
# OS Imports
import json
# Local Imports
from app import sockets
from app.functions import get_cpu_load, get_disk_usage, get_vmem
#sockets.route('/_socket_system')
def socket_system(ws):
"""
Returns the system informations, JSON Format
CPU, RAM, and Disk Usage
"""
while True:
message = ws.receive()
if message == "update":
cpu = round(get_cpu_load())
ram = round(get_vmem())
disk = round(get_disk_usage())
ws.send(json.dumps(dict(received=message, cpu=cpu, ram=ram, disk=disk)))
else:
ws.send(json.dumps(dict(received=message)))
project/app/__init__.py
# -*- coding: utf-8 -*-
from flask import Flask
from flask_sockets import Sockets
app = Flask(__name__)
sockets = Sockets(app)
app.config.from_object('config')
from app import views
Using Flask-Websockets made my life a lot easier. Here is the launcher :
launchwithsockets.sh
#!/bin/sh
gunicorn -k flask_sockets.worker app:app
Finally, here is the client code :
custom.js
The code is a bit too long, so here it is.
Note that I'm NOT using things like socket.io, that's why the code is long. This code also tries to reconnect to the server periodically, and can stop trying to reconnect on a user action. I use the Messenger lib to notify the user that something went wrong. Of course it's a bit more complicated than using socket.io but I really enjoyed coding the client side.