Accepting form data from HTML in Python - javascript

I am trying to write a simple web app as a school assignment. Basically, the app uses the twitter stream API to pull tweets, and then I'm going to graph the results using Google Charts.
I have written a python script that is able to make the connection to twitter and pull the tweets in real time, but I want to be able to define the search criteria based off of user input on an HTML page. (I don't have it written yet, but I plan to have a list of radio buttons that a user can select from to choose the search option)
My question is how do I import data from an HTML document into my python script, and then return the data back to my HTML page.
I am very new to python and web programming in general, so over simplification would be nice!
This is the code in my python file:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
ckey = ''
csecret = ''
atoken = ''
asecret = ''
class listener(StreamListener):
def on_data(self,data):
try:
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time())+ '::' + tweet
print saveThis
saveFile = open ('twitDB.csv' , 'a')
saveFile.write (saveThis)
saveFile.write ('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata, ', str(e)
time.sleep(5)
def on_error(self,status):
print status
auth =OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["car"])

Related

What is the optimal way to show custom data from MSAccess DB at Wordpress site?

I need an advice from skilled Wordpress developers. My organization has internal MS Access database which contains numerous tables, reports and input forms. The structure of DB is not too complicated (persons information, events, third parties info and different relations between them). We woild like to show some portion of this info at our Wordpress site, which currently has only news section.
Because information in our DB updated each day, also we would like to make simple synchronization between MS Access DB and Wordpress (MySQL DB). Now I try to find the best way how to connect MS Access and Wordpress.
At present I see only these ways how to do this:
Make export requests and save to XML files.
Import to MySQL DB of Wordpress.
Show content on Wordpress site using Custom fields feature (or develop own plugin).
-OR-
Build own informational system on some very light PHP engine (for example CodeIgniter) on same domain as Wordpress site, which will actually show imported content.
These variants needs manual transfer info between DB each day. And I don't know possibilities of Wordpress to show custom data from DB. Would you suggest me what ways will you prefer to use in my case?
P.S. MS Access used is ver 2007+ (file .accdb). Name of fields, db's and content is on Russian language. In future we planning to add 2 new languages (English, Ukrainian). MS access DB also contains persons photos included.
---Updated info---
I was able to make semi-atomatic import/export operations using following technique:
Javascript library ACCESSdb (little bit modified for new DB format)
Internet Explorer 11 (for running ADODB ActiveX)
small VBS script for extracting attached files from MSAccess tables.
latest jQuery
Wordpress plugins for custom data (Advanced Custom Fields, Custom Post Type UI)
Wordpress Rest-API enabled (with plugins JSON Basic Authentication, ACF to REST API)
At first I've constructed data scheme at Wordpress site using custom post and custom fields technique. Then I locally run JS queries to MSAccess DB, received info I sending via jQuery to WP Rest-API endpoints. Whole transfer operation can be made with in 1 click.
But I can't upload files automatically via JS due to security limitations. This can be done in +1 click per file.
Your question is too broad.
It consist of two parts: 1. export from Access and 2. import to Wordpress. Since i'm not familiar with Wordpress I can only give you advice about 1 part. At least google shows that there is some plugins that able to import from CSV like this one:
https://ru.wordpress.org/plugins/wp-ultimate-csv-importer/
You can create a scheduled task that runs Access that runs macro that runs VBA function as described here:
Running Microsoft Access as a Scheduled Task
In that VBA function you can use ADODB.Stream object to create a UTF-8 CSV file with you data and make upload to FTP of your site.
OR
Personally i use a python script to do something similar. I prefer this way because it is more straitforward and reliable. There is my code. Notice, that i have two FTP servers: one of them is for testing only.
# -*- coding: utf-8 -*-
# 2018-10-31
# 2018-11-28
import os
import csv
from time import sleep
from ftplib import FTP_TLS
from datetime import datetime as dt
import msaccess
FTP_REAL = {'FTP_SERVER':r'your.site.com',
'FTP_USER':r'username',
'FTP_PW':r'Pa$$word'
}
FTP_WIP = {'FTP_SERVER':r'192.168.0.1',
'FTP_USER':r'just_test',
'FTP_PW':r'just_test'
}
def ftp_upload(fullpath:str, ftp_folder:str, real:bool):
''' Upload file to FTP '''
try:
if real:
ftp_set = FTP_REAL
else:
ftp_set = FTP_WIP
with FTP_TLS(ftp_set['FTP_SERVER']) as ftp:
ftp.login(user=ftp_set['FTP_USER'], passwd=ftp_set['FTP_PW'])
ftp.prot_p()
# Passive mode off otherwise there will be problem
# with another upload attempt
# my site doesn't allow active mode :(
ftp.set_pasv(ftp_set['FTP_SERVER'].find('selcdn') > 0)
ftp.cwd(ftp_folder)
i = 0
while i < 3:
sleep(i * 5)
i += 1
try:
with open(fullpath, 'br') as f:
ftp.storbinary(cmd='STOR ' + os.path.basename(fullpath),
fp=f)
except OSError as e:
if e.errno != 0:
print(f'ftp.storbinary error:\n\t{repr(e)}')
except Exception as e:
print(f'ftp.storbinary exception:\n\t{repr(e)}')
filename = os.path.basename(fullpath)
# Check if uploaded file size matches local file:
# IDK why but single ftp.size command sometimes returns None,
# run this first:
ftp.size(filename)
#input(f'overwrite it: {filename}')
ftp_size = ftp.size(os.path.basename(fullpath))
# import pdb; pdb.set_trace()
if ftp_size != None:
if ftp_size == os.stat(fullpath).st_size:
print(f'File \'{filename}\' successfully uploaded')
break
else:
print('Transfer failed')
# input('Press enter for another try...')
except OSError as e:
if e.errno != 0:
return False, repr(e)
except Exception as e:
return False, repr(e)
return True, None
def make_file(content:str):
''' Make CSV file in temp directory and return True and fullpath '''
fullpath = os.environ['tmp'] + f'\\{dt.now():%Y%m%d%H%M}.csv'
try:
with open(fullpath, 'wt', newline='', encoding='utf-8') as f:
try:
w = csv.writer(f, delimiter=';')
w.writerows(content)
except Exception as e:
return False, f'csv.writer fail:\n{repr(e)}'
except Exception as e:
return False, repr(e)
return True, fullpath
def query_upload(sql:str, real:bool, ftp_folder:str, no_del:bool=False):
''' Run query and upload to FTP '''
print(f'Real DB: {real}')
status, data = msaccess.run_query(sql, real=real, headers=False)
rec_num = len(data)
if not status:
print(f'run_query error:\n\t{data}')
return False, data
status, data = make_file(data)
if not status:
print(f'make_file error:\n\t{data}')
return False, data
fi = data
status, data = ftp_upload(fi, ftp_folder, real)
if not status:
print(f'ftp_upload error:\n\t{data}')
return False, data
print(f'Done: {rec_num} records')
if no_del: input('\n\nPress Enter to exit and delete file')
os.remove(fi)
return True, rec_num

Python: Javascript rendered webpage not parsing

I want to parse the information in the following url. I want to parse the Name of the trade, the strategy description and the transactions in the "Trading History" and "Open Positions". When I parse the page, I do not get this data.
I am new to parsing javascript rendered webpages so I would appreciate some explanation why my code below isn't working.
import bs4 as bs
import urllib
import dryscrape
import sys
import time
url = 'https://www.zulutrade.com/trader/314062/trading'
sess = dryscrape.Session()
sess.visit(url)
time.sleep(10)
sauce = sess.body()
soup = bs.BeautifulSoup(sauce, 'lxml')
Thanks!
Your link in the code doesn't allow you to get anything cause the original url you should play with is the one I'm pasting below .The one you tried to work with automatically gets redirected to the one I mentioned here.
https://www.zulutrade.com/zulutrade-client/traders/api/providers/314062/tradeHistory?
Scraping json data out of the table from that page is as follows:
import requests
r = requests.get('https://www.zulutrade.com/zulutrade-client/traders/api/providers/314062/tradeHistory?')
j = r.json()
items = j['content']
for item in items:
print(item['currency'],item['pips'],item['tradeType'],item['transactionCurrency'],item['id'])

Scrapy - making selections from dropdown(e.g.date) on webpage

I'm new to scrapy and python and i am trying to scrape data off the following start url .
After login, this is my start url--->
start_urls = ["http://www.flightstats.com/go/HistoricalFlightStatus/flightStatusByFlight.do?"]
(a) from there i need to interact with the webpage to select ---by-airport---
and then make ---airport, date, time period selection---
how can i do that? i would like to loop over all time periods and past dates..
I have used firebug to see the source, I cannot show here as I do not have enough points to post images..
i read a post mentioning the use of Splinter..
(b) after the selections it will lead me to a page where there are links to the eventual page with the information i want. How do i populate the links and make scrapy look into every one to extract the information?
-using rules? where should i insert the rules/ linkextractor function?
I am willing to try myself, hope help can be given to find posts that can guide me.. I am a student and I have spent more than a week on this.. I have done the scrapy tutorial, python tutorial, read the scrapy documentation and searched for previous posts in stackoverflow but I did not manage to find posts that cover this.
a million thanks.
my code so far to log-in and the items to scrape via xpath from the eventual target site:
`import scrapy
from tutorial.items import FlightItem
from scrapy.http import FormRequest
class flightSpider(scrapy.Spider):
name = "flight"
allowed_domains = ["flightstats.com"]
login_page = 'https://www.flightstats.com/go/Login/login_input.do;jsessionid=0DD6083A334AADE3FD6923ACB8DDCAA2.web1:8009?'
start_urls = [
"http://www.flightstats.com/go/HistoricalFlightStatus/flightStatusByFlight.do?"]
def init_request(self):
#"""This function is called before crawling starts."""
return Request(url=self.login_page, callback=self.login)
def login(self, response):
#"""Generate a login request."""
return FormRequest.from_response(response,formdata= {'loginForm_email': 'marvxxxxxx#hotmail.com', 'password': 'xxxxxxxx'},callback=self.check_login_response)
def check_login_response(self, response):
#"""Check the response returned by a login request to see if we aresuccessfully logged in."""
if "Sign Out" in response.body:
self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
# Now the crawling can begin..
return self.initialized() # ****THIS LINE FIXED THE LAST PROBLEM*****
else:
self.log("\n\n\nFailed, Bad times :(\n\n\n")
# Something went wrong, we couldn't log in, so nothing happens.
def parse(self, response):
for sel in response.xpath('/html/body/div[2]/div[2]/div'):
item = flightstatsItem()
item['flight_number'] = sel.xpath('/div[1]/div[1]/h2').extract()
item['aircraft_make'] = sel.xpath('/div[4]/div[2]/div[2]/div[2]').extract()
item['dep_date'] = sel.xpath('/div[2]/div[1]/div').extract()
item['dep_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[1]').extract()
item['arr_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[2]').extract()
item['dep_gate_scheduled'] = sel.xpath('/div[2]/div[2]/div[1]/div[2]/div[2]').extract()
item['dep_gate_actual'] = sel.xpath('/div[2]/div[2]/div[1]/div[3]/div[2]').extract()
item['dep_runway_actual'] = sel.xpath('/div[2]/div[2]/div[2]/div[3]/div[2]').extract()
item['dep_terminal'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[1]').extract()
item['dep_gate'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[2]').extract()
item['arr_gate_scheduled'] = sel.xpath('/div[3]/div[2]/div[1]/div[2]/div[2]').extract()
item['arr_gate_actual'] = sel.xpath('/div[3]/div[2]/div[1]/div[3]/div[2]').extract()
item['arr_terminal'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[1]').extract()
item['arr_gate'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[2]').extract()
yield item`

Cannot get entire web page after query

I'm trying to scrape the historical NAVPS tables found on this page:
http://www.philequity.net/pefi_historicalnavps.php
All the code here are the contents of my minimal working script. So it starts with:
import urllib
import urllib2
from BeautifulSoup import BeautifulSoup
opener = urllib2.build_opener()
urllib2.install_opener(opener)
After studying the web page using Chrome's Inspect Element, I find that the Form Data sent are the following:
form_data = {}
form_data['mutualFund'] = '1'
form_data['year'] = '1995'
form_data['dmonth'] = 'Month'
form_data['dday'] = 'Day'
form_data['dyear'] = 'Year'
So I continue building up the request:
url = "http://www.philequity.net/pefi_historicalnavps.php"
params = urllib.urlencode(form_data)
request = urllib2.Request(url, params)
I expect this to be the equivalent of clicking "Get NAVPS" after filling in the form:
page = urllib2.urlopen(request)
Then I read it with BeautifulSoup:
soup = BeautifulSoup(page.read())
print soup.prettify()
But alas! I only get the web page as though I didn't click "Get NAVPS" :( Am I missing something? Is the server sending the table in a separate stream? How do I get to it?
When I look at the POST request in firebug, I see one more parameter that you aren't passing: "type" is "Year". I don't know if this will get the data for you, there's any number of other reasons it might not serve you the data.

Webscraping Google tasks via Google Calendar

As gmail and the task api is not available everywhere (eg: some companies block gmail but not calendar), is there a way to scrap google task through the calendar web interface ?
I did a userscript like the one below, but I find it too brittle :
// List of div to hide
idlist = [
'gbar',
'logo-container',
...
];
// Hiding by id
function displayNone(idlist) {
for each (id in idlist) {
document.getElementById(id).style.display = 'none';
}
}
The Google Tasks API is now available. You can get a list of your tasks via a HTTP query, result returned in JSON. There is a step by step example on how to write a Google Tasks webapp on Google App Engine at
http://code.google.com/appengine/articles/python/getting_started_with_tasks_api.html
The sample webapp looks like this:
from google.appengine.dist import use_library
use_library('django', '1.2')
from google.appengine.ext import webapp
from google.appengine.ext.webapp import template
from google.appengine.ext.webapp.util import run_wsgi_app
from apiclient.discovery import build
import httplib2
from oauth2client.appengine import OAuth2Decorator
import settings
decorator = OAuth2Decorator(client_id=settings.CLIENT_ID,
client_secret=settings.CLIENT_SECRET,
scope=settings.SCOPE,
user_agent='mytasks')
class MainHandler(webapp.RequestHandler):
#decorator.oauth_aware
def get(self):
if decorator.has_credentials():
service = build('tasks', 'v1', http=decorator.http())
result = service.tasks().list(tasklist='#default').execute()
tasks = result.get('items', [])
for task in tasks:
task['title_short'] = truncate(task['title'], 26)
self.response.out.write(template.render('templates/index.html',
{'tasks': tasks}))
else:
url = decorator.authorize_url()
self.response.out.write(template.render('templates/index.html',
{'tasks': [],
'authorize_url': url}))
def truncate(string, length):
return string[:length] + '...' if len(string) > length else string
application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
def main():
run_wsgi_app(application)
Note that first you need to enable Google Tasks API at the API Console https://code.google.com/apis/console/b/0/?pli=1
I would suggest parsing the Atom feed of the calendars you wish to see. You can get the feeds of individual calendars by selecting the Options Gear > Calendar Settings, then choosing the Calendars tab, and selecting the calendar you want. From the Calendar Details screen, you can get an Atom (XML) feed, an iCal feed, or an HTML/Javascript calendar.

Categories

Resources