I am programming in python a script to obtain statistical data of the public schools of the city in which I live. With the following code I get the source code of a page that shows, by pages, the first 100 results of a total of 247 schools:
import requests
url = "http://www.madrid.org/wpad_pub/run/j/BusquedaAvanzada.icm"
post_call = {'Public title': 'S', 'cdMuni': '079', 'cdNivelEdu': '6545'}
r = requests.post(url, data = post_call)
The page can be viewed here.
On that page there is a button that activates a javascript function to download a csv file with all 247 results. I was thinking of using Selenium to download this file, but I have seen, using Tamper Data, that when the button is pressed a POST call occurs, in which the parameter codCentrosExp is sent with the codes of the 247 colleges. The parameter looks like this:
CodCentrosExp = 28077877%3B28077865%3B28063751%3B28018392%3B28018393%...(thus up to the 247 codes)
This makes my work easier, since I do not have to download the csv file, open it, select the code column, etc. And I could do it with Tamper Data, but my question is: how can I get that parameter with my Python script, without having to use Tamper Data?
I finally found the parameter in the page's source code, and extracted them as follows:
schools = BeautifulSoup(r.content, "lxml")
school_codes = schools.findAll(Attrs = {"name": "codCentrosExp", "value": re.compile("^.+$")})[0]["value"]
school_codes = school_codes.split(";")
Anyway, if anyone knows how to respond to the original question, I would be grateful to know how it could be done.
Related
I'm referencing this url: https://tracker.icon.foundation/block/29562412
If you scroll down to "Transactions", it shows 2 transactions with separate links, that's essentially what I'm trying to grab. If I try a simple pd.read_csv(url) command, it clearly omits the data I'm looking for, so I thought it might be JavaScript based and tried the following code instead:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://tracker.icon.foundation/block/29562412')
r.html.links
r.html.absolute_links
and I get the result "set()"
even though I was expecting the following:
['https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632', 'https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e']
Is JavaScript even the right approach? I tried BeautifulSoup instead and found no cigar on that end as well.
You're right. This page is populated asynchronously using JavaScript, so BeautifulSoup and similar tools won't be able to see the specific content you're trying to scrape.
However, if you log your browser's network traffic, you can see some (XHR) HTTP GET requests being made to a REST API, which serves its results in JSON. This JSON happens to contain the information you're looking for. It actually makes several such requests to various API endpoints, but the one we're interested in is called txList (short for "transaction list" I'm guessing):
def main():
import requests
url = "https://tracker.icon.foundation/v3/block/txList"
params = {
"height": "29562412",
"page": "1",
"count": "10"
}
response = requests.get(url, params=params)
response.raise_for_status()
base_url = "https://tracker.icon.foundation/transaction/"
for transaction in response.json()["data"]:
print(base_url + transaction["txHash"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
https://tracker.icon.foundation/transaction/0x9e5927c83efaa654008667d15b0a223f806c25d4c31688c5fdf34936a075d632
https://tracker.icon.foundation/transaction/0xd64f88fe865e756ac805ca87129bc287e450bb156af4a256fa54426b0e0e6a3e
>>>
I need an advice from skilled Wordpress developers. My organization has internal MS Access database which contains numerous tables, reports and input forms. The structure of DB is not too complicated (persons information, events, third parties info and different relations between them). We woild like to show some portion of this info at our Wordpress site, which currently has only news section.
Because information in our DB updated each day, also we would like to make simple synchronization between MS Access DB and Wordpress (MySQL DB). Now I try to find the best way how to connect MS Access and Wordpress.
At present I see only these ways how to do this:
Make export requests and save to XML files.
Import to MySQL DB of Wordpress.
Show content on Wordpress site using Custom fields feature (or develop own plugin).
-OR-
Build own informational system on some very light PHP engine (for example CodeIgniter) on same domain as Wordpress site, which will actually show imported content.
These variants needs manual transfer info between DB each day. And I don't know possibilities of Wordpress to show custom data from DB. Would you suggest me what ways will you prefer to use in my case?
P.S. MS Access used is ver 2007+ (file .accdb). Name of fields, db's and content is on Russian language. In future we planning to add 2 new languages (English, Ukrainian). MS access DB also contains persons photos included.
---Updated info---
I was able to make semi-atomatic import/export operations using following technique:
Javascript library ACCESSdb (little bit modified for new DB format)
Internet Explorer 11 (for running ADODB ActiveX)
small VBS script for extracting attached files from MSAccess tables.
latest jQuery
Wordpress plugins for custom data (Advanced Custom Fields, Custom Post Type UI)
Wordpress Rest-API enabled (with plugins JSON Basic Authentication, ACF to REST API)
At first I've constructed data scheme at Wordpress site using custom post and custom fields technique. Then I locally run JS queries to MSAccess DB, received info I sending via jQuery to WP Rest-API endpoints. Whole transfer operation can be made with in 1 click.
But I can't upload files automatically via JS due to security limitations. This can be done in +1 click per file.
Your question is too broad.
It consist of two parts: 1. export from Access and 2. import to Wordpress. Since i'm not familiar with Wordpress I can only give you advice about 1 part. At least google shows that there is some plugins that able to import from CSV like this one:
https://ru.wordpress.org/plugins/wp-ultimate-csv-importer/
You can create a scheduled task that runs Access that runs macro that runs VBA function as described here:
Running Microsoft Access as a Scheduled Task
In that VBA function you can use ADODB.Stream object to create a UTF-8 CSV file with you data and make upload to FTP of your site.
OR
Personally i use a python script to do something similar. I prefer this way because it is more straitforward and reliable. There is my code. Notice, that i have two FTP servers: one of them is for testing only.
# -*- coding: utf-8 -*-
# 2018-10-31
# 2018-11-28
import os
import csv
from time import sleep
from ftplib import FTP_TLS
from datetime import datetime as dt
import msaccess
FTP_REAL = {'FTP_SERVER':r'your.site.com',
'FTP_USER':r'username',
'FTP_PW':r'Pa$$word'
}
FTP_WIP = {'FTP_SERVER':r'192.168.0.1',
'FTP_USER':r'just_test',
'FTP_PW':r'just_test'
}
def ftp_upload(fullpath:str, ftp_folder:str, real:bool):
''' Upload file to FTP '''
try:
if real:
ftp_set = FTP_REAL
else:
ftp_set = FTP_WIP
with FTP_TLS(ftp_set['FTP_SERVER']) as ftp:
ftp.login(user=ftp_set['FTP_USER'], passwd=ftp_set['FTP_PW'])
ftp.prot_p()
# Passive mode off otherwise there will be problem
# with another upload attempt
# my site doesn't allow active mode :(
ftp.set_pasv(ftp_set['FTP_SERVER'].find('selcdn') > 0)
ftp.cwd(ftp_folder)
i = 0
while i < 3:
sleep(i * 5)
i += 1
try:
with open(fullpath, 'br') as f:
ftp.storbinary(cmd='STOR ' + os.path.basename(fullpath),
fp=f)
except OSError as e:
if e.errno != 0:
print(f'ftp.storbinary error:\n\t{repr(e)}')
except Exception as e:
print(f'ftp.storbinary exception:\n\t{repr(e)}')
filename = os.path.basename(fullpath)
# Check if uploaded file size matches local file:
# IDK why but single ftp.size command sometimes returns None,
# run this first:
ftp.size(filename)
#input(f'overwrite it: {filename}')
ftp_size = ftp.size(os.path.basename(fullpath))
# import pdb; pdb.set_trace()
if ftp_size != None:
if ftp_size == os.stat(fullpath).st_size:
print(f'File \'{filename}\' successfully uploaded')
break
else:
print('Transfer failed')
# input('Press enter for another try...')
except OSError as e:
if e.errno != 0:
return False, repr(e)
except Exception as e:
return False, repr(e)
return True, None
def make_file(content:str):
''' Make CSV file in temp directory and return True and fullpath '''
fullpath = os.environ['tmp'] + f'\\{dt.now():%Y%m%d%H%M}.csv'
try:
with open(fullpath, 'wt', newline='', encoding='utf-8') as f:
try:
w = csv.writer(f, delimiter=';')
w.writerows(content)
except Exception as e:
return False, f'csv.writer fail:\n{repr(e)}'
except Exception as e:
return False, repr(e)
return True, fullpath
def query_upload(sql:str, real:bool, ftp_folder:str, no_del:bool=False):
''' Run query and upload to FTP '''
print(f'Real DB: {real}')
status, data = msaccess.run_query(sql, real=real, headers=False)
rec_num = len(data)
if not status:
print(f'run_query error:\n\t{data}')
return False, data
status, data = make_file(data)
if not status:
print(f'make_file error:\n\t{data}')
return False, data
fi = data
status, data = ftp_upload(fi, ftp_folder, real)
if not status:
print(f'ftp_upload error:\n\t{data}')
return False, data
print(f'Done: {rec_num} records')
if no_del: input('\n\nPress Enter to exit and delete file')
os.remove(fi)
return True, rec_num
Hi I am building a web app with Flask Python. I got a problem here:
#app.route('/analytics/signals/<ticker_url>')
def analytics_signals_com_page(ticker_url):
all_ticker = full_list
ticker_name = com_name
ticker = ticker_url.upper()
pricerec = sp500[ticker_url.upper()].tolist()
timerec = sp500[ticker_url.upper()].index.tolist()
return render_template('company.html', all_ticker=all_ticker, ticker_name=ticker_name, ticker=ticker, pricerec=pricerec, timerec=timerec)
Here I am defining company pages based on the a page will contain different content. The problem is that everything is fine upto ticker = ticker_url.upper(). It works perfectly fine. But for pricerec and timerec, they make problems.
sp500 is a pandas DataFrame columns being companies like "AAPL", "GOOG","MSFT", and so forth 505 companies and the index are timestamps, and values are the prices at each time.
So what I am doing for the pricerec, I am taking the ticker_url and use it to take the specific company's price and make it as a list. And timerec is to take the index (timestamps) and make it as a list. And I am passing these two variables into the company.html page.
But it makes internal server error. I do not know why it happens.
My expectation was that when a user click a button that href to "~/analytics/signals/aapl" then the company.html page will contain the pricerec and timerec for me to draw a graph. But it didn't work like that. It makes internal server error. I defined those two variables in the javascript also like I did for the other variables(all_ticker, ticker_name, and ticker)
Can anyone help me with this issue?
Thanks!
INTRODUCTION
I am working on personal project and using Symfony3.
In order to upload files i am using OneUpUploaderBundle.
And I am not accepting file name that consists of characters with accents, Cyrillic characters, etc.
In order to do so - I am using function from CODE section
TARGET
I would like to use PHP function in CODE section in JavaScript!
CODE
// transliterate text
public function transliterateText($input_text)
{
$input_russian = transliterator_transliterate('Russian-Latin/BGN', $input_text);
$input_german_french = transliterator_transliterate('Any-Latin; Latin-ASCII', $input_russian);
$input_baltic = iconv('UTF-8', 'ASCII//TRANSLIT', $input_german_french);
$transliterated_text = preg_replace('/[^a-zA-Z_0-9\(\)\n]/', '_', $input_baltic );
$transliterated_text = strtolower($transliterated_text);
return $transliterated_text;
}
EXAMPLE
input: "12345 Rūķīši Проверка äöüß àâæçéèêëïîôœùûüÿ.txt"
output: "12345_rukisi_proverka_aouss_aaaeceeeeiiooeuuuy.txt"
QUESTION
I did not found many information about this problem on the Internet.
May be it is not a good idea to use JavaScript for this task...
Or maybe I should create service in Symfony3, that is accessible through AJAX and returns transliterated text instead?
CONCLUSION
Please advise.
Thank You for your time and knowledge.
UPDATE
I would like to use this function in JavaScript in order to show user what the filename would look like when on the server. (File names are going to be transliterated on server anyway). At the moment I am sending (in the UploadListener) following information for each file [{'error':'none'}{'orig':'my file name.txt'}{'t13n':'my_file_name.txt'}]. I would like to send as little as possible informātion from server to browser. So if there was "translation" of the CODE I would need only to send error for each file...
Ive been using an Internet Explorer automation script found here:
http://www.pvle.be/2009/06/web-ui-automationtest-using-powershell/
That lets me easily post form data using commands (functions) like this:
NavigateTo "http://www.websiteURI/"
SetElementValueByName "q" "powershell variable scope"
SetElementValueByName "num" "30"
SetElementValueByName "lr" "lang_en"
ClickElementById "sb_form_go"
The above would let me post values to elements and click to submit the form.
I would like to do the equivalent with Powershell's web client using helper functions. I haven't found such a script. The closest I could find was The Scripting Guys, Send-WebRequest:
http://gallery.technet.microsoft.com/scriptcenter/7e7b6bf2-d067-48c3-96b3-b38f26a1d143
which I'm not even sure it does what I expect (since there's no working examples showing how to do what I want).
Anyway, I'd really appreciate some help to get me started to do the equivalent of what I showed up there with working examples (as simple as possible). A bonus would be to also be able to get a list of element names for a URI in order to know what form elements I want to submit.
PS: I also need to be able to specify user-agent and credentials; so, examples with these included would be ideal.
Have you taken a look at the Invoke-WebRequest commmand? (requires powershell 3.0 or above) I believe the following would work for submitting the data
#POSTing data
Invoke-WebRequest http://www.websiteURI/ `
-UserAgent 'My User Agent' `
-Credential $cred `
-Method Post `
-Body #{
q = 'powershell variable scope'
num = 30
lr = 'lang_en'
}
For your bonus, the result of Invoke-WebRequest contains a collection of the InputFields on the page, which you can use to get a list of form elements to set.
#List input elements
Invoke-WebRequest http://www.websiteURI/ | select -ExpandProperty InputFields