How to export an Array to PDF using jsPDF? - javascript

first time using jsPDF.
I'm trying to export some data via PDF, however when I try to export the array it gives me an error. I have ASSUMED you export arrays using doc.table because I can't get to find any documentation, there's a tag for it in stackoverflow for questions but didn't find anyone with the same question neither.
This is what I have so far
const generatePDF = () => {
var doc = new jsPDF('landscape', 'px', 'a4', 'false');
doc.addImage(AIBLogo, 'PNG', 250,10,100,100)
doc.setFont('Helvertica', "bold")
doc.text(60,150, "Informacion del Pedido");
doc.setFont('Helvertica', "Normal")
doc.text(60,170, "Estado del Pago: "+estadoDePago)
doc.text(60,190, "Estado de Disponibilidad: "+estadoDeDisponibilidad)
doc.text(60,210, "Total del Pedido: "+total)
doc.text(60,230, "Total Pagado: "+totalPagado)
doc.setFont('Helvertica', "bold")
doc.text(360,150, "Informacion del Estudiante");
doc.setFont('Helvertica', "Normal")
doc.text(360,170, "Nombre del Estudiante: "+nombre)
doc.text(360,190, "Escuela: "+escuela)
doc.text(360,210, "Estado del Pago: "+grado)
doc.text(360,240, "Direccion de Entrega: Retirar en institución")
doc.table(100,100, librosData)
doc.save(id+".pdf")
}
Excluding the table bit it prints out like this:
I would like to add the DataTable after all that info and make new pages if is out of bounds cause table can be up to 20 items.
UPDATE
I have try the following but is not working neither I got it from here: StackOverflow Answer
var col = ["Descripcion", "Tipo", "Editorial","Precio"]
row = []
for (let i = 0; i < librosData.length; i++){
var temp = [librosData[i].descripcion, librosData[i].tipo, librosData[i].editorial, librosData[i].precio];
row.push(temp)
}
doc.autoTable(col,row, {startY:380});
doc.save(id+".pdf")
I know all the data is coming correctly this is how the row is printing:
Any help or documentation is appreciate it.
Final Update
To fix the issue that says autotable is not a function just
Solution: StackOverflow

In the end to make the table I recommend this:
To do the table: How to export a table with jsPDF
To fix the compatibility: autoTable is not a function
To fix "deprecated autoTable initiation":
Initialize your autoTable the following way
import autoTable from 'jspdf-autotable'
autoTable(doc, {
head: [col],
body: row
})

Related

How can table_name be a property in my post query?

I want this query to be generic on the table_name meaning that there is in my JSON file a new property named "device" which indicates in which table the data will be inserted.
the problem is that in my SQL request I can't specify it. Here is what I tried:
INSERT INTO ${device} (adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z, temperature, spo2, pa_diastolique, pa_systolique, indice_confiance, received_on, bpm)' +
'values(${adc_v}, ${adc_i}, ${acc_axe_x}, ${acc_axe_y}, ${acc_axe_z}, ${temperature}, ${spo2}, ${pa_diastolique}, ${pa_systolique}, ${indice_confiance}, ${received_on}, ${bpm})'
here is my JSON on postman:
{
"device": "tag_7z8eq73",
"adc_v": 130,
"adc_i": {{RandomCourant}},
"acc_axe_x": {{RandomAccX}},
"acc_axe_y": {{RandomAccY}},
"acc_axe_z": {{RandomAccZ}},
"temperature": {{RandomTemp}},
"spo2": {{RandomSpo2}},
"pa_diastolique": {{RandomDias}},
"pa_systolique": {{RandomSys}},
"indice_confiance": {{RandomIndiceConf}},
"received_on": "{{$isoTimestamp}}",
"bpm": {{RandomBpm}}}
The table name is : tag_7z8eq73
here is the error that is returned to me:
error: erreur de syntaxe sur ou près de « 'tag_7z8eq73' »
Looks like I am close to the solution but there is a syntax problem, the quote ? is my way the right one?
const device = req.body.device;
console.log(device)
return db.none('INSERT INTO '+device+' (adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z, temperature, spo2, pa_diastolique, pa_systolique, indice_confiance, received_on, bpm)' +
'values(${adc_v}, ${adc_i}, ${acc_axe_x}, ${acc_axe_y}, ${acc_axe_z}, ${temperature}, ${spo2}, ${pa_diastolique}, ${pa_systolique}, ${indice_confiance}, ${received_on}, ${bpm})',
req.body)
try this

How to create language selection wrapper from a gist script?

I have a Gist file written in different languages all do the same thing.
So, I would like to create a language select option similar to Google docs documentation.
Is it possible to create such a wrapper class that accepts a Gist script tag and display as above?
As in embed single file, I tried different query command like <script src="https://gist.github.com/gistid.js?language=python">, but none of them work.
This is the processing code that I ended up with.
With some CSS + javascript hide and toggle logic, it would work like google docs documentation.
I'd appreciate it if anyone updates this answer, with css or js.
import requests
from bs4 import BeautifulSoup
def render_gist_by_file(gist_id):
result = requests.get(f'https://gist.github.com/{gist_id}.js', headers=git_credential)
if result.text.startswith("<!DOCTYPE html>"):
return None
result = result.text
result = result.replace("\\/", "/").replace("\\&", "&").replace("\\$", "$").replace("\\<", "<").replace("\\`", "`").replace("\\n", "\n").replace('\\"', '"')
result = html.unescape(result)
result = result.split("document.write('")[-1][:-3]
bs = BeautifulSoup(result, "html.parser")
for tag in bs.find_all(class_="gist"):
file_box = tag.find(class_="file-box")
root = tag.find(class_="file-box")
toggle_div = bs.new_tag('div', attrs={"class": "gist-meta"})
for i, d in enumerate(tag.find_all(class_="file")):
d["class"] = f"file gist-toggle gist-id-{gist_id}"
if i != 0:
file_box.append(d) # combine to first table
for d in tag.find_all(class_="gist-meta"):
siblings = list(d.next_elements)
file_id, file_name = siblings[4].attrs["href"].split("#")[-1], siblings[5]
gist.file_names.append(file_name)
toggle_a = bs.new_tag('a', attrs={"id": file_id, "class": f"gist-toggle gist-id-{gist_id}", "onclick": f"toggle('gist-id-{gist_id}', '{file_id}')", "style": "padding: 0 18px"})
toggle_a.append(file_name)
toggle_div.append(toggle_a)
d.extract() # remove bottom nav
root.insert(0, toggle_div)
for d in islice(tag.find_all(class_="gist-file"), 1, None):
d.extract() # remove except first
gist.html = str(bs)
return gist

How to search an specific button in a grid using webdriverIO

I'm using an ExtJS grid panel. This grid has more than 20 rows of info and I want to search in each row for an icon that represents active mode, using WebdriverIO as a test driver.
How can I search in each row till the test driver finds the first active icon? (Note: the grid I'm testing is hosted on alegra.com).
Consider the following HTML print-screen:
It's hard to know exactly how to do it without knowing which locator you are using but if you first get a list of the rows and then filter them and grab the first match it should do what you are looking for.
public static get rows() { return browser.elements('#someTableId > table > tbody > tr'); }
public static getFirstMatch() {
return this.rows.value.filter((row: WebdriverIO.Element) =>
browser.elementIdElement(row.ELEMENT, 'someLocator').value)[0];
}
This is how i did it
var icon_type = 'delete';
it('Se elimina una factura', function(){
//rellenamos el array de elementos del grid
var elements = browser.getAttribute('#gridInvoices #gridview-1047-table tbody tr .action-icons img:nth-child(7)','class');
//Busca la primera coincidencia en el array con type
var row_num = elements.indexOf(icon_type);
if(row_num === -1){
throw new Error('No se encontraron botones activos del tipo "Eliminar" en todo el grid' )
}else{
$$('#gridInvoices #gridview-1047-table tbody tr')[row_num].click('.action-icons [title="Eliminar"]');
browser.click('=Sí')
};
});

Scrapy crawling not working on ASPX website

I'm scraping the Madrid Assembly's website, built in aspx, and I have no idea how to simulate clicks on the links where I need to get the corresponding politicians from. I tried this:
import scrapy
class AsambleaMadrid(scrapy.Spider):
name = "Asamblea_Madrid"
start_urls = ['http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx']
def parse(self, response):
for id in response.css('div#moduloBusqueda div.sangria div.sangria ul li a::attr(id)'):
target = id.extract()
url = "http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx"
formdata= {'__EVENTTARGET': target,
'__VIEWSTATE': '/wEPDwUBMA9kFgJmD2QWAgIBD2QWBAIBD2QWAgIGD2QWAmYPZBYCAgMPZBYCAgMPFgIeE1ByZXZpb3VzQ29udHJvbE1vZGULKYgBTWljcm9zb2Z0LlNoYXJlUG9pbnQuV2ViQ29udHJvbHMuU1BDb250cm9sTW9kZSwgTWljcm9zb2Z0LlNoYXJlUG9pbnQsIFZlcnNpb249MTQuMC4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49NzFlOWJjZTExMWU5NDI5YwFkAgMPZBYMAgMPZBYGBSZnXzM2ZWEwMzEwXzg5M2RfNGExOV85ZWQxXzg4YTEzM2QwNjQyMw9kFgJmD2QWAgIBDxYCHgtfIUl0ZW1Db3VudAIEFghmD2QWAgIBDw8WBB4PQ29tbWFuZEFyZ3VtZW50BTRHcnVwbyBQYXJsYW1lbnRhcmlvIFBvcHVsYXIgZGUgbGEgQXNhbWJsZWEgZGUgTWFkcmlkHgRUZXh0BTRHcnVwbyBQYXJsYW1lbnRhcmlvIFBvcHVsYXIgZGUgbGEgQXNhbWJsZWEgZGUgTWFkcmlkZGQCAQ9kFgICAQ8PFgQfAgUeR3J1cG8gUGFybGFtZW50YXJpbyBTb2NpYWxpc3RhHwMFHkdydXBvIFBhcmxhbWVudGFyaW8gU29jaWFsaXN0YWRkAgIPZBYCAgEPDxYEHwIFL0dydXBvIFBhcmxhbWVudGFyaW8gUG9kZW1vcyBDb211bmlkYWQgZGUgTWFkcmlkHwMFL0dydXBvIFBhcmxhbWVudGFyaW8gUG9kZW1vcyBDb211bmlkYWQgZGUgTWFkcmlkZGQCAw9kFgICAQ8PFgQfAgUhR3J1cG8gUGFybGFtZW50YXJpbyBkZSBDaXVkYWRhbm9zHwMFIUdydXBvIFBhcmxhbWVudGFyaW8gZGUgQ2l1ZGFkYW5vc2RkBSZnX2MxNTFkMGIxXzY2YWZfNDhjY185MWM3X2JlOGUxMTZkN2Q1Mg9kFgRmDxYCHgdWaXNpYmxlaGQCAQ8WAh8EaGQFJmdfZTBmYWViMTVfOGI3Nl80MjgyX2ExYjFfNTI3ZDIwNjk1ODY2D2QWBGYPFgIfBGhkAgEPFgIfBGhkAhEPZBYCAgEPZBYEZg9kFgICAQ8WAh8EaBYCZg9kFgQCAg9kFgQCAQ8WAh8EaGQCAw8WCB4TQ2xpZW50T25DbGlja1NjcmlwdAW7AWphdmFTY3JpcHQ6Q29yZUludm9rZSgnVGFrZU9mZmxpbmVUb0NsaWVudFJlYWwnLDEsIDEsICdodHRwOlx1MDAyZlx1MDAyZnd3dy5hc2FtYmxlYW1hZHJpZC5lc1x1MDAyZkVTXHUwMDJmUXVlRXNMYUFzYW1ibGVhXHUwMDJmQ29tcG9zaWNpb25kZWxhQXNhbWJsZWFcdTAwMmZMb3NEaXB1dGFkb3MnLCAtMSwgLTEsICcnLCAnJykeGENsaWVudE9uQ2xpY2tOYXZpZ2F0ZVVybGQeKENsaWVudE9uQ2xpY2tTY3JpcHRDb250YWluaW5nUHJlZml4ZWRVcmxkHgxIaWRkZW5TY3JpcHQFIVRha2VPZmZsaW5lRGlzYWJsZWQoMSwgMSwgLTEsIC0xKWQCAw8PFgoeCUFjY2Vzc0tleQUBLx4PQXJyb3dJbWFnZVdpZHRoAgUeEEFycm93SW1hZ2VIZWlnaHQCAx4RQXJyb3dJbWFnZU9mZnNldFhmHhFBcnJvd0ltYWdlT2Zmc2V0WQLrA2RkAgEPZBYCAgUPZBYCAgEPEBYCHwRoZBQrAQBkAhcPZBYIZg8PFgQfAwUPRW5nbGlzaCBWZXJzaW9uHgtOYXZpZ2F0ZVVybAVfL0VOL1F1ZUVzTGFBc2FtYmxlYS9Db21wb3NpY2lvbmRlbGFBc2FtYmxlYS9Mb3NEaXB1dGFkb3MvUGFnZXMvUmVsYWNpb25BbGZhYmV0aWNhRGlwdXRhZG9zLmFzcHhkZAICDw8WBB8DBQZQcmVuc2EfDgUyL0VTL0JpZW52ZW5pZGFQcmVuc2EvUGFnaW5hcy9CaWVudmVuaWRhUHJlbnNhLmFzcHhkZAIEDw8WBB8DBRpJZGVudGlmaWNhY2nDs24gZGUgVXN1YXJpbx8OBTQvRVMvQXJlYVVzdWFyaW9zL1BhZ2luYXMvSWRlbnRpZmljYWNpb25Vc3Vhcmlvcy5hc3B4ZGQCBg8PFgQfAwUGQ29ycmVvHw4FKGh0dHA6Ly9vdXRsb29rLmNvbS9vd2EvYXNhbWJsZWFtYWRyaWQuZXNkZAIlD2QWAgIDD2QWAgIBDxYCHwALKwQBZAI1D2QWAgIHD2QWAgIBDw8WAh8EaGQWAgIDD2QWAmYPZBYCAgMPZBYCAgUPDxYEHgZIZWlnaHQbAAAAAAAAeUABAAAAHgRfIVNCAoABZBYCAgEPPCsACQEADxYEHg1QYXRoU2VwYXJhdG9yBAgeDU5ldmVyRXhwYW5kZWRnZGQCSQ9kFgICAg9kFgICAQ9kFgICAw8WAh8ACysEAWQYAgVBY3RsMDAkUGxhY2VIb2xkZXJMZWZ0TmF2QmFyJFVJVmVyc2lvbmVkQ29udGVudDMkVjRRdWlja0xhdW5jaE1lbnUPD2QFKUNvbXBvc2ljacOzbiBkZSBsYSBBc2FtYmxlYVxMb3MgRGlwdXRhZG9zZAVHY3RsMDAkUGxhY2VIb2xkZXJUb3BOYXZCYXIkUGxhY2VIb2xkZXJIb3Jpem9udGFsTmF2JFRvcE5hdmlnYXRpb25NZW51VjQPD2QFGkluaWNpb1xRdcOpIGVzIGxhIEFzYW1ibGVhZJ',
'__EVENTVALIDATION': '/wEWCALIhqvYAwKh2YVvAuDF1KUDAqCK1bUOAqCKybkPAqCKnbQCAqCKsZEJAvejv84Dtkx5dCFr3QGqQD2wsFQh8nP3iq8',
'__VIEWSTATEGENERATOR': 'BAB98CB3',
'__REQUESTDIGEST': '0x476239970DCFDABDBBDF638A1F9B026BD43022A10D1D757B05F1071FF3104459B4666F96A47B4845D625BCB2BE0D88C6E150945E8F5D82C189B56A0DA4BC859D'}
yield scrapy.FormRequest(url=url, formdata= formdata, callback=self.takeEachParty)
def takeEachParty(self, response):
print response.css('ul.listadoVert02 ul li::text').extract()
Going into the source code of the website, I can see how links look like, and how they send the JavaScript query. This is one of the links I need to access:
<a id="ctl00_m_g_36ea0310_893d_4a19_9ed1_88a133d06423_ctl00_Repeater1_ctl00_lnk_Grupo" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("ctl00$m$g_36ea0310_893d_4a19_9ed1_88a133d06423$ctl00$Repeater1$ctl00$lnk_Grupo", "", true, "", "", false, true))">Grupo Parlamentario Popular de la Asamblea de Madrid</a>
I have been reading so many articles about, but probably the problem is my ignorance in respect.
Thanks in advance.
EDITED:
SOLUTION: I finally did it! Translating the very helpul code from Padraic Cunningham into Scrapy way. As I specified the issue for Scrapy, I want to post the result just in case someone has the same problem as I had.
So here it goes:
import scrapy
import js2xml
class AsambleaMadrid(scrapy.Spider):
name = "AsambleaMadrid"
start_urls = ['http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx']
def parse(self, response):
source = response
hrefs = response.xpath("//*[#id='moduloBusqueda']//div[#class='sangria']/ul/li/a/#href").extract()
form_data = self.validate(source)
for ref in hrefs:
# js2xml allows us to parse the JS function and params, and so to grab the __EVENTTARGET
js_xml = js2xml.parse(ref)
_id = js_xml.xpath(
"//identifier[#name='WebForm_PostBackOptions']/following-sibling::arguments/string[starts-with(.,'ctl')]")[0]
form_data["__EVENTTARGET"] = _id.text
url_diputado = 'http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx'
# The proper way to send a POST in scrapy is by using the FormRequest
yield scrapy.FormRequest(url=url_diputado, formdata=form_data, callback=self.extract_parties, method='POST')
def validate(self, source):
# these fields are the minimum required as cannot be hardcoded
data = {"__VIEWSTATEGENERATOR": source.xpath("//*[#id='__VIEWSTATEGENERATOR']/#value")[0].extract(),
"__EVENTVALIDATION": source.xpath("//*[#id='__EVENTVALIDATION']/#value")[0].extract(),
"__VIEWSTATE": source.xpath("//*[#id='__VIEWSTATE']/#value")[0].extract(),
" __REQUESTDIGEST": source.xpath("//*[#id='__REQUESTDIGEST']/#value")[0].extract()}
return data
def extract_parties(self, response):
source = response
name = source.xpath("//ul[#class='listadoVert02']/ul/li/a/text()").extract()
print name
I hope is clear. Thanks everybody, again!
If you look at the data posted to the form in chrome or firebug you can see there are many fields passed in the post request, there are a few that are essential and must be parsed from the original page, parsing the ids from the div.sangria ul li a tags is not sufficient as the actual data posted is slightly different, what is posted is in the Javascript function, WebForm_DoPostBackWithOptions which is in the href not the id attribute:
href='javascript:WebForm_DoPostBackWithOptions(new
WebForm_PostBackOptions("ctl00$m$g_36ea0310_893d_4a19_9ed1_88a133d06423$ctl00$Repeater1$ctl03$lnk_Grupo", "", true, "", "", false, true))'>
Sometimes all the underscores are replaced with dollar signs so it is easy to do a str.replace to get them in the correct order but not really in this case, we could use a regex to parse but I like the js2xml lib which can parse a javascript function and its args into an xml tree.
The following code using requests shows you how can get the data from the initial request and get to all the pages you want:
import requests
from lxml import html
import js2xml
post = "http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx"
def validate(xml):
# these fields are the minimum required as cannot be hardcoded
data = {"__VIEWSTATEGENERATOR": xml.xpath("//*[#id='__VIEWSTATEGENERATOR']/#value")[0],
"__EVENTVALIDATION": xml.xpath("//*[#id='__EVENTVALIDATION']/#value")[0],
"__VIEWSTATE": xml.xpath("//*[#id='__VIEWSTATE']/#value")[0],
" __REQUESTDIGEST": xml.xpath("//*[#id='__REQUESTDIGEST']/#value")[0]}
return data
with requests.Session() as s:
# make initial requests to get the links/hrefs and the from fields
r = s.get(
"http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx")
xml = html.fromstring(r.content)
hrefs = xml.xpath("//*[#id='moduloBusqueda']//div[#class='sangria']/ul/li/a/#href")
form_data = validate(xml)
for h in hrefs:
js_xml = js2xml.parse(h)
_id = js_xml.xpath(
"//identifier[#name='WebForm_PostBackOptions']/following-sibling::arguments/string[starts-with(.,'ctl')]")[
0]
form_data["__EVENTTARGET"] = _id.text
r = s.post(post, data=form_data)
xml = html.fromstring(r.content)
print(xml.xpath("//ul[#class='listadoVert02']/ul/li/a/text()"))
If we run the code above we see the different text output from all teh anchor tags:
In [2]: with requests.Session() as s:
...: r = s.get(
...: "http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx")
...: xml = html.fromstring(r.content)
...: hrefs = xml.xpath("//*[#id='moduloBusqueda']//div[#class='sangria']/ul/li/a/#href")
...: form_data = validate(xml)
...: for h in hrefs:
...: js_xml = js2xml.parse(h)
...: _id = js_xml.xpath(
...: "//identifier[#name='WebForm_PostBackOptions']/following-sibling::arguments/string[starts-with(.,'ctl')]")[
...: 0]
...: form_data["__EVENTTARGET"] = _id.text
...: r = s.post(post, data=form_data)
...: xml = html.fromstring(r.content)
...: print(xml.xpath("//ul[#class='listadoVert02']/ul/li/a/text()"))
...:
[u'Abo\xedn Abo\xedn, Sonsoles Trinidad', u'Adrados Gautier, M\xaa Paloma', u'Aguado Del Olmo, M\xaa Josefa', u'\xc1lvarez Padilla, M\xaa Nadia', u'Arribas Del Barrio, Jos\xe9 M\xaa', u'Ballar\xedn Valc\xe1rcel, \xc1lvaro C\xe9sar', u'Berrio Fern\xe1ndez-Caballero, M\xaa In\xe9s', u'Berzal Andrade, Jos\xe9 Manuel', u'Cam\xedns Mart\xednez, Ana', u'Carballedo Berlanga, M\xaa Eugenia', 'Cifuentes Cuencas, Cristina', u'D\xedaz Ayuso, Isabel Natividad', u'Escudero D\xedaz-Tejeiro, Marta', u'Fermosel D\xedaz, Jes\xfas', u'Fern\xe1ndez-Quejo Del Pozo, Jos\xe9 Luis', u'Garc\xeda De Vinuesa Gardoqui, Ignacio', u'Garc\xeda Mart\xedn, Mar\xeda Bego\xf1a', u'Garrido Garc\xeda, \xc1ngel', u'G\xf3mez Ruiz, Jes\xfas', u'G\xf3mez-Angulo Rodr\xedguez, Juan Antonio', u'Gonz\xe1lez Gonz\xe1lez, Isabel Gema', u'Gonz\xe1lez Jim\xe9nez, Bartolom\xe9', u'Gonz\xe1lez Taboada, Jaime', u'Gonz\xe1lez-Mo\xf1ux V\xe1zquez, Elena', u'Gonzalo L\xf3pez, Rosal\xeda', 'Izquierdo Torres, Carlos', u'Li\xe9bana Montijano, Pilar', u'Mari\xf1o Ortega, Ana Isabel', u'Moraga Valiente, \xc1lvaro', u'Mu\xf1oz Abrines, Pedro', u'N\xfa\xf1ez Guijarro, Jos\xe9 Enrique', u'Olmo Fl\xf3rez, Luis Del', u'Ongil Cores, M\xaa Gador', 'Ortiz Espejo, Daniel', u'Ossorio Crespo, Enrique Mat\xedas', 'Peral Guerra, Luis', u'P\xe9rez Baos, Ana Isabel', u'P\xe9rez Garc\xeda, David', u'Pla\xf1iol De Lacalle, Regina M\xaa', u'Redondo Alcaide, M\xaa Isabel', u'Roll\xe1n Ojeda, Pedro', u'S\xe1nchez Fern\xe1ndez, Alejandro', 'Sanjuanbenito Bonal, Diego', u'Serrano Guio, Jos\xe9 Tom\xe1s', u'Serrano S\xe1nchez-Capuchino, Alfonso Carlos', 'Soler-Espiauba Gallo, Juan', 'Toledo Moreno, Lucila', 'Van-Halen Acedo, Juan']
[u'Andaluz Andaluz, M\xaa Isabel', u'Ardid Jim\xe9nez, M\xaa Isabel', u'Carazo G\xf3mez, M\xf3nica', u'Casares D\xedaz, M\xaa Luc\xeda Inmaculada', u'Cepeda Garc\xeda De Le\xf3n, Jos\xe9 Carmelo', 'Cruz Torrijos, Diego', u'Delgado G\xf3mez, Carla', u'Franco Pardo, Jos\xe9 Manuel', u'Freire Campo, Jos\xe9 Manuel', u'Gabilondo Pujol, \xc1ngel', 'Gallizo Llamas, Mercedes', u"Garc\xeda D'Atri, Ana", u'Garc\xeda-Rojo Garrido, Pedro Pablo', u'G\xf3mez Montoya, Rafael', u'G\xf3mez-Chamorro Torres, Jos\xe9 \xc1ngel', u'Gonz\xe1lez Gonz\xe1lez, M\xf3nica Silvana', u'Leal Fern\xe1ndez, M\xaa Isaura', u'Llop Cuenca, M\xaa Pilar', 'Lobato Gandarias, Juan', u'L\xf3pez Ruiz, M\xaa Carmen', u'Manguan Valderrama, Eva M\xaa', u'Maroto Illera, M\xaa Reyes', u'Mart\xednez Ten, Carmen', u'Mena Romero, M\xaa Carmen', u'Moreno Navarro, Juan Jos\xe9', u'Moya Nieto, Encarnaci\xf3n', 'Navarro Lanchas, Josefa', 'Nolla Estrada, Modesto', 'Pardo Ortiz, Josefa Dolores', u'Quintana Viar, Jos\xe9', u'Rico Garc\xeda-Hierro, Enrique', u'Rodr\xedguez Garc\xeda, Nicol\xe1s', u'S\xe1nchez Acera, Pilar', u'Sant\xedn Fern\xe1ndez, Pedro', 'Segovia Noriega, Juan', 'Vicente Viondi, Daniel', u'Vinagre Alc\xe1zar, Agust\xedn']
['Abasolo Pozas, Olga', 'Ardanuy Pizarro, Miguel', u'Beirak Ulanosky, Jazm\xedn', u'Camargo Fern\xe1ndez, Ra\xfal', 'Candela Pokorna, Marco', 'Delgado Orgaz, Emilio', u'D\xedaz Rom\xe1n, Laura', u'Espinar Merino, Ram\xf3n', u'Espinosa De La Llave, Mar\xeda', u'Fern\xe1ndez Rubi\xf1o, Eduardo', u'Garc\xeda G\xf3mez, M\xf3nica', 'Gimeno Reinoso, Beatriz', u'Guti\xe9rrez Benito, Eduardo', 'Huerta Bravo, Raquel', u'L\xf3pez Hern\xe1ndez, Isidro', u'L\xf3pez Rodrigo, Jos\xe9 Manuel', u'Mart\xednez Abarca, Hugo', u'Morano Gonz\xe1lez, Jacinto', u'Ongil L\xf3pez, Miguel', 'Padilla Estrada, Pablo', u'Ruiz-Huerta Garc\xeda De Viedma, Lorena', 'Salazar-Alonso Revuelta, Cecilia', u'San Jos\xe9 P\xe9rez, Carmen', u'S\xe1nchez P\xe9rez, Alejandro', u'Serra S\xe1nchez, Isabel', u'Serra S\xe1nchez, Clara', 'Sevillano De Las Heras, Elena']
[u'Aguado Crespo, Ignacio Jes\xfas', u'\xc1lvarez Cabo, Daniel', u'Gonz\xe1lez Pastor, Dolores', u'Iglesia Vicente, M\xaa Teresa De La', 'Lara Casanova, Francisco', u'Marb\xe1n De Frutos, Marta', u'Marcos Arias, Tom\xe1s', u'Meg\xedas Morales, Jes\xfas Ricardo', u'N\xfa\xf1ez S\xe1nchez, Roberto', 'Reyero Zubiri, Alberto', u'Rodr\xedguez Dur\xe1n, Ana', u'Rubio Ruiz, Juan Ram\xf3n', u'Ruiz Fern\xe1ndez, Esther', u'Sol\xeds P\xe9rez, Susana', 'Trinidad Martos, Juan', 'Veloso Lozano, Enrique', u'Zafra Hern\xe1ndez, C\xe9sar']
You can add the exact same logic to your spider, I just used requests to show you a working example. You should also be aware that not every asp.net site behaves the same, you may have to re-validate for every post as in this related answer.
I think that scrapy's from_response could help you a lot (maybe this isn't the best re but for it, but you'll get the idea), try something like this:
import scrapy
import urllib
from scrapy.http.request.form import FormRequest
class AsambleaMadrid(scrapy.Spider):
name = "Asamblea_Madrid"
start_urls = ['http://www.asambleamadrid.es/ES/QueEsLaAsamblea/ComposiciondelaAsamblea/LosDiputados/Paginas/RelacionAlfabeticaDiputados.aspx']
def parse(self, response):
ids_re = r'WebForm_PostBackOptions\(([^,]*)'
for id in response.css('#moduloBusqueda li a').re(ids_re):
target = urllib.unquote(id).strip('"')
formdata = {'__EVENTTARGET': target}
request = FormRequest.from_response(response=response,
formdata=formdata,
callback=self.takeEachParty,
dont_click=True)
yield request
def takeEachParty(self, response):
print response.css('.listadoVert02 li a::text').extract()
Do agree with ELRuLL - Firebug is your best friend while scraping.
If you want to avoid JS simulation then you need reproduce carefully all the params/headers that are being sent.
For example from what I see
for __EVENTTARGET you're sending just
id="ctl00_m_g_36ea0310_893d_4a19_9ed1_88a133d06423_ctl00_Repeater2_ctl01_lnk_Diputado")
and via Firebug we see that:
__EVENTTARGET=ctl00$m$g_36ea0310_893d_4a19_9ed1_88a133d06423$ctl00$Repeater2$ctl01$lnk_Diputado
It maybe the reason and maybe not, just repeat and test.
Firebug link just in case.

PHPExcel - Render HTML markups on output

In my site I have several forms using CKEditor to store it's data. So when I export all the content, in excel I can see markups all over my document. Is there an easy way to convert those to \n or somehow have PHPExcel recognize them as breaks?
I also tried changing the export type to Excel2007 thinking that maybe Excel could just simply render the HTML markup, however when I switch types my report just crashes. So im sticking with Excel5.
Any help would be appreciated.
<?php
/** PHPExcel */
require_once '/Excelphp/Classes/PHPExcel.php';
// Create new PHPExcel object
$objPHPExcel = new PHPExcel();
$rows=2;
$sheet=$objPHPExcel->getActiveSheet();
// Set properties
$objPHPExcel->getProperties()->setCreator("Maarten Balliauw")
->setLastModifiedBy("Maarten Balliauw")
->setTitle("Office 2007 XLSX Test Document")
->setSubject("Office 2007 XLSX Test Document")
->setDescription("Test document for Office 2007 XLSX, generated using PHP classes.")
->setKeywords("office 2007 openxml php")
->setCategory("Test result file");
//This is the hard coded *non dynamic* cell formatting
$sheet->getColumnDimension('A')->setWidth(5);
$sheet->getColumnDimension('B')->setWidth(15);
$sheet->getColumnDimension('C')->setWidth(50);
$sheet->getColumnDimension('D')->setWidth(50);
$sheet->getSheetView()->setZoomScale(90);
$sheet->getStyle('A:D') ->getAlignment()->setVertical(PHPExcel_Style_Alignment::VERTICAL_CENTER);
//Font Setting for the Support group title.
$Support_team = array('font'=> array('bold'=> true,'color' => array('rgb' => '4D4D4D'),'size' => 22,'name' => 'Arial'),'alignment' => array('horizontal' => PHPExcel_Style_Alignment::HORIZONTAL_CENTER,'vertical' => PHPExcel_Style_Alignment::VERTICAL_CENTER),);
//Font settings for the header cells only.
$headers = array('font'=> array('bold'=> true,'color' => array('rgb' => '4D4D4D'),'size' => 12,'name' => 'Arial'),'alignment' => array('horizontal' => PHPExcel_Style_Alignment::HORIZONTAL_CENTER,'vertical' => PHPExcel_Style_Alignment::VERTICAL_CENTER),);
//Border settings
$borders = array('borders' => array('inside'=> array('style' => PHPExcel_Style_Border::BORDER_THIN,'color' => array('argb' => '717171')),'outline' => array('style' => PHPExcel_Style_Border::BORDER_THIN,'color' => array('argb' => '717171'))));
// SQl database connections
$db = mysql_connect("localhost", "IMC_COE2", "IMC123");
mysql_select_db("IMC_COE2",$db);
$sql="select client, team_name,support_team_prime,prime_comments,support_team_backup,backup_comments,escalation1,escalation1_comments,escalation2,escalation2_comments,escalation3,escalation3_comments,escalation4,escalation4_comments,note from tbl_address ORDER BY team_name";
$result=mysql_query($sql);
$numrows=mysql_num_rows($result);
if ($numrows>0)
{
while($data=mysql_fetch_array($result))
{
//Cell Merging
$sheet
->mergeCells('B'.$rows.':D'.$rows)
->mergeCells('B'.($rows+2).':D'.($rows+2))
->mergeCells('B'.($rows+5).':D'.($rows+5))
->mergeCells('B'.($rows+10).':D'.($rows+10))
->mergeCells('C'.($rows+1).':D'.($rows+1))
->mergeCells('B'.($rows+11).':D'.($rows+11));
// Add some data
$objPHPExcel->setActiveSheetIndex(0)
->setCellValue('B'.($rows+1), 'Client:')
->setCellValue('B'.($rows+2), 'Support group contacts')
->setCellValue('B'.($rows+3), 'Prime:')
->setCellValue('B'.($rows+4), 'Backup:')
->setCellValue('B'.($rows+5), 'Escalations')
->setCellValue('B'.($rows+6), 'Escalation 1:')
->setCellValue('B'.($rows+7), 'Escalation 2:')
->setCellValue('B'.($rows+8), 'Escalation 3:')
->setCellValue('B'.($rows+9), 'Escalation 4:')
->setCellValue('B'.($rows+10), 'Notes');
//Format the hardcoded text
$sheet->getStyle('B'.$rows)->applyFromArray($Support_team);
$sheet->getStyle('B'.($rows+2))->applyFromArray($headers);
$sheet->getStyle('B'.($rows+5))->applyFromArray($headers);
$sheet->getStyle('B'.($rows+10))->applyFromArray($headers);
//Row height adjustments
$sheet->getRowDimension($rows+3)->setRowHeight(60);
$sheet->getRowDimension($rows+4)->setRowHeight(60);
$sheet->getRowDimension($rows+6)->setRowHeight(60);
$sheet->getRowDimension($rows+7)->setRowHeight(60);
$sheet->getRowDimension($rows+8)->setRowHeight(60);
$sheet->getRowDimension($rows+9)->setRowHeight(60);
$sheet->getRowDimension($rows+11)->setRowHeight(100);
//Cell Wraptext
$sheet->getStyle('C'.($rows+1).':D'.($rows+1))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+3).':D'.($rows+3))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+4).':D'.($rows+4))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+6).':D'.($rows+6))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+7).':D'.($rows+7))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+8).':D'.($rows+8))->getAlignment()->setWrapText(true);
$sheet->getStyle('C'.($rows+9).':D'.($rows+9))->getAlignment()->setWrapText(true);
$sheet->getStyle('B'.($rows+11).':D'.($rows+11))->getAlignment()->setWrapText(true);
//Background color on cells
$sheet->getStyle('B'.$rows.':D'.$rows)->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FF9BC2E6');
$sheet->getStyle('B'.($rows+2).':D'.($rows+2))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FF9BC2E6');
$sheet->getStyle('B'.($rows+5).':D'.($rows+5))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FF9BC2E6');
$sheet->getStyle('B'.($rows+10).':D'.($rows+10))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FF9BC2E6');
$sheet->getStyle('B'.($rows+1))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+3))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+4))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+6))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+7))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+8))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
$sheet->getStyle('B'.($rows+9))->getFill()->setFillType(PHPExcel_Style_Fill::FILL_SOLID)->getStartColor()->setARGB('FFE6F1FA');
//Border range
$sheet->getStyle('B'.$rows.':D'.($rows+11))->applyFromArray($borders);
//This section is the actual data imported from the SQL database *don't touch*
$objPHPExcel->setActiveSheetIndex(0)
->setCellValue('C'.($rows+1), utf8_encode($data['client'])) //this will give cell C2.
->setCellValue('B'.$rows, utf8_encode($data['team_name'])) // this will give cell B2
->setCellValue('C'.($rows+3), utf8_encode($data['support_team_prime'])) //this will give C5
->setCellValue('D'.($rows+3), utf8_encode($data['prime_comments'])) // This will give D5
->setCellValue('C'.($rows+4), utf8_encode($data['support_team_backup'])) //This will give C6
->setCellValue('D'.($rows+4), utf8_encode($data['backup_comments'])) //This will give D6
->setCellValue('C'.($rows+6), utf8_encode($data['escalation1']))//THis will give you C8
->setCellValue('D'.($rows+6), utf8_encode($data['escalation1_comments']))//This will give you D8
->setCellValue('C'.($rows+7), utf8_encode($data['escalation2']))//This will give you C9
->setCellValue('D'.($rows+7), utf8_encode($data['escalation2_comments']))//This will give you D9
->setCellValue('C'.($rows+8), utf8_encode($data['escalation3']))//This will give you C10
->setCellValue('D'.($rows+8), utf8_encode($data['escalation3_comments']))//This will give you D10
->setCellValue('C'.($rows+9), utf8_encode($data['escalation4']))//This will give you C11
->setCellValue('D'.($rows+9), utf8_encode($data['escalation4_comments']))//This will give you D11
->setCellValue('B'.($rows+11), utf8_encode($data['note'])); //This will give you B13
$rows+=14;
}
}
// Rename sheet
$sheet->setTitle('Directory Tool Full dump');
// Set active sheet index to the first sheet, so Excel opens this as the first sheet
$objPHPExcel->setActiveSheetIndex(0);
// Redirect output to a client’s web browser (Excel5)
ob_end_clean();
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');
header('Content-Type: application/vnd.ms-excel');
$today=date("F.d.Y");
$filename = "Directory_Export-$today.xls";
header("Content-Disposition: attachment; filename=$filename");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
$objWriter->save('php://output');
exit;
?>
MS Excel has no concept of HTML Markup within a cell, it's simply a string value.... storing a value like ABC<br />DEF will simply be treated as a string containing ABC<br />DEF and not as two lines containing ABC and DEF respectively. If you want it to be treated as two lines then you need to change the <br /> to "\n". You also need to set the cell alignment to wrap.

Categories

Resources