I'm trying to convert HTML & CSS into a pdf page using Django-weasyprint but I'm kind of stuck with their tutorial because I want the PDF to render the current page when the user clicks on the download button and the pdf downloads automatically with read-only privileges. This whole process kind of feels painful to do.
Currently, weasyprint just converts a URL in Django to pdf, but I don't know how to set the button to look at the weasyprint view.
Maybe I am looking past it and over complicating it, any assistance would be appreciated.
Weasyprints example code:
from django.conf import settings
from django.views.generic import DetailView
from django_weasyprint import WeasyTemplateResponseMixin
from django_weasyprint.views import CONTENT_TYPE_PNG
class MyModelView(DetailView):
# vanilla Django DetailView
model = MyModel
template_name = 'mymodel.html'
class MyModelPrintView(WeasyTemplateResponseMixin, MyModelView):
# output of MyModelView rendered as PDF with hardcoded CSS
pdf_stylesheets = [
settings.STATIC_ROOT + 'css/app.css',
]
# show pdf in-line (default: True, show download dialog)
pdf_attachment = False
# suggested filename (is required for attachment!)
pdf_filename = 'foo.pdf'
class MyModelImageView(WeasyTemplateResponseMixin, MyModelView):
# generate a PNG image instead
content_type = CONTENT_TYPE_PNG
# dynamically generate filename
def get_pdf_filename(self):
return 'bar-{at}.pdf'.format(
at=timezone.now().strftime('%Y%m%d-%H%M'),
)
I made a virtual env on my pc and it's setup exactly like the example. Currently using Boostrap 4.
*Edit if there is a better way of doing it, you are more than welcome to share it :)
Also I want to target just the body tags so that it converts only that section to pdf and not the ENTIRE page.
The solution I used before this is: https://codepen.io/AshikNesin/pen/KzgeYX but this doesn't work very well.
*EDIT 2.0
I've moved on to js and I'm stuck with this script where it doesn't want to create the pdf form on click function also is there a way to set the js function to ONLY download the selected Id in the div and not on certain scale? (afraid that it's going to use the resolution instead of the actual content that needs to be rendered)
https://jsfiddle.net/u4ko9pzs/18/
Any suggestions would be greatly appreciated.
I was tried django-wkhtmltopdf and weasyprint. For shure weasyprint allow more in subject of css but still it isn't one to one to browser version. I wasn't happy with results.
It is ok for tables, and all word/excel like data. But in my case i have to made px to px 'certificate' like pdf with svg shaped background Then some lack of features starts to be not convenient problem.
So if you need use the same css with minimal modification to get effect like in browswer then puppeter solving all compabitlity problems. But this force to use Node to run puppeteer. (in my case i already use it to compile assets)
Cli for puppetere:
npm i- puppeteer-pdf
then install wrapper
pip install django-puppeteer-pdf
add node puppeteer cli start command in settings:
PUPPETEER_PDF_CMD = 'npx puppeteer-pdf'
and create view
from puppeteer_pdf import render_pdf_from_template
from django.http import HttpResponse,
import os
# Rendering code
context = {
'data': 'test data'
}
output_temp_file = os.path.join(settings.BASE_DIR, 'static', 'temp.pdf')
pdf = render_pdf_from_template(
input_template='path to template in templates dir'
header_template='',
footer_template='',
context=context,
cmd_options={
'format': 'A4',
'scale': '1',
'marginTop': '0',
'marginLeft': '0',
'marginRight': '0',
'marginBottom': '0',
'printBackground': True,
'preferCSSPageSize': True,
'output': output_temp_file,
'pageRanges': 1
}
)
#you can remove temp file now or collecte them and run some crone task.(performance)
if os.path.exists(output_temp_file):
os.remove(output_temp_file)
else:
# raise error or something
filename = f'filename={'your donwload file name'}.pdf'
response = HttpResponse(pdf, content_type='application/pdf;')
response['Content-Disposition'] = filename
To keep original color you have to add
html{
-webkit-print-color-adjust: exact;
}
controlling page size and margin when you choice 'preferCSSPageSize'.
#page {
size: A4;
margin: 0 0 0 0;
}
Use whkhtmltopdf it's very easy to use in Django. It's provide pure html generated PDF conversion
pip install django-wkhtmltopdf
https://django-wkhtmltopdf.readthedocs.io/en/latest/
Related
I have a non-traditional, image upload button on my company's website. I want to have an automated way to upload an image using this button, but without having to use a tool like AutoIt in order to interact with the file explorer.
Here's a sample of this button's HTML:
<button ng-click="onClick()" ng-disabled="readOnly" accepted-types="image/*" on-files-selected="onFilesSelected" allow-multiple="true" readonly="readonly">Add images</button>
It's a bit different than the usual input element, e.g. <input type="file">, and it's using AngularJS. Since it's not an input element, I don't think I can use Selenium's sendKeys() function to input the image's file location on my machine.
Is there any hack or workaround to uploading the image? I was considering things like overwriting the onClick() function to do read from a specified location (this approach doesn't really seem like it's doable), or possibly intercepting the event that opens the file explorer and trying to hack my way from there, but these are all just unsupported and untested approaches to solving the problem.
Would it be possible to do this in another browser-automation tool, like Microsoft's Playwright?
Use JACOB it provides java native interface where you can use AutoIt functionalities with selenium here is a sample I am using it in most of the places like MSTeams,Slack for Automation[Upload Feature] it does the job.
List of Steps you need to do before jumping to the code:
Step 1:
Download JACOB jar
Step 2:
Register the AutoIt COM libraries e.g regsvr32 AutoItX3_x64.dll
Use these in your code
jacob.jar
AutoItX4Java.jar
jacob-1.18-x64.dll
jacob-1.18-x86.dll
Sample Code:
[This Code Interacts with file explorer]
import com.jacob.com.LibraryLoader;
import autoitx4java.AutoItX;
public class Attachments {
public void uploadAttachments(){
File f = new File("Location");
File[] fil =f.listFiles();
//Upload Button Xpath
WebElement uploadFromComp = driver.findElement(By.xpath("//span[contains(text(),'Upload from my computer')]"));
uploadFromComp.click();
Thread.sleep(5000);
String jacobDllVersionToUse;
if (jvmBitVersion().contains("32")) {
jacobDllVersionToUse = "jacob-1.19-x86.dll";
} else {
jacobDllVersionToUse = "jacob-1.19-x64.dll";
}
File file1 = new File("registerAutoItDll", jacobDllVersionToUse);
System.setProperty(LibraryLoader.JACOB_DLL_PATH, file1.getAbsolutePath());
AutoItX x = new AutoItX();
x.winWaitActive("Open");
x.sleep(5000);
x.send(fil[j].getAbsolutePath());
x.send("{ENTER}", false);
}}
I hope it works for you.
It is 100% posible with playwright and it is lot simplier then in the Selenium.
// Select one file
await page.setInputFiles('input#upload', 'myfile.pdf');
// Select multiple files
await page.setInputFiles('input#upload', ['file1.txt', 'file2.txt']);
See more on:
https://playwright.dev/docs/input#upload-files
I'm using EPUB.js and Vue to render an Epub. I want to display the cover images of several epub books so users can click one to then see the whole book.
There's no documentation on how to do this, but there are several methods that indicate that this should be possible.
First off, there's Book.coverUrl() method.
Note that I'm setting an img src property equal to bookCoverSrc in the Vue template. Setting this.bookCoverSrc will automatically update the src of the img tag and cause an image to display (if the src is valid / resolves).
this.book = new Epub(this.epubUrl, {});
this.book.ready.then(() => {
this.book.coverUrl().then((url) => {
this.bookCoverSrc = url;
});
})
The above doesn't work. url is undefined.
Weirdly, there appears to be a cover property directly on book. So, I try:
this.book = new Epub(this.epubUrl, {});
this.book.ready.then(() => {
this.coverSrc = this.book.cover;
});
this.book.cover resolves to OEBPS/#public#vhost#g#gutenberg#html#files#49010#49010-h#images#cover.jpg, so at least locally when I set it to a src results in a request to http://localhost:8080/OEBPS/#public#vhost#g#gutenberg#html#files#49010#49010-h#images#cover.jpg, which 200s but returns no content. Probably a quirk of webpack-dev-server to 200 on that, but if I page through sources in Chrome dev tools I also don't see any indicate that such a URL should resolve.
So, docs not helping. I googled and found this github question from 2015. Their code is like
$("#cover").attr("src", Book.store.urlCache[Book.cover]);
Interesting, nothing in the docks about Book.store.urlCache. As expected, urlCache is undefined, though book.store exists. I don't see anything on there that can help me display a cover image though.
Using epub.js, how can I display a cover image of an Epub file? Note that simply rendering the first "page" of the Epub file (which is usually the cover image) doesn't solve my problem, as I'd like to list a couple epub files' cover images.
Note also that I believe the epub files I'm using do have cover images. The files are Aesop's Fables and Irish Wonders.
EDIT: It's possible I need to use Book.load on the url provided by book.cover first. I did so and tried to console.log it, but it's a massive blog of weirdly encoded text that looks something like:
����
So I think it's an image straight up, and I need to find a way to get that onto the Document somehow?
EDIT2: that big blobby blob is type: string, and I can't atob() or btoa() it.
EDIT3: Just fetching the url provided by this.book.cover returns my index.html, default behavior for webpack-dev-server when it doesn't know what else to do.
EDIT4: Below is the code for book.coverUrl from epub.js
key: "coverUrl",
value: function coverUrl() {
var _this9 = this;
var retrieved = this.loaded.cover.then(function (url) {
if (_this9.archived) {
// return this.archive.createUrl(this.cover);
return _this9.resources.get(_this9.cover);
} else {
return _this9.cover;
}
});
return retrieved;
}
If I use this.archive.createUrl(this.cover) instead of this.resources.get, I actually get a functional URL, that looks like blob:http://localhost:8080/9a3447b7-5cc8-4cfd-8608-d963910cb5f5. I'll try getting that out into src and see what happens.
The reason this was happening to me was because the functioning line of code in the coverUrl function was commented out in the source library epub.js, and a non-functioning line of code was written instead.
So, I had to copy down the entire library, uncomment the good code and delete the bad. Now the function works as it should.
To do so, clone down the entire epub.js project. Copy over the dependencies in that project's package.json to your own. Then, take the src, lib, and libs folders and copy them somewhere into your project. Find a way to disable eslint for the location you put these folders into because the project uses TAB characters for spacing which caused my terminal to hang due to ESLINT exploding.
npm install so you have your and epub.js dependencies in your node_modules.
Open book.js. Uncomment line 661 which looks like
return this.archive.createUrl(this.cover);
and comment out line 662 which looks like
// return this.resources.get(this.cover);
Now you can display an image by setting an img tag's src attribute to the URL returned by book.coverUrl().
this.book = new Epub(this.epubUrl, {});
this.book.ready.then(() => {
this.book.coverUrl().then((url) => {
this.bookCoverSrc = url;
});
})
from my poor knowledge about webscraping I've come about to find a very complex issue for me, that I will try to explain the best I can (hence I'm opened to suggestions or edits in my post).
I started using the web crawling framework 'Scrapy' long ago to make my webscraping, and it's still the one that I use nowadays. Lately, I came across this website, and found that my framework (Scrapy) was not able to iterate over the pages since this website uses Fragment URLs (#) to load the data (the next pages). Then I made a post about that problem (having no idea of the main problem yet): my post
After that, I realized that my framework can't make it without a JavaScript interpreter or a browser imitation, so they mentioned the Selenium library. I read as much as I could about that library (i.e. example1, example2, example3 and example4). I also found this StackOverflow's post that gives some tracks about my issue.
So Finally, my biggest questions are:
1 - Is there any way to iterate/yield over the pages from the website shown above, using Selenium along with scrapy?
So far, this is the code I'm using, but doesn't work...
EDIT:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# The require imports...
def getBrowser():
path_to_phantomjs = "/some_path/phantomjs-2.1.1-macosx/bin/phantomjs"
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87")
browser = webdriver.PhantomJS(executable_path=path_to_phantomjs, desired_capabilities=dcap)
return browser
class MySpider(Spider):
name = "myspider"
browser = getBrowser()
def start_requests(self):
the_url = "http://www.atraveo.com/es_es/islas_canarias#eyJkYXRhIjp7ImNvdW50cnlJZCI6IkVTIiwicmVnaW9uSWQiOiI5MjAiLCJkdXJhdGlvbiI6NywibWluUGVyc29ucyI6MX0sImNvbmZpZyI6eyJwYWdlIjoiMCJ9fQ=="
yield scrapy.Request(url=the_url, callback=self.parse, dont_filter=True)
def parse(self, response):
self.get_page_links()
def get_page_links(self):
""" This first part, goes through all available pages """
for i in xrange(1, 3): # 210
new_data = {"data": {"countryId": "ES", "regionId": "920", "duration": 7, "minPersons": 1},
"config": {"page": str(i)}}
json_data = json.dumps(new_data)
new_url = "http://www.atraveo.com/es_es/islas_canarias#" + base64.b64encode(json_data)
self.browser.get(new_url)
print "\nThe new URL is -> ", new_url, "\n"
content = self.browser.page_source
self.get_item_links(content)
def get_item_links(self, body=""):
if body:
""" This second part, goes through all available items """
raw_links = re.findall(r'listclickable.+?>', body)
links = []
if raw_links:
for raw_link in raw_links:
new_link = re.findall(r'data-link=\".+?\"', raw_link)[0].replace("data-link=\"", "").replace("\"",
"")
links.append(str(new_link))
if links:
ids = self.get_ids(links)
for link in links:
current_id = self.get_single_id(link)
print "\nThe Link -> ", link
# If commented the line below, code works, doesn't otherwise
yield scrapy.Request(url=link, callback=self.parse_room, dont_filter=True)
def get_ids(self, list1=[]):
if list1:
ids = []
for elem in list1:
raw_id = re.findall(r'/[0-9]+', elem)[0].replace("/", "")
ids.append(raw_id)
return ids
else:
return []
def get_single_id(self, text=""):
if text:
raw_id = re.findall(r'/[0-9]+', text)[0].replace("/", "")
return raw_id
else:
return ""
def parse_room(self, response):
# More scraping code...
So this is mainly my problem. I'm almost sure that what I'm doing isn't the best way, so for that I did my second question. And to avoid having to do these kind of issues in the future, I did my third question.
2 - If the answer to the first question is negative, how could I tackle this issue? I'm opened to another means, otherwise
3 - Can anyone tell me or show me pages where I can learn how to solve/combine webscraping along javaScript and Ajax? Nowadays are more the websites that use JavaScript and Ajax scripts to load content
Many thanks in advance!
Selenium is one of the best tools to scrape dynamic data.you can use selenium with any web browser to fetch the data that is loading from scripts.That works exactly like the browser click operations.But I am not prefering it.
For getting dynamic data you can use scrapy + splash combo. From scrapy you wil get all the static data and splash for other dynamic contents.
Have you looked into BeautifulSoup? It's a very popular web scraping library for python. As for JavaScript, I would recommend something like Cheerio (If you're asking for a scraping library in JavaScript)
If you are meaning that the website uses HTTP requests to load content, you could always try to manipulate that manually with something like the requests library.
Hope this helps
You can definitely use Selenium as a standalone to scrap webpages with dynamic content (like AJAX loading).
Selenium will just rely on a WebDriver (basically a web browser) to seek content over the Internet.
Here are a few of them (but the most often used) :
ChromeDriver
PhantomJS (my favorite)
Firefox
Once your started, you can start your bot and parse the html content of the webpage.
I included a minimal working example below using Python and ChromeDriver :
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='chromedriver')
driver.get('https://www.google.com')
# Then you can search for any element you want on the webpage
search_bar = driver.find_element(By.CLASS_NAME, 'tsf-p')
search_bar.click()
driver.close()
See the documentation for more details !
I want to let users to download a specific file, by clicking on a button "Download". The button will be linked with many switchers, so I wrote a JS script that change the "href" tag to point to the correct static file.
I tried to follow many stackoverflow questions and read documentation about Django staticfiles, media files but did not understand what I need to do on my case. Any help would be really appreciated, let me please introduce what I did and ask for your help/opinion.
I want to let people download files that can be found in :
"/home/user/xxxx/xxx/project/my_app/static/"
Here is my function in views.py :
def send_file(request,file_name):
from django.contrib.staticfiles.storage import staticfiles_storage
import os
from wsgiref.util import FileWrapper
import mimetypes
filename = staticfiles_storage.url(file_name)
download_name = file_name
wrapper = FileWrapper(open(filename))
content_type = mimetypes.guess_type(filename)[0]
response = HttpResponse(wrapper, content_type=content_type)
response['Content-Length'] = os.path.getsize(filename)
response['Content-Disposition'] = "attachment; filename=%s" % download_name
return response
I need the exact path to open the file, so what I have done is that I defined on my settings STATIC_URL = "/home/user/xxxx/xxx/project/my_app/static/"
. I do not like this solution, because after it, if you check my source, you have the exact path of my project. If I defined STATIC_URL = "static/" it does not work. I looked for a way to get exact path for the static file but it did not work. Any help for this part ?
urls.py:
url(r'^download/static/(?P<file_name>[\w.]{0,256})$',views.send_file, name='send_file'),
template.html, only the button part :
Download
JS, only the part that when you click on a switcher, it changes the href tag of the HTML button :
"if you click on a switcher"
id = switcher-checked
var atag = document.getElementById("dl-link");
var url = "http://localhost:8000/test/download/static/"+id+".csv";
atag.setAttribute("href",url);
Is there a solution to use the {% url 'my_app:send_file %} tag in my JS ? I found that one solution is to put the script directly within "template.html", is it a good behaviour ?
My downloading is working perfectly, but I feel like all my choices are pretty bad (STATIC_URL and JS var url definition). I know that my question is quite dense, but I really need this help. Any examples would be more than appreciated. Thank you.
Couple of points. first one for STATIC_URL you can use "/static/" with starting and ending with slash instead of complete path. since you are trying download from same domain name no need to use complete URL "http://localhost:8000/your-url". You can simply use "/your-url" in js var url variable.
Django cannot render js files, so we have only one option, keeping it in template.html. For better maintaining of js code, try declaring global variable with urls in the template on document load and use them in your script. So that script part will be clear. for ex,
var all_my_urls = {
"url":"{% url 'my_app:send_file'%}"
}
I am a complete beginner trying to develop for FCKeditor so please bear with me here. I have been tasked with developing a custom plugin that will allow users to browse a specific set of images that the user uploads. Essentially the user first attaches images, then uses the FCKeditor to insert those images.
So I have my plugin directory:
lang
fckplugin.js
img.png (for the toolbar button)
I am looking for some help on strategy for the custom file browser (lets call it mybrowser.asp).
1) Should mybrowser.asp be in the plugin directory? It is dynamic and only applies to one specific area of the site.
2) How should I pass the querystring to mybrowser.asp?
3) Any other recommendations for developing FCKeditor plugins? Any sample plugins that might be helpful to me?
EDIT: The querystring passed to the plugin page will be the exact same as the one on the host page. (This is a very specific plugin that will only be used in one place)
You don't need the lang directory unless you're planning on supporting multiple languages. But even then, I would get the plugin working in one language first.
I would probably put mybrowser.asp in the plugin directory.
Here's some code for fckplugin.js to get you started.
// Register the related command.
// RegisterCommand takes the following arguments: CommandName, DialogCommand
// FCKDialogCommand takes the following arguments: CommandName,
// Dialog Title, Path to HTML file, Width, Height
FCKCommands.RegisterCommand(
'MyBrowser',
new FCKDialogCommand(
'My Browser',
'Select An Image',
FCKPlugins.Items['MyBrowser'].Path + 'mybrowser.asp',
500,
250)
);
// Create the toolbar button.
// FCKToolbarButton takes the following arguments: CommandName, Button Caption
var button = new FCKToolbarButton( 'MyBrowser', 'Select An Image' ) ;
button.IconPath = FCKPlugins.Items['MyBrowser'].Path + 'img.png' ;
FCKToolbarItems.RegisterItem( 'MyBrowser', button ) ;
Edit: I haven't tested this, but you should be able to append the querystring by doing something along these lines.
'Select An Image',
FCKPlugins.Items['MyBrowser'].Path + 'mybrowser.asp' + window.top.location.search,
500,
You might not need to write your own file browser as this functionality is built in. If you check the fckconfig.js file and search for var _FileBrowserLanguage you can specify your server language and it should hopefully use the equivalent file in the editor -> filemanager -> connectors folder.
If you check the docs hopefully that should hopefully keep you on the right track.