I'm having an issue rendering a PDF using EVOPdf from a WebAPI controller to an AngularJS app.
This is my code so far:
Angular call:
var url = 'api/form/build/' + id;
$http.get(url, null, { responseType: 'arraybuffer' })
.success(function (data) {
var file = new Blob([data], { type: 'application/pdf' });
if (window.navigator && window.navigator.msSaveOrOpenBlob) {
window.navigator.msSaveOrOpenBlob(file);
}
else {
var objectUrl = URL.createObjectURL(file);
window.open(objectUrl);
}
});
APIController method:
var url = "http://localhost/index.html#/form/build/" + id;
#region PDF Document Setup
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
htmlToPdfConverter.LicenseKey = "4W9+bn19bn5ue2B+bn1/YH98YHd3d3c=";
//htmlToPdfConverter.HtmlViewerWidth = 1024; //default
htmlToPdfConverter.PdfDocumentOptions.PdfPageSize = PdfPageSize.A4;
htmlToPdfConverter.PdfDocumentOptions.PdfPageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.ConversionDelay = 3;
htmlToPdfConverter.MediaType = "print";
htmlToPdfConverter.PdfDocumentOptions.LeftMargin = 10;
htmlToPdfConverter.PdfDocumentOptions.RightMargin = 10;
htmlToPdfConverter.PdfDocumentOptions.TopMargin = 10;
htmlToPdfConverter.PdfDocumentOptions.BottomMargin = 10;
htmlToPdfConverter.PdfDocumentOptions.TopSpacing = 10;
htmlToPdfConverter.PdfDocumentOptions.BottomSpacing = 10;
htmlToPdfConverter.PdfDocumentOptions.ColorSpace = ColorSpace.RGB;
// Set HTML content destination in PDF page
htmlToPdfConverter.PdfDocumentOptions.Width = 640;
htmlToPdfConverter.PdfDocumentOptions.FitWidth = true;
htmlToPdfConverter.PdfDocumentOptions.StretchToFit = true;
#endregion
byte[] outPdfBuffer = htmlToPdfConverter.ConvertUrl(url);
string outPdfFile = #"c:\temp\forms\" + id + ".pdf";
System.IO.File.WriteAllBytes(outPdfFile, outPdfBuffer);
HttpResponseMessage result = null;
result = Request.CreateResponse(HttpStatusCode.OK);
result.Content = new ByteArrayContent(outPdfBuffer.ToArray());
result.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment");
result.Content.Headers.ContentDisposition.FileName = "filename.pdf";
result.Content.Headers.ContentType = new MediaTypeHeaderValue("application/pdf");
return result;
When I check the PDF that I write out using WriteAllBytes, it renders perfectly but when it is returned via the Angular call and opened in Adobe Reader, I get an "Invalid Color Space" error message that pops up quite a few times, but the document is not opened. When I change the colorspace to GrayScale, the PDF opens but it's blank.
I have a feeling that it's the ByteArrayContent conversion that's causing the issue, seen as that's the only thing that happens between the actual creation of the PDF and sending it back to the Angular call, but I've hit a brick wall and can't figure out what the problem is.
I'd really appreciate any help you guys can offer because I'm so close to sorting this out and I just need the document to "convert" properly when returned from the call.
Thanks in advance for any help.
Regards,
Johann.
The problem seems to like on the client side, the characters are not properly parsed in the response. For anyone strugling with this i found my solution here: SO Question
Have you tried Headless Chrome? Here is a nice article about this topic. I was using https://github.com/puppeteer/puppeteer for this purpose and it was an easily integrated solution.
// install puppeteer-core npm package cmd
npm i puppeteer-core
# or "yarn add puppeteer-core"
<!-- example.js start-->
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
Execute script on the command line
node example.js
Puppeteer sets an initial page size to 800×600px, which defines the screenshot size. The page size can be customized with Page.setViewport().
Example - create a PDF.
Save file as hn.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://news.ycombinator.com', {
waitUntil: 'networkidle2',
});
await page.pdf({ path: 'hn.pdf', format: 'a4' });
await browser.close();
})();
Execute script on the command line
node hn.js
See Page.pdf() for more information about creating pdfs.
Example - evaluate script in the context of the page
Save file as get-dimensions.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Get the "viewport" of the page, as reported by the page.
const dimensions = await page.evaluate(() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
};
});
console.log('Dimensions:', dimensions);
await browser.close();
})();
Execute script on the command line
node get-dimensions.js
See Page.evaluate() for more information on evaluate and related methods like evaluateOnNewDocument and exposeFunction.
Related
I am working on reactjs/typescript applications. I am trying to download some files from azure storage v2. Below is the sample path I am supposed to download files. In this path, enrichment is the container name, and the rest all are folders. I am trying to download the last modified file from reportdocument folder.
enrichment/report/SAR-1234-56/reportdocument/file1.docs
I tried something below.
#action
public async reportDownload(sarNumber: string) {
let storage = globals.getGlobals('StorageAccount03');
console.log(storage);
let containerName = globals.getGlobals('StorageAccount03ContainerName');
let marker = undefined;
let allUploadPromise: Array<Promise<unknown>> = [];
const config = {
path: `/Storage/getsastoken/?storageName=${storage}&containerName=${containerName}`,
method: "GET",
success: (url: any) => {
const containerURL: ContainerURL = new ContainerURL(
url,
StorageURL.newPipeline(new AnonymousCredential()));
const listBlobsResponse = containerURL.listBlobFlatSegment(
Aborter.none,
marker,
);
}
};
await handleRequest(config);
}
From here I am struggling to download the latest modified file from the above path.
can someone help me to fix this? Any help would be greatly appreciated. Thank you
It's better to use #azure/storage-blob library and then the code would be something like below instead of directly trying to call blob REST API like you were trying in your code which seems unnecessary reinventing the wheel. The library already does it for you. Refer this for details.
const { BlobServiceClient } = require("#azure/storage-blob");
const account = "<account name>";
const sas = "<service Shared Access Signature Token>";
const containerName = "<container name>";
const blobName = "<blob name>";
const blobServiceClient = new BlobServiceClient(`https://${account}.blob.core.windows.net${sas}`);
async function download() {
const containerClient = blobServiceClient.getContainerClient(containerName);
const blobClient = containerClient.getBlobClient(blobName);
// Get blob content from position 0 to the end
// In browsers, get downloaded data by accessing downloadBlockBlobResponse.blobBody
const downloadBlockBlobResponse = await blobClient.download();
const downloaded = await blobToString(await downloadBlockBlobResponse.blobBody);
console.log("Downloaded blob content", downloaded);
// [Browsers only] A helper method used to convert a browser Blob into string.
async function blobToString(blob) {
const fileReader = new FileReader();
return new Promise((resolve, reject) => {
fileReader.onloadend = (ev) => {
resolve(ev.target.result);
};
fileReader.onerror = reject;
fileReader.readAsText(blob);
});
}
}
The SAS token expiry bothers me.You cannot have a static SAS token that expires sooner unless we can set long expiry (user-delegation SAS token is short lived). Do we really have the capability to create the SAS token dynamically in javascript runtime? I think it's only possible in NodeJS runtime.
Trying to make a pdf with Puppeteer on Firebase Functions: send in html snippet, get a pdf back.
But the pdf is corrupted. The problem is, I think, with returning the file / buffer.
// Server:
// setup
const func = require('firebase-functions');
const pptr = require('puppeteer');
const opts = { memory: '1GB', regions: ['europe-west3'] };
const call = func.runWith(opts).https.onCall
let browser = pptr.launch({ args: ['--no-sandbox'] });
// onCall
exports.makePdf = call(async (data) => {
// this works, does open the page testing with { headless: false }
const brws = await (await browser).createIncognitoBrowserContext();
const page = await brws.newPage();
await page.setContent(data, { waitUntil: 'domcontentloaded' });
const file = await page.pdf({ format: 'A4' });
await brws.close();
// the problem is returning the file
return file
});
When file is logged on the server it's a buffer <Buffer 25 50 44 46 2d 31 2e ... 11150 more bytes>, but when logged on the client it's an object, in dec and not hex { data: {0: 37, 1: 80, 2: 68, 3: 70, ... }}.
Convert that back to buffer? Convert back to hex? Which buffer?
// Client:
// send html and receive the file
let html = '<div><h1>Hi</h1></div>';
let file = ((await fns.makePdf(html))).data;
// also tried buffer = new Buffer.from(JSON.stringify(file));
let blob = new Blob(file, { type: 'application/pdf' });
// download the file
let a = document.createElement('a');
a.href = window.URL.createObjectURL(blob);
a.download = `${name}.pdf`;
a.click();
Or is the pdf corrupted because I'm downloading it wrong (createObjectURL)? Or onCall functions can't be used this way?
Point is, don't know why it's not working. Thanks for your help
Ok. /insert cursing the gods/
Server:
So it was ok, you CAN use onCall instead of onRequest, because you CAN create a pdf in the end from the json response. The reason onCall is a bit better is because you don't need to bother with cors or headers, Google does it for you.
// setup
const func = require('firebase-functions');
const pptr = require('puppeteer');
const opts = { memory: '1GB', regions: ['europe-west3'] };
const call = func.runWith(opts).https.onCall
// this runs outside of the function because it might not be killed off
// if the function is called quickly in succession (lookup details on
// google cloud functions docs)
let browser = pptr.launch({ args: ['--no-sandbox'] });
// the main function
exports.makePdf = call(async (data) => {
// prep puppeteer so it can print the pdf
const brws = await (await browser).createIncognitoBrowserContext();
const page = await brws.newPage();
await page.setContent(data);
const file = await page.pdf({
format: 'A4',
printBackground: false,
margin: { left: 0, top: 0, right: 0, bottom: 0 }
});
brws.close();
// and just return the stream
return file
});
Client:
The trick was on the client. I'll note with comments.
// so this is inside some function, has to be because it needs to be async
var a, blob, buffer, d, file, html, i, result;
// this html is for testing, it would obviously be gathered in another way
html = '<div><h1>Hi</h1></div>';
// we call the function from the client
// fns is my shorthand object defined on Google Functions initialization
// it is literally: f.httpsCallable('makePdf') where f is the functions object
// you get from initialization
file = ((await fns.makePdf(html))).data;
// additional .data is needed because onCall automatically wraps
// now the point
// 1. convert object to array
result = []; for (i in file) {
d = file[i]; result.push(d);
}
// 2. convert that to a Uint8Array
buffer = new Uint8Array(result);
// 3. create a blob, BUT the buffer needs to go inside another array
blob = new Blob([buffer], { type: 'application/pdf' });
// finally, download it
a = document.createElement('a');
a.href = window.URL.createObjectURL(blob);
a.download = `${path}.pdf`;
a.click();
I found this project similar than yours and I found a difference, in the buffer return you need to specify the headers and the HTTP code, the client browser possibly is misinterpreting the object from the server.
This is the return segment of the other project
const pdfBuffer = await page.pdf({ printBackground: true });
res.set("Content-Type", "application/pdf");
res.status(200).send(pdfBuffer);
Everything in your code looks right and if transforming the PDF from dec to hex return the same string, I think that could be a good workarround.
Let's say I have a canvas in an HTML file (nothing special). And I'm using Puppeteer to draw something on the canvas in the browser context and then pass it to NodeJs context where I save it as a PNG file using OpenCV.
Since binary data cannot be transferred between the two contexts (the browser and Node), I have to convert the binary data to a string and reconstruct the binary data in Node. The following code does exactly that:
const puppeteer = require('puppeteer');
const cv = require('opencv4nodejs');
const pup = puppeteer.launch({
headless: false,
args: [
'--use-gl=swiftshader',
'--no-sandbox',
'--enable-surface-synchronization',
'--disable-web-security',
]
}).then(async browser => {
const page = (await browser.pages())[0];
page.on('console', msg => console.log(msg.text()));
page.on("pageerror", function(err) {
const theTempValue = err.toString();
console.log("Page error: " + theTempValue);
});
page.on("error", function (err) {
const theTempValue = err.toString();
console.log("Error: " + theTempValue);
});
await page.exposeFunction('_setupCompleted', async () => {
await page.evaluate(async () => {
const c = document.getElementById("myCanvas");
const dataUrl = c.toDataURL("image/png");
const txt = await new Promise(resolve => c.toBlob((blb) => blb.text().then(resolve), "image/png"));
await window._saveMat(dataUrl, txt);
});
await browser.close();
});
await page.exposeFunction('_saveMat', async (dataUrl, txt) => {
dataUrl = dataUrl.replace(/^data:image\/png;base64,/, "");
const dataUrlBuffer = Buffer.from(dataUrl, 'base64');
const dataUrlMat = cv.imdecode(dataUrlBuffer, -1);
cv.imwrite(`./dataUrl.png`, dataUrlMat);
const txtBuffer = Buffer.from(txt, 'utf8');
const txtMat = cv.imdecode(txtBuffer, -1);
cv.imwrite(`./txtMat.png`, txtMat);
});
await page.goto('http://127.0.0.1:8887/index.html', {
waitUntil: 'networkidle2',
});
});
I the given code, I transfer the PNG in two ways. First, I use the toDataURL method of the canvas. This method encodes the binary data of the canvas using base64 and prepends some signature to it as well. Once received, the code in Node will strip the signature and decodes the base64. This part of the code works perfectly and it saves the PNG just fine.
The other approach uses the toBlob method of the canvas. This method returns a Blob which I'll call its text method to convert it to a string. On the Node side, I'll try to reconstruct a buffer out of the transferred string. This part is failing. Apparently, the reconstructed buffer is not a valid PNG and the imdecode method fails to create a mat object based on it.
My question is, how can I reconstruct the same buffer as I get from toDataURL using the text I get from the toBlob method in NodeJs?
TL;DR
I'm trying to fetch and image, convert it to base64, and put the data url into an img's src attribute, but it's not working:
async function ajax(id) {
const tag = document.getElementById(id);
const path = tag.getAttribute("data-src");
const response = await fetch(path);
const blob = await response.blob();
const base64 = window.btoa(blob);
const content = `data:image/jpeg;base64,${base64}`;
tag.setAttribute("src", content);
}
The details, as well as some other methods, which do work follow.
I have been experimenting with different ways to lazy load:
$ mkdir lazy
$ cd lazy
$ wget https://upload.wikimedia.org/wikipedia/commons/7/7a/Lone_Ranger_and_Silver_1956.jpg # any other example image
now create a file called index.html with this in it:
<script>
// this works
function setAttribute(id) {
const tag = document.getElementById(id);
const path = tag.getAttribute("data-src");
tag.setAttribute("src", path);
}
// this doesn't work for some reason
async function ajax(id) {
const tag = document.getElementById(id);
const path = tag.getAttribute("data-src");
const response = await fetch(path);
const blob = await response.blob();
const base64 = window.btoa(blob);
const content = `data:image/jpeg;base64,${base64}`;
tag.setAttribute("src", content);
}
// this works too
async function works(id) {
const tag = document.getElementById(id);
const path = tag.getAttribute("data-src");
const response = await fetch(path);
const blob = await response.blob();
const content = URL.createObjectURL(blob);
tag.setAttribute("src", content);
}
</script>
set attribute<br />
data url<br />
object url<br />
<img id="example" data-src="Lone_Ranger_and_Silver_1956.jpg"></img><br />
and start a server in that folder:
$ python -m SimpleHTTPServer # or whichever local webserver
and then when I look at it in chrome I get this:
The first and third links both work:
However, the middle link does not:
Here is what the three links do to the tag respectively:
works:
<img id="example" data-src="Lone_Ranger_and_Silver_1956.jpg" src="Lone_Ranger_and_Silver_1956.jpg">
does not work:
<img id="example" data-src="Lone_Ranger_and_Silver_1956.jpg" src="data:image/jpeg;base64,W29iamVjdCBCbG9iXQ==">
works:
<img id="example" data-src="Lone_Ranger_and_Silver_1956.jpg" src="blob:http://localhost:8000/736a9e18-c30d-4e39-ac2e-b5246105c178">
That data url in the non working example also looks too short. So what am I doing wrong?
Thanks for the suggestion #dolpsdw. window.btoa doesn't do what I thought it would. If anybody is trying to do the same thing, instructions for reading a blob into a data url are here: https://stackoverflow.com/a/18650249/5203563
I have created this wrapper that fits right into my program as follows:
(it even adds in the data:image/jpeg;base64, part for you and works out the mime type from the blob)
function readBlob(b) {
return new Promise(function(resolve, reject) {
const reader = new FileReader();
reader.onloadend = function() {
resolve(reader.result);
};
// TODO: hook up reject to reader.onerror somehow and try it
reader.readAsDataURL(b);
});
}
async function ajax(id) {
const tag = document.getElementById(id);
const path = tag.getAttribute("data-src");
const response = await fetch(path);
const blob = await response.blob();
// const base64 = window.btoa(blob);
// const content = `data:image/jpeg;base64,${base64}`;
const content = await readBlob(blob);
tag.setAttribute("src", content);
}
this gives me the much longer data url that I expected:
When you have the inmemory blob
Just generate a url for that blob
var url = urlCreator.createObjectURL(blob)
Then create a new IMG with JavaScript and invoke decode method
const img = new Image();
img.src = url;
img.decode()
.then(() => {
document.body.appendChild(img);
})
.catch((encodingError) => {
// Do something with the error.
})
Maybe you want also to revoke URL after load with
URL.revokeObjectURL(objectURL)
About why the window.btoa does not work, its because is for string to base64 only.
Read about blob to base64 conversión here.
But is a more elegant solution createObjectURL.
so I'm playing around with request and cheerio npm's and I can't seem to find a solution, why does it keep giving me empty arrays. I used same code when I scraped reddit and it worked like a charm, but when I use it on YouTube or any other page it doesn't work.
var request = require('request'),
cheerio = require('cheerio'),
fs = require('fs'),
urls = [];
request('https://www.youtube.com/', function(err,resp,body) {
if(!err && resp.statusCode == 200) {
var $ = cheerio.load(body);
$('a.yt-simple-endpoint style-scope ytd-grid-video-renderer', 'primary').each(function() {
var url = $(this);
urls.push(url);
});
And this is my reddit code (works fine)
var request = require('request'),
cheerio = require('cheerio'),
fs = require('fs'),
urls = [];
request('http://www.reddit.com/', function(err,resp,body) {
if(!err && resp.statusCode == 200) {
var $ = cheerio.load(body);
$('a.title', '#siteTable').each(function() {
var url = $(this).attr('href');
if(url.indexOf('imgur.com')!= -1) {
urls.push(url);
}
});
Output Example: [ 'http://i.imgur.com/WVrmZ9j.gifv',
'http://i.imgur.com/T0BchYC.gifv',
'http://imgur.com/u59lzux' ]
The HTML that cheerio loads for youtube is different.
Do res.send($.html()); to check the HTML structure and target it accordingly.
If you need to get some information from YouTube, you can't use Cheerio. Instead, you need to use browser automation, e.g. Puppeteer because YouTube dynamically loads content on the page via JavaScript. In the code below I show you how you can do this (also check it on the online IDE):
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
const mainPageUrl = "https://www.youtube.com";
async function getUrls() {
const browser = await puppeteer.launch({
headless: false,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(60000);
await page.goto(mainPageUrl);
await page.waitForSelector("#contents");
const urls = await page.$$eval("a#thumbnail", (els) => {
return els.map(el => el.getAttribute('href') ? "https://www.youtube.com" + el.getAttribute('href') : undefined).filter((el) => el)
});
await browser.close();
return urls;
}
getUrls().then(console.log);
Output
[
"https://www.youtube.com/watch?v=0rliFQ0qyAM",
"https://www.youtube.com/watch?v=36YnV9STBqc",
"https://www.youtube.com/watch?v=w1_hYe3hhjE",
"https://www.youtube.com/watch?v=Uv6iK2kS3qQ",
"https://www.youtube.com/watch?v=hUGWLAFqYUc",
"https://www.youtube.com/watch?v=17TYygDfr28",
"https://www.youtube.com/watch?v=2isYuQZMbdU",
...and other results
]
You can read more about scraping YouTube from my blog posts:
Web scraping YouTube search video results with Nodejs
Web scraping YouTube secondary search results with Nodejs
Web scraping YouTube video page with Nodejs