What I want
I want to be able to copy/paste the entire content of a chat to memory
so I can extract included YouTube urls from it.
What I know
As you may know, the group chat(s) run on a separate url and are loaded page by page. Normally you go to the previous page either by simply scrolling upwards, or by clicking on a show previous link (works differently on different devices I think).
Things I tried
Sadly I can't find the urls to either anymore, but ...
Add a script to Chrome console
The point was to add a script that went looking for the show previous link and clicked it.
Add a start=0 parameter to the url
This assumes you can find out the actual url, either manually or through something like Fiddler.
The idea was that you add something like ?start=0 to the url. This would cause the paging to start from the very first record and load all.
Both solutions didn't work.
Possibly this is because Facebook made these options obsolete. It's my impression that Facebook initially provided more dev options than it does now.
My question
What can I do to fully load chat content?
Not really sure what this has to do with C#, but i'll give a C# solution anyways. My solution would be to use something such as HTMLAgilityPack to get the InnerHTML from a page once it's loaded, although this will obviously require some type of authentication, so for this I suggest using something like a WebClient and sending along Auth credentials with whatever it is you're doing, OR just create a method to login, then use the same webclient to access chats via URL, use DownloadString() to get the contents of the page then using HTMLAgilityPack's methods to get the InnerHTML of whatever the chat box is called/indentified as.
Right now this is the nearest thing I can find:
https://www.facebook.com/help/community/question/?id=10200611181580779
There is a way to see your complete chat history on Facebook easily.
By this method you can also see Photos or videos you've shared on
Facebook. Your Wall posts etc. -- 'A copy of what you've shared on
Facebook' Follow these steps:
Go to 'Account Settings'
Click on 'Download a copy of your Facebook data' from bottom of General section
Then click 'Start My Archive' -- It may take a little while for gather your photos, wall posts, messages, and other information.
(Usually 20 to 60 minutes)
Once Archive generated Download it.
Extract and open 'index.html' from downloaded folder
Now you can see 'Messages' on bottom of the page, click it.
Done!
I got a response in my mail way faster than 20 minutes.
You will get a mail with a link to a zip file, containing your archive:
In the html folder you find: messages.htm
For that I can write a script that looks for YouTube URLs in that file.
I have a PHP script that's outputting a CSV file and up until now I've been just using a link and passing parameters that are used to determine the output in the GET data. However recently the size of the data increased and now that code gets Error 414 - Request URI too Large. I tried using a hidden form to do it with POST but it just reloaded the page and didn't supply a prompt to download the file and all of the suggestions I've been able to find online about doing it with AJAX suggest using a link with GET data instead. Does anyone know a workaround that will have the browser still let the user easily download the data?
Presently I'm just setting the href attribute of a <a> tag.
$("#exportCSV").attr('href', "myscript.php/?data=" + exportData);
exportData has become too long for GET data but I want to maintain the behavior where if you click on a link that has say a CSV file being outputted the browser provides a download dialog for the user.
Basically I have a django app that communicates with a chrome extension. I have a bunch of functionality that interfaces with normal HTML pages that's all done by the extension. I want to allow users to have the same functionalities for PDF files. I have a python script which translates the pdf file into an html page.
The problem I run into is when the pdfs are open locally within chrome.
Like the following file:///home/wcr5048/Downloads/sample_pdf.pdf
This is my current solution, it basically gets the html and replaces all the current html, which is just an embedded pdf, and replaces it with the converted pdf(html). But I run into an issue because the "url" isn't really a url, and therefore I can't append html to something that doesn't exist.
function convert_to_html(request) {
console.log('converting to html...');
document.getElementsByTagName('html')[0].innerHTML = request.data;
chrome.runtime.sendMessage({
detail: 'refresh'
});
}
What I don't want to happen is to download a file just like the pdf but one that's been converted into html. I would rather have everything happen automatically.
I only see two possible options:
I create a unique link for the converted pdf file for every user, and then send the raw html string to populate the corresponding view.
I somehow tell the extension to use a popup to cover the entire width of the screen, and then populate it with the data.
Are there any suggested solutions that would be a better fit, and if not, which would be a better solution.
Thanks for viewing
I'm trying to write a Google Chrome extension for showing PDF files. As soon as I detect that browser is redirecting to a URL pointing to a PDF file, I want it to stop loading the default PDF viewer, but start showing my UI instead. The UI will use PDF.JS to render the PDF and jQuery-ui to show some other stuff.
Question: how do I make this? It's very important to block the original PDF viewer, because I don't want to double memory consumption by showing two instance of the document. Therefore, I should somehow navigate the tab to my own view.
As the main author of the PDF.js Chrome extension, I can share some insights about the logic behind building a PDF Viewer extension for Chrome.
How to detect a PDF file?
In a perfect world, every website would serve PDF files with the standard application/pdf MIME-type. Unfortunately, the real world is not perfect, and in practice there are many websites which use an incorrect MIME-type. You will catch the majority of the cases by selecting requests that satisfy any of the following conditions:
The resource is served with the Content-Type: application/pdf response header.
The resource is served with the Content-Type: application/octet-stream response header, and its URL contains ".pdf" (case-insensitive).
Besides that, you also have to detect whether the user wants to view the PDF file or download the PDF file. If you don't care about the distinction, it's easy: Just intercept the request if it matches any of the previous conditions.
Otherwise (and this is the approach I've taken), you need to check whether the Content-Disposition response header exists and its value starts with "attachment".
If you want to support PDF downloads (e.g. via your UI), then you need to add the Content-Disposition: attachment response header. If the header already exists, then you have to replace the existing disposition type (e.g. inline) with "attachment". Don't bother with trying to parse the full header value, just strip the first part up to the first semicolon, then put "attachment" in front of it. (If you really want to parse the header, read RFC 2616 (section 19.5.1) and RFC 6266).
Which Chrome (extension) APIs should I use to intercept PDF files?
The chrome.webRequest API can be used to intercept and redirect requests. With the following logic, you can intercept and redirect PDFs to your custom viewer that requests the PDF file from the given URL.
chrome.webRequest.onHeadersReceived.addListener(function(details) {
if (/* TODO: Detect if it is not a PDF file*/)
return; // Nope, not a PDF file. Ignore this request.
var viewerUrl = chrome.extension.getURL('viewer.html') +
'?file=' + encodeURIComponent(details.url);
return { redirectUrl: viewerUrl };
}, {
urls: ["<all_urls>"],
types: ["main_frame", "sub_frame"]
}, ["responseHeaders", "blocking"]);
(see https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler.js for the actual implementation of the PDF detection using the logic described at the top of this answer)
With the above code, you can intercept any PDF file on http and https URLs.
If you want to view PDF files from the local filesystem and/or ftp, then you need to use the chrome.webRequest.onBeforeRequest event instead of onHeadersReceived. Fortunately, you can assume that if the file ends with ".pdf", then the resource is most likely a PDF file. Users who want to use the extension to view a local PDF file have to explicitly allow this at the extension settings page though.
On Chrome OS, use the chrome.fileBrowserHandler API to register your extension as a PDF Viewer (https://github.com/mozilla/pdf.js/blob/master/extensions/chromium/pdfHandler-vcros.js).
The methods based on the webRequest API only work for PDFs in top-level documents and frames, not for PDFs embedded via <object> and <embed>. Although they are rare, I still wanted to support them, so I came up with an unconventional method to detect and load the PDF viewer in these contexts. The implementation can be viewed at https://github.com/mozilla/pdf.js/pull/4549/files. This method relies on the fact that when an element is put in the document, it eventually have to be rendered. When it is rendered, CSS styles get applied. When I declare an animation for the embed/object elements in the CSS, animation events will be triggered. These events bubble up in the document. I can then add a listener for this event, and replace the content of the object/embed element with an iframe that loads my PDF Viewer.
There are several ways to replace an element or content, but I've used Shadow DOM to change the displayed content without affecting the DOM in the page.
Limitations and notes
The method described here has a few limitations:
The PDF file is requested at least two times from the server: First a usual request to get the headers, which gets aborted when the extension redirects to the PDF Viewer. Then another request to request the actual data.
Consequently, if a file is valid only once, then the PDF cannot be displayed (the first request invalidates the URL and the second request fails).
This method only works for GET requests. There is no public API to directly get response bodies from a request in a Chrome extension (crbug.com/104058).
The method to get PDFs to work for <object> and <embed> elements requires a script to run on every page. I've profiled the code and found that the impact on performance is negligible, but you still need to be careful if you want to change the logic.
(I first tried to use Mutation Observers for detection, which slowed down the page load by 3-20% on huge documents, and caused an additional 1.5 GB peak in memory usage in a complex DOM benchmark).
The method to detect <object> / <embed> tags might still cause any NPAPI/PPAPI-based PDF plugins to load, because it only replaced the <embed>/<object> tag's content when it has already been inserted and rendered. When a tab is inactive, animations are not scheduled, and hence the dispatch of the animation event will significantly be delayed.
Afterword
PDF.js is open-source, you can view the code for the Chrome extension at https://github.com/mozilla/pdf.js/tree/master/extensions/chromium. If you browse the source, you'll notice that the code is a bit more complex than I explained here. That's because extensions cannot redirect requests at the onHeadersReceived event until I implemented it a few months ago (crbug.com/280464, Chrome 35).
And there is also some logic to make the URL in the omnibox look a bit better.
The PDF.js extension continues to evolve, so unless you want to significantly change the UI of the PDF Viewer, I suggest to ask users to install the PDF.js's official PDF Viewer in the Chrome Web Store, and/or open issues on PDF.js's issue tracker for reasonable feature requests.
Custom PDF Viewer
Basically, to accomplish what you want to do you'll need to:
Interject the PDF's URL when it's loaded;
Stop the PDF from loading;
Start your own PDF viewer and load the PDF inside it.
How to
Using the chrome.webRequest API you can easily listen to the web requests made by Chrome, and, more specifically, the ones that are going to load .pdf files. Using the chrome.webRequest.onBeforeRequest event you can listen to all the requests that end with ".pdf" and get the URL of the requested resource.
Create a page, for example display_pdf.html where you will show the PDFs and do whatever you want with them.
In the chrome.webRequest.onBeforeRequest listener, prevent the resource from being loaded returning {redirectUrl: ...} to redirect to your display_pdf.html page.
Pass the PDF's URL to your page. This can be done in several ways, but, for me, the simplest one is to add the encoded PDF URL at the end of your page's url, like an encoded query string, something like display_pdf.html?url=http%3A%2F%2Fwww.example.com%2Fexample.pdf.
Inside the page, get the URL with JavaScript and process and render the PDF with any library you want, like PDF.js.
The code
Following the above steps, your extension will look like this:
<root>/
/background.js
/display_pdf.html
/display_pdf.js
/manifest.json
So, first of all, let's look at the manifest.json file: you will need to declare the permissions for webRequest and webRequestBlocking, so it should look like this:
{
"manifest_version": 2,
"name": "PDF Test",
"version": "0.0.1",
"background": {
"scripts": ["/background.js"]
},
"permissions": ["webRequest", "webRequestBlocking", "<all_urls>"],
}
Then, in your background.js you will listen to the chrome.webRequest.onBeforeRequest event and update the tab which is loading the PDF with the URL of your custom display_pdv.html page, like this:
chrome.webRequest.onBeforeRequest.addListener(function(details) {
var displayURL;
if (/\.pdf$/i.test(details.url)) { // if the resource is a PDF file ends with ".pdf"
displayURL = chrome.runtime.getURL('/display_pdf.html') + '?url=' + encodeURIComponent(details.url);
return {redirectUrl: displayURL};
// stop the request and proceed to your custom display page
}
}, {urls: ['*://*/*.pdf']}, ['blocking']);
And finally, in your display_pdf.js file you will extract the PDF's url from the query string and use it to do whatever you want:
var PDF_URL = decodeURIComponent(location.href.split('?url=')[1]);
// this will be something like http://www.somesite.com/path/to/example.pdf
alert('The PDF url is: ' + PDF_URL);
// do something with the pdf... like processing it with PDF.js
Working Example
A working example of what I said above can be found HERE.
Documentation links
I recommend you to take a look at the official documentation of the above specified APIs, that you can find following these links:
chrome.webRequest API
chrome.webRequest.onBeforeRequest event
chrome.runtime API
chrome.runtime.getURL method
I have already build functionality to generate pdf file for reports that user view.
So what it currently does is when user clicks to the print pdf button it
Get the html content of the div that needs to be printed
Send this content to the controller's method using jquery ajax method with POST
In the controller it wraps the content with html document strings like <html>, <body> etc. plus I add some styles there.
Then this html string is passed to one of the tools I am using that returns me pdf bytes for this string
Then its saves those bytes as pdf file in a folder and returns the path of this file.
Jquery then on success method opens up the window for this file's path.
This all is working fine.
The problem is
It does not immediately opens up the window as it does all the processing and then on success it opens the window
Plus I am wondering if I am doing this all correctly or doing some extra unnecessary steps, from this I mean is there any better way or short way to do this.
Something like after getting content of div make some changes to the string to directly show it in the new window as pdf content etc. to avoid server processing, is that possible?
Till now I have tried to show the content directly with data:application/pdf but that didn't work.
If that is not possible, I am thinking to avoid saving of pdf file but just show the view that will open up as pdf, may be by setting its content-type, is that possible?
What you did is the best approach. (All) browsers don't have the capability to convert html to pdf so you can't just order them to open a page as pdf. You must serve the pdf file from the server. For more control you can serve the file from a script at a specified url and add appropriate headers:
header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="the.pdf"'); // second parameter is the name of the file
Content type means browser will try to open it with a program appropriate for this MIME type.
Content Disposition inline means that browser will try to open it in the browser.
As server is working just display "loading" image to the user. That way the user will know that something is happening and that he needs to wait.