I am developing a Mobile Application (primary for iPhone/iOS) with HTML5/JS (with the DevExtreme-Package from DevExpress). Now, I need to display the content of MHT-Files in this Application. For this I am searching for a document viewer which can display MHT-Files.
Optimal would be, if the control cannot only display MHT-Files rather the user can edit the content.
Can someone help me, how can I do that? Perhaps would it be possible to use a web-browser-control to view it?
MHT or Mime HTML is an archive format... so you should be able to convert to HTML then back to .mht...
You should be able to use https://github.com/zsxsoft/mhtml-parser to parse mht...
let parser = require('mhtml-parser');
parser.loadFile(__dirname + "/simple/simple.mht", {
charset: "gbk"
}, function(err, data) {
if (err) throw err;
console.log(data);
});
You should then be able to modify the file contents then you need to convert it back to mht which I'm not sure if above mentioned repo can do, Still you should be able to reverse engineer and create for yourself html to mht converter, when you do consider making it open source, so other people can use it... ;-)
I haven't used this personally, but it's got six stars... and seems to be most popular repo for this work...
Alternatively...
You can search mht on github to find more candidates suitable for the job ;-)
Related
Problem
I would like to know is there any PHP/NodeJS API available to convert editable PDF to non-editable PDF online. We have a client application where we need a scenario where the user downloads the PDF should not able to modify it thought any software (eg. Foxit reader, Adobe)
Basically, we are using PDF-LIB right now and it seems there is no solution for the non-editable pdf API to set access privileges, I have search a lot but does not found any API for that, Am not using the pdf-flatten because we want everything selectable, Appreciate your help.
List of libraries tried and fail to achieve the results
bpampuch/pdfmake issue can't load an existing pdf
PDF-LIB issue can't support permissions
nrhirani/node-qpdf issue File restrictions not working properly
I think flattening the PDF might help you to make it un-editable in case your target is
Just the form fields then you might use this from the PDF-LIB github repo
The entire PDF then, see if pdf-flatten package helps for Node.js
After a lot of research work and tried multiple libraries in PHP/Node. I don't found any library that is mature enough to proceed with that, so I decided to make an API that will build in different technology C# and Java
Solution
we post the PDF URL through API, the API download that file, and apply for multiple permission according to the dataset.
Library
the library we choose is ASPOSE
// These can be true/false
config.IsPrint = true;
// Document is allowed to be changed.
config.IsModify = false;
// Annotation is allowed.
config.IsAnnot = true;
// Form filling is allowed.
config.IsFillForm = true;
// Content extraction is allowed.
config.IsExtract = true;
I want to pull data (maybe scrape) from a web site and save it to an external file.
My first thought was to write a Chrome extension to do that for me, but I could not find how to save to an external file. (I am a newbie with Chrome extensions.) I searched StackOverflow and found answers:
"You can't do that in a Chrome extension.",
"You can do it, but I'm not going to tell you how. ;)"
"Use localStorage"
localStorage does not write to a user external file, and I may need to save many MB of data.
My second thought is to use Electron and write a special-purpose browser for the task. Electron has node built-in, so saving a file is possible.
Before I put time and energy into doing this, has anyone already tried it? Any pitfalls or roadblocks ahead?
I am posting this quick example as answer and follow up to the comments. If you want to test it, you need to npm install request jsdom.
const request = require('request');
const jsdom = require('jsdom');
request(
'https://stackoverflow.com/questions/51896635/how-to-save-scraped-data-from-client-side-browser-to-a-user-file-use-electron?noredirect=1',
(err, result, body) => {
const dom = new jsdom.JSDOM(body);
const comments = dom.window.document.querySelectorAll('.comment-copy');
comments.forEach(comment => console.log(`>>> ${comment.innerHTML}\n`));
}
);
The output must be the actual comments of this very same page.
>>> Regarding extensions, the authoritative source is the documentation: they can download the data to a file in the default downloads directory, optionally showing the Save As dialog where the user can manually choose any directory.
>>> You don't really need a browser for that. A simple script (in any scripting language really) should be good for this task. If you want to perform queries on your file, you can either process it later with a different script or you can use Node.js and do everything in a single script; there are a bunch of libraries that simulate DOM objects for Node. Worst case you could even spin up a headless Chrome from Node to do all DOM related tasks.
>>> The "download" will be text that I create in the browser, possibly from multiple web pages. The documentation suggests that it is only possible to download using a URL, not save something created locally. Or am I wrong?
>>> #ErickRuizdeChavez Yes, Node looks a good way to go, and Electron gives a convenient framework to house it as an app.
>>> In node you can do whatever you want, it is not constrained by the browser sandbox, so you should be able to do whatever you need. Obviously it is not as straightforward as just dropping some javascript on the browser.
Ineed is a good start you can scrap text, images, hyperlinks , tags ...
var ineed = require('ineed');
ineed.collect.images.hyperlinks.scripts.stylesheets.from('http://google.com',
function (err, response, result) {
console.log(result);
});
to write data and save you can use FS
var fs = require('fs');
fs.appendFile('mynewfile1.txt', 'Hello content!', function (err) {
if (err) throw err;
console.log('Saved!');
});
Python novice here.
I am trying to scrape company information from the Dutch Transparency Benchmark website for a number of different companies, but I'm at a loss as to how to make it work. I've tried
pd.read_html(https://www.transparantiebenchmark.nl/en/scores-0#/survey/4/company/793)
and
requests.get("https://www.transparantiebenchmark.nl/en/scores-0#/survey/4/company/793")
and then working from there. However, it seems like the data is dynamically generated/queried, and thus not actually contained in the html source code these methods retrieve.
If I go to my browser's developer tools and copy the "final" html as shown there in the "Elements" tab, the whole information is in there. But as I'd like to repeat the process for several of the companies, is there any way to automate it?
Alternatively, if there's no direct way to obtain the info from the html, there might be a second possibility. The site allows to download the information as an Excel-file for each individual company. Is it possible to somehow automatically "click" the download button and save the file somewhere? Then I might be able to loop over all the companies I need.
Please excuse if this question is poorly worded, and thank you very much in advance
Tusen takk!
Edit: I have also tried it using BeautifulSoup, as #pmkroeker suggested. But I'm not really sore how to make it work so that it first runs all the javascript so the site actually contains the data.
I think you will either want use a library to render the page. This answer seems to apply to python. I will also copy the code from that answer for completeness.
You can pip install selenium from a command line, and then run something like:
from selenium import webdriver
from urllib2 import urlopen
url = 'http://www.google.com'
file_name = 'C:/Users/Desktop/test.txt'
conn = urlopen(url)
data = conn.read()
conn.close()
file = open(file_name,'wt')
file.write(data)
file.close()
browser = webdriver.Firefox()
browser.get('file:///'+file_name)
html = browser.page_source
browser.quit()
I think you could probably skip the file write and just pass it to that browser.get call, but I'll leave that to you to find out.
The other thing you can do is look for the ajax calls in a browser developer tool. i.e. when using chrome the 3 dots -> more tools -> developer tools or press something like F12. Then look at the network tab. There will be various requests. You will want to click one, click the Preview tab, and then go through each until you find a response that looks like json data. You are effectively look for their API calls that they used to get the data to generate things. Once you find one, click the Headers tab and you will see a Request URL.
i.e. this https://sa-tb.nl/api/widget/chart/survey/4/sector/38 has lots of data
The problem here is it may or may not be repeatable (API may change, id's may change). You may have a similar problem with just HTML scraping as the HTML could change just as easily.
I am trying to grab a JSON file from a website (Trello.com). when I navigate to the web address of the JSON file in IE (or any browser) I am presented with the option to save it or to open it.
So I think to myself "this should be quite easy then".
However I've got some limitations on how to implement it.
I need to display data from the JSON on CRM 2013.
The displaying of the data isn't the issue, it's grabbing the JSON file from the website.
An example URL to use would be https://trello.com/1/boards/dgbi8Gng
I've been trying to use Ajax and JSONP but am encountering issues (likely due to my lack of experience with them).
Could anyone help out a frustrated fellow? Maybe some example code that could be implemented in CRM and a quick explanation?
Many Thanks
You need to include your application key in the url e.g.
https://trello.com/1/boards/dgbi8Gng?key=substitutewithyourapplicationkey
Then it's just a standard request (assuming you're using jQuery)
var url = "https://trello.com/1/boards/dgbi8Gng?key=[YOUR KEY]";
$.getJSON( url, function( data ) {
console.log(data);
});
Full details are here:
https://trello.com/docs/gettingstarted/#getting-an-application-key
I have an application that needs to create simple OpenXML documents (in particular PowerPoint presentations) using JavaScript.
Can anyone suggest how to get started on this please (or even if it is possible)? I've used the Microsoft OpenXML SDK for doing something similar using C#, and was wondering whether there were any JavaScript libraries with similar functionality.
Essentially the problem is how to create the individual OpenXML documents that make up an unzipped PowerPoint document, then zip them together to create the PowerPoint (.pptx) file, which someone can then save to their disk.
Any ideas welcome!
Use the OpenXML SDK for Javascript
Obviously, operations such as Zipping/unZipping a document or saving a document cannot be done client-side and with pure javascript.
However, if you want to do such things, I do believe that there are Linux packages out there that accept strings as input and give you a ready to use Office document as output.
If you're not convenient with Linux packages, assuming you want to save this as a word 2007 document:
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="MyHeading1" />
</w:pPr>
<w:r>
<w:t>This is Heading</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
You can build this string, client-side. then send it to server through AJAX and let your server deal with it. specifically I have used these APIs myself multiple times. let PHP handle it. save the result somewhere, or force client's browser to download it (stream results)
USE OPEN XML SDK.
You can run it on node and in 32 seconds it creates 2000 documents. Or you can run it on the browser.