I am searching for a JavaScript library, which can read .doc - and .docx - files. The focus is only on the text content. I am not interested in pictures, formulas or other special structures in MS-Word file.
It would be great if the library works with to JavaScript FileReader as shown in the code below.
function readExcel(currfile) {
var reader = new FileReader();
reader.onload = (function (_file) {
return function (e) {
//here should the magic happen
};
})(currfile);
reader.onabort = function (e) {
alert('File read canceled');
};
reader.readAsBinaryString(currfile);
}
I searched through the internet, but I could not get what I was looking for.
You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :
var zip = new JSZip(content);
var doc=new Docxtemplater().loadZip(zip)
var text= doc.getFullText();
console.log(text);
See the Doc for installation information (I'm the maintainer of this project)
However, it only handles docx, not doc
now you can extract the text content from doc/docx without installing external dependencies.
You can use the node library called any-text
Currently, it supports a number of file extensions like PDF, XLSX, XLS, CSV etc
Usage is very simple:
Install the library as a dependency (/dev-dependency)
npm i -D any-text
Make use of the getText method to read the text content
var reader = require('any-text');
reader.getText(`path-to-file`).then(function (data) {
console.log(data);
});
You can also use the async/await notation
var reader = require('any-text');
const text = await reader.getText(`path-to-file`);
console.log(text);
Sample Test
var reader = require('any-text');
const chai = require('chai');
const expect = chai.expect;
describe('file reader checks', () => {
it('check docx file content', async () => {
expect(
await reader.getText(`${process.cwd()}/test/files/dummy.doc`)
).to.contains('Lorem ipsum');
});
});
I hope it will help!
Related
I am trying to read the content of the XML file. Probably this is basic JS stuff, but I seem can't make it work.
I am using Chrome's experimental Native File System API to read folders in folder:
const opts = {type: 'open-directory'};
handle = await window.chooseFileSystemEntries(opts);
const entries = await handle.getEntries();
...
Then, later on in the code I am entering one of the folders from the main directory and trying to read the file in it. The file system strucure is like this:
Directory > subdirectory > file
and the second part of the code looks like this:
var subdirHandle = await handle.getDirectory(oneOfTheFolders);
var xmlFile = await subdirHandle.getFile('subject.xml');
xmlDoc = domParser.parseFromString(xmlFile, "text/xml");
parsedNumber = document.evaluate(myXpathFce('nodeInXML'), xmlDoc, null, XPathResult.ANY_TYPE, null).iterateNext();
if(parsedNumber.childNodes.length >0){
...
I believe the issue is here var xmlFile = await subdirHandle.getFile('subject.xml'); with the file reading. If I loaded the file straight from the Input and used FileReader(), I was able to get the content and parse it, but with the 'directory' approach I am getting null (for the evaluated document) like this Uncaught (in promise) TypeError: Cannot read property 'childNodes' of null
Edit here is what I get in console for the xmlFile variable. I just need to get the content (XML in text format) from this
I noticed you're saving the File object in the xmlFile variable and passing it directly into the parseFromString method.
You cannot parse a document object from a File object directly. You should first read the string from the File object using a FileReader. You can read the string from a File object with the await keyword using the readFileAsync function below:
function readFileAsync(file) {
return new Promise((resolve, reject) => {
let reader = new FileReader();
reader.onload = () => {
resolve(reader.result);
};
reader.onerror = reject;
reader.readAsText(file);
})
}
var file = await handle.getFile();
var text = await readFileAsync(file);
var xmlDoc = domParser.parseFromString(text, "text/xml");
For obtaining the contents of a FileSystemFileHandle, call getFile(), which returns a File object, which contains a blob. To get the data from the blob, call one of its methods (slice(), stream(), text(), arrayBuffer()).
I have been trying to use sheetJS and follow examples that completely work in jsfiddle, however I cannot get to work when creating a new js file. I have tried multiple browswers, but keep getting the same error "XLSX is not defined"
I have tried this Excel to JSON javascript code? and wanted to ask on there but needed 50 rep to leave a comment.
Here is the code snippet and am including the following files in this order:
shim.js, jszip.js,xlsx.js
var oFileIn;
$(function() {
oFileIn = document.getElementById('xlf');
if(oFileIn.addEventListener) {
console.log("if hit")
oFileIn.addEventListener('change', filePicked, false);
}
$("#xlf").on("change",function(oEvent){
console.log("jqiey workd?")
filePicked(oEvent)
})
});
function filePicked(oEvent) {
// Get The File From The Input
var oFile = oEvent.target.files[0];
var sFilename = oFile.name;
// Create A File Reader HTML5
var reader = new FileReader();
// Ready The Event For When A File Gets Selected
reader.onload = function(e) {
var data = e.target.result;
var cfb = XLSX.read(data, {type: 'binary'});
console.log(cfb)
cfb.SheetNames.forEach(function(sheetName) {
// Obtain The Current Row As CSV
var sCSV = XLS.utils.make_csv(cfb.Sheets[sheetName]);
var oJS = XLS.utils.sheet_to_json(cfb.Sheets[sheetName]);
$("#my_file_output").html(sCSV);
console.log(oJS)
$scope.oJS = oJS
});
};
I have tried numerous examples, this is just the only one I came across that worked on jsfiddle. The same error occurs if it is XLS or XLSX...
In other examples such as the one provided by sheetJS it has
var X = XLSX;
right under the script segment, and will automatically get error that XLSX is not defined on that line.
Anyone come across this, or know what the issue is?
-Thanks!!!
The included files with the project weren't correct. The project had a corrupt js file. I fixed it by manually adding all sheet project download folder and replacing files.
I want to..
.. convert an ICO file (e.g. http://www.google.com/favicon.ico ) to a PNG file after I downloaded it.
.. preserve transparency.
.. apply the solution in a node.js application.
I don't want to and already tried to ..
.. use native tools such as imagemagick (that's what I currently use in my application, but it's really bad for maintaining platform independency).
.. use tools that internally use native tools (e.g. gm.js).
.. rely on webservices such as http://www.google.com/s2/favicons?domain=www.google.de that don't allow configuring the resulting size or require payments or logins.
Therefore I'd love a Javascript-only solution. I used Jimp in another application, but it does not support ICO files.
Any help is appreciated. Thanks!
Use a FileReader() . Convert the Base64 to a data/png. Done.
const inputFile = __dirname + "/favicon.ico";
const outputFile = __dirname + "/favicon.png";
(function( inputFile, outputFile ) {
const fileApi = require("file-api");
const fs = require("fs");
const File = fileApi.File;
var fileReader = new fileApi.FileReader();
fileReader.readAsDataURL(new File(inputFile));
fileReader.addEventListener("load", function (ev) {
var rawdata = ev.target.result;
rawdata = rawdata.replace(/.*base64,/, "");
fs.writeFileSync(outputFile, rawdata, "base64");
});
})(inputFile, outputFile);
I am not familiar with Node environment but I wrote this ES6 module PNG2ICOjs using purely Javascript ArrayBuffer or Blob and can 100% run on client-side browsers (I assume Node file should act like a Blob).
import { PngIcoConverter } from "../src/png2icojs.js";
// ...
const inputs = [...files].map(file => ({
png: file
}));
// Result is a Blob
const resultBlob1 = await converter.convertToBlobAsync(inputs); // Default mime type is image/x-icon
const resultBlob2 = await converter.convertToBlobAsync(inputs, "image/your-own-mime");
// Result is an Uint8Array
const resultArr = await converter.convertAsync(inputs);
I have the following file structure:
test.html
test.json
And the following JS function:
function get_file(){
var app_path = app.activeDocument.path,
file = new File(app_path + '/test.json');
console.log(file);
}
How can I make the function log the file's content?
I'm not sure if everything you can do in the browsers environment translates to everything you can do in photoshops environment. But you should look at a few things.
Doing This in the Browser
The File object.
https://developer.mozilla.org/en-US/docs/Web/API/File
Notable that it extends the Blob object.
https://developer.mozilla.org/en-US/docs/Web/API/Blob
Which if you researched you would find it can be read using the FileReader.
https://developer.mozilla.org/en-US/docs/Web/API/FileReader
So this would work in the browser but may/may-not work in the photoshop scripting set.
function get_file(){
var app_path = app.activeDocument.path,
file = new File(app_path + '/test.json');
var reader = new FileReader();
reader.onloadend = function() {
console.log(reader.result);
}
reader.readAsText(file);
}
This is asynchronous so you may need to use a callback depending on what you're trying to do with this. You won't be able to return the string from inside the reader.onloadend event.
Doing This in Photoshop
Take a look at their scripting references. Specifically the javascript reference.
All Resources: http://www.adobe.com/devnet/photoshop/scripting.html
Javascript PDF: http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/photoshop/pdfs/photoshop-cc-javascript-ref-2015.pdf
It looks like they don't have the FileReader but instead the File object can be used to read content. The File API begins on page 109 but it's empty! The documentation is a bit pathetic so I can see why you'd have trouble finding this. With some searching I found someone doing this in 2012 (but I don't know if it will still work- worth a shot)
var b = new File("c:\test.txt");
b.open('r');
var str = "";
while(!b.eof) {
str += b.readln();
}
b.close();
alert(str);
Let me know if that works.
After the user uploads a zipped file, i want to remove the images folder from it before sending it over the network. I am using kendo for uploading, and the existing functionality works fine. I just want to add on the removing images part. This is what i have so far:
function onSelect(e) {
var file = e.files[0];
if (endsWith(file.name, '.eds')) {
var contents = e.target.result;
var jszip = new JSZip(contents);
jszip.remove("apldbio/sds/images_barcode");
fileToSend = jszip.generate({type: "base64", compression: "DEFLATE"});
}
e.files[0] = fileToSend;
openProgressDialog(e.files.length); //this is existing code, works fine
}
target.result doesn't seem to exist in the event e. And nothing works properly from that point on. e should probably be used inside a FileReader object's onload(), (as seen here and here) but i have no idea how to use a FileReader for my purpose, with kendo Upload.
EDIT:I did some more reading and now i am using FileReader like this:
var reader = new FileReader();
reader.onload = function (e) {
// do the jszip stuff here with e.target.result
};
reader.onerror = function (e) {
console.error(e);
};
reader.readAsArrayBuffer(file);
Note : file = e.files[0] as in the 1st code block.
With this though, i get the error:
Failed to execute 'readAsArrayBuffer' on 'FileReader': parameter 1 is not of type 'Blob'.