parse pdf document javascript

parse pdf document javascript - javascript

I have a pdf document embedded inside a webpage in ASP.net and want to get a specific field inside the pdf document using Javascript...plain Javascript...

JavaScript in a PDF can call JS in a web page and visa versa, if BOTH are set up for it. You can see Acrobat's documentation here.
Check out the HostContainer specification, starting on page 486. In the PDF you'd need script something like:
var document = this; // hurray for closures.
this.hostContainer.messageHandler = { onDisclose: function() {return true;},
onMessage: function(msgArrayIgnored) {
// build a JSON string of field/value pairs
var outgoingMessage = "{ ";
for (var i = 0; i < this.numFields; ++i) {
var fldName = document.getNthFieldName(i);
var fld = document.getField(fld);
var val = fld.value;
// you'll probably need to escape 'val' to be legal JSON
outgoingMessage += fldName + ": \"" + val + "\";
// stick in a comma unless this is the last field
if (i != this.numFields-1) {
outgoingMessage += ", ";
}
}
outgoingMessage += "};";
this.hostContainer.postMessage( [outgoingMessage] );
};
In the HTML, you need to set up something similar. Lets assume your pdf is embedded in an object tag, and that element's id is "pdfElem". Your HTML script might look something like:
var pdf = document.getElementById("pdfElem");
pdf.messageHandler = function(message) {
var fldValPairs = eval(message);
doStuffWithFieldInfo(fldValPairs);
};
Later, any time you want to inspect the PDF's field info you post a message, and the PDF will call back to pdf.messageHandler with its JSON string wrapped in an array:
pdf.postMessage(["this string is ignored"]);
There's probably a bug or two lurking in there somewhere, but this will put you on the right track.

Webpage JavaScript will not be able to interact with the PDF form fields. You can however make a PDF form post to a web page form processor and then obtain the values in the form fields.

Related

Manipulate C# / UWP from HTML / JS

I just managed to implement a small webserver on my Raspberry Pi.
The webserver is created as an UWP headless app.
It can use Javascript. Which is pretty helpful.
I only just start with HTML and JS so I'm a big noob in this and need some help.
I already managed to show the same data I show on the webpage in a headed app on the same device.
Now I want to be able to manipulate the data from the webpage.
But I don't know how I'm supposed to do that.
I parse the HTML / JS as a complete string so I can't use variables I defined in code. I would need another way to do this.
My code for the webserver is currently this:
public sealed class StartupTask : IBackgroundTask
{
private static BackgroundTaskDeferral _deferral = null;
public async void Run(IBackgroundTaskInstance taskInstance)
{
_deferral = taskInstance.GetDeferral();
var webServer = new MyWebServer();
await ThreadPool.RunAsync(workItem => { webServer.Start(); });
}
}
class MyWebServer
{
private const uint BufferSize = 8192;
public async void Start()
{
var listener = new StreamSocketListener();
await listener.BindServiceNameAsync("8081");
listener.ConnectionReceived += async (sender, args) =>
{
var request = new StringBuilder();
using (var input = args.Socket.InputStream)
{
var data = new byte[BufferSize];
IBuffer buffer = data.AsBuffer();
var dataRead = BufferSize;
while (dataRead == BufferSize)
{
await input.ReadAsync(buffer, BufferSize, InputStreamOptions.Partial);
request.Append(Encoding.UTF8.GetString(data, 0, data.Length));
dataRead = buffer.Length;
}
}
string query = GetQuery(request);
using (var output = args.Socket.OutputStream)
{
using (var response = output.AsStreamForWrite())
{
string htmlContent = "<html>";
htmlContent += "<head>";
htmlContent += "<script>";
htmlContent += "function myFunction() {document.getElementById('demo').innerHTML = 'Paragraph changed.'}";
htmlContent += "</script>";
htmlContent += "<body>";
htmlContent += "<h2>JavaScript in Head</h2>";
htmlContent += "<p id='demo'>A paragraph.</p>";
htmlContent += "<button type='button' onclick='myFunction()'>Try it!</button>";
htmlContent += "</body>";
htmlContent += "</html>";
var html = Encoding.UTF8.GetBytes(htmlContent);
using (var bodyStream = new MemoryStream(html))
{
var header =
$"HTTP/1.1 200 OK\r\nContent-Length: {bodyStream.Length}\r\nConnection: close\r\n\r\n";
var headerArray = Encoding.UTF8.GetBytes(header);
await response.WriteAsync(headerArray, 0, headerArray.Length);
await bodyStream.CopyToAsync(response);
await response.FlushAsync();
}
}
}
};
}
public static string GetQuery(StringBuilder request)
{
var requestLines = request.ToString().Split(' ');
var url = requestLines.Length > 1
? requestLines[1]
: string.Empty;
var uri = new Uri("http://localhost" + url);
var query = uri.Query;
return query;
}
}

Your question is a bit vague, so I have to guess what you're trying to do. Do you mean that a browser (or another app with a Web view) will connect to your Pi server, grab some data off it, and then manipulate the data to format them / display them in a particular way on the page? If so, then first you need to decide how you get the data. You seem to imply the data will just be a stream of HTML, though it's not clear how you'll be passing that string to the browser. Traditional ways of grabbing the data might be with Ajax and possibly JSON, but it's also possible to use an old-fashioned iframe (maybe a hidden one) -- though if starting from scratch, Ajax would be better.
The basic issue is to know: what page will access the data on the server and in what format? Is it a local page served locally from the client app's filestore, that will then launch a connection to the server, grab the data and display them in a <div> or and <iframe>, or is it a page on your server that comes with the data incorporated in one part of the DOM, and you want to transform them and display them in another element?
Let's now assume your client app has received the data in an element like <div id="myData">data</div>. A script on the client page can grab those data as a string with document.getElementById('myData').innerHTML(see getElementById). You can then transform the data as necessary with JavaScript methods. Then there are various DOM techniques for inserting the transformed data either back in the same element or a different one.
Instead, let's assume you have received the data via XMLHttpRequest. Then you'll need to identify just the data you want from the received object (that might involve turning the object into a string and using a regular expression, or more likely, use DOM selection methods on the object till you have the part of the data you want). When you've extracted the data / node / element, you can insert it into a <div> on your page as above.
Sorry if this is all a bit vague and abstract, but hopefully it can point you in the right direction to look up further things as needed. https://www.w3schools.com/ is a great resource for beginners.

Remove script tags only from variable containing entire web page

I have a webpage that gets most of its content via API calls to a cloud database solution. The HTML page is fairly barebones but gets much more data injected through a number of JS/JQuery commands, etc.
The resulting page represents a "Quote" which I'd like to save back into the cloud database for reference purposes.
I can get the current state of the page and store it in a variable by using the following command:
var AVMI_thisPage = document.getElementsByTagName('html')[0].outerHTML;
I now need to remove any <script> tags from the variable so that any reimport of the HTML back to the cloud database doesn't contain any JS that is likely to mess with the page again when someone opens it for reference.
I should be able to push the string back to the database but I need to get rid of any <script>.
I've tried JQuery but this seems to kill the HTML, HEAD, and BODY tags.
To be honest, I wasn't expecting the code below to work anyway... but tried it.
E.g.
var AVMI_thisPage = document.getElementsByTagName('html')[0].outerHTML;
var AVMI_tree = $("<div>" + AVMI_thisPage + "</div>");
AVMI_tree.find('script').remove();
AVMI_thisPage = AVMI_tree.html();
Any ideas?
UPDATED - FINAL CODE (including BASE64 encoding and upload)
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
var htmlPage = $("html");
$("script", htmlPage).remove();
AVMI_thisPage = htmlPage.html();
AVMI_thisPageB64 = b64EncodeUnicode(AVMI_thisPage);
var req = "";
req += "<qdbapi>";
req += "<rid>" + AVMI_quoteRID + "</rid>";
req += "<field fid='171' filename='Hardcopy of Quote.html'>"+ AVMI_thisPageB64 + "</field>";
req += "</qdbapi>";
$.ajax({
type: "POST",
contentType: "text/xml",
dataType: "xml",
processData: false,
url: "https://xxxx.xxxxxxxx.com/db/" + AVMI_Q_DBID + "?act=API_UploadFile",
data: req
})
.then(function() {
alert("A copy of this quote has been saved into the 'Hardcopy Attachment' field.");
window.close();
});

You can do:
$("script", AVMI_tree).remove();
But mind that you're getting the OuterHTML of documentElement, that includes Head and BODY, and putting them into a DIV, which is illegal.
You could do:
var htmlPage = $("html");
$("script", htmlPage).remove();
AVMI_thisPage = htmlPage.html();
Mind that it doesn't matter that you're actually removing the SCRIPTS fro the HTML page rather than from a copied DOM, because once a loaded script has been processed and loaded by the JVM, it doesn't matter if you remove it from the DOM: The script will be loaded and active.

I am not going to question the reason you do the 'state' saving like this, however here's how you can achieve what you want:
var regex = new RegExp('<script(.|\n)*</script>', 'g');
var noScript = AVMI_thisPage.replace(regex, '');
You can run it in your console in this page and print the noScript to see for yourself.
The regex selects all script tags that contain any character or newline in the whole stringified page and then we replace them with nothing, only doing string operations. I suspect this must be faster than doing DOM operations, let alone doing them with jQuery.

How to read an attachment content into an array or String by clicking a button on XPage?

I have an XPage with an File Upload/Download control that shows my attachments. I need to read a content of first file attachment (name not known/random) into a string var or array by clicking a button.
I am not sure if XMLHttpRequests() can work on XPage or if there is an standard XPages control to do that?
I do need just to read content. (Users don't need to interact with attachment directly (select/save/other UI actions)).

You need to clarify what "first" means: oldest, attached first, first in alphabet? Domino doesn't guarantee a sequence. You can use #AttachmentNames in an evaluate statement. You then get use that name to directly access that attachment from your browser using a rest call using this syntax:
http(s)://[yourserver]/[application.nsf]/[viewname|0]/[UNID| ViewKey]/$File/[AttachmentName]?Open
More details are in this blog entry.
If you want to handle that on the server side then you use document.getAttachment().

Working example:
importPackage(java.net);
importPackage(java.io);
var valString:String = "";
var nrt:NotesRichTextItem=document1.getDocument().getFirstItem('Body');
if (nrt!=null){
var eos:java.util.Vector = nrt.getEmbeddedObjects();
if (!eos.isEmpty()) {
var eo:NotesEmbeddedObject = eos.get(0);
var inputReader:BufferedReader = new BufferedReader(new InputStreamReader(eo.getInputStream(), "UTF-16"));
while ((inputLine = inputReader.readLine()) != null) {
valString+=inputLine + "<br>";
}
if (inputReader != null){inputReader.close();}
eo.recycle();
}
}
return valString;

Is it possible to access java script data from the code behind in a different page?

I am trying to export some data that I have in some Backbone collections to a csv file.
So far I am opening a new export page using java script like so
var href = [];
href.push('ExportAnalysis.aspx?');
href.push('ParamSet=' + this.document.analysisParameterSetView.selectedParamSet + '&');
href.push('Start=' + start
Date + '&');
href.push('Finish=' + endDate + '&');
frames["exportIFrame"].location.href = href.join('');
And then in the code behind of exportAalysis.aspx, i am grabing the variables from the query string getting the data, building up the csv file and return the file like so.
// Get the export parmaters from the query string
var paramSet = Request["ParamSet"];
var startUnix = int.Parse(Request["Start"]);
var finishUnix = int.Parse(Request["Finish"]);
var start = DateTime.Parse("1970-01-01").AddSeconds(startUnix);
var finish = DateTime.Parse("1970-01-01").AddSeconds(finishUnix);
// GET DATA using Parameters
var filename = "analysisExport";
var content = "1,2";
Response.Clear();
Response.ContentType = "application/x-unknown";
Response.AddHeader("Content-Disposition", "attachment;filename=" + filename);
Response.Write(content);
Response.End();
}
This works OK, but it seems a little inefficient, as I am having to get the data I need twice. Once for the main page and again for the export page.
Its a bit of a long shot But is it possible to get the data from the first page from the code behind of the export page? If it was all client side I could use window.opener.document to get the opener page, Can I do something similar in asp.net
Or am I completely off track, and there is a much better way to achieve this.

This only works if the protocol and domain match between the iframe and the main window.
All code is javascript
Iframe to the parent:
var pDoc = window.parent.document;
var pWin = window.parent.window;
Document to iframe:
var cDoc = document.getElementById("exportIFrame").contentDocument;
var cWin = document.getElementById("exportIFrame").contentWindow;
To call scripts on a parent:
pWin.yourFunction("parameter");
To call scripts in an iframe:
cWin.yourFunction("parameter");

Open XML content in another window, using JavaScript

I understand I cannot save XML content to a local file, because of security restrictions. but is there a way I can show the XML content in another browser window, as
Window.Open(xmlString, . .. );
that would work the same as -
Window.Open(URL, . . .);
I cannot use server-side language.
I can use javaScript \ jQuery. (I already use them to create the XML)
I can have a template XML file, near my HTML. Is there a way to display the template file and change its content ? almost the same as window.open: is it possible open a new window with modify its DOM or How to write JavaScript to a separate window? but I need to change XML nodes, and not HTML.
EDIT 1: try using myXmlWindow.document.write(xmlString)
=> I tried the suggested code -
var xmlString = xml2Str(xmlDocument);
myXmlWindow = window.open();
myXmlWindow.document.write(xmlString);
myXmlWindow.focus();
but it does not display the whole XML content, just the intern node values. and the new window still display "Connecting..." as it did not finish loading the content (missing close tag ???)
maybe I need to tell it is XML content and not HTML ???
my xmlString :
<root><device1>Name</device1><device2/><device3><Temprature_1>23.5</Temprature_1><Temprature_2>23.4</Temprature_2><Temprature_3>23.4</Temprature_3><Temprature_4>23.3</Temprature_4><Temprature_5>23.2</Temprature_5></device3></root>
the displayed content:
Name23.523.423.423.323.2
EDIT 2: my code -
function xml2Str(xmlNode) {
try {
// Gecko- and Webkit-based browsers (Firefox, Chrome), Opera.
return (new XMLSerializer()).serializeToString(xmlNode);
}
catch (e) {
try {
// Internet Explorer.
return xmlNode.xml;
}
catch (e) {
//Other browsers without XML Serializer
// alert('Xmlserializer not supported');
return('Xmlserializer not supported');
}
}
return false;
}
function fShow_xml_in_win() {
var xmlDocument = $.parseXML("<root/>");
var dev1 = xmlDocument.createElement('device1');
var dev2 = xmlDocument.createElement('device2');
var dev3 = xmlDocument.createElement('device3');
dev1.appendChild(xmlDocument.createTextNode('Name'));
xmlDocument.documentElement.appendChild(dev1);
xmlDocument.documentElement.appendChild(dev2);
xmlDocument.documentElement.appendChild(dev3);
var i;
var xNode;
for (i = 0; i < 5; i++) {
xNode = xmlDocument.createElement('Temprature_' + (i+1));
xNode.appendChild(xmlDocument.createTextNode( "myVal " + ((i+1) * 10) ));
dev3.appendChild(xNode);
}
var xmlString = xml2Str(xmlDocument);
alert(xmlString);
xmlString = "<?xml version='1.0' ?>" + xmlString; // I do not know how to add this node using parseXML :(
alert(xmlString);
myXmlWindow = window.open();
myXmlWindow.document.write(xmlString);
myXmlWindow.document.close(); // !! EDIT 3
myXmlWindow.focus();
return false;
}
EDIT 3: solved the "connecting..." problem
I just needed to add myXmlWindow.document.close();

You can open a blank window and then write content to it as follows:
myWindow=window.open('','','width=200,height=100')
myWindow.document.write(xmlString);
myWindow.focus()
You may need to do some work to format your xmlString, but I think this approach will do what you want. If your xmlString is formatted, try adding:
<?xml version="1.0" ?>
to the start of your string.

My understanding from your post, are
1.(From your firts point)
you get xml from somewhere which is not your control. My suggestion is why don't you get as JSON?
2.(From your second point)
If those XML is created by you means, Why aren't you try to write those XML from reference?
For example:
var reference = window.open();
reference.document.write(<some string goes here>)
3.(From your third point)
As per understanding from your second point. You can create xml. So why are you changing after write the document?
Note: Generally XML is used for Server-to-server communication, JSON is used for Server-to-client(browser) communication.

Develop Reference

JavaScript is the programming language of the Web.