Need to compare contents of word template & actual PDF

Need to compare contents of word template & actual PDF - javascript

I have 2 files (word & pdf) and need to compare them. word doc will be template which defines how pdf should be generated. Below are the samples.
Word doc:
<firstname>,<LastName>
<ID>,<organization>
<salary>,<place>
Dear <firstname>,
you are working in the department of <organization> and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
Actual PDF:
John, Kennedy
234,google
USD1245,CA
Dear John,
you are working in the department of google and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
Can someone help with the comparison logic to validate both the static and dynamic content are getting generated as expected??
we are using TestComplete with JavaScript for the automation.

If you know what is in the template, you could start with some regex splitting first few lines where are all variables. Then read the template, replace every variable with values got from pdf beggining and if the replaced text matches the text in pdf, then you have pdf built from template.
Regex splitting example: read word file, split every line, get the first 3 lines, split them with ",", and set the vars to coresponding indexes.
As I said, this works only if you know the content of the template

Related

HTML/Javascript How to make "translator"

How would I make a translator that translates some text that has been inputted by the user?
After they inputted their text that they want to be translated it would be stored in a variable. Then on a text file (english.txt), it would find the English version of what they had inputted.
1. Hello
2. There
It would then tell the program what line the English word is on and then open another text file (language.txt) with the translations.
1. Test
2. Pending
Using that line then it would replace the word with the corresponding line
So, for example, the input is Hello There then it would be translated to Test Pending
How would I go about doing this in HTML/Javascript?
** The files would be somewhere to 50,000 lines

You're asking a very open question and as a result, there are a million ways of tackling this problem. However, I'd say that generally, you'd want to store these translations in a database and query them through a Web Api.
You should also realise, that a word by word translation will not give you very good translations, since you're not taking context or synonyms into account.
For example: "bat" could refer to an animal, a wooden implement with a handle and a solid surface, or the action of batting away something.
But if you do want to implement this and want to use as little different technologies as possible (and rely on static files), then you could do something as follows...
Convert your two input files to a single JSON file, with a format like:
{
"Hello": "Test",
"There": "Pending",
}
When the user enters a translation, split the input text by space using string.split(" "), which will return an array of keys.
Next step is to find the keys inside your json using json[key];

How to regex Zapier and get output?

I regularly receive emails from the same person, each containing one or more unique identifying codes. I need to get those codes.
The email body contains a host of inconsistent email content, but it is the strings I am interested in. They look like...
loYm9vYzE6Z-aaj5lL_Og539wFer0KfD
FuZTFvYzE68y8-t4UgBT9npHLTGmVAor
JpZDRwYzE6dgyo1legz9sqpVy_F21nx8
ZzZ3RwYzE63P3UwX2ANPI-c4PMo7bFmj
What the strings seem to have in common is, they are all 32 characters in length and all composed of a mixture of both uppercase, lowercase, numbers and symbols. But a given email may contain none, one or multiple, and the strings will be in an unpredictable position, not on adjacent lines as above.
I wish to make a Zap workflow in Zapier, the linking tool for web services, to find these strings and use them in another app - ie. whenever a string is found, create a new Trello card.
I have already started the workflow with Zapier's "Gmail" integration as a "trigger", specifically a search using the "from:" field corresponding to the regular sender. That's the easy part.
But the actual parsing of the email body is foxing me. Zapier has a rudimentary email parser, but it is not suitable for this task. What is suitable is using Zapier's own "Code" integration to execute freeform code - namely, a regular expression to identify those strings.
I have never done this before and am struggling to formulate working code. Zapier Code can take either Python (documentation) or Javascript (documentation). It supports data variables "input_data" (Python) or "inputData" (Javascript) and "output" (both).
See, below, how I insert the Gmail body in to "body" for parsing...
I need to use the Code box to construct a regular expression to find each unique identifier string and output it as input to the next integration in the workflow, ie. Trello.
For info, in the above screengrab, the existing "hello world" code in the box is Zapier's own test code. The fields "id" and "hello" are made available to the next workflow app in the chain.
But I need to do my process for all of the strings found within an email body - ie. if an email contains just one code, create one Trello card; but if an email contains four codes, create a Trello card for each of the four.
That is, there could be multiple outputs. I have no idea how this could work, since I think these workflows are only supposed to accommodate one action.
I could use some help getting over the hill. Thank-you.

David here, from the Zapier Platform team.
I'm glad you're showing interest in the code step. Assuming your assumptions (32 characters exactly) is always going to be true, this should be fairly straightforward.
First off, the regex. We want to look for a character that's a letter, number, or punctuation. Luckily, javascript's \w is equivalent to [A-Z0-9a-z_], which covers the bases in all of your examples besides the -, which we'll include manually. Finally, we want exactly 32 character length strings, so we'll ask for that. We also want to add the global flag, so we find all matches, not just the first. So we have the following:
/[\w-]{32}/g
You've already covered mapping the body in, so that's good. The javascript code will be as follows:
// stores an array of any length (0 or more) with the matches
var matches = inputData.body.match(/[\w-]{32}/g)
// the .map function executes the nameless inner function once for each
// element of the array and returns a new array with the results
// [{str: 'loYm9vYzE6Z-aaj5lL_Og539wFer0KfD'}, ...]
return (matches || []).map(function (m) { return {str: m} })
Here, you'll be taking advantage of an undocumented feature of code steps: when you return an array of objects, subsequent steps are executed once for each object. If you return an empty array (which is what'll happen if no keys are found), the zap halts and nothing else happens. When you're testing, there'll be no indicator that anything besides the first result does anything. Once your zap is on and runs for real though, it'll fan out as described here.
That's all it takes! Hopefully that all makes sense. Let me know if you've got any other questions!

How to scrape javascript table in R?

I want to scrape a table from the citibike : https://s3.amazonaws.com/tripdata/index.html
My goal is to get the urls of the zip files all at once, instead of manually type all the dates and downloading one at each time. Since the webpage is updated monthly, every time I run the function, I want be able to get all the up-to-date data files.
I first tried to use Rvest and XML packages and then realized that the webpage contains both the html and a table that's generated by a javascript function. That's where the problem was.
Really appreciate any help and please let me know if I could provide further information.

If I go to https://s3.amazonaws.com/tripdata/ (just the root, no index.html) I get a simple XML file. The relevant element is Key (uppercase K, lowercase e,y) if you want to parse the XML but I would just search the plain text, that is: ignore the XML, treat it like a simple text file, get every string between <Key> and </Key> treat that as the filename that it is and prefix https://s3.amazonaws.com/tripdata/ to get it.
The first entry is all together (170 MB) as it seems, so you might be ok with that alone.

Extract document extensions from clicks

I'm using this technique to extract the click events in my SharePoint site. It uses jquery and a regular expression to capture clicks and report them as events to google analytics.
I'm also just past total newbie with regex -- It is starting to make some sense to me, but I have a lot to learn still. So here goes.
I have a preapproved list of filetypes that I am interested in based on the site listed above.
var filetypes = /\.(zip|pdf|doc.*|xls.*|ppt.*|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)$/i;
But it isn't quite working like I need. With the $ I assume it is trying to match to the end of the line. But often in SharePoint we get links like this:
example.org/sharepoint/_layouts/15/wopiframe.aspx?sourcedoc=/sharepointlibrary/the%20document%20name.docx&action=default&defaultitemopen=1
The two problems I have are, I can't count on the file name being before the query or hash and I can't count on it being at the end. And all the different Microsoft Office extensions.
I found this thread on extracting extensions, but it doesn't seem to work correctly.
I've put together this version
var filetypes = \.(zip|pdf|doc|xls|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)[A-Za-z]*
I changed the office bits from doc.* to just plain doc and added the optional alpha character afterwards. And removed the $ end anchor. It seems to be working with my test sample, but I don't know if there are gotchas that I don't understand.
Does this seem like a good solution or is there a better way to get a predetermined list of extensions (including for example the Office varions like doc, docx, docm) that is either before the query string or might be one parameter in the query string?

I would go with the following which matches file name and extension:
/[^/]+\.(zip|pdf|doc[xm]?|xlsx?|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)/i
Outputs the%20document%20name.docx from you example.
There may be other formats that it might not work on but should get you what you want.

need to write a CSV file through code, but need to know what the client's CSV delimiter is

in my webapplication I have data which I want to offer to the user as CSV file. I used a comma as delimiter, but when I open it it shows all data for each line in one field in Excel, meaning that my computer doesn't use a comma as delimiter. However, when I would change the delimiter to a semicolon or tab, it might not turn out well on other pcs.
So the question is, how do I find out what is the correct delimiter is for the client (in javascript) and use that delimiter to generate the CSV file?
Thanks in advance!

Develop Reference

JavaScript is the programming language of the Web.