How would I make a translator that translates some text that has been inputted by the user?
After they inputted their text that they want to be translated it would be stored in a variable. Then on a text file (english.txt), it would find the English version of what they had inputted.
1. Hello
2. There
It would then tell the program what line the English word is on and then open another text file (language.txt) with the translations.
1. Test
2. Pending
Using that line then it would replace the word with the corresponding line
So, for example, the input is Hello There then it would be translated to Test Pending
How would I go about doing this in HTML/Javascript?
** The files would be somewhere to 50,000 lines
You're asking a very open question and as a result, there are a million ways of tackling this problem. However, I'd say that generally, you'd want to store these translations in a database and query them through a Web Api.
You should also realise, that a word by word translation will not give you very good translations, since you're not taking context or synonyms into account.
For example: "bat" could refer to an animal, a wooden implement with a handle and a solid surface, or the action of batting away something.
But if you do want to implement this and want to use as little different technologies as possible (and rely on static files), then you could do something as follows...
Convert your two input files to a single JSON file, with a format like:
{
"Hello": "Test",
"There": "Pending",
}
When the user enters a translation, split the input text by space using string.split(" "), which will return an array of keys.
Next step is to find the keys inside your json using json[key];
Related
I have 2 files (word & pdf) and need to compare them. word doc will be template which defines how pdf should be generated. Below are the samples.
Word doc:
<firstname>,<LastName>
<ID>,<organization>
<salary>,<place>
Dear <firstname>,
you are working in the department of <organization> and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
Actual PDF:
John, Kennedy
234,google
USD1245,CA
Dear John,
you are working in the department of google and we are really honored to have you here. Expecting many more successful years of service from you.
Thanks,
Can someone help with the comparison logic to validate both the static and dynamic content are getting generated as expected??
we are using TestComplete with JavaScript for the automation.
If you know what is in the template, you could start with some regex splitting first few lines where are all variables. Then read the template, replace every variable with values got from pdf beggining and if the replaced text matches the text in pdf, then you have pdf built from template.
Regex splitting example: read word file, split every line, get the first 3 lines, split them with ",", and set the vars to coresponding indexes.
As I said, this works only if you know the content of the template
I regularly receive emails from the same person, each containing one or more unique identifying codes. I need to get those codes.
The email body contains a host of inconsistent email content, but it is the strings I am interested in. They look like...
loYm9vYzE6Z-aaj5lL_Og539wFer0KfD
FuZTFvYzE68y8-t4UgBT9npHLTGmVAor
JpZDRwYzE6dgyo1legz9sqpVy_F21nx8
ZzZ3RwYzE63P3UwX2ANPI-c4PMo7bFmj
What the strings seem to have in common is, they are all 32 characters in length and all composed of a mixture of both uppercase, lowercase, numbers and symbols. But a given email may contain none, one or multiple, and the strings will be in an unpredictable position, not on adjacent lines as above.
I wish to make a Zap workflow in Zapier, the linking tool for web services, to find these strings and use them in another app - ie. whenever a string is found, create a new Trello card.
I have already started the workflow with Zapier's "Gmail" integration as a "trigger", specifically a search using the "from:" field corresponding to the regular sender. That's the easy part.
But the actual parsing of the email body is foxing me. Zapier has a rudimentary email parser, but it is not suitable for this task. What is suitable is using Zapier's own "Code" integration to execute freeform code - namely, a regular expression to identify those strings.
I have never done this before and am struggling to formulate working code. Zapier Code can take either Python (documentation) or Javascript (documentation). It supports data variables "input_data" (Python) or "inputData" (Javascript) and "output" (both).
See, below, how I insert the Gmail body in to "body" for parsing...
I need to use the Code box to construct a regular expression to find each unique identifier string and output it as input to the next integration in the workflow, ie. Trello.
For info, in the above screengrab, the existing "hello world" code in the box is Zapier's own test code. The fields "id" and "hello" are made available to the next workflow app in the chain.
But I need to do my process for all of the strings found within an email body - ie. if an email contains just one code, create one Trello card; but if an email contains four codes, create a Trello card for each of the four.
That is, there could be multiple outputs. I have no idea how this could work, since I think these workflows are only supposed to accommodate one action.
I could use some help getting over the hill. Thank-you.
David here, from the Zapier Platform team.
I'm glad you're showing interest in the code step. Assuming your assumptions (32 characters exactly) is always going to be true, this should be fairly straightforward.
First off, the regex. We want to look for a character that's a letter, number, or punctuation. Luckily, javascript's \w is equivalent to [A-Z0-9a-z_], which covers the bases in all of your examples besides the -, which we'll include manually. Finally, we want exactly 32 character length strings, so we'll ask for that. We also want to add the global flag, so we find all matches, not just the first. So we have the following:
/[\w-]{32}/g
You've already covered mapping the body in, so that's good. The javascript code will be as follows:
// stores an array of any length (0 or more) with the matches
var matches = inputData.body.match(/[\w-]{32}/g)
// the .map function executes the nameless inner function once for each
// element of the array and returns a new array with the results
// [{str: 'loYm9vYzE6Z-aaj5lL_Og539wFer0KfD'}, ...]
return (matches || []).map(function (m) { return {str: m} })
Here, you'll be taking advantage of an undocumented feature of code steps: when you return an array of objects, subsequent steps are executed once for each object. If you return an empty array (which is what'll happen if no keys are found), the zap halts and nothing else happens. When you're testing, there'll be no indicator that anything besides the first result does anything. Once your zap is on and runs for real though, it'll fan out as described here.
That's all it takes! Hopefully that all makes sense. Let me know if you've got any other questions!
I want to use cmudict file in a web. It contains 170000 words with its phonetic transcription (in ARPAbet symbols).
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
I want to use it in JSON format, search any word introduced by the user and return an explanation of how to pronounce it syllabe by syllabe. The second part is not very complex in search terms as there are only 39 different phonemes, but the first one with the 170000 entries may consume too much time if the user introduces a text instead of a single word to transcript.
I wonder if it's worth to split the JSON into for example 26 parts (one per initial letter) and search only in the corresponding file.
Also I don't know if JSON is the best format for this, but I want to use it in a free blog like Tumblr or Blogger ones (or similar, the thing is that I don't want to spend money in this) and Javascript is what they support. I would listen suggestions on this too.
Well, that is tough call since you must consider download size. I would shorten the names of all your properties to be as small as possible, so instead of repeating "description" : "the short description", I would go with "sd" : "the short description". You are trying to use javascript to serve a data file, which is okay since you can rely on caching and what not, but the initial download size may be rather large. I would do something like var myDictionary = { }; at the top of the file, that way you can reference the variable since it is in the global space. It is an interesting experiment for sure.
Trying to create a search bar for users that can type in a query which will then search through my JSON file to retrieve the proper contents.
Examples:
1- User input: "15 inch touchscreen"
Match with: "15-inch", "15", "Touchmonitor", "1537L", "Stand for 1501L-1601L"
2- User input: "3243 ids"
Match with: "3243L", "IDS"
Basically going for a full blown search function - so obviously speed will be a factor.
Questions:
Is there anyway to handle partial matches like that in Javascript or
jQuery?
Would it be faster to load all products client side on page load and
then search through them later, or search the JSON file at the time
of the query?
Dealing with around a 5000 line JSON file so about 200KB
Hard to say without seeing your code.
The preferred way is to make the heavy calculations and logic on the server side - In case your data and logic is expected to grow in size and complexity.
So I'd definately suggest to look into a product like elastic: https://www.elastic.co/
However, if that file is constant (not expected to grow) you could definately implement that using plain javascript on the client.
I have a web application where a client side editor is editing a really really large text which is known on the server side.
The client can make any kind of modifications to this text.
What is the most network-efficient way to transmit the result difference in a way that the server understands? Also, since this will happen on client side (Javascript), I would also like it to be 'fast' (or at least not noticeably slow)
Some scenarios:
User modifies ONE character
User modifies several sentences in random positions
User erases everything and results in a blank text.
I cannot use diff-like syntax since it's not network efficent, it checks lines, where examples 1 and 3 will produce horrible differences (especially the last one, where the result will be more than the old itself).
Anyone has experience in this matter? User operates on a really large set of data - around 3-5MB of text, and uploading the whole "new" content is a big no-no.
To be clear, I'm looking for a "protocol" of transfer, string comparison is not the issue.
I'm not very familiar with this topic but I can point you to an open source (Apache License 2.0) project which may be very useful.
It is a Diff, Match and Patch library written in several languages, including JavaScript, from a Google engineer and it is used in several online collaborative editing services.
Here are a list of resources:
The Diff, Match and Patch project
The MobWrite project (Editor implementation based on the above project)
"Differential Synchronization" (A Google Tech Talk by the engineer)
A simple approach, assuming that you know the copy on the server isn't going to change, would just be to send a list of edits (deletions and additions), with the deletions represented as a start and end index, and the additions represented as a start index and the text to insert.
If you have more than a simple diff algorithm to work with (I'm not sure exactly what you mean by "string comparison is not the issue"), you could also detect moved or copied chunks of text, and send those as the start and end index of the moved or copied piece of text, as well as the destination to insert it.
Note that you'll need to make sure to keep track of whether your indices refer to the original document, or the document as edited so far. An easy approach to avoid this problem is to always perform the edits from the end of the document towards the beginning; then earlier edits won't affect the offsets specified by later edits.
For an example of an approach like this, see the ed format that diff -e outputs. This is basically input that could be fed into the ed line-oriented text editor. If you want the absolute smallest diffs to send across you may want to do character based indexing rather than line based indexing, but the same basic approach could work.
Any edits the user's performing can be efficiently broken down into: delete from X for length Y; insert at X text "whatever". X and Y are offsets in characters from the start of the text; Y is a number of characters; "whatever" is any string of characters. You say you need no help computing the diff, but an example is here, except it's richer in its output than you need, but does identify "removals and insertions", so, just change the output part.
The exact format in which you send the data to the server can be tuned, but I don't think there's much mileage in doing that -- pending measurement, I'd start by sending the commands as D for delete or I for insert, the numbers in decimal, the inserted string in quoted form. Once you have some statistics on actual transfers being performed, you can see how much overhead is in the numbers (decimal vs binary) and quotes, but I suspect that may not be all that meaningful (if it proves to be, there are all sort of things you can try, such as giving offsets from the latest point of insertion or deletion, rather than always from the start, to make things faster).
You can sample what the user is doing every few seconds, and just send the incremental changes over those last few seconds (if any) -- this way, each packet you're sending will be small, and if the net connection or the user's computer/browser crash, the user won't have lost much work.
You could just send changes every 500ms, so, whatever changes were made in the last 500ms would be sent, but you only send data when there was a change.
In this you could then send the position of the changed word(s) and just send the entire word, but I would have the position be from the front of the text.
It won't be several sentences worth, but there may be several words involved, but, if you send them in order of change then the result should be consistent.
Because there are so many ways to do edits--even within short periods of time like 500ms--including dragging and dropping, or cutting and pasting, large sections of text around within the document or from outside it--I don't know if there's going to be something that will cover all scenarios really well. This is certainly a non-answer to your question at face value, but I would consider carefully the trouble of developing and maintaining something like this compared to changing the interface to restrict the text size and breaking existing texts into smaller pieces.
Maybe that's not possible in your situation, but if it is, I would guess it would be much less trouble in the end to dodge the issue in this way and just send full documents after an edit.