I am using extendscript to build some invoices from downloaded plaintext emails (.txt)
At points in the file there are lines of text that look like "Order Number: 123456" and then the line ends. I have a script made from parts I found on this site that finds the end of "Order Number:" in order to get a starting position of a substring. I want to use where the return key was hit to go to the next line as the second index number to finish the substring. To do this, I have another piece of script from the helpful people of this site that makes an array out of the indexes of every instance of a character. I will then use whichever array object is a higher number than the first number for the substring.
It's a bit convoluted, but I'm not great with Javascript yet, and if there is an easier way, I don't know it.
What is the character I need to use to emulate a return key in a txt file in javascript for extendscript for indesign?
Thank you.
I have tried things like \n and \r\n and ^p both with and without quotes around them but none of those seem to show up in the array when I try them.
//Load Email as String
var b = new File("~/Desktop/Test/email.txt");
b.open('r');
var str = "";
while (!b.eof)
str += b.readln();
b.close();
var orderNumberLocation = str.search("Order Number: ") + 14;
var orderNumber = str.substring(orderNumberLocation, ARRAY NUMBER GOES HERE)
var loc = orderNumberLocation.lineNumber
function indexes(source, find) {
var result = [];
for (i = 0; i < source.length; ++i) {
// If you want to search case insensitive use
// if (source.substring(i, i + find.length).toLowerCase() == find) {
if (source.substring(i, i + find.length) == find) {
result.push(i);
}
}
alert(result)
}
indexes(str, NEW PARAGRAPH CHARACTER GOES HERE)
I want all my line breaks to show up as an array of indexes in the variable "result".
Edit: My method of importing stripped all line breaks from the document. Using the code below instead works better. Now \n works.
var file = File("~/Desktop/Test/email.txt", "utf-8");
file.open("r");
var str = file.read();
file.close();
You need to use Regular Expressions. Depending on the fields do you need to search, you'l need to tweek the regular expressions, but I can give you a point. If the fields on the email are separated by new lines, something like that will work:
var str; //your string
var fields = {}
var lookFor = /(Order Number:|Adress:).*?\n/g;
str.replace(lookFor, function(match){
var order = match.split(':');
var field = order[0].replace(/\s/g, '');//remove all spaces
var value = order[1];
fields[field]= value;
})
With (Order Number:|Adress:) you are looking for the fields, you can add more fields separated the by the or character | ,inside the parenthessis. The .*?\n operators matches any character till the first break line appears. The g flag indicates that you want to look for all matches. Then you call str.replace, beacause it allows you to perfom a single task on each match. So, if the separator of the field and the value is a colon ':', then you split the match into an array of two values: ['Order number', 12345], and then, store that matches into an object. That code wil produce:
fields = {
OrderNumber: 12345,
Adresss: "my fake adress 000"
}
Please try \n and \r
Example: indexes(str, "\r");
If i've understood well, wat you need is to str.split():
function indexes(source, find) {
var order;
var result = [];
var orders = source.split('\n'); //returns an array of strings: ["order: 12345", "order:54321", ...]
for (var i = 0, l = orders.length; i < l; i++)
{
order = orders[i];
if (order.match(/find/) != null){
result.push(i)
}
}
return result;
}
Related
I am trying to capture the counts associated with the keywords in the string txt. All the keywords are loaded into an array ahead of time.
This code is in jquery/javascript. I cannot hard code string keywords so that is why they are stored in an array. Please assist me in finding what goes in place of "Reg Expression" before and/or after the keyword variable within the loop.
The html br can be used to end that regexmatch in that iteration of the loop.
Trying to end up with keywordCount = "2, 5, 11"
//String I need to search through
var txt = "Edit Req'd2<br>Errors5<br>Pndg App11<br>";
//array of keywords I can use to find associated counts to keywords
var keyword = ["Edit Req'd", "Errors", "Pndg App"];
//empty string declared before loop
var keywordCount = '';
for (i = 0; i < keyword.length; i++) {
// takes the comma off end of first entry in array
// might not be needed or another method might be better?
keyword[i] = $.trim(keyword[i]);
//regex expression generated using keyword and unknown expression
var regexmatch = RegExp("Reg Expression" + keyword + "Reg Expression")
//use regex expression to generate string containing counts
keywordCount += (txt.match(regexmatch)) + ",";
}
Here is the example which may helps you in achieving your required output.
//String I need to search through
var txt = "Edit Req'd2<br>Errors5<br>Pndg App11<br>";
//array of keywords I can use to find associated counts to keywords
var keyword = ["Edit Req'd", "Errors", "Pndg App"];
//empty string declared before loop
var keywordCount = '';
var i = 0;
var keywordCount = "";
var splittedValue = "";
while (i < keyword.length) {
if (txt.indexOf(keyword[i]/i)) {
splittedValue = txt.split(keyword[i]);
if (keywordCount === "") {
keywordCount = splittedValue[1].split("<br>")[0];
} else {
keywordCount += ", " + splittedValue[1].split("<br>")[0];
}
}
i += 1;
}
console.log(keywordCount);
I would use a digit/numeric range to match the number ([0-9]; this is pretty basic regex stuff), and use a group to match any of the keywords:
Any number of digits: [0-9]+
Any of your keywords: (something|somethingelse|other)
You can use capture-groups as well to have match() return them separately:
var keyword = ["Edit Req'd", "Errors", "Pndg App"];
var regexmatch = RegExp('(\b' + keyword.join('|') + ')([0-9]+)', 'g')
(note that we use the word-boundary \b to make sure keyword is not part of a longer word, and the 'g' flag for global to signal we want multiple results)
Now, match() will only match one result, and we want matches for every keyword, so we’ll use exec() instead:
var txt = "Edit Req'd2<br>Errors5<br>Pndg App11<br>";
var keyword = ["Edit Req'd", "Errors", "Pndg App"];
var regex = RegExp('(\b' + keyword.join('|') + ')([0-9]+)', 'g')
var match = regex.exec( txt ); // ...find the first match
var numbers = [];
while ( match !== null ) {
numbers.push( match[ 2 ]); // #0 of exec() its return-value is the result as a whole: "Edit Req'd2"
// #1 is the value of the first capture-group: "Edit Req'd"
// #2 is the value of the second capture-group: "2"
match = regex.exec( txt ); // ...find the next match
}
console.log("the numbers are:", numbers);
Lastly, do note that regular expressions may look cool, they are not always the fastest (performance-wise). If performance matters a lot, you could use (for example) indexOf() instead.
From your question it seems like you could brush up on your knowledge of regular expressions a little bit. There’s a ton of articles around (just search for “regular expressions basics” or “regex 101”) – like this one:
https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285*
I'd like to know if it is possible to replace every matching pattern in the string with not one but different values each time.
Let's say I found 5 matches in a text and I want to replace first match with a string, second match with another string, third match with another and so on... is it achievable?
var synonyms = ["extremely", "exceedingly", "exceptionally", "especially", "tremendously"];
"I'm very upset, very distress, very agitated, very annoyed and very pissed".replace(/very/g, function() {
//replace 5 matches of the keyword every with 5 synonyms in the array
});
You may try to replace the matches inside a replace callback function:
var synonyms = ["extremely", "exceedingly", "exceptionally", "especially", "tremendously"];
var cnt = 0;
console.log("I'm very upset, very distress, very agitated, very annoyed and very pissed (and very anxious)".replace(/very/g, function($0) {
if (cnt === synonyms.length) cnt = 0;
return synonyms[cnt++]; //replace 5 matches of the keyword every with 5 synonyms in the array
}));
If you have more matches than there are items in the array, the cnt will make sure the array items will be used from the first one again.
A simple recursive approach. Be sure your synonyms array has enough elements to cover all matches in your string.
let synonyms = ["extremely", "exceedingly", "exceptionally"]
let yourString = "I'm very happy, very joyful, and very handsome."
let rex = /very/
function r (s, i) {
let newStr = s.replace(rex, synonyms[i])
if (newStr === s)
return s
return r(newStr, i+1)
}
r(yourString, 0)
I would caution that if your replacement would also match your regex, you need to add an additional check.
function replaceExpressionWithSynonymsInText(text, regX, synonymList) {
var
list = [];
function getSynonym() {
if (list.length <= 0) {
list = Array.from(synonymList);
}
return list.shift();
}
return text.replace(regX, getSynonym);
}
var
synonymList = ["extremely", "exceedingly", "exceptionally", "especially", "tremendously"],
textSource = "I'm very upset, very distress, very agitated, very annoyed and very pissed",
finalText = replaceExpressionWithSynonymsInText(textSource, (/very/g), synonymList);
console.log("synonymList : ", synonymList);
console.log("textSource : ", textSource);
console.log("finalText : ", finalText);
The advantages of the above approach are, firstly one does not alter the list of synonyms,
secondly working internally with an ever new copy of the provided list and shifting it,
makes additional counters obsolete and also provides the opportunity of being able to
shuffle the new copy (once it has been emptied), thus achieving a more random replacement.
Using the example you've provided, here's what I would do.
First I would set up some variables
var text = "I'm very upset, very distress, very agitated, very annoyed and very pissed";
var regex = /very/;
var synonyms = ["extremely", "exceedingly", "exceptionally", "especially", "tremendously"];
Then count the number of matches
var count = text.match(/very/g).length;
Then I would run a loop to replace the matches with the values from the array
for(var x = 0; x < count; x++) {
text = text.replace(regex, synonyms[x]);
}
You can do it with the use of Replace() function, where you use 'g' option for global matching (finds all occurrences of searched expression). For the second argument you can use a function which returns values from your predefined array.
Here is a little fiddle where you can try it out.
var str = "test test test";
var rep = ["one", "two", "three"];
var ix = 0;
var res = str.replace(/test/g, function() {
if (ix == rep.length)
ix = 0;
return rep[ix++];
});
$("#result").text(res);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p id="result">
Result...
</p>
Yes it is achievable. There may be a more efficient answer than this, but the brute force way is to double the length of your regex. i.e. Instead of searching just A, search (/A){optionalText}(/A) and then replace /1 /2 as needed. If you need help with the regex itself, provide some code for what you're searching for and someone with more rep than me can probably comment the actual regexp.
I need some help to improve my code :
I am beginner with regex system.
I would like to fecth NUMBER below in script and store it in a string or an array to moment that output "NUMBER1,NUMBER1_NUMBER2_,NUMBER2" I don't understand why, i would like jsut NUMBER at the end ;
function fetchnumber(){
extract = "";
for(picture = 1 ; picture < 5; picture++){
// get background image as a string as this :
// url('http://www.mywebsite.com/directory/image_NUMBER_.png');
var NumberOfPicture = document.getElementById(picture).style.backgroundImage ;
reg = /\_(.*)\_/;
extract += reg.exec(NumberOfPicture);
}
}
I write this small example for you. Hope this help you.
var inputString = 'http://www.mywebsite.com/directory/image_123_.png';
var imageNumber = (/image_([^_]+)_\.\w{3}$/.exec(inputString) || [,false])[1];
// so into imageNumber variable you will have a 123 or false if there is no mach
if (imageNumber) {
// here you can do something with finded
// part of text.
}
I wish you luck with the implementation.
You asked why there is [1] instead [0]. The explanation is that we need to have
the same behavior when there is no match of regex. This is quote from MDN
The exec() method executes a search for a match in a specified string.
Returns a result array, or null.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
So. If there is match for regular expression the returned array will consist
from matched string located at zero index ([0]) and with first backreference at first index [1] back references are symbols between brackets (sddsd). But if there is no match we will pass [,false] as output so we will expect the result
into first array index [1]. Try the code with different input. For example:
var inputString = 'some text some text ';
var imageNumber = (/image_([^_]+)_\.\w{3}$/.exec(inputString) || [,false])[1];
// so into imageNumber variable you will have a 123 or false if there is no mach
if (imageNumber) {
// here you can do something with finded
// part of text.
}
So .. In this case the condition will not be executed at all. I mean:
if (imageNumber) {
// here you can do something with finded
// part of text.
}
I am working on a tool that would receive text that has been copied from a word document, and return an html output for copy/paste into an email client for email marketing.
During this process, one of the steps the tool needs to handle is the replacement of special characters within the copied values. The output needs to show the encoded values so when they are copied into the email client, they render accordingly during the mail send process
The problem is that there are multiple inputs the user can populate and right now the code is VERY WET... I want to set up the tool to be a little cleaner, and not repeat the code as often.
Currently the input is given to the tool via a prompt();
I am taking that input and replacing the special characters ™, ®, Ø, ´, ”, ‟ and others (partial list given for this example) as needed:
JS (Commented Version)
msg2 = prompt("enter text here");
//long version to tag each replacement with it's identifying name
msg2 = msg2.replace(/[\u0027]/g, '''); // Apostrophe ´
msg2 = msg2.replace(/[\u2122]/g, '™'); // trademark ™
msg2 = msg2.replace(/[\u00AE]/g, '®'); // R-Ball ®
msg2 = msg2.replace(/[\u201c]/g, '"'); // Left Double Quote ‟
msg2 = msg2.replace(/[\u201D]/g, '"'); // Right Double Quote ”
msg2 = msg2.replace(/[\u2018]/g, '''); // Left Single Quote ‛
msg2 = msg2.replace(/[\u2019]/g, '''); // Right Single Quote ’
msg2 = msg2.replace(/[\u2022]/g, 'ߦ') // Bullet •
JS (Short Version)
msg2 = prompt("enter text here");
msg2 = msg2.replace(/[\u0027]/g, ''').replace(/[\u2122]/g,
'™').replace(/[\u00AE]/g, '®').replace(/[\u201c]/g,
'"').replace(/[\u201D]/g, '"').replace(/[\u2018]/g,
''').replace(/[\u2019]/g, ''').replace(/[\u2022]/g,
'ߦ');
BUT... I need to run this same replacement on a number of prompts. I don't want to repeat this in the code a bunch of times with each of the variables changing as needed.
What I would rather do is create a function to handle the replacement, and then simply create an array of the variables and run the function on the array...
Example
function txtEncode () {
...replacment code here...
}
var inputTxt = [msg1, msg2, msg3...];
for (var i=0; i < inputTxt.length; i++){
txtEncode(i)
}
Just make an array with replacement pairs:
var replacements = [ ["&", "&"], ["'", """] etc
and apply them one by one:
replacements.forEach(function(pair) {
msg = msg.split(pair[0]).join(pair[1]);
});
split/join is better to replace literal strings than .replace which is intended for use with regular expressions.
Also, your encoding doesn't look right, ® will be displayed as ®, not as ®
You can use a global find/replace function and extend the string prototype, I have this code in one of my fiddles, but I can't find the origin.
Code:
String.prototype.replaceArray = function(find, replace) {
var replaceString = this;
var regex;
for (var i = 0; i < find.length; i++) {
regex = new RegExp(find[i], "g");
replaceString = replaceString.replace(regex, replace[i]);
}
return replaceString;
};
var msg2 = 'demo \u0027'
var find = ["\u0027"];
var replace = ["'"];
msg2 = msg2.replaceArray(find, replace);
Demo: http://jsfiddle.net/IrvinDominin/YQKwN/
Add a method to the String object for your code.
String.prototype.myCleanString = function(){
return this.replace(/[\u0027]/g, ''')
.replace(/[\u2122]/g,'™')
.replace(/[\u00AE]/g, '®')
.replace(/[\u201c]/g, '"')
.replace(/[\u201D]/g, '"')
.replace(/[\u2018]/g, ''')
.replace(/[\u2019]/g, ''')
.replace(/[\u2022]/g, 'ߦ')
.replace(/foo/g, 'bar');
}
Call as needed... http://jsfiddle.net/XKHNt/
I've got a large array of words in Javascript (~100,000), and I'd like to be able to quickly return a subset of them based on a text pattern.
For example, I'd like to return all the words that begin with a pattern so typing hap should give me ["happy", "happiness", "happening", etc, etc], as a result.
If it's possible I'd like to do this without iterating over the entire array.
Something like this is not working fast enough:
// data contains an array of beginnings of words e.g. 'hap'
$.each(data, function(key, possibleWord) {
found = $.inArray(possibleWord, words);
// do something if found
}
Any ideas on how I could quickly reduce the set to possible matches without iterating over the whole word set? The word array is in alphabetical order if that helps.
If you just want to search for prefixes there are data structures just for that, such as the Trie and Ternary search trees
A quick Google search and some promissing Javascrit Trie and autocomplete implementations show up:
http://ejohn.org/blog/javascript-trie-performance-analysis/
Autocomplete using a trie
http://odhyan.com/blog/2010/11/trie-implementation-in-javascript/
I have absolutely no idea if this is any faster (a jsperf test is probably in order...), but you can do it with one giant string and a RegExp search instead of arrays:
var giantStringOfWords = giantArrayOfWords.join(' ');
function searchForBeginning(beginning, str) {
var pattern = new RegExp('\\b' + str + '\\w*'),
matches = str.match(pattern);
return matches;
}
var hapResults = searchForBeginning('hap', giantStringOfWords);
The best approach is to structure the data better. Make an object with keys like "hap". That member holds an array of words (or word suffixes if you want to save space) or a separated string of words for regexp searching.
This means you will have shorter objects to iterate/search. Another way is to sort the arrays and use a binary search pattern. There's a good conversation about techniques and optimizations here: http://ejohn.org/blog/revised-javascript-dictionary-search/
I suppose that using raw javascript can help a bit, you can do:
var arr = ["happy", "happiness", "nothere", "notHereEither", "happening"], subset = [];
for(var i = 0, len = arr.length; i < len; i ++) {
if(arr[i].search("hap") !== -1) {
subset.push(arr[i]);
}
}
//subset === ["happy", "happiness","happening"]
Also, if the array is ordered you could break early if the first letter is bigger than the first of your search, instead of looping the entire array.
var data = ['foo', 'happy', 'happiness', 'foohap'];
jQuery.each(data, function(i, item) {
if(item.match(/^hap/))
console.log(item)
});
If you have the data in an array, you're going to have to loop through the whole thing.
A really simple optimization is on page load go through your big words array and make a note of what index ranges apply to each starting letter. E.g., in my example below the "a" words go from 0 to 2, "b" words go from 3 to 4, etc. Then when actually doing a pattern match only look through the applicable range. Although obviously some letters will have more words than others, a given search will only have to look through an average of 100,000/26 words.
// words array assumed to be lowercase and in alphabetical order
var words = ["a","an","and","be","blue","cast","etc."];
// figure out the index for the first and last word starting with
// each letter of the alphabet, so that later searches can use
// just the appropriate range instead of searching the whole array
var letterIndexes = {},
i,
l,
letterIndex = 0,
firstLetter;
for (i=0, l=words.length; i<l; i++) {
if (words[i].charAt(0) === firstLetter)
continue;
if (firstLetter)
letterIndexes[firstLetter] = {first : letterIndex, last : i-1};
letterIndex = i;
firstLetter = words[i].charAt(0);
}
function getSubset(pattern) {
pattern = pattern.toLowerCase()
var subset = [],
fl = pattern.charAt(0),
matched = false;
if (letterIndexes[firstLetter])
for (var i = letterIndexes[fl].first, l = letterIndex[fl].last; i <= l; i++) {
if (pattern === words[i].substr(0, pattern.length)) {
subset.push(words[i]);
matched = true;
} else if (matched) {
break;
}
}
return subset;
}
Note also that when searching through the (range within the) words array, once a match is found I set a flag, which indicates we've gone past all of the words that are alphabetically before the pattern and are now making our way through the matching words. That way as soon as the pattern no longer matches we can break out of the loop. If the pattern doesn't match at all we still end up going through all the words for that first letter though.
Also, if you're doing this as a user types, when letters are added to the end of the pattern you only have to search through the previous subset, not through the whole list.
P.S. Of course if you want to break the word list up by first letter you could easily do that server-side.