Google Apps Script getPlainBody() from GmailMessage class regex not working - javascript

this is my first question on stackoverflow so plz let me know how I can improve readability for others.
Am trying to use regex on the string I obtained from getPlainBody() in GmailMessage Class but somehow it doesn't work when I try to do it directly on the string returned by getPlainBody() but works well when I manually add \n characters.
Code that works:
function RegularExp() {
//manually entered \n characters into string that I copied and pasted from getPlainBody()
var string = "Personal Message\nraw material: oak wood 100kg\nTRACKING NUMBER 7777777777\n<somehyperlink\nFROM SomeBrand";
//my goal is to get: raw material: oak wood 100kg
var regExp = new RegExp("(.*?)\n(?=TRACKING NUMBER)","g");
var PersonalMessage = regExp.exec(string)[1];
Logger.log(PersonalMessage); //works perfectly fine
}
Code that doesn't work:
for (var j in messages){
var message = messages[j];
var plainText = message.getPlainBody(); //getting plainbody of fedex mail of interest
//trying to extract the personal message
var regExp = new RegExp("(.*?)\n(?=TRACKING NUMBER)","g");
var PersonalMessage = regExp.exec(plainText)[1];
Logger.log(PersonalMessage); //won't show anything
}
My question is why does it work when I manually enter \n but not when I use the string that was returned from getPlainBody()? I'm using the exact same regex pattern and can't see why.
Below are the links I used to try to solve my problem (or I might just be dumb not being able to apply the solution to this issue)
Newline in gmail app script getplainbody function
Google Apps Script: getPlainBody() weird behavior
Regex - google apps script
Thanks

The issue is that the . does not match a CR char in the JavaScript regex (ECMAScript flavor).
You can use
var regExp = /(.*)(?=\r?\nTRACKING NUMBER)/g;
The regex matches
(.*) - Group 1: any zero or more chars other than line break chars (it does not match LF and CR chars)
(?=\r?\nTRACKING NUMBER) - a positive lookahead that matches a location that is immediately followed with
\r? - an optional CR (carriage return char)
\n - a line feed char
TRACKING NUMBER - some fixed string (at the end of the next line).

Related

Regex Match End of Line Unless it Ends with a Closed Bracket

I'm trying to write a JavaScript Regex that will grab the end of a line unless said line ends with a closing bracket, example:
[word]
lengthy text line
[other word]
even lengthier text line! Whoo!
That part I have down pat writing up this Regex new RegExp(/[\n]\n|(?![^\]])$/gm)
But I also need to be able to grab the end of the line even where there isn't a double space, and that is proving to be SUPER difficult since I don't really know a ton about Regex.
-- [word]
These two lines need to be grouped -- lengthy text line
-- [other word]
These two lines need to be grouped -- even lengthier text line! Whoo!
This needs to be it's own group -- This text line is the longest of them all!
-- [more words]
These two lines need to be grouped -- The last guy can win...
What's annoying is that there is a very simple Regex that accomplishes this goal, but it's not currently supported in FireFox, and that's a problem. (?<!])\n Negative Look Behind Assertion
EDIT: The method used for the information is splitting, it splits the value placed into a textarea and matches it to array[i].match(/^\[(.*?)\]\n/). It'd look something like this:
var regex = new RegExp(/[\n]\n|(?![^\]])$/gm);
var array = $('#textar').val().split(regex);
for (var i = 0; i < array.length; i++) {
var match = array[i].match(/^\[(.*?)\]\n/)
}
but with a lot more code taking those variables and using them.
SOLUTION:
Wiktor Stribiżew had the solution. Changing .split(regex) to .match(regex) and adding their regex fixed the problem
var regex = new RegExp(/^.*[^\]\n](?:\]\n.*[^\]\n])*$/gm);
var array = $('#textar').val().match(regex);
for (var i = 0; i < array.length; i++) {
var match = array[i].match(/^\[(.*?)\]\n/)
}
You may use String#match:
text.match(/^.*[^\]\n](?:\]\n.*[^\]\n])*$/gm)
Regex details
^ - start of a line
.*[^\]\n] - 0 or more chars other than line break chars, as many as possible and then a char other than a newline and ]
(?:\]\n.*[^\]\n])* - 0 or more repetitions of
\]\n - ] and a newline, LF, char
.*[^\]\n] - 0 or more chars other than line break chars, as many as possible and then a char other than a newline and ]
$ - end of a line.
See the JS demo:
var text = "[word]\nlengthy text line\n\n[other word]\neven lengthier text line! Whoo!\nThis text is the longest of them all!\n[more words]\nThe last gyu can win...";
console.log(text.match(/^.*[^\]\n](?:\]\n.*[^\]\n])*$/gm));
You are looking for a regex like this:
/^\[.+(\n+[^\[]+)/gm
^ at the begining of the string,
look for [
.+ followed by any character
(\n+[^\[]+) an enter any number of times or any character as long as it is not [
Demo: https://regex101.com/r/c1giqu/3
For your convenience, the full match keeps the text between brackets. The first group includes only the text without the brackets.

Getting element from filename using continous split or regex

I currently have the following string :
AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv
But I would like to split it to only get the following result (removing all tree directories + removing timestamp before the file):
1564416946615-file-test.dsv
I currently have the following code, but it's not working when the filename itselfs contains a '-' like in the example.
getFilename(str){
return(str.split('\\').pop().split('/').pop().split('-')[1]);
}
I don't want to use a loop for performances considerations (I may have lots of files to work with...) So it there an other solution (maybe regex ?)
We can try doing a regex replacement with the following pattern:
.*\/\d+-\b
Replacing the match with empty string should leave you with the result you want.
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var output = filename.replace(/.*\/\d+-\b/, "");
console.log(output);
The pattern works by using .*/ to first consume everything up, and including, the final path separator. Then, \d+- consumes the timestamp as well as the dash that follows, leaving only the portion you want.
You may use this regex and get captured group #1:
/[^\/-]+-(.+)$/
RegEx Demo
RegEx Details:
[^\/-]+: Match any character that is not / and not -
-: Match literal -
(.+): Match 1+ of any characters
$: End
Code:
var filename = "AAAAA/BBBBB/1565079415419-1564416946615-file-test.dsv";
var m = filename.match(/[^\/-]+-(.+)$/);
console.log(m[1]);
//=> 1564416946615-file-test.dsv

JS conditional RegEx that removes different parts of a string between two delimiters

I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.

How to check if a string contains specific words in different languages [duplicate]

I have simple regex which founds some word in text:
var patern = new RegExp("\bsomething\b", "gi");
This match word in text with spaces or punctuation around.
So it match:
I have something.
But doesn't match:
I havesomething.
what is fine and exactly what I need.
But I have issue with for example Arabic language. If I have regex:
var patern = new RegExp("\bرياضة\b", "gi");
and text:
رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي
The keyword which I am looking for is at the end of the text.
But this doesn't work, it just doesn't find it.
It works if I remove \b from regex:
var patern = new RegExp("رياضة", "gi");
But that is now what I want, because I don't want to find it if it's part of another word like in english example above:
I havesomething.
So I really have low knowledge about regex and if anyone can help me to work this with english and languages like arabic.
We have first to understand what does \b mean:
\b is an anchor that matches at a position that is called a "word boundary".
In your case, the word boundaries that you are looking for are not having other Arabic letters.
To match only Arabic letters in Regex, we use unicode:
[\u0621-\u064A]+
Or we can simply use Arabic letters directly
[ء-ي]+
The code above will match any Arabic letters. To make a word boundary out of it, we could simply reverse it on both sides:
[^ء-ي]ARABIC TEXT[^ء-ي]
The code above means: don't match any Arabic characters on either sides of an Arabic word which will work in your case.
Consider this example that you gave us which I modified a little bit:
أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا
If we are trying to match only رياض, this word will make our search match also رياضة, رياضيات, and رياضتي. However, if we add the code above, the match will successfully be on رياض only.
var x = " أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا ";
x = x.replace(/([^ء-ي]رياض[^ء-ي])/g, '<span style="color:red">$1</span>');
document.write (x);
If you would like to account for أآإا with one code, you could use something like this [\u0622\u0623\u0625\u0627] or simply list them all between square brackets [أآإا]. Here is a complete code
var x = "أنا هنا وانا هناك .. آنا هنا وإنا هناك";
x = x.replace(/([أآإا]نا)/g, '<span style="color:red">$1</span>');
document.write (x);
Note: If you want to match every possible Arabic characters in Regex including all Arabic letters أ ب ت ث ج, all diacritics َ ً ُ ٌ ِ ٍ ّ, and all Arabic numbers ١٢٣٤٥٦٧٨٩٠, use this regex: [،-٩]+
Useful link about the ranking of Arabic characters in Unicode: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode
This doesn't work because of the Arabic language which isn't supported on the regex engine.
You could search for the unicode chars in the text (Unicode ranges).
Or you could use encoding to convert the text into unicode and then make somehow the regex (i never have tried this but it should work).
I used this ء-ي٠-٩ and it works for me
If you don't need a complicated RegEx (for instance, because you're looking for a particular word or a short list of words), then I've found that it's actually easier to tokenize the search text and find it that way:
>>> text = 'رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي '
>>> tokens = text.split()
>>> print(tokens)
['رياضة', 'أنا', 'أحب', 'رياضتي', 'وأنا', 'سعيد', 'حقا', 'هنا', 'لها', 'حبي']
>>> search_words = ['رياضة', 'رياضت']
>>> found = [w for w in tokens if w in search_words]
>>> print(found)
['رياضة'] # returns only full-word match
I'm sure that this is slower than RegEx, but not enough that I've ever noticed.
If your text had punctuation, you could do a more sophisticated tokenization (so it would find things like 'رياضة؟') using NLTK.

Regex find string and replace that line and following lines

I am trying to find a regex to achieve the following criteria which I need to use in javascript.
Input file
some string is here and above this line
:62M:C111111EUR1211498,00
:20:0000/11111000000
:25:1111111111
:28C:00001/00002
:60M:C170926EUR1211498,06
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111
Output has to be
some string is here and above this line
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111
Briefly, find :62M: and then replace (and delete) the lines starting with :62M: followed by lines starting with :20:, :25:, :28c: and :60M:.
Or, find :62M: and replace (and delete) until the line starting with :61:.
Each line has fixed length of 80 characters followed by newline (CR LF).
Is this really possible with regex?
I know how to find a string and replace the same line where the string is. But here multiple lines to be removed which is quite hard for me.
Please could someone help me out if it is possible with regex.
Here it is. First I'm finding text to delete using regex (note that I'm using [^]* to match all the lines insted of .*, as it also matches newlines). Then I'm replacing it with a newline.
var regex = /:62M:.*([^]*):61:.*/;
var text = `some string is here and above this line
:62M:C111111EUR1211498,00
:20:0000/11111000000
:25:1111111111
:28C:00001/00002
:60M:C170926EUR1211498,06
:61:1710050926C167,XXNCHKXXXXX 11111//111111/111111`;
var textToDelete = regex.exec(text)[1];
var result = text.replace(textToDelete, '\n');
console.log(result);

Categories

Resources