How to strip comments from Javascript using PHP - javascript

I want to remove the comments from these kind of scripts:
var stName = "MyName"; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain";
var stCountry = "United State of America";
What is (the best) ways of accomplish this using PHP?

The best way is to use an actual parser or write at least a lexer yourself.
The problem with Regex is that it gets enormously complex if you take everything into account that you have to.
For example, Cagatay Ulubay's suggested Regex'es /\/\/[^\n]?/ and /\/\*(.*)\*\// will match comments, but they will also match a lot more, like
var a = '/* the contents of this string will be matches */';
var b = '// and here you will even get a syntax error, because the entire rest of the line is removed';
var c = 'and actually, the regex that matches multiline comments will span across lines, removing everything between the first "/*" and here: */';
/*
this comment, however, will not be matched.
*/
While it is rather unlikely that strings contain such sequences, the problem is real with inline regex:
var regex = /^something.*/; // You see the fake "*/" here?
The current scope matters a lot, and you can't possibly know the current scope unless you parse the script from the beginning, character for character.
So you essentially need to build a lexer.
You need to split the code into three different sections:
Normal code, which you need to output again, and where the start of a comment could be just one character away.
Comments, which you discard.
Literals, which you also need to output, but where a comment cannot start.
Now the only literals I can think of are strings (single- and double-quoted), inline regex and template strings (backticks), but those might not be all.
And of course you also have to take escape sequences inside those literals into account, because you might encounter an inline regex like
/^file:\/\/\/*.+/
in which a single-character based lexer would only see the regex /^file:\/ and incorrectly parse the following /*.+ as the start of a multiline comment.
Therefore upon encountering the second /, you have to look back and check if the last character you passed was a \. The same goes for all kinds of quotes for strings.

I would go with preg_replace(). Assuming all comments are single line comments (// Comment here) you can start with this:
$JsCode = 'var stName = "MyName isn\'t \"Foobar\""; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain"; // Comment
var stLink2 = \'http://domain.com/mydomain\'; // This comment goes as well
var stCountry = "United State of America"; // Comment here';
$RegEx = '/(["\']((?>[^"\']+)|(?R))*?(?<!\\\\)["\'])(.*?)\/\/.*$/m';
echo preg_replace($RegEx, '$1$3', $JsCode);
Output:
var stName = "MyName isn't \"Foobar\"";
var stLink = "http://domain.com/mydomain";
var stLink2 = 'http://domain.com/mydomain';
var stCountry = "United State of America";
This solution is far from perfect and might have issues with strings containing "//" in them.

Related

Replace multiple identical characters with a string

Using Javascript, I want to replace:
This is a test, please complete ____.
with:
This is a test, please complete %word%.
The number of underlines isn't consistent, so I cannot just use something like str.replace('_____', '%word%').
I've tried str.replace(/(_)*/g, '%word%') but it didn't work. Any suggestions?
Remove the capturing group, and make sure _ repeats with + (at least one occurrence, matches as many _s as possible):
const str = 'This is a test, please complete ____.';
console.log(
str.replace(/_+/g, '%word%')
);
The regular expression
/(_)*/
means, in plain language: match zero or more underscores, which of course isn't what you're looking for. That will match every position in the string (except positions in the string between underscores).
I'm going to suggest a slightly different approach to this. Instead of maintaining the sentence as you currently have it, instead maintain something like this:
This is the {$1} test, please complete {$2}.
When you want to render this sentence, use a regex replacement to replace the placeholders with underscores:
var sentence = "This is the {$1} test, please complete {$2}.";
var show = sentence.replace(/\{\$\d+\}/g, "____");
console.log(show);
When you want to replace a given placeholder, you may also use a targeted regex replacement. For example, to target the first placeholder you could use:
var sentence = "This is the {$1} test, please complete {$2}.";
var show = sentence.replace(/\{\$1\}/g, "first");
console.log(show);
This is a fairly robust and scalable solution, and is more accurate than just doing a single blanket replacement of all underscores.

Extract properties from a string value in GTM

I'm trying to pull out information from an old AWIN tag we have on the site with GTM. We're working on getting this pushed into the DataLayer, but that will take a while, so this is the next step for the time being.
Ive managed to pull the information into a string in GTM which is returning a value of the below (ive manually removed the values for this post), which is great:
'/* Do not change / var AWIN = {}; AWIN.Tracking = {};
AWIN.Tracking.Sale = {}; / Set your transaction parameters */
AWIN.Tracking.Sale.amount = "00.00"; AWIN.Tracking.Sale.channel =
"aw"; AWIN.Tracking.Sale.currency = "GBP"; AWIN
.Tracking.Sale.orderRef = "00000"; AWIN.Tracking.Sale.parts =
"DEFAULT:00.00" ; AWIN.Tracking.Sale.test = "0";
AWIN.Tracking.Sale.voucher = "";'
The only part i need is the value of
AWIN.Tracking.Sale.parts.
The script we've created to extract this is:
function() {
var awintrackstr = {{DOM - AWIN Image Full}};
return awintrackstr.match(/AWIN.Tracking.Sale.parts = \"(.*)\";$/)[1];
}
However, this is extracting everything past that the value we need:
'DEFAULT:00:00"; AWIN.Tracking.Sale.test = "0"; AWIN.Tracking.Sal....
All the tests we've created shows the above should work, but its not working in GTM
Has anyone got any ideas of how this should work in GTM? Again, all we're looking to exctract is the part that says DEFAULT:00.00.
Thanks in advance
This is because of the "(.*)" part in your regular expression.
.* will match anything, including other " characters, making it match up to the last " that is still followed by the rest of your regular expression.
Replace "(.*)" with "([^"]*)", this will match any character that is not ".
I can also recommend using regex101.com whenever you need to write a regular expression. Using this, you will also notice the " character has no special meaning in a javascript regular expression, so there is no need to escape it.
Edit: here is the modified version of your regular expression at work: https://regex101.com/r/TPUU6z/1

Javascript Regex only replacing first match occurence

I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?
^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28
Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g
Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));

Matching invisible characters in JavaScript RegEx

I've got some string that contain invisible characters, but they are in somewhat predictable places. Typically the surround the piece of text I want to extract, and then after the 2nd occurrence I want to keep the rest of the text.
I can't seem to figure out how to both key off of the invisible characters, and exclude them from my result. To match invisibles I've been using this regex: /\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F/ which does seem to work.
Here's an example: [invisibles]Keep as match 1[invisibles]Keep as match 2
Here's what I've been using so far without success:
/([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)(.+)([\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+)/(.+)
I've got the capture groups in there, but it's bee a while since I've had to use regex's in this way, so I know I'm missing something important. I was hoping to just make the invisible matches non-capturing groups, but it seems that JavaScript does not support this.
Something like this seems like what you want. The second regex you have pretty much works, but the / is in totally the wrong place. Perhaps you weren't properly reading out the group data.
var s = "\x0EKeep as match 1\x0EKeep as match 2";
var r = /[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)[\xA0\x00-\x09\x0B\x0C\x0E-\x1F\x7F]+(.+)/;
var match = s.match(r);
var part1 = match[1];
var part2 = match[2];

Javascript string validation using the regex object

I am complete novice at regex and Javascript. I have the following problem: need to check into a textfield the existence of one (1) or many (n) consecutive * (asterisk) character/characters eg. * or ** or *** or infinite (n) *. Strings allowed eg. *tomato or tomato* or **tomato or tomato** or as many(n)*tomato many(n)*. So, far I had tried the following:
var str = 'a string'
var value = encodeURIComponent(str);
var reg = /([^\s]\*)|(\*[^\s])/;
if (reg.test(value) == true ) {
alert ('Watch out your asterisks!!!')
}
By your question it's hard to decipher what you're after... But let me try:
Only allow asterisks at beginning or at end
If you only allow an arbitrary number (at least one) of asterisks either at the beginning or at the end (but not on both sides) like:
*****tomato
tomato******
but not **tomato*****
Then use this regular expression:
reg = /^(?:\*+[^*]+|[^*]+\*+)$/;
Match front and back number of asterisks
If you require that the number of asterisks at the biginning matches number of asterisks at the end like
*****tomato*****
*tomato*
but not **tomato*****
then use this regular expression:
reg = /^(\*+)[^*]+\1$/;
Results?
It's unclear from your question what the results should be when each of these regular expressions match? Are strings that test positive to above regular expressions fine or wrong is on you and your requirements. As long as you have correct regular expressions you're good to go and provide the functionality you require.
I've also written my regular expressions to just exclude asterisks within the string. If you also need to reject spaces or anything else simply adjust the [^...] parts of above expressions.
Note: both regular expressions are untested but should get you started to build the one you actually need and require in your code.
If I understand correctly you're looking for a pattern like this:
var pattern = /\**[^\s*]+\**/;
this won't match strings like ***** or ** ***, but will match ***d*** *d or all of your examples that you say are valid (***tomatos etc).If I misunderstood, let me know and I'll see what I can do to help. PS: we all started out as newbies at some point, nothing to be ashamed of, let alone apologize for :)
After the edit to your question I gather the use of an asterisk is required, either at the beginning or end of the input, but the string must also contain at least 1 other character, so I propose the following solution:
var pattern = /^\*+[^\s*]+|[^\s*]+\*+$/;
'****'.match(pattern);//false
' ***tomato**'.match(pattern);//true
If, however *tomato* is not allowed, you'll have to change the regex to:
var pattern = /^\*+[^\s*]+$|^[^\s*]+\*+$/;
Here's a handy site to help you find your way in the magical world of regular expressions.

Categories

Resources