So I'm tackling the task of de-obfuscating some javascript code and using www.jsbeautifier.org I have got my code. Is it possible to make a search and replace query of some sort or another method to replace the random variable names with this actual content e.g:
O7 = "string";
o9 = "test";
function o9(O7) {
.... etc
}
to
function test(string) {
...... etc.
}
Thanks
You can't do this in pure regex, or in language-agnostic, just by the fact that you can't use conditional replacements (or substitutions). Which means you can't do something like:
b(a)?, and say: if a is empty, then replace the whole match to "c"; otherwise, to "d".
Why is it useful? Keep reading to see what we'll be using to match the right text.
Currently, some regex flavors allow you to use different 'variables' within the substitution text.
(e.g.: $n, $', $&, $`...) - Take a look at Substitutions in Regular Expressions.
However, assuming you're deobfuscating Javascript code with Javascript, the regex you're searching for is:
/"[^"]*"|'[^']*'|\/\*[\s\S]*?\*\/|\/\/.*$|\b(<text>)\b/mg
Explanation
If you use it in Regex101, you'll see it's matching the same as \b<text>\b, any other comment
(/* foo */, // bar), and any other quoted text ("baz", 'qux'), which is actually the expected. The first two parts of the regex will be responsible to match any string:
"[^"]*" - matches: "..."
'[^']*' - matches: '...'
And that's okay because we want to exclude the possibility of replacing a 'variable' if it's actually inside the string.
And then the third, which will be responsible for the multiline comments, and the fourth (normal comments) part shall work like this:
\/\*[\s\S]*?\*\/ - matches: /*...*/
\/\/.*$ - matches: //... until the line breaks
And now, the text we'll be searching for, will not simply be matched by the regex, but also will be captured. Take a look at the last part:
\b(<text>)\b - captures the <text> (those that haven't been captured before).
Now, in our script, we can simply match all the occurrences of the desired input, and replace to the output when our code detects that group one ($1) is not empty.
Result (TL;DR)
function deobfuscate(code, from, to){
var re = RegExp('"[^"]*"|\'[^\']*\'|\\/\\*[\\s\\S]*?\\*\\/|\\/\\/.*$|\\b('+ from +')\\b', 'gm');
return code.replace(re, function(match, g1) { return (g1) ? to:match; });
}
With that function, you can do what you want, for example, parsing:
O7 = "string";
o9 = "test";
function o9(O7) { ...
And retrieving <toFind> = <toReplace> in the start, and then use it (inside a loop or something) like this:
code = deobfuscate(code, toFind[i], toReplace[i]);
Working Example
/* Textarea & Inputs' DOMs */
var code = document.getElementById("code");
var from = document.getElementById("from");
var to = document.getElementById("to");
code.placeholder = "Code goes here...";
from.placeholder = "From";
to.placeholder = "To";
/* Example Values */
code.value = "Example: //Switch(?):\n"+
"function o9(o9) { //true, true\n"+
" o9 = 'o9'; //true, false\n"+
" /*\n"+
" o9 //false\n"+
" */\n"+
" var test = o9+\"o9\"+o9; //true, false, true\n"+
" return o9; //o9 //true, false\n"+
"}\n";
from.value = "o9";
to.value = "ok";
/* Called onclick action */
function doStuff(){
code.value = deobfuscate(code.value, from.value, to.value);
}
function deobfuscate(code, from, to){
var re = RegExp('"[^"]*"|\'[^\']*\'|\\/\\*[\\s\\S]*?\\*\\/|\\/\\/.*$|\\b('+ from +')\\b', 'gm');
return code.replace(re, function(match, g1) { return (g1) ? to:match; });
}
<html>
<body>
<textarea id="code" rows="10" cols="55"></textarea> <br>
<input id="from"/> → <input id="to"/> <br><br>
<button onclick="doStuff()">Deobfuscate</button>
</body>
</html>
OBS
As your question is not clear enough, I can't tell what you are searching for, a lot is possible. For example, should it search for random variables in the code and then replace it all? That would require a dictionary for words, so wouldn't really deobfuscate the code, as you must specify the input and output. If there's something you think I'm missing, please add it to the comment section.
Related
I have such a sentence:
(CAR AND BUS) OR TRAM
I need to add quotes to all words except AND(it can be OR instead of AND):
So I created such a code for that:
word.replace(/"/g, '').split(" ").map(e => ["AND", "OR"].includes(e) ? e : '"' + e + '"').join(" ");
but as an output, I have an incorrectly formatted query like
"(CAR" AND "BUS)" OR "TRAM"
I do need not to include quotes to the () so as an output I expect to have
("CAR" AND "BUS") OR "TRAM"
How can I achieve such a result?
"(CAR AND BUS) OR TRAM".replace(/([a-zA-Z]+)/gi, function(word){
if(["AND", "OR"].indexOf(word) < 0) return `"${word}"`;
else return word
})
There are plenty of approaches to solve your task. Here is one that is more procedural and it does not use regular expressions.
Basically your task is to split the sentence into words, then process each word, checking if it requires processing and then apply your given ruleset.
Ofcourse this can be written more concise by using regular expression (see other answers) but especially if you have people around in your team that are not so versatile sometimes an more expressive approach is good too.
var sentence = "(CAR AND BUS) OR TRAM"; // Input data
var words = sentence.split(" "); // Get each word of the input
var exclude = ["AND", "OR"]; // Words that should be ignored when processing
var result = []; // result goes here
words.forEach(word=>{ // loop over each word
if(exclude.includes(word)){ // exclude from further processing?
result.push(word); //yes: put into result
} else{ //no: remove ( and ) and enclose with quoationsmark then put into result
result.push("\""+word.replace("(","").replace(")","")+"\"");
}
}
);
console.log(result.join(" "));
Using replace()
Demo: https://regex101.com/r/1kC6zT/2
console.log("(CAR AND BUS) OR TRAM".replace(/(?!AND|OR)(\b[^\s]+\b)/g, '"$1"'))
I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");
Below I have a sentance and desiredResult for the sentance. Using the pattern below I can snag the t T that needs to be changed to t, t but I don't know where to go further.
var sentence = "Over the candidate behaves the patent Then the doctor.";
var desiredResult = "Over the candidate behaves the patent, then the doctor.";
var pattern = /[a-z]\s[A-Z]/g;
I want to a correct sentence by adding comma and a space before a capital other than 'I' if the preceding letter is lowercase.
Use .replace() on your sentence and pass replacing function as second parameter
var corrected = sentence.replace(
/([a-z])\s([A-Z])/g,
function(m,s1,s2){ //arguments: whole match (t T), subgroup1 (t), subgroup2 (T)
return s1+', '+s2.toLowerCase();
}
);
As for preserving uppercased I, there are many ways, one of them:
var corrected = sentence.replace(
/([a-z])\s([A-Z])(.)/g,
function(m,s1,s2,s3){
return s1+((s2=='I' && /[^a-z]/i.test(s3))?(' '+s2):(', '+s2.toLowerCase()))+s3;
}
);
But there are more cases when it will fail, like: His name is Joe., WTF is an acronym for What a Terrible Failure. and many others.
I have text like the following, with embedded spaces that show indentation of some xml data:
<Style id="KMLStyler"><br>
<IconStyle><br>
<colorMode>normal</colorMode><br>
I need to use Javascript to replace each LEADING space with
so that it looks like this:
<Style id="KMLStyler"><br>
<IconStyle><br>
<colorMode>normal</colorMode><br>
I have tried a basic replace, but it is matching all spaces, not just the leading ones. I want to leave all the spaces alone except the leading ones. Any ideas?
JavaScript does not have the convenient \G (not even look-behinds), so there's no pure regex-solution for this AFAIK. How about something like this:
function foo() {
var leadingSpaces = arguments[0].length;
var str = '';
while(leadingSpaces > 0) {
str += ' ';
leadingSpaces--;
}
return str;
}
var s = " A B C";
print(s.replace(/^[ \t]+/mg, foo));
which produces:
A B C
Tested here: http://ideone.com/XzLCR
EDIT
Or do it with a anonymous inner function (is it called that?) as commented by glebm in the comments:
var s = " A B C";
print(s.replace(/^[ \t]+/gm, function(x){ return new Array(x.length + 1).join(' ') }));
See that in action here: http://ideone.com/3JU52
Use ^ to anchor your pattern at the beginning of the string, or if you'r dealing with a multiline string (ie: embedded newlines) add \n to your pattern. You will need to match the whole set of leading spaces at once, and then in the replacement check the length of what was matched to figure out how many nbsps to insert.
I'm trying to extract a substring from a file with JavaScript Regex. Here is a slice from the file :
DATE:20091201T220000
SUMMARY:Dad's birthday
the field I want to extract is "Summary". Here is the approach:
extractSummary : function(iCalContent) {
/*
input : iCal file content
return : Event summary
*/
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr);
}
function extractSummary(iCalContent) {
var rx = /\nSUMMARY:(.*)\n/g;
var arr = rx.exec(iCalContent);
return arr[1];
}
You need these changes:
Put the * inside the parenthesis as
suggested above. Otherwise your matching
group will contain only one
character.
Get rid of the ^ and $. With the global option they match on start and end of the full string, rather than on start and end of lines. Match on explicit newlines instead.
I suppose you want the matching group (what's
inside the parenthesis) rather than
the full array? arr[0] is
the full match ("\nSUMMARY:...") and
the next indexes contain the group
matches.
String.match(regexp) is
supposed to return an array with the
matches. In my browser it doesn't (Safari on Mac returns only the full
match, not the groups), but
Regexp.exec(string) works.
You need to use the m flag:
multiline; treat beginning and end characters (^ and $) as working
over multiple lines (i.e., match the beginning or end of each line
(delimited by \n or \r), not only the very beginning or end of the
whole input string)
Also put the * in the right place:
"DATE:20091201T220000\r\nSUMMARY:Dad's birthday".match(/^SUMMARY\:(.*)$/gm);
//------------------------------------------------------------------^ ^
//-----------------------------------------------------------------------|
Your regular expression most likely wants to be
/\nSUMMARY:(.*)$/g
A helpful little trick I like to use is to default assign on match with an array.
var arr = iCalContent.match(/\nSUMMARY:(.*)$/g) || [""]; //could also use null for empty value
return arr[0];
This way you don't get annoying type errors when you go to use arr
This code works:
let str = "governance[string_i_want]";
let res = str.match(/[^governance\[](.*)[^\]]/g);
console.log(res);
res will equal "string_i_want". However, in this example res is still an array, so do not treat res like a string.
By grouping the characters I do not want, using [^string], and matching on what is between the brackets, the code extracts the string I want!
You can try it out here: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp
Good luck.
(.*) instead of (.)* would be a start. The latter will only capture the last character on the line.
Also, no need to escape the :.
You should use this :
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr[0]);
this is how you can parse iCal files with javascript
function calParse(str) {
function parse() {
var obj = {};
while(str.length) {
var p = str.shift().split(":");
var k = p.shift(), p = p.join();
switch(k) {
case "BEGIN":
obj[p] = parse();
break;
case "END":
return obj;
default:
obj[k] = p;
}
}
return obj;
}
str = str.replace(/\n /g, " ").split("\n");
return parse().VCALENDAR;
}
example =
'BEGIN:VCALENDAR\n'+
'VERSION:2.0\n'+
'PRODID:-//hacksw/handcal//NONSGML v1.0//EN\n'+
'BEGIN:VEVENT\n'+
'DTSTART:19970714T170000Z\n'+
'DTEND:19970715T035959Z\n'+
'SUMMARY:Bastille Day Party\n'+
'END:VEVENT\n'+
'END:VCALENDAR\n'
cal = calParse(example);
alert(cal.VEVENT.SUMMARY);