I need to fetch particular function and its body as a text from the javascript file and print that function as an output using C#. I need to give function name and js file as an input parameter. I tried using regex but couldnt achieved the desired result. Here is the code of regex.
public void getFunction(string jstext, string functionname)
{
Regex regex = new Regex(#"function\s+" + functionname + #"\s*\(.*\)\s*\{");
Match match = regex.Match(jstext);
}
Is there any other way I can do this?
This answer is based on the assumption which you provide in comments, that the C# function needs only to find function declarations, and not any form of function expressions.
As I point out in comments, javascript is too complex to be efficiently expressed in a regular expression. The only way to know you've reached the end of the function is when the brackets all match up, and given that, you still need to take escape characters, comments, and strings into account.
The only way I can think of to achieve this, is to actually iterate through every single character, from the start of your function body, until the brackets match up, and keep track of anything odd that comes along.
Such a solution is never going to be very pretty. I've pieced together an example of how it might work, but knowing how javascript is riddled with little quirks and pitfalls, I am convinced there are many corner cases not considered here. I'm also sure it could be made a bit tidier.
From my first experiments, the following should handle escape characters, multi- and single line comments, strings that are delimited by ", ' or `, and regular expressions (i.e. delimited by /).
This should get you pretty far, although I'm intrigued to see what exceptions people can come up with in comments:
private static string GetFunction(string jstext, string functionname) {
var start = Regex.Match(jstext, #"function\s+" + functionname + #"\s*\([^)]*\)\s*{");
if(!start.Success) {
throw new Exception("Function not found: " + functionname);
}
StringBuilder sb = new StringBuilder(start.Value);
jstext = jstext.Substring(start.Index + start.Value.Length);
var brackets = 1;
var i = 0;
var delimiters = "`/'\"";
string currentDelimiter = null;
var isEscape = false;
var isComment = false;
var isMultilineComment = false;
while(brackets > 0 && i < jstext.Length) {
var c = jstext[i].ToString();
var wasEscape = isEscape;
if(isComment || !isEscape)
{
if(c == #"\") {
// Found escape symbol.
isEscape = true;
} else if(i > 0 && !isComment && (c == "*" || c == "/") && jstext[i-1] == '/') {
// Found start of a comment block
isComment = true;
isMultilineComment = c == "*";
} else if(c == "\n" && isComment && !isMultilineComment) {
// Found termination of singline line comment
isComment = false;
} else if(isMultilineComment && c == "/" && jstext[i-1] == '*') {
// Found termination of multiline comment
isComment = false;
isMultilineComment = false;
} else if(delimiters.Contains(c)) {
// Found a string or regex delimiter
currentDelimiter = (currentDelimiter == c) ? null : currentDelimiter ?? c;
}
// The current symbol doesn't appear to be commented out, escaped or in a string
// If it is a bracket, we should treat it as one
if(currentDelimiter == null && !isComment) {
if(c == "{") {
brackets++;
}
if(c == "}") {
brackets--;
}
}
}
sb.Append(c);
i++;
if(wasEscape) isEscape = false;
}
return sb.ToString();
}
Demo
Related
I am trying to find a non-deprecated method of comparing two strings in Typescript and/or Javascript that can handle any string that can be generated using the special characters available on a standard keyboard, because our application is fetching a randomly generated password from the backend then displaying that in our PASSWORD and CONFIRM PASSWORD fields, instead of fetching the encrypted password from the DB and including it in the response, which could possibly be decrypted with enough effort by someone attempting to crack our password hashing algorithm.
It is comparing the PASSWORD to the CONFIRM PASSWORD and checking if they are equal using the 'eval()' function (because the validation condition is loaded from the DB, and could be "FIELD_1 > FIELD_2", not necessarily always equality checking. It does replacements using .replaceAll() on the condition based on form fields' current form value, then evaluates the conditions using eval(). However, replaceAll requires a regex, and regexes can't have certain special characters unescaped or else an error will occur.)
So, to try to find a solution that works for ALL strings/combinations of characters, I added code at the bottom of my question which will generate random strings and compare them for equality 3000 times, and if there are no issues, it will not print any failure messages. I noticed I can escape MOST strings successfully using the following:
function escapeSpecialCharactersInTestString(str: string) {
str = str.replace(/\$/g, '$$$$');
str = str.replace(/[']/g, "\\'");
return str;
}
However, it is still failing for strings like
\3aeB296Z=DH\D"]Yu[0;MC.dep.UeE8g]&}sz)6N|M?.]q:%/ because of the \3 giving the error Octal escape sequences are not allowed in strict mode. I don't want to have to disable strict mode somehow and shouldn't be required to. However, if I use the deprecated "escape()" function instead, it always works.
function escapeSpecialCharactersInTestString(str: string) {
return escape(str)
}
Looking for a solution that can replace any sequence of characters / any string, and I don't want to worry about using deprecated code.
Code that tests 3000 equal, randomly generated strings for equality and stops if it can't compare them:
const normalCharacters =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const specialCharacters = "#:;#&-?/%+*[]\\_=^!()|{}\",.";
function escapeSpecialCharactersInTestString(str: string) {
// console.log(str)
// WORKS 100% but is deprecated:
// str = escape(str)
str = str.replace(/\$/g, '$$$$');
str = str.replace(/[']/g, "\\'");
return str;
}
function generateRandomString(length: number) {
let characterList = normalCharacters + specialCharacters;
let result = "";
while (length > 0) {
let index = Math.floor(Math.random() * characterList.length);
result += characterList[index];
length--;
}
return result;
}
function evaluateConditionWithTrueOrFalseResult(testString: string) {
let conditionResult = false
let evalResult;
try {
evalResult = eval(testString);
} catch (e) {
console.log(e)
}
if (evalResult == true) {
conditionResult = true
} else if (evalResult == false) {
conditionResult = false
} else {
conditionResult = false
}
return conditionResult
}
function replaceAll(str: any, find: any, replace: any) {
//console.log("str: ",str,", find: ",find,", replace: ",replace)
if(str == null) {
return str;
}
if(find !== undefined && find !== null && find !== '' && find !== true && find !== false && !isDate(find) && isNaN(find)) {
find = escapeSpecialCharactersInTestString(find)
}
if(replace !== undefined && replace !== null && replace !== '' && replace !== true && replace !== false && !isDate(replace) && isNaN(replace)) {
//console.log("escaping special chars: ",replace)
replace = escapeSpecialCharactersInTestString(replace)
//console.log("after: ",replace)
}
// replaces whole word only because of the \\b 's
return str.replace(new RegExp('\\b' + find + '\\b', "g"),replace);
}
function isDate(val: any) {
let isDate = false
if(val !== undefined && val !== null) {
if(typeof val === 'object') {
if(val._isAMomentObject === true) {
isDate = true
}
}
if(Object.prototype.toString.call(val) === '[object Date]') {
isDate = true
}
}
// console.log("is ",val," a date: ",isDate)
return isDate
}
let myOrigString = "'PASSWORD' === 'CONFIRM_PASSWORD'";
for (let i = 0; i < 3000; i++) {
let formValue = generateRandomString(50)
let escapedFormValue = escapeSpecialCharactersInTestString(formValue)
let myReplacedString = replaceAll(myOrigString, 'PASSWORD', escapedFormValue)
myReplacedString = replaceAll(myReplacedString, 'CONFIRM_PASSWORD', escapedFormValue)
if (i > 0 && i % 100 == 0) {
console.log("at iteration: "+i)
}
if (evaluateConditionWithTrueOrFalseResult(myReplacedString) === false) {
console.log(myReplacedString, " failed at iteration: " + i)
break;
}
}
My intention is to build a simple process with which I can split the word into syllables. The approach is to split the word whenever the vowel occurs. However, the trouble is when a consonant is not followed by a vowel, in such a case the split occurs at that consonant.
My test cases are as follows:
hair = ["hair"]
hairai = ["hai", "rai"]
hatred = ["hat", "red"]
In the first example hair is one syllable, as the final consonant is not followed by a vowel, similarly, in the final example, the "t" is followed by an r and so should considered along "ha" as one syllable.
In the second example, ai is considered as one vowel sound and so hai will become one syllable.
More examples include
father = ["fat", "her"]
kid = ["kid"]
lady = ["la","dy"]
Please note that, I am using simplistic examples as the ENglish language is quite complex when it comes to sound
My code is as follows
function syllabify(input) {
var arrs = [];
for (var i in input) {
var st = '';
var curr = input[i];
var nxt = input[i + 1];
if ((curr == 'a') || (curr == 'e') || (curr == 'i') || (curr == 'o') || (curr == 'u')) {
st += curr;
} else {
if ((nxt == 'a') || (nxt == 'e') || (nxt == 'i') || (nxt == 'o') || (nxt == 'u')) {
st += nxt;
} else {
arrs.push(st);
st = '';
}
}
}
console.log(arrs);
}
syllabify('hatred')
However, my code does not even return the strings. What am I doing wrong?
Problems with your current approach
There are a number of problems with your code:
First thing in the loop, you set st to an empty string. This means that you never accumulate any letters. You probably want that line above, outside the loop.
You are trying to loop over the indexes of letters by using i in input. In JavaScript, the in keyword gives you the keys of an object as strings. So you get strings, not numbers, plus the names of some methods defined on strings. Try var i = 0; i < input.length; i++ instead.
Maybe not the direct cause of the problems, but still - your code is messy. How about these?
Use clearer names. currentSyllable instead of st, syllables instead of arrs and so on.
Instead of a nested if - else, use one if - else if - else.
You repeat the same code that checks for vowels twice. Separate it into a function isVowel(letter) instead.
A new approach
Use regular expressions! Here is your definition of a syllable expressed in regex:
First, zero or more consonants: [^aeiouy]*
Then, one or more vowels: [aeiouy]+
After that, zero or one of the following:
Consonants, followed by the end of the word: [^aeiouy]*$
A consonant (if it is followed by another consonant): [^aeiouy](?=[^aeiouy])
Taken together you get this:
/[^aeiouy]*[aeiouy]+(?:[^aeiouy]*$|[^aeiouy](?=[^aeiouy]))?/gi
You can see it in action here. To run it in JavaScript, use the match function:
const syllableRegex = /[^aeiouy]*[aeiouy]+(?:[^aeiouy]*$|[^aeiouy](?=[^aeiouy]))?/gi;
function syllabify(words) {
return words.match(syllableRegex);
}
console.log(['away', 'hair', 'halter', 'hairspray', 'father', 'lady', 'kid'].map(syllabify))
Note that this does not work for words without vowels. You would either have to modify the regex to accomodate for that case, or do some other workaround.
I am weak in the ways of RegEx and while Anders example is right most of the time, I did find a few exceptions. Here is what I have found to work so far (but I am sure there are other exceptions I have not found yet). I am sure it can be RegEx'ified by masters of the art. This function returns an array of syllables.
function getSyllables(word){
var response = [];
var isSpecialCase = false;
var nums = (word.match(/[aeiou]/gi) || []).length;
//debugger;
if (isSpecialCase == false && (word.match(/[0123456789]/gi) || []).length == word.length ){
// has digits
response.push(word);
isSpecialCase = true;
}
if (isSpecialCase == false && word.length < 4){
// three letters or less
response.push(word);
isSpecialCase = true;
}
if (isSpecialCase == false && word.charAt(word.length-1) == "e"){
if (isVowel(word.charAt(word.length-2)) == false){
var cnt = (word.match(/[aeiou]/gi) || []).length;
if (cnt == 3){
if (hasDoubleVowels(word)){
// words like "piece, fleece, grease"
response.push(word);
isSpecialCase = true;
}
}
if (cnt == 2){
// words like "phase, phrase, blaze, name",
if (hasRecurringConsonant(word) == false) {
// but not like "syllable"
response.push(word);
isSpecialCase = true;
}
}
}
}
if (isSpecialCase == false){
const syllableRegex = /[^aeiouy]*[aeiouy]+(?:[^aeiouy]*$|[^aeiouy](?=[^aeiouy]))?/gi;
response = word.match(syllableRegex);
}
return response;
}
The function compress() would accept a sentence and return a string with all the blanks and punctuation removed. This function must call isWhiteSpace() and isPunct().
I've already done the functions to call, but I don't know what's missing from my js code to make it call the functions.
function compress(sent) {
var punc = "; : . , ? ! - '' "" () {}";
var space = " ";
if (punc.test(param)) {
return true
} else {
return false
}
if (space.test(param)) {
return true
} else {
return false
}
isWhiteSpace(x);
isPunct(x);
}
This function must call isWhiteSpace() and isPunct().
So you already have two functions which I assume return true when the passed character is either whitespace or a punctuation mark. Then you need not and should not duplicate this functionality by implementing a duplicate regex based text for whitespace and punctuation in your code. Keep it DRY - don't repeat yourself.
A compress function based on these two functions would look as follows:
function isWhiteSpace(char) {
return " \t\n".includes(char);
}
function isPunct(char) {
return ";:.,?!-'\"(){}".includes(char);
}
function compress(string) {
return string
.split("")
.filter(char => !isWhiteSpace(char) && !isPunct(char))
.join("");
}
console.log(compress("Hi! How are you?"));
I agree that a regex test would probably the to-go choice in a real world scenario:
function compress(string) {
return string.match(/\w/g).join("");
}
However, you specifically asked for a solution which calls isWhiteSpace and isPunct.
You can leverage String.indexOf to design the isPunct function.
function isPunct(x) {
// list of punctuation from the original question above
var punc = ";:.,?!-'\"(){}";
// if `x` is not found in `punc` this `x` is not punctuation
if(punc.indexOf(x) === -1) {
return false;
} else {
return true;
}
}
Solving isWhiteSpace is easier.
function isWhiteSpace(x) {
if(x === ' ') {
return true;
} else {
return false;
}
}
You can put it all together with a loop that checks every character in a string using String.charAt:
function compress(sent) {
// a temp string
var compressed = '';
// check every character in the `sent` string
for(var i = 0; i < sent.length; i++) {
var letter = sent.charAt(i);
// add non punctuation and whitespace characters to `compressed` string
if(isPunct(letter) === false && isWhiteSpace(letter) === false) {
compressed += letter;
}
}
// return the temp string which has no punctuation or whitespace
return compressed;
}
If you return something in a function, execution will stop.
From what I can see, your function doesn't need to return anything... So you should just do this
function compress(sent) {
var punc = ";:.,?!-'\"(){} ";
var array = punc.split("");
for (x = 0; x < array.length; x++) {
sent = sent.replace(array[x], "");
}
isWhiteSpace(x);
isPunct(x);
return sent;
}
I am sure there is probably a dupe of this here somewhere, but if so I cannot seem to find it, nor can I glue the pieces together correctly from what I could find to get what I need. I am using JavaScript and need the following:
1) Replace the first character of a string with it's Unicode aware capitalization UNLESS the next (second) character is a - OR ` or ' (minus/dash, caret, or single-quote).
I have come close with what I could find except for getting the caret and single quote included (assuming they need to be escaped somehow) and what I believe to be a scope issue with the following because first returns undefined. I am also not positive which JS/String functions are Unicode aware:
autoCorrect = (str) => {
return str.replace(/^./, function(first) {
// if next char is not - OR ` OR ' <- not sure how to handle caret and quote
if(str.charAt(1) != '-' ) {
return first.toUpperCase(); // first is undefined here - scope??
}
});
}
Any help is appreciated!
Internally, JavaScript uses UCS-2, not UTF-8.
Handling Unicode in JavaScript isn't particularly beautiful, but possible. It becomes particularly ugly with surrogate pairs such as "🐱", but the for..of loop can handle that. Do never try to use indices on Unicode strings, as you might get only one half of a surrogate pair (which breaks Unicode).
This should handle Unicode well and do what you want:
function autoCorrect(string) {
let i = 0, firstSymbol;
const blacklist = ["-", "`", "'"];
for (const symbol of string) {
if (i === 0) {
firstSymbol = symbol;
}
else if (i === 1 && blacklist.some(char => char === symbol)) {
return string;
}
else {
const rest = string.substring(firstSymbol.length);
return firstSymbol.toUpperCase() + rest;
}
++i;
}
return string.toUpperCase();
}
Tests
console.assert(autoCorrect("δα") === "Δα");
console.assert(autoCorrect("🐱") === "🐱");
console.assert(autoCorrect("d") === "D");
console.assert(autoCorrect("t-minus-one") === "t-minus-one");
console.assert(autoCorrect("t`minus`one") === "t`minus`one");
console.assert(autoCorrect("t'minus'one") === "t'minus'one");
console.assert(autoCorrect("t^minus^one") === "T^minus^one");
console.assert(autoCorrect("t_minus_one") === "T_minus_one");
var regex = /[A-Za-z]\d[A-Za-z] ?\d[A-Za-z]\d/;
var match = regex.exec(value);
if (match){
if ( (value.indexOf("-") !== -1 || value.indexOf(" ") !== -1 ) && value.length() == 7 ) {
return true;
} else if ( (value.indexOf("-") == -1 || value.indexOf(" ") == -1 ) && value.length() == 6 ) {
return true;
}
} else {
return false;
}
The regex looks for the pattern A0A 1B1.
true tests:
A0A 1B1
A0A-1B1
A0A1B1
A0A1B1C << problem child
so I added a check for "-" or " " and then a check for length.
Is there a regex, or more efficient method?
User kind, postal code strict, most efficient format:
/^[ABCEGHJ-NPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ -]?\d[ABCEGHJ-NPRSTV-Z]\d$/i
Allows:
h2t-1b8
h2z 1b8
H2Z1B8
Disallows:
Z2T 1B8 (leading Z)
H2T 1O3 (contains O)
Leading Z,W or to contain D, F, I, O, Q or U
Add anchors to your pattern:
var regex = /^[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d$/;
^ means "start of string" and $ means "end of string". Adding these anchors will prevent the C from slipping in to the match since your pattern will now expect a whole string to consist of 6 (sometimes 7--as a space) characters. This added bonus should now alleviate you of having to subsequently check the string length.
Also, since it appears that you want to allow hyphens, you can slip that into an optional character class that includes the space you were originally using. Be sure to leave the hyphen as either the very first or very last character; otherwise, you will need to escape it (using a leading backslash) to prevent the regex engine from interpreting it as part of a character range (e.g. A-Z).
This one handles us and ca codes.
function postalFilter (postalCode) {
if (! postalCode) {
return null;
}
postalCode = postalCode.toString().trim();
var us = new RegExp("^\\d{5}(-{0,1}\\d{4})?$");
var ca = new RegExp(/([ABCEGHJKLMNPRSTVXY]\d)([ABCEGHJKLMNPRSTVWXYZ]\d){2}/i);
if (us.test(postalCode.toString())) {
return postalCode;
}
if (ca.test(postalCode.toString().replace(/\W+/g, ''))) {
return postalCode;
}
return null;
}
// these 5 return null
console.log(postalFilter('1a1 a1a'));
console.log(postalFilter('F1A AiA'));
console.log(postalFilter('A12345-6789'));
console.log(postalFilter('W1a1a1')); // no "w"
console.log(postalFilter('Z1a1a1')); // ... or "z" allowed in first position!
// these return canada postal less space
console.log(postalFilter('a1a 1a1'));
console.log(postalFilter('H0H 0H0'));
// these return unaltered
console.log(postalFilter('H0H0H0'));
console.log(postalFilter('a1a1a1'));
console.log(postalFilter('12345'));
console.log(postalFilter('12345-6789'));
console.log(postalFilter('123456789'));
// strip spaces
console.log(postalFilter(' 12345 '));
You have a problem with the regex StatsCan has posted the rules for what is a valid Canadian postal code:
The postal code is a six-character code defined and maintained by
Canada Post Corporation (CPC) for the purpose of sorting and
delivering mail. The characters are arranged in the form ‘ANA NAN’,
where ‘A’ represents an alphabetic character and ‘N’ represents a
numeric character (e.g., K1A 0T6). The postal code uses 18 alphabetic
characters and 10 numeric characters. Postal codes do not include the
letters D, F, I, O, Q or U, and the first position also does not make
use of the letters W or Z.
The regex should be if you wanted it strict.
/^[ABCEGHJ-NPRSTVXY][0-9][ABCEGHJ-NPRSTV-Z] [0-9][ABCEGHJ-NPRSTV-Z][0-9]$/
Also \d means number not necessarily 0-9 there may be the one errant browser that treats it as any number in unicode space which would likely cause issues for you downstream.
from: https://trajano.net/2017/05/canadian-postal-code-validation/
This is a function that will do everything for you in one shot. Accepts AAA BBB and AAABBB with or without space.
function go_postal(){
let postal = $("#postal").val();
var regex = /^[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d$/;
var pr = regex .test(postal);
if(pr === true){
//all good
} else {
// not so much
}
}
function postalFilter (postalCode, type) {
if (!postalCode) {
return null;
}
postalCode = postalCode.toString().trim();
var us = new RegExp("^\\d{5}(-{0,1}\\d{4})?$");
// var ca = new RegExp(/^((?!.*[DFIOQU])[A-VXY][0-9][A-Z])|(?!.*[DFIOQU])[A-VXY][0-9][A-Z]\ ?[0-9][A-Z][0-9]$/i);
var ca = new RegExp(/^[ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ]( )?\d[ABCEGHJKLMNPRSTVWXYZ]\d$/i);
if(type == "us"){
if (us.test(postalCode.toString())) {
console.log(postalCode);
return postalCode;
}
}
if(type == "ca")
{
if (ca.test(postalCode.toString())) {
console.log(postalCode);
return postalCode;
}
}
return null;
}
regex = new RegExp(/^[ABCEGHJ-NPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][-]?\d[ABCEGHJ-NPRSTV-Z]\d$/i);
if(regex.test(value))
return true;
else
return false;
This is a shorter version of the original problem, where value is any text value. Furthermore, there is no need to test for value length.