C# Regex.Split is working differently than JavaScript - javascript

I'm trying to convert this long JS regex to C#.
The JS code below gives 29 items in an array starting from ["","常","","に","","最新","、","最高"...]
var keywords = /(\ |[a-zA-Z0-9]+\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)/g;
var source = '常に最新、最高のモバイル。Androidを開発した同じチームから。';
var result = source.split(keywords);
But the C# code below gives a non-splitted single item in string[].
var keywords = #"/(\ |[a-zA-Z0-9]+\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)/g";
var source = #"常に最新、最高のモバイル。Androidを開発した同じチームから。";
var result = Regex.Split(source, keywords);
Many questions in Stack Overflow are covering relatively simple expressions only, so I cannot find my mistakes.
What am I missing?

Your RegEx is wrong, you should not start and end with '/' or '/g' You specify a string in the constructor, not a JavaScript Regex (with '/ /' syntax.). That's a Javascript syntax.
Actually the same applies to JavaScript when you use a string constructor like this:
var regex = new RegExp('//'); // This will match 2 slashes

Here is a C# example code
string keywords = #"(\ |[a-zA-Z0-9]+\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)";
string source = #"常に最新、最高のモバイル。Androidを開発した同じチームから。";
string [] res = Regex.Split(source, keywords);
string single = "";
foreach ( string str in res )
single += "'" + str + "',";
Console.WriteLine("{0}", single);

Related

How to put a variable in my JS regular expression? [duplicate]

I want to add a (variable) tag to values with regex, the pattern works fine with PHP but I have troubles implementing it into JavaScript.
The pattern is (value is the variable):
/(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/is
I escaped the backslashes:
var str = $("#div").html();
var regex = "/(?!(?:[^<]+>|[^>]+<\\/a>))\\b(" + value + ")\\b/is";
$("#div").html(str.replace(regex, "" + value + ""));
But this seem not to be right, I logged the pattern and its exactly what it should be.
Any ideas?
To create the regex from a string, you have to use JavaScript's RegExp object.
If you also want to match/replace more than one time, then you must add the g (global match) flag. Here's an example:
var stringToGoIntoTheRegex = "abc";
var regex = new RegExp("#" + stringToGoIntoTheRegex + "#", "g");
// at this point, the line above is the same as: var regex = /#abc#/g;
var input = "Hello this is #abc# some #abc# stuff.";
var output = input.replace(regex, "!!");
alert(output); // Hello this is !! some !! stuff.
JSFiddle demo here.
In the general case, escape the string before using as regex:
Not every string is a valid regex, though: there are some speciall characters, like ( or [. To work around this issue, simply escape the string before turning it into a regex. A utility function for that goes in the sample below:
function escapeRegExp(stringToGoIntoTheRegex) {
return stringToGoIntoTheRegex.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
var stringToGoIntoTheRegex = escapeRegExp("abc"); // this is the only change from above
var regex = new RegExp("#" + stringToGoIntoTheRegex + "#", "g");
// at this point, the line above is the same as: var regex = /#abc#/g;
var input = "Hello this is #abc# some #abc# stuff.";
var output = input.replace(regex, "!!");
alert(output); // Hello this is !! some !! stuff.
JSFiddle demo here.
Note: the regex in the question uses the s modifier, which didn't exist at the time of the question, but does exist -- a s (dotall) flag/modifier in JavaScript -- today.
If you are trying to use a variable value in the expression, you must use the RegExp "constructor".
var regex = "(?!(?:[^<]+>|[^>]+<\/a>))\b(" + value + ")\b";
new RegExp(regex, "is")
I found I had to double slash the \b to get it working. For example to remove "1x" words from a string using a variable, I needed to use:
str = "1x";
var regex = new RegExp("\\b"+str+"\\b","g"); // same as inv.replace(/\b1x\b/g, "")
inv=inv.replace(regex, "");
You don't need the " to define a regular expression so just:
var regex = /(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/is; // this is valid syntax
If value is a variable and you want a dynamic regular expression then you can't use this notation; use the alternative notation.
String.replace also accepts strings as input, so you can do "fox".replace("fox", "bear");
Alternative:
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(value)\b/", "is");
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(" + value + ")\b/", "is");
var regex = new RegExp("/(?!(?:[^<]+>|[^>]+<\/a>))\b(.*?)\b/", "is");
Keep in mind that if value contains regular expressions characters like (, [ and ? you will need to escape them.
I found this thread useful - so I thought I would add the answer to my own problem.
I wanted to edit a database configuration file (datastax cassandra) from a node application in javascript and for one of the settings in the file I needed to match on a string and then replace the line following it.
This was my solution.
dse_cassandra_yaml='/etc/dse/cassandra/cassandra.yaml'
// a) find the searchString and grab all text on the following line to it
// b) replace all next line text with a newString supplied to function
// note - leaves searchString text untouched
function replaceStringNextLine(file, searchString, newString) {
fs.readFile(file, 'utf-8', function(err, data){
if (err) throw err;
// need to use double escape '\\' when putting regex in strings !
var re = "\\s+(\\-\\s(.*)?)(?:\\s|$)";
var myRegExp = new RegExp(searchString + re, "g");
var match = myRegExp.exec(data);
var replaceThis = match[1];
var writeString = data.replace(replaceThis, newString);
fs.writeFile(file, writeString, 'utf-8', function (err) {
if (err) throw err;
console.log(file + ' updated');
});
});
}
searchString = "data_file_directories:"
newString = "- /mnt/cassandra/data"
replaceStringNextLine(dse_cassandra_yaml, searchString, newString );
After running, it will change the existing data directory setting to the new one:
config file before:
data_file_directories:
- /var/lib/cassandra/data
config file after:
data_file_directories:
- /mnt/cassandra/data
Much easier way: use template literals.
var variable = 'foo'
var expression = `.*${variable}.*`
var re = new RegExp(expression, 'g')
re.test('fdjklsffoodjkslfd') // true
re.test('fdjklsfdjkslfd') // false
Using string variable(s) content as part of a more complex composed regex expression (es6|ts)
This example will replace all urls using my-domain.com to my-other-domain (both are variables).
You can do dynamic regexs by combining string values and other regex expressions within a raw string template. Using String.raw will prevent javascript from escaping any character within your string values.
// Strings with some data
const domainStr = 'my-domain.com'
const newDomain = 'my-other-domain.com'
// Make sure your string is regex friendly
// This will replace dots for '\'.
const regexUrl = /\./gm;
const substr = `\\\.`;
const domain = domainStr.replace(regexUrl, substr);
// domain is a regex friendly string: 'my-domain\.com'
console.log('Regex expresion for domain', domain)
// HERE!!! You can 'assemble a complex regex using string pieces.
const re = new RegExp( String.raw `([\'|\"]https:\/\/)(${domain})(\S+[\'|\"])`, 'gm');
// now I'll use the regex expression groups to replace the domain
const domainSubst = `$1${newDomain}$3`;
// const page contains all the html text
const result = page.replace(re, domainSubst);
note: Don't forget to use regex101.com to create, test and export REGEX code.
var string = "Hi welcome to stack overflow"
var toSearch = "stack"
//case insensitive search
var result = string.search(new RegExp(toSearch, "i")) > 0 ? 'Matched' : 'notMatched'
https://jsfiddle.net/9f0mb6Lz/
Hope this helps

Javascript regex error " /?/: nothing to repeat " It worked fine earlier [duplicate]

This question already has answers here:
What does the "Nothing to repeat" error mean when using a regex in javascript?
(7 answers)
Closed 4 years ago.
I'm trying to clear a string of any invalid characters to be set as a directory.
Tried a number of methods and this one eventually worked[custom encoding] but now it doesn't, it says "nothing to repeat" in the console. What does that mean? using Chrome.
Here's the code(using random string):
var someTitle = "wa?";
var cleanTitle = cleanTitle(someTitle);
function cleanTitle(title){
var obstructions = ['\\','/',':','*','?','"','<','>','|'];
var solutions = [92,47,58,42,63,34,60,62,124];
var encodedTitle = title;
for (var obstruction = 0; obstruction < obstructions.length; obstruction++){
var char = obstructions[obstruction];
if (encodedTitle.includes(char)){
var enCode = "__i!__"+solutions[obstruction]+"__!i__";
var rEx = new RegExp(char,"g");
encodedTitle = encodedTitle.replace(rEx,enCode);
}
}
console.log("CLEAN: "+title);
console.log("ENCODED: "+encodedTitle);
return encodedTitle;
}
Heres the error:
Uncaught SyntaxError: Invalid regular expression: /?/: Nothing to
repeat
It points to this line -> var rEx = new RegExp(char,"g");
You need to escape some characters when using them as literals in a regular expression. Among those are most of the characters you have in your array.
Given your function replaces the obstruction characters with their ASCII code (and some wrapping __i!__), I would suggest to make your function a bit more concise, by performing the replacement with one regular expression, and a callback passed to .replace():
function cleanTitle(title){
return title.replace(/[\\/:*?"<>|]/g, function (ch) {
return "__i!__"+ch.charCodeAt(0)+"__!i__";
});
}
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);
...and if you are in an ES6 compatible environment:
var cleanTitle = t=>t.replace(/[\\/:*?"<>|]/g, c=>"__i!__"+c.charCodeAt(0)+"__!i__");
var someTitle = "wh*r* is |his?";
var result = cleanTitle(someTitle);
console.log(result);
The ? is a regex modifier. When you want to look for it (and build a regex with it), you need to escape it.
That beeing said, a harmless unuseful escaping doesn't hurt (or makes your other search params useable, as there are many modifiers or reserved chars in it) your other search params. So go with
var char = '\\' + obstructions[obstruction];
to replace them all with a (for the regex) string representation
/?/ is not a valid regex. For it to be a regex, you need /\?/.
Regex here would be awkward, as most of the characters need escaping. Instead, consider using a literal string replacement until it is no longer found:
while( encodedTitle.indexOf(char) > -1) {
encodedTitle = encodedTitle.replace(char,enCode);
}

JavaScript regular expression, return everything before and after a pattern

I have the need to return a string (filename) without certain data, example strings are:
eng_somerset_yeovil_montacute-house_962.jpg
eng_south-yorkshire_barnsley_wentworth-castle_0.jpg
eng_staffordshire_harriseahead_mow-cop-castle_1329317.jpg
eng_somerset_weston-super-mare_marine-lake-walkway_29113.jpg
These example strings need to be returned as the following:
eng_somerset_yeovil_montacute-house.jpg
eng_south-yorkshire_barnsley_wentworth-castle.jpg
eng_staffordshire_harriseahead_mow-cop-castle.jpg
eng_somerset_weston-super-mare_marine-lake-walkway.jpg
I've tried using the regex below, but i only see the pattern and after the pattern returned:
filename = fl.replace(/(^.*?(?=[_]{1}[0-9]{1,10}))/gi, '');
_962.jpg
_0.jpg
_1329317.jpg
_29113.jpg
Thanks for your help.
This regex should work:
var repl = str.replace(/_\d+(?=\.jpg$)/, "");
TESTING:
str = 'eng_somerset_yeovil_montacute-house_962.jpg';
var repl = str.replace(/_\d+(?=\.jpg$)/, "");
// eng_somerset_yeovil_montacute-house.jpg

Exclude characters from displaying?

I want to exclude characters from displaying in a vbulletin template.
For example, if a user writes:
"[Hello World] How are you?"
I want to ecxlude "[" and "]" all that's inside so it only displays:
"How are you?"
Is there a way to do this?
Use JavaScript string operations .getIndexOf() and .substring(). Get the position of the first bracket, get the position of the second bracket, split the string into 3 substrings, the middle section being between the two indexed values, and then add just the first and third substrings together. Like this:
var string = "[Hello World] How are you?";
var bracket1 = string.getIndexOf("[");
var bracket2 = string.getIndexOf("]");
var substring1 = string.substring(0,bracket1);
var substring2 = string.substring(bracket1,bracket2);
var substring3 = string.substring(bracket2,string.length);
var solution = substring 1 + " " + substring 3;
At least, that's the concept. Everything may not be right on, but you could futz with the numbers a little to get it perfect.
Or if you don't need to worry about what comes before [], simply use .split():
var string = "[Hello World] How are you?";
var solutionArray = string.split("]");
var solution = solutionArray[1];
Hope this helps!

JavaScript insert text after first parenthesis

everyone. I've got a string looks like
var s = "2qf/tqg4/ad(d=d,s(f)d)"
And I've got another string
var n = "abc = /fd/dsf/sdf/a.doc, "
What I want to do is insert n after the first '('
So it will look like
"2qf/tqg4/ad(abc = /fd/dsf/sdf/a.doc, d=d,s(f)d)"
Just use the replace function:
var result = s.replace("(", "("+n);
This barely needs REs.
var t = s.replace(/\(/, '('+n);
This doesn't need REs at all, as String.replace takes strings as well as REs to specify what should be replaced.
var t = s.replace('(', '('+n);

Categories

Resources