Inverting a rather complex set of regexes

Inverting a rather complex set of regexes - javascript

I'm sort of new to regular expressions, and none of the solutions I found online helped/worked.
I'm dealing with a one-line String in JavaScript, it'll contain five types of data mixed in.
A "#" followed by six numbers/letters (HTML color) (/#....../g)
A forward slash followed by any of a few specific characters (/\/(\+|\^|\-|#|!\+|_|#|\*|%|&|~)/g)
A "$" followed by a sequence of letters and a "|" (/\$([^\|]+)/g)
A "|" alone (/\|/g)
Alphanumeric characters that do not fall under any of these categories
The thing is, I have regexes to match the first four categories, that are important.
The problem is that I need a single Regex that I'll use to replace all the characters that DO NOT match for the first four regexes with a single character, such as "§".
Example:
This#00CC00 is green$Courier| and /^mono|spaced
§§§§#00CC00§§§§§§§§§$Courier|§§§§§/^§§§§|§§§§§§
I know I may be attacking this problem the wrong way, I'm rather new to regular expressions.
Essentially, how do I make a regex that means "anything that doesn't have any matches for regexes x, y, or z"?
Thank you for your time.

use this pattern
((#\w{6}|\/[\/\(\+\^\-]|\$\w+\||\|)*).
and replace w/ $1§
Downside is your preserved pattern has to be followed by at least one character
Demo
( # Capturing Group (1)
( # Capturing Group (2)
# # "#"
\w # <ASCII letter, digit or underscore>
{6} # (repeated {6} times)
| # OR
\/ # "/"
[\/\(\+\^\-] # Character Class [\/\(\+\^\-]
| # OR
\$ # "$"
\w # <ASCII letter, digit or underscore>
+ # (one or more)(greedy)
\| # "|"
| # OR
\| # "|"
) # End of Capturing Group (2)
* # (zero or more)(greedy)
) # End of Capturing Group (1)
. # Any character except line break
Code copied from Regex101
var re = /((#\w{6}|\/[\/\(\+\^\-]|\$\w+\||\|)*)./gm;
var str = 'This#00CC00 is green$Courier| and /^mono|spaced|\n';
var subst = '$1§';
var result = str.replace(re, subst);

This isn't as efficient as a working regular expression but it works. Basically it gets all of the matches and fills the parts between with § characters. One nice thing is you don't have to be a regular expression genius to update it, so hopefully more people can use it.
var str = 'This#00CC00 is green$Courier| and /^mono|spaced';
var patt=/#(\d|\w){6}|\/(\+|\^|\-|#|!\+|_|#|\*|%|&|~)|\$([^\|]+)\||\|/g;
var ret = "";
pos = [];
while (match=patt.exec(str)) {
pos.push(match.index);
pos.push(patt.lastIndex);
console.log(match.index + ' ' + patt.lastIndex);
}
for (var i=0; i<pos.length; i+=2) {
ret += Array(1+pos[i]- (i==0 ? 0 : pos[i-1])).join("§");
ret += str.substring(pos[i], pos[i+1]);
}
ret += Array(1+str.length-pos[pos.length-1]).join("§");
document.body.innerHTML = str +"<br>"+ret;
console.log(str);
console.log(ret);
demo here

Related

Regex string contain just digits and *,#,+ in javascript

I am trying to implement a function in Javascript that verify a string.
The pattern i must to do is contain only digits charactor, *, # and +.
For example:
+182031203
+12312312312*#
+2131*1231#
*12312+#
#123*########
I tried alot of thing like
/^[\d]{1,}[*,#,+]{1,}$
but it's doesn't work. I am not sure that i understand good in regex. Please help.

I think you want the pattern ^[0-9*#+]+$:
var inputs = ["+182031203", "+12312312312*#", "+2131*1231#", "12312+#", "#123########"];
for (var i=0; i < inputs.length; ++i) {
console.log(inputs[i] + " => " + /^[0-9*#+]+$/.test(inputs[i]));
}

Using Regex101
/^[0-9\*\#\+]+$/g
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
* matches the character * with index 4210 (2A16 or 528) literally (case sensitive)
# matches the character # with index 3510 (2316 or 438) literally (case sensitive)
+ matches the character + with index 4310 (2B16 or 538) literally (case sensitive)

Try this regular expression:
const rxDigitsAndSomeOtherCharacters = /^(\d+|[*#]+)+$/;
Breaking it down:
^ start-of-text, followed by
( a [capturing] group, consisting of
\d+ one or more digits
| or...
[*#+]+ one ore more of *, # or +
)+, the [capturing] group being repeated 1 or more times, and followed by
$ end-of-text

Regex to validate a comma separated list of unique numbers

I am trying to validate a comma separated list of numbers 1-7 unique (not repeating).
i.e.
2,4,6,7,1 is valid input.
2,2,6 is invalid
2 is valid
2, is invalid
1,2,3,4,5,6,7,8 is invalid ( only 7 number)
I tried ^[1-7](?:,[1-7])*$ but it's accepting repeating numbers
var data = [
'2,4,6,7,1',
'2,2,6',
'2',
'2,',
'1,2,3,2',
'1,2,2,3',
'1,2,3,4,5,6,7,8'
];
data.forEach(function(str) {
document.write(str + ' gives ' + /(?!([1-7])(?:(?!\1).)\1)^((?:^|,)[1-7]){1,7}$/.test(str) + '<br/>');
});

Regex are not suited for this. You should split the list into an array and try the different conditions:
function isValid(list) {
var arrList = list.split(",");
if (arrList.length > 7) { // if more than 7, there are duplicates
return false;
}
var temp = {};
for (var i in arrList) {
if (arrList[i] === "") return false; // if empty element, not valid
temp[arrList[i]] = "";
}
if (Object.keys(temp).length !== arrList.length) { // if they're not of same length, there are duplicates
return false;
}
return true;
}
console.log(isValid("2,4,6,7,1")); // true
console.log(isValid("2,2,6")); // false
console.log(isValid("2")); // true
console.log(isValid("2,")); // false
console.log(isValid("1,2,3,4,5,6,7,8")); // false
console.log(isValid("1,2,3")); // true
console.log(isValid("1,2,3,7,7")); // false

No RegEx is needed:
This is much more maintainable and explicit than a convoluted regular expression would be.
function isValid(a) {
var s = new Set(a);
s.delete(''); // for the hanging comma case ie:"2,"
return a.length < 7 && a.length == s.size;
}
var a = '2,4,6,7,1'.split(',');
alert(isValid(a)); // true
a = '2,2,6'.split(',');
alert(isValid(a)); // false
a = '2'.split(',');
alert(isValid(a)); // true
a = '2,'.split(',');
alert(isValid(a)); // false
'1,2,3,4,5,6,7,8'.split(',');
alert(isValid(a)); // false

You were pretty close.
^ # BOS
(?! # Validate no dups
.*
( [1-7] ) # (1)
.*
\1
)
[1-7] # Unrolled-loop, match 1 to 7 numb's
(?:
,
[1-7]
){0,6}
$ # EOS
var data = [
'2,4,6,7,1',
'2,2,6',
'2',
'2,',
'1,2,3,2',
'1,2,2,3',
'1,2,3,4,5,6,7,8'
];
data.forEach(function(str) {
document.write(str + ' gives ' + /^(?!.*([1-7]).*\1)[1-7](?:,[1-7]){0,6}$/.test(str) + '<br/>');
});
Output
2,4,6,7,1 gives true
2,2,6 gives false
2 gives true
2, gives false
1,2,3,2 gives false
1,2,2,3 gives false
1,2,3,4,5,6,7,8 gives false
For a number range that exceeds 1 digit, just add word boundary's around
the capture group and the back reference.
This isolates a complete number.
This particular one is numb range 1-31
^ # BOS
(?! # Validate no dups
.*
( # (1 start)
\b
(?: [1-9] | [1-2] \d | 3 [0-1] ) # number range 1-31
\b
) # (1 end)
.*
\b \1 \b
)
(?: [1-9] | [1-2] \d | 3 [0-1] ) # Unrolled-loop, match 1 to 7 numb's
(?: # in the number range 1-31
,
(?: [1-9] | [1-2] \d | 3 [0-1] )
){0,6}
$ # EOS
var data = [
'2,4,6,7,1',
'2,2,6',
'2,30,16,3',
'2,',
'1,2,3,2',
'1,2,2,3',
'1,2,3,4,5,6,7,8'
];
data.forEach(function(str) {
document.write(str + ' gives ' + /^(?!.*(\b(?:[1-9]|[1-2]\d|3[0-1])\b).*\b\1\b)(?:[1-9]|[1-2]\d|3[0-1])(?:,(?:[1-9]|[1-2]\d|3[0-1])){0,6}$/.test(str) + '<br/>');
});

Like other commenters, I recommend you to use something other than regular expressions to solve your problem.
I have a solution, but it is too long to be a valid answer here (answers are limited to 30k characters). My solution is actually a regular expression in the language-theory sense, and is 60616 characters long. I will show you here the code I used to generate the regular expression, it is written in Python, but easily translated in any language you desire. I confirmed that it is working in principle with a smaller example (that uses only the numbers 1 to 3):
^(2(,(3(,1)?|1(,3)?))?|3(,(1(,2)?|2(,1)?))?|1(,(3(,2)?|2(,3)?))?)$
Here's the code used to generate the regex:
def build_regex(chars):
if len(chars) == 1:
return list(chars)[0]
return ('('
+
'|'.join('{}(,{})?'.format(c, build_regex(chars - {c})) for c in chars)
+
')')
Call it like this:
'^' + build_regex(set("1234567")) + "$"
The concept is the following:
To match a single number a, we can use the simple regex /a/.
To match two numbers a and b, we can match the disjunction /(a(,b)?|b(,a)?)/
Similarily, to match n numbers, we match the disjunction of all elements, each followed by the optional match for the subset of size n-1 not containing that element.
Finally, we wrap the expression in ^...$ in order to match the entire text.

Edit:
Fixed error when repeating digit wasn't the first one.
One way of doing it is:
^(?:(?:^|,)([1-7])(?=(?:,(?!\1)[1-7])*$))+$
It captures a digit and then uses a uses a look-ahead to make sure it doesn't repeats itself.
^ # Start of line
(?: # Non capturing group
(?: # Non capturing group matching:
^ # Start of line
| # or
, # comma
) #
([1-7]) # Capture digit being between 1 and 7
(?= # Positive look-ahead
(?: # Non capturing group
, # Comma
(?!\1)[1-7] # Digit 1-7 **not** being the one captured earlier
)* # Repeat group any number of times
$ # Up to end of line
) # End of positive look-ahead
)+ # Repeat group (must be present at least once)
$ # End of line
var data = [
'2,4,6,7,1',
'2,2,6',
'2',
'2,',
'1,2,3,4,5,6,7,8',
'1,2,3,3,6',
'3,1,5,1,8',
'3,2,1'
];
data.forEach(function(str) {
document.write(str + ' gives ' + /^(?:(?:^|,)([1-7])(?=(?:,(?!\1)[1-7])*$))+$/.test(str) + '<br/>');
});
Note! Don't know if performance is an issue, but this does it in almost half the number of steps compared to sln's solution ;)

Javascript regexp capture matches delimited by character

I have a string like
classifier1:11:some text1##classifier2:fdglfgfg##classifier3:fgdfgfdg##classifier4
I am trying to capture terms like classifier1:11, classifier2:, classifier3 and classifier4
So these classifiers can be followed by a single semicolon or not.
So far I came up with
/([^#]*)(?::(?!:))/g
But that does not seem to capture classifier4, not sure what I am missing here

It seems that a classifier in your case consists of any word chars that may have single : in between and ends with a digit.
Thus, you may use
/(\w+(?::+\w+)*\d)[^#]*/g
See the regex demo
Explanation:
(\w+(?::+\w+)*\d) - Group 1 capturing
\w+ - 1 or more [a-zA-Z0-9_] (word) chars
(?::+\w+)* - zero or more sequences of 1+ :s and then 1+ word chars
\d - a digit should be at the end of this group
[^#]* - zero or more characters other than the delimiter #.
JS:
var re = /(\w+(?::+\w+)*\d)[^#\n]*/g;
var str = 'classifier4##classifier1:11:some text1##classifier2:fdglfgfg##classifier3:fgdfgfdg\nclassifier1:11:some text1##classifier4##classifier2:fdglfgfg##classifier3:fgdfgfdg##classifier4';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>";

Basing on your pattern you can use a regex like this:
([^#]*)(?::|$)
Working demo

Regular expression match all except first occurence

I need a regular expression to match all occurrences of a dot (.) except the first one.
For example if the source is:
aaa.bbb.ccc..ddd
the expression should match the dots after bbb and ccc but not the dot after aaa. In other works it should match all dots except the first one.
I need it for javascript regex.

with pcre (PHP, R) you can do that:
\G(?:\A[^.]*\.)?+[^.]*\K\.
demo
details:
\G # anchor for the start of the string or the position after a previous match
(?:\A[^.]*\.)?+ # start of the string (optional possessive quantifier)
[^.]* # all that is not a dot
\K # remove all that has been matched on the left from the match result
\. # the literal dot
With .net: (easy since you can use a variable length lookbehind)
(?<!^[^.]*)\.
demo
With javascript there is no way to do it with a single pattern.
using a placeholder:
var result = s.replace('.', 'PLACEHOLDER')
.replace(/\./g, '|')
.replace('PLACEHOLDER', '.');
(or replace all dots with | and then replace the first occurrence of | with a dot).
using split:
var parts = s.split('.');
var result = parts.shift() + (parts.length ? '.': '') + parts.join('|');
with a counter:
var counter = 0;
var result = s.replace(/\./g, (_) => counter++ ? '|' : '.');
With NodeJS (or any other implementation that allows lookbehinds):
var result = s.replace(/((?:^[^.]*\.)?(?<=.)[^.]*)\./g, "$1|");

One-line solution for JavaScript using arrow function (ES6):
'aaa.bbb.ccc..ddd'
.replace(/\./g, (c, i, text) => text.indexOf(c) === i ? c : '|')
-> 'aaa.bbb|ccc||ddd'

Regular expression to parse jQuery-selector-like string

text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
regex = /(.*?)\.filter\((.*?)\)/;
matches = text.match(regex);
log(matches);
// matches[1] is '#container a'
//matchss[2] is '.top'
I expect to capture
matches[1] is '#container a'
matches[2] is '.top'
matches[3] is '.bottom'
matches[4] is '.middle'
One solution would be to split the string into #container a and rest. Then take rest and execute recursive exec to get item inside ().
Update: I am posting a solution that does work. However I am looking for a better solution. Don't really like the idea of splitting the string and then processing
Here is a solution that works.
matches = [];
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var regex = /(.*?)\.filter\((.*?)\)/;
var match = regex.exec(text);
firstPart = text.substring(match.index,match[1].length);
rest = text.substring(matchLength, text.length);
matches.push(firstPart);
regex = /\.filter\((.*?)\)/g;
while ((match = regex.exec(rest)) != null) {
matches.push(match[1]);
}
log(matches);
Looking for a better solution.

This will match the single example you posted:
<html>
<body>
<script type="text/javascript">
text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
matches = text.match(/^[^.]*|\.[^.)]*(?=\))/g);
document.write(matches);
</script>
</body>
</html>
which produces:
#container a,.top,.bottom,.middle
EDIT
Here's a short explanation:
^ # match the beginning of the input
[^.]* # match any character other than '.' and repeat it zero or more times
#
| # OR
#
\. # match the character '.'
[^.)]* # match any character other than '.' and ')' and repeat it zero or more times
(?= # start positive look ahead
\) # match the character ')'
) # end positive look ahead
EDIT part II
The regex looks for two types of character sequences:
one ore more characters starting from the start of the string up to the first ., the regex: ^[^.]*
or it matches a character sequence starting with a . followed by zero or more characters other than . and ), \.[^.)]*, but must have a ) ahead of it: (?=\)). This last requirement causes .filter not to match.

You have to iterate, I think.
var head, filters = [];
text.replace(/^([^.]*)(\..*)$/, function(_, h, rem) {
head = h;
rem.replace(/\.filter\(([^)]*)\)/g, function(_, f) {
filters.push(f);
});
});
console.log("head: " + head + " filters: " + filters);
The ability to use functions as the second argument to String.replace is one of my favorite things about Javascript :-)

You need to do several matches repeatedly, starting where the last match ends (see while example at https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp/exec):
If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property. For example, assume you have this script:
var myRe = /ab*/g;
var str = "abbcdefabh";
var myArray;
while ((myArray = myRe.exec(str)) != null)
{
var msg = "Found " + myArray[0] + ". ";
msg += "Next match starts at " + myRe.lastIndex;
print(msg);
}
This script displays the following text:
Found abb. Next match starts at 3
Found ab. Next match starts at 9
However, this case would be better solved using a custom-built parser. Regular expressions are not an effective solution to this problem, if you ask me.

var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var result = text.split('.filter');
console.log(result[0]);
console.log(result[1]);
console.log(result[2]);
console.log(result[3]);

text.split() with regex does the trick.
var text = '#container a.filter(.top).filter(.bottom).filter(.middle)';
var parts = text.split(/(\.[^.()]+)/);
var matches = [parts[0]];
for (var i = 3; i < parts.length; i += 4) {
matches.push(parts[i]);
}
console.log(matches);

Develop Reference

JavaScript is the programming language of the Web.

Inverting a rather complex set of regexes - javascript

Related

Regex string contain just digits and *,#,+ in javascript

Regex to validate a comma separated list of unique numbers

Javascript regexp capture matches delimited by character

Regular expression match all except first occurence

Regular expression to parse jQuery-selector-like string

Categories

Resources