Regex match entire string while grouping

Regex match entire string while grouping - javascript

I'm trying to match a currency string that may or may not be suffixed with one of K, M, or Bn, and group them into two parts
Valid matches:
500 K // Expected grouping: ["500", "K"]
900,000 // ["900,000", ""]
2.3 Bn // ["2.3", "Bn"]
800M // ["800", "M"]
ps: I know the matches first item in match output array is the entire match string, the above expected grouping in only an example
The Regex I've got so far is this:
/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i
When I match it with a normal string, it does OK.
"898734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["898734 K", "898734", "K"] // output
"500,000".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["500,000", "500,000", ""]
Trouble is, it also matches space in there
"89 8734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/i)
=> ["89 ", "89", ""]
And I'm not sure why. So I thought I'd add /g option in there to match entire string, but now it doesn't group the matches.
"898734 K".match(/\b([-\d\,\.]+)\s?([M|Bn|K]?)\b/gi)
=> ["898734 K"]
What change do I need to make to get the regex behave as expected?

You could use a different regular expression, which looks for some numbers, a comma or dot and some other numbers as well, some whitepspace and the wanted letters.
var array = ['500 K', '900,000', '2.3 Bn', '800M'],
regex = /(\d+[.,]?\d*)\s*(K|Bn|M|$)/
array.forEach(function (a) {
var m = a.match(regex);
if (m) {
m.shift();
console.log(m);
}
});
.as-console-wrapper { max-height: 100% !important; top: 0; }

You have a problem and want to use a regex to solve the problem. Now you have two problems...
Joke aside, I think you can achieve what you want to do without any regex:
"".join([c for i, c in enumerate(itertools.takewhile(lambda c: c.isdigit() or c in ',.', s))]), s[i+1:]
I tried this with s="560 K", s="900,000", etc and it seems to work.

Related

regex exclude matches that don't meet one of two patterns separated by delimiter

In Javascript using string.match():
I have a string like: foo_2:asc,foo2:desc,foo3,foo4:wrong
the matches should look like ["foo_2:asc", "foo2:desc", "foo3"]
but instead the best I can get it to so far is a match returning ["foo_2:asc", "foo2:desc", "foo3", "wrong"]
the regex that I'm using currently for the above wrong match is: /([a-z0-9_]+?[:asc|:desc]*?)(?=,|$)/gi
I also need a regex that will return the opposite, i.e. find a match for all patterns between the delimiter that doesn't match the pattern rules of thing_1:asc, thing_1:desc, or thing_1 i.e. this would be used to validate the string, while the other would be used to gather the values (i.e. instead of splitting the string manually). So the result of the original would be ["foo4:wrong"] as the part of that string that doesn't meet the pattern.

Assuming that the only valid forms are words followed by one of :asc, :desc or nothing, you can do what you want by splitting the string, first on , and then on : and checking whether there are two values as a result of the last split and the second is not one of asc or desc:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => v.split(':').length == 2 && ['asc', 'desc'].indexOf(v.split(':')[1]) == -1);
console.log(errs);
If you must use regex, you can split on , and then filter based on the value not matching ^\w+(:(asc|desc))$:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.split(',').filter(v => !v.match(/^\w+(:(?:asc|desc))?$/));
console.log(errs);
If the format of the string is guaranteed to be \w+(:\w+)?(,\w+(:\w+)?)* you can simplify to this:
const str = 'foo_2:asc,foo2:desc,foo3,foo4:wrong';
const errs = str.match(/\w+:(?!(?:asc|desc)\b)\w+/g);
console.log(errs);

If you'd like regex for this purpose, you probably can just add start from coma or string start.
/(^|\,)([a-z0-9_]+?(:asc|:desc)*?)(?=,|$)/gi
also pay attention [:asc|:desc] changed to (:asc|:desc), to avoid false positive cases like:
foo5:aaa,foo6:d,foo7:,foo8|,et:c
it just matches by any char in square brackets.
Regarding opposite, try something like:
/(^|\,)(?!([a-z0-9_]+?(:asc|:desc)*?)(?=,|$))[^,$]+/gi
seems to do the job.

For the match I came up with
/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g
Example: https://regex101.com/r/QLJeDV/3/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))((\w+(?!:)|\w+(:asc|:desc)))(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or even
/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g
should work. Example: https://regex101.com/r/QLJeDV/6/
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".match(/(?<=(^|,))\w+(:asc|:desc)?(?=($|,))/g)
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
They are using lookahead and lookbehind.
For the "opposite", I don't know how to match something and then "negate" a later pattern, but only know how to negate the result of whether it is a complete match, so I had to split it. The "opposite":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo4:wrong' ]
and the "original":
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^((\w+(?!:)|\w+(:asc|:desc)))$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]
Or it can be simplified as:
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => !/^\w+(:asc|:desc)?$/.test(s))
[ 'foo4:wrong' ]
> "foo_2:asc,foo2:desc,foo3,foo4:wrong".split(",").filter(s => /^\w+(:asc|:desc)?$/.test(s))
[ 'foo_2:asc', 'foo2:desc', 'foo3' ]

why condition is always true in javascript?

Could you please tell me why my condition is always true? I am trying to validate my value using regex.i have few conditions
Name should not contain test "text"
Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
I do like this
https://jsfiddle.net/aoerLqkz/2/
var val = 'ab dd'
if (/test|[^a-z]|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i.test(val)) {
alert( 'match')
} else {
alert( 'false')
}
I tested my code with the following string and getting an unexpected result
input string "abc" : output fine :: "match"
input string "aaa" : output fine :: "match"
input string "aa a" : **output ** :: "match" why it is match ?? there is space between them why it matched ????
input string "sa c" : **output ** :: "match" why it is match ?? there is different string and space between them ????

The string sa c includes a space, the pattern [^a-z] (not a to z) matches the space.
Possibly you want to use ^ and $ so your pattern also matches the start and end of the string instead of looking for a match anywhere inside it.

there is space between them why it matched ????
Because of the [^a-z] part of your regular expression, which matches the space:
> /[^a-z]/i.test('aa a');
true

The issue is the [^a-z]. This means that any string that has a non-letter character anywhere in it will be a match. In your example, it is matching the space character.
The solution? Simply remove |[^a-z]. Without it, your regex meets all three criteria.
test checks if the value contains the word 'test'.
abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz checks if the value contains three sequential letters.
(.)\1\1 checks if any character is repeated three times.
Complete regex:
/test|(.)\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz/i`
I find it helpful to use a regex tester, like https://www.regexpal.com/, when writing regular expressions.
NOTE: I am assuming that the second criteria actually means "three consecutive letters", not "three consecutive characters" as it is written. If that is not true, then your regex doesn't meet the second criteria, since it only checks for three consecutive letters.

I would not do this with regular expresions, this expresion will always get more complicated and you have not the possibilities you had if you programmed this.
The rules you said suggest the concept of string derivative. The derivative of a string is the distance between each succesive character. It is specially useful dealing with password security checking and string variation in general.
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
//these strings have the same derivative: [0,0,0,0]
console.log(derivative('aaaaa'));
console.log(derivative('bbbbb'));
//these strings also have the same derivative: [1,1,1,1]
console.log(derivative('abcde'));
console.log(derivative('mnopq'));
//up and down: [1,-1, 1,-1, 1]
console.log(derivative('ababa'));
With this in mind you can apply your each of your rules to each string.
// Rules:
// 1. Name should not contain test "text"
// 2. Name should not contain three consecutive characters example "abc" , "pqr" ,"xyz"
// 3. Name should not contain the same character three times example "aaa", "ccc" ,"zzz"
const derivative = (str) => {
const result = [];
for(let i=1; i<str.length; i++){
result.push(str.charCodeAt(i) - str.charCodeAt(i-1));
}
return result;
};
const arrayContains = (master, sub) =>
master.join(",").indexOf( sub.join( "," ) ) == -1;
const rule1 = (text) => !text.includes('text');
const rule2 = (text) => !arrayContains(derivative(text),[1,1]);
const rule3 = (text) => !arrayContains(derivative(text),[0,0]);
const testing = [
"smthing textual",'abc','aaa','xyz','12345',
'1111','12abb', 'goodbcd', 'weeell'
];
const results = testing.map((input)=>
[input, rule1(input), rule2(input), rule3(input)]);
console.log(results);

Based on the 3 conditions in the post, the following regex should work.
Regex: ^(?:(?!test|([a-z])\1\1|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz).)*$
Demo

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");

You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}

You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");

([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);

var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

Javascript regex find variables in a math equation

I want to find in a math expression elements that are not wrapped between { and }
Examples:
Input: abc+1*def
Matches: ["abc", "1", "def"]
Input: {abc}+1+def
Matches: ["1", "def"]
Input: abc+(1+def)
Matches: ["abc", "1", "def"]
Input: abc+(1+{def})
Matches: ["abc", "1"]
Input: abc def+(1.1+{ghi})
Matches: ["abc def", "1.1"]
Input: 1.1-{abc def}
Matches: ["1.1"]
Rules
The expression is well-formed. (So there won't be start parenthesis without closing parenthesis or starting { without })
The math symbols allowed in the expression are + - / * and ( )
Numbers could be decimals.
Variables could contains spaces.
Only one level of { } (no nested brackets)
So far, I ended with: http://regex101.com/r/gU0dO4
(^[^/*+({})-]+|(?:[/*+({})-])[^/*+({})-]+(?:[/*+({})-])|[^/*+({})-]+$)
I split the task into 3:
match elements at the beginning of the string
match elements that are between two { and }
match elements at the end of the string
But it doesn't work as expected.
Any idea ?

Matching {}s, especially nested ones is hard (read impossible) for a standard regular expression, since it requires counting the number of {s you encountered so you know which } terminated it.
Instead, a simple string manipulation method could work, this is a very basic parser that just reads the string left to right and consumes it when outside of parentheses.
var input = "abc def+(1.1+{ghi})"; // I assume well formed, as well as no precedence
var inParens = false;
var output = [], buffer = "", parenCount = 0;
for(var i = 0; i < input.length; i++){
if(!inParens){
if(input[i] === "{"){
inParens = true;
parenCount++;
} else if (["+","-","(",")","/","*"].some(function(x){
return x === input[i];
})){ // got symbol
if(buffer!==""){ // buffer has stuff to add to input
output.push(buffer); // add the last symbol
buffer = "";
}
} else { // letter or number
buffer += input[i]; // push to buffer
}
} else { // inParens is true
if(input[i] === "{") parenCount++;
if(input[i] === "}") parenCount--;
if(parenCount === 0) inParens = false; // consume again
}
}

This might be an interesting regexp challenge, but in the real world you'd be much better off simply finding all [^+/*()-]+ groups and removing those enclosed in {}'s
"abc def+(1.1+{ghi})".match(/[^+/*()-]+/g).filter(
function(x) { return !/^{.+?}$/.test(x) })
// ["abc def", "1.1"]
That being said, regexes is not a correct way to parse math expressions. For serious parsing, consider using formal grammars and parsers. There are plenty of parser generators for javascript, for example, in PEG.js you can write a grammar like
expr
= left:multiplicative "+" expr
/ multiplicative
multiplicative
= left:primary "*" right:multiplicative
/ primary
primary
= atom
/ "{" expr "}"
/ "(" expr ")"
atom = number / word
number = n:[0-9.]+ { return parseFloat(n.join("")) }
word = w:[a-zA-Z ]+ { return w.join("") }
and generate a parser which will be able to turn
abc def+(1.1+{ghi})
into
[
"abc def",
"+",
[
"(",
[
1.1,
"+",
[
"{",
"ghi",
"}"
]
],
")"
]
]
Then you can iterate this array just normally and fetch the parts you're interested in.

The variable names you mentioned can be match by \b[\w.]+\b since they are strictly bounded by word separators
Since you have well formed formulas, the names you don't want to capture are strictly followed by }, therefore you can use a lookahead expression to exclude these :
(\b[\w.]+ \b)(?!})
Will match the required elements (http://regexr.com/38rch).
Edit:
For more complex uses like correctly matching :
abc {def{}}
abc def+(1.1+{g{h}i})
We need to change the lookahead term to (?|({|}))
To include the match of 1.2-{abc def} we need to change the \b1. This term is using lookaround expression which are not available in javascript. So we have to work around.
(?:^|[^a-zA-Z0-9. ])([a-zA-Z0-9. ]+(?=[^0-9A-Za-z. ]))(?!({|}))
Seems to be a good one for our examples (http://regex101.com/r/oH7dO1).
1 \b is the separation between a \w and a \W \z or \a. Since \w does not include space and \W does, it is incompatible with the definition of our variable names.

Going forward with user2864740's comment, you can replace all things between {} with empty and then match the remaining.
var matches = "string here".replace(/{.+?}/g,"").match(/\b[\w. ]+\b/g);
Since you know that expressions are valid, just select \w+

Regex split on upper case and first digit

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ]
I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ]
So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it.

Are you looking for this?
"thisIs12MyString".match(/[A-Z]?[a-z]+|[0-9]+/g)
returns
["this", "Is", "12", "My", "String"]

As I said in my comment, my approach would be to insert a special character before each sequence of digits first, as a marker:
"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)
where ~ could be any other character, preferably a non-printable one (e.g. a control character), as it is unlikely to appear "naturally" in a string.
In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy:
"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')
It might or might not perform better.

In my rhino console,
js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);
this,Is,12,My,String
another one,
js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return a;});
this,Is,12,My,String

You can fix the JS missing of lookbehinds working on the array split using your current regex.
Quick pseudo code:
var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {
if (isSingleDigit(word)) {
if (!digitsFlag) {
result.push(word);
} else {
result[result.length - 1] += word;
}
digitsFlag = true;
} else {
result.push(word);
digitsFlag = false;
}
});

I can't think of any ways to achieve this with a RegEx.
I think you will need to do it in code.
Please check the URL, same question different language (ruby) ->
The code is at the bottom:
http://code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/

Develop Reference

JavaScript is the programming language of the Web.

Regex match entire string while grouping - javascript

Related

regex exclude matches that don't meet one of two patterns separated by delimiter

why condition is always true in javascript?

Separating words with Regex

Javascript regex find variables in a math equation

Regex split on upper case and first digit

Categories

Resources