regex capturing with repeating pattern - javascript

I'm trying to capture all parts of a string, but I can't seem to get it right.
The string has this structure: 1+22+33. Numbers with an operator in between. There could be any number of terms.
What I want is ["1+22+33", "1", "+", "22", "+", "33"]
But I get: ["1+22+33", "22", "+", "33"]
I've tried all kinds of regexes, this is the best I've got, but it's obviously wrong.
let re = /(?:(\d+)([+]+))+(\d+)/g;
let s = '1+22+33';
let m;
while (m = re.exec(s))
console.log(m);
Note: the operators may vary. So in reality I'd look for [+/*-].

You can simply use String#split, like this:
const input = '3+8 - 12'; // I've willingly added some random spaces
console.log(input.split(/\s*(\+|-)\s*/)); // Add as many other operators as needed

Just thought of a solution: /(\d+)|([+*/-]+)/g;

You only have to split on digits:
console.log(
"1+22+33".split(/(\d+)/).filter(Boolean)
);

Related

How to separate string characters and mathematical characters from a string

I want to separate the alphabetic string or number string and mathematical characters from a string.
For example -
var test = "test1+test2*3+(test3*6)";
I want to separate it like this -
var result = ["test1", "+", "test2", "*", "3", "+", "(", "test3", "*", "6",")"];
Can anyone help me to get the result, Thanks in advance.
You can use a regular expression to either match a mathematical character, or any characters other than mathematical characters:
var test = "test1+test2*3+(test3*6)";
var result = test.match(/[+*()]|[^+*()]+/g);
console.log(result);
// ->
// ["test1","+","test2","*","3","+","(","test3","*","6",")"]
[+*()] - Match a single +, *, (, or ) (feel free to add whichever characters you want to isolate here)
| - OR
[^+*()]+ - Match anything but those characters, one or more times
This gives the required matches:
/(\w+|\W)/gm
\w+ matches any token/variable (like test1, test2 etc.)
\W matches all operators
var test = "test1+test2*3+(test3*6)";
var result = test.match(/\w+|\W/g);
console.log(result);
var test = "test1+test2*3+(test3*6)";
var temp=''
var arr=[];
for(var i in test){
if((/[^A-Za-z0-9]/).test(test[i]))
{arr.push(temp)
arr.push(test[i])
temp='';
}
else{
temp+=test[i]
}
}
console.log(arr.filter((value)=>{return value!==''}))

How to split a string based on space and a limit?

For eg, I have a string:
var s = "Tyrannosaurus Rex";
My limit is 4. So I want to split this string to 4 letter short strings into an array. Which can be done using:
var arr = s.match(/.{1,4}/g);
But the troubling thing is, it should consider space or \n as a splitting criteria as well. So that final output should be:
["Tyra", "nnos", "auru", "s", "Rex"]
and not this:
["Tyra", "nnos", "auru", "s Re", "x"]
Any clean solution would be helpful!
Simply modify your regex to not match spaces or new lines:
var s = "Tyrannosaurus Rex";
console.log(s.match(/[^ \n]{1,4}/g));
Use this /\S{1,4}/g. It will consider every character except space, tabs and newline.
var s = "Tyrannosaurus Rex";
console.log(s.match(/\S{1,4}/g));
Start by splitting it into words then applying your regexp:
var s = "Tyrannosaurus Rex";
var words = s.split(' ');
var arr = [].concat(...words.map(w => w.match(/.{1,4}/g)));
console.log(arr);
EDIT: Leaving this here but #le_m answer is provably better

Javascript regex find variables in a math equation

I want to find in a math expression elements that are not wrapped between { and }
Examples:
Input: abc+1*def
Matches: ["abc", "1", "def"]
Input: {abc}+1+def
Matches: ["1", "def"]
Input: abc+(1+def)
Matches: ["abc", "1", "def"]
Input: abc+(1+{def})
Matches: ["abc", "1"]
Input: abc def+(1.1+{ghi})
Matches: ["abc def", "1.1"]
Input: 1.1-{abc def}
Matches: ["1.1"]
Rules
The expression is well-formed. (So there won't be start parenthesis without closing parenthesis or starting { without })
The math symbols allowed in the expression are + - / * and ( )
Numbers could be decimals.
Variables could contains spaces.
Only one level of { } (no nested brackets)
So far, I ended with: http://regex101.com/r/gU0dO4
(^[^/*+({})-]+|(?:[/*+({})-])[^/*+({})-]+(?:[/*+({})-])|[^/*+({})-]+$)
I split the task into 3:
match elements at the beginning of the string
match elements that are between two { and }
match elements at the end of the string
But it doesn't work as expected.
Any idea ?
Matching {}s, especially nested ones is hard (read impossible) for a standard regular expression, since it requires counting the number of {s you encountered so you know which } terminated it.
Instead, a simple string manipulation method could work, this is a very basic parser that just reads the string left to right and consumes it when outside of parentheses.
var input = "abc def+(1.1+{ghi})"; // I assume well formed, as well as no precedence
var inParens = false;
var output = [], buffer = "", parenCount = 0;
for(var i = 0; i < input.length; i++){
if(!inParens){
if(input[i] === "{"){
inParens = true;
parenCount++;
} else if (["+","-","(",")","/","*"].some(function(x){
return x === input[i];
})){ // got symbol
if(buffer!==""){ // buffer has stuff to add to input
output.push(buffer); // add the last symbol
buffer = "";
}
} else { // letter or number
buffer += input[i]; // push to buffer
}
} else { // inParens is true
if(input[i] === "{") parenCount++;
if(input[i] === "}") parenCount--;
if(parenCount === 0) inParens = false; // consume again
}
}
This might be an interesting regexp challenge, but in the real world you'd be much better off simply finding all [^+/*()-]+ groups and removing those enclosed in {}'s
"abc def+(1.1+{ghi})".match(/[^+/*()-]+/g).filter(
function(x) { return !/^{.+?}$/.test(x) })
// ["abc def", "1.1"]
That being said, regexes is not a correct way to parse math expressions. For serious parsing, consider using formal grammars and parsers. There are plenty of parser generators for javascript, for example, in PEG.js you can write a grammar like
expr
= left:multiplicative "+" expr
/ multiplicative
multiplicative
= left:primary "*" right:multiplicative
/ primary
primary
= atom
/ "{" expr "}"
/ "(" expr ")"
atom = number / word
number = n:[0-9.]+ { return parseFloat(n.join("")) }
word = w:[a-zA-Z ]+ { return w.join("") }
and generate a parser which will be able to turn
abc def+(1.1+{ghi})
into
[
"abc def",
"+",
[
"(",
[
1.1,
"+",
[
"{",
"ghi",
"}"
]
],
")"
]
]
Then you can iterate this array just normally and fetch the parts you're interested in.
The variable names you mentioned can be match by \b[\w.]+\b since they are strictly bounded by word separators
Since you have well formed formulas, the names you don't want to capture are strictly followed by }, therefore you can use a lookahead expression to exclude these :
(\b[\w.]+ \b)(?!})
Will match the required elements (http://regexr.com/38rch).
Edit:
For more complex uses like correctly matching :
abc {def{}}
abc def+(1.1+{g{h}i})
We need to change the lookahead term to (?|({|}))
To include the match of 1.2-{abc def} we need to change the \b1. This term is using lookaround expression which are not available in javascript. So we have to work around.
(?:^|[^a-zA-Z0-9. ])([a-zA-Z0-9. ]+(?=[^0-9A-Za-z. ]))(?!({|}))
Seems to be a good one for our examples (http://regex101.com/r/oH7dO1).
1 \b is the separation between a \w and a \W \z or \a. Since \w does not include space and \W does, it is incompatible with the definition of our variable names.
Going forward with user2864740's comment, you can replace all things between {} with empty and then match the remaining.
var matches = "string here".replace(/{.+?}/g,"").match(/\b[\w. ]+\b/g);
Since you know that expressions are valid, just select \w+

Regex split on upper case and first digit

I need to split the string "thisIs12MyString" to an array looking like [ "this", "Is", "12", "My", "String" ]
I've got so far as to "thisIs12MyString".split(/(?=[A-Z0-9])/) but it splits on each digit and gives the array [ "this", "Is", "1", "2", "My", "String" ]
So in words I need to split the string on upper case letter and digits that does not have an another digit in front of it.
Are you looking for this?
"thisIs12MyString".match(/[A-Z]?[a-z]+|[0-9]+/g)
returns
["this", "Is", "12", "My", "String"]
As I said in my comment, my approach would be to insert a special character before each sequence of digits first, as a marker:
"thisIs12MyString".replace(/\d+/g, '~$&').split(/(?=[A-Z])|~/)
where ~ could be any other character, preferably a non-printable one (e.g. a control character), as it is unlikely to appear "naturally" in a string.
In that case, you could even insert the marker before each capital letter as well, and omit the lookahead, making the split very easy:
"thisIs12MyString".replace(/\d+|[A-Z]/g, '~$&').split('~')
It might or might not perform better.
In my rhino console,
js> "thisIs12MyString".replace(/([A-Z]|\d+)/g, function(x){return " "+x;}).split(/ /);
this,Is,12,My,String
another one,
js> "thisIs12MyString".split(/(?:([A-Z]+[a-z]+))/g).filter(function(a){return a;});
this,Is,12,My,String
You can fix the JS missing of lookbehinds working on the array split using your current regex.
Quick pseudo code:
var result = [];
var digitsFlag = false;
"thisIs12MyString".split(/(?=[A-Z0-9])/).forEach(function(word) {
if (isSingleDigit(word)) {
if (!digitsFlag) {
result.push(word);
} else {
result[result.length - 1] += word;
}
digitsFlag = true;
} else {
result.push(word);
digitsFlag = false;
}
});
I can't think of any ways to achieve this with a RegEx.
I think you will need to do it in code.
Please check the URL, same question different language (ruby) ->
The code is at the bottom:
http://code.activestate.com/recipes/440698-split-string-on-capitalizeduppercase-char/

Regex using javascript to return just numbers

If I have a string like "something12" or "something102", how would I use a regex in javascript to return just the number parts?
Regular expressions:
var numberPattern = /\d+/g;
'something102asdfkj1948948'.match( numberPattern )
This would return an Array with two elements inside, '102' and '1948948'. Operate as you wish. If it doesn't match any it will return null.
To concatenate them:
'something102asdfkj1948948'.match( numberPattern ).join('')
Assuming you're not dealing with complex decimals, this should suffice I suppose.
You could also strip all the non-digit characters (\D or [^0-9]):
let word_With_Numbers = 'abc123c def4567hij89'
let word_Without_Numbers = word_With_Numbers.replace(/\D/g, '');
console.log(word_Without_Numbers)
For number with decimal fraction and minus sign, I use this snippet:
const NUMERIC_REGEXP = /[-]{0,1}[\d]*[.]{0,1}[\d]+/g;
const numbers = '2.2px 3.1px 4px -7.6px obj.key'.match(NUMERIC_REGEXP)
console.log(numbers); // ["2.2", "3.1", "4", "-7.6"]
Update: - 7/9/2018
Found a tool which allows you to edit regular expression visually: JavaScript Regular Expression Parser & Visualizer.
Update:
Here's another one with which you can even debugger regexp: Online regex tester and debugger.
Update:
Another one: RegExr.
Update:
Regexper and Regex Pal.
If you want only digits:
var value = '675-805-714';
var numberPattern = /\d+/g;
value = value.match( numberPattern ).join([]);
alert(value);
//Show: 675805714
Now you get the digits joined
I guess you want to get number(s) from the string. In which case, you can use the following:
// Returns an array of numbers located in the string
function get_numbers(input) {
return input.match(/[0-9]+/g);
}
var first_test = get_numbers('something102');
var second_test = get_numbers('something102or12');
var third_test = get_numbers('no numbers here!');
alert(first_test); // [102]
alert(second_test); // [102,12]
alert(third_test); // null
IMO the #3 answer at this time by Chen Dachao is the right way to go if you want to capture any kind of number, but the regular expression can be shortened from:
/[-]{0,1}[\d]*[\.]{0,1}[\d]+/g
to:
/-?\d*\.?\d+/g
For example, this code:
"lin-grad.ient(217deg,rgba(255, 0, 0, -0.8), rgba(-255,0,0,0) 70.71%)".match(/-?\d*\.?\d+/g)
generates this array:
["217","255","0","0","-0.8","-255","0","0","0","70.71"]
I've butchered an MDN linear gradient example so that it fully tests the regexp and doesn't need to scroll here. I think I've included all the possibilities in terms of negative numbers, decimals, unit suffixes like deg and %, inconsistent comma and space usage, and the extra dot/period and hyphen/dash characters within the text "lin-grad.ient". Please let me know if I'm missing something. The only thing I can see that it does not handle is a badly formed decimal number like "0..8".
If you really want an array of numbers, you can convert the entire array in the same line of code:
array = whatever.match(/-?\d*\.?\d+/g).map(Number);
My particular code, which is parsing CSS functions, doesn't need to worry about the non-numeric use of the dot/period character, so the regular expression can be even simpler:
/-?[\d\.]+/g
var result = input.match(/\d+/g).join([])
Using split and regex :
var str = "fooBar0123".split(/(\d+)/);
console.log(str[0]); // fooBar
console.log(str[1]); // 0123
The answers given don't actually match your question, which implied a trailing number. Also, remember that you're getting a string back; if you actually need a number, cast the result:
item=item.replace('^.*\D(\d*)$', '$1');
if (!/^\d+$/.test(item)) throw 'parse error: number not found';
item=Number(item);
If you're dealing with numeric item ids on a web page, your code could also usefully accept an Element, extracting the number from its id (or its first parent with an id); if you've an Event handy, you can likely get the Element from that, too.
As per #Syntle's answer, if you have only non numeric characters you'll get an Uncaught TypeError: Cannot read property 'join' of null.
This will prevent errors if no matches are found and return an empty string:
('something'.match( /\d+/g )||[]).join('')
Here is the solution to convert the string to valid plain or decimal numbers using Regex:
//something123.777.321something to 123.777321
const str = 'something123.777.321something';
let initialValue = str.replace(/[^0-9.]+/, '');
//initialValue = '123.777.321';
//characterCount just count the characters in a given string
if (characterCount(intitialValue, '.') > 1) {
const splitedValue = intitialValue.split('.');
//splittedValue = ['123','777','321'];
intitialValue = splitedValue.shift() + '.' + splitedValue.join('');
//result i.e. initialValue = '123.777321'
}
If you want dot/comma separated numbers also, then:
\d*\.?\d*
or
[0-9]*\.?[0-9]*
You can use https://regex101.com/ to test your regexes.
Everything that other solutions have, but with a little validation
// value = '675-805-714'
const validateNumberInput = (value) => {
let numberPattern = /\d+/g
let numbers = value.match(numberPattern)
if (numbers === null) {
return 0
}
return parseInt(numbers.join([]))
}
// 675805714
One liner
I you do not care about decimal numbers and only need the digits, I think this one liner is rather elegant:
/**
* #param {String} str
* #returns {String} - All digits from the given `str`
*/
const getDigitsInString = (str) => str.replace(/[^\d]*/g, '');
console.log([
'?,!_:/42\`"^',
'A 0 B 1 C 2 D 3 E',
' 4 twenty 20 ',
'1413/12/11',
'16:20:42:01'
].map((str) => getDigitsInString(str)));
Simple explanation:
\d matches any digit from 0 to 9
[^n] matches anything that is not n
* matches 0 times or more the predecessor
( It is an attempt to match a whole block of non-digits all at once )
g at the end, indicates that the regex is global to the entire string and that we will not stop at the first occurrence but match every occurrence within it
Together those rules match anything but digits, which we replace by an empty strings. Thus, resulting in a string containing digits only.

Categories

Resources