JavaScript split string by regex

JavaScript split string by regex - javascript

I will have a string never long than 8 characters in length, e.g.:
// represented as array to demonstrate multiple examples
var strs = [
'11111111',
'1RBN4',
'12B5'
]
When ran through a function, I would like all digit characters to be summed to return a final string:
var strsAfterFunction = [
'8',
'1RBN4',
'3B5'
]
Where you can see all of the 8 single 1 characters in the first string end up as a single 8 character string, the second string remains unchanged as at no point are there adjacent digit characters and the third string changes as the 1 and 2 characters become a 3 and the rest of the string is unchanged.
I believe the best way to do this, in pseudo-code, would be:
1. split the array by regex to find multiple digit characters that are adjacent
2. if an item in the split array contains digits, add them together
3. join the split array items
What would be the .split regex to split by multiple adajcent digit characters, e.g.:
var str = '12RB1N1'
=> ['12', 'R', 'B', '1', 'N', '1']
EDIT:
question:
What about the string "999" should the result be "27", or "9"
If it was clear, always SUM the digits, 999 => 27, 234 => 9

You can do this for the whole transformation :
var results = strs.map(function(s){
return s.replace(/\d+/g, function(n){
return n.split('').reduce(function(s,i){ return +i+s }, 0)
})
});
For your strs array, it returns ["8", "1RBN4", "3B5"].

var results = string.match(/(\d+|\D+)/g);
Testing:
"aoueoe34243euouoe34432euooue34243".match(/(\d+|\D+)/g)
Returns
["aoueoe", "34243", "euouoe", "34432", "euooue", "34243"]

George... My answer was originally similar to dystroy's, but when I got home tonight and found your comment I couldn't pass up a challenge
:)
Here it is without regexp. fwiw it might be faster, it would be an interesting benchmark since the iterations are native.
function p(s){
var str = "", num = 0;
s.split("").forEach(function(v){
if(!isNaN(v)){
(num = (num||0) + +v);
} else if(num!==undefined){
(str += num + v,num = undefined);
} else {
str += v;
}
});
return str+(num||"");
};
// TESTING
console.log(p("345abc567"));
// 12abc18
console.log(p("35abc2134mb1234mnbmn-135"));
// 8abc10mb10mnbmn-9
console.log(p("1 d0n't kn0w wh#t 3153 t0 thr0w #t th15 th1n6"));
// 1d0n't0kn0w0wh#t12t0thr0w0#t0th6th1n6
// EXTRY CREDIT
function fn(s){
var a = p(s);
return a === s ? a : fn(a);
}
console.log(fn("9599999gh999999999999999h999999999999345"));
// 5gh9h3
and here is the Fiddle & a new Fiddle without overly clever ternary

Related

How to determine matched group's offset in JavaScript's replace? [duplicate]

I want to match a regex like /(a).(b)(c.)d/ with "aabccde", and get the following information back:
"a" at index = 0
"b" at index = 2
"cc" at index = 3
How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.
Edit: A test case which wouldn't work with plain indexOf
regex: /(a).(.)/
string: "aaa"
expected result: "a" at 0, "a" at 2
Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.

There is currently a proposal (stage 4) to implement this in native Javascript:
RegExp Match Indices for ECMAScript
ECMAScript RegExp Match Indices provide additional information about the start and end indices of captured substrings relative to the start of the input string.
...We propose the adoption of an additional indices property on the array result (the substrings array) of RegExp.prototype.exec(). This property would itself be an indices array containing a pair of start and end indices for each captured substring. Any unmatched capture groups would be undefined, similar to their corresponding element in the substrings array. In addition, the indices array would itself have a groups property containing the start and end indices for each named capture group.
Here's an example of how things would work. The following snippets run without errors in, at least, Chrome:
const re1 = /a+(?<Z>z)?/d;
// indices are relative to start of the input string:
const s1 = "xaaaz";
const m1 = re1.exec(s1);
console.log(m1.indices[0][0]); // 1
console.log(m1.indices[0][1]); // 5
console.log(s1.slice(...m1.indices[0])); // "aaaz"
console.log(m1.indices[1][0]); // 4
console.log(m1.indices[1][1]); // 5
console.log(s1.slice(...m1.indices[1])); // "z"
console.log(m1.indices.groups["Z"][0]); // 4
console.log(m1.indices.groups["Z"][1]); // 5
console.log(s1.slice(...m1.indices.groups["Z"])); // "z"
// capture groups that are not matched return `undefined`:
const m2 = re1.exec("xaaay");
console.log(m2.indices[1]); // undefined
console.log(m2.indices.groups.Z); // undefined
So, for the code in the question, we could do:
const re = /(a).(b)(c.)d/d;
const str = 'aabccde';
const result = re.exec(str);
// indices[0], like result[0], describes the indices of the full match
const matchStart = result.indices[0][0];
result.forEach((matchedStr, i) => {
const [startIndex, endIndex] = result.indices[i];
console.log(`${matchedStr} from index ${startIndex} to ${endIndex} in the original string`);
console.log(`From index ${startIndex - matchStart} to ${endIndex - matchStart} relative to the match start\n-----`);
});
Output:
aabccd from index 0 to 6 in the original string
From index 0 to 6 relative to the match start
-----
a from index 0 to 1 in the original string
From index 0 to 1 relative to the match start
-----
b from index 2 to 3 in the original string
From index 2 to 3 relative to the match start
-----
cc from index 3 to 5 in the original string
From index 3 to 5 relative to the match start
Keep in mind that the indices array contains the indices of the matched groups relative to the start of the string, not relative to the start of the match.
A polyfill is available here.

I wrote MultiRegExp for this a while ago. As long as you don't have nested capture groups, it should do the trick. It works by inserting capture groups between those in your RegExp and using all the intermediate groups to calculate the requested group positions.
var exp = new MultiRegExp(/(a).(b)(c.)d/);
exp.exec("aabccde");
should return
{0: {index:0, text:'a'}, 1: {index:2, text:'b'}, 2: {index:3, text:'cc'}}
Live Version

I created a little regexp Parser which is also able to parse nested groups like a charm. It's small but huge. No really. Like Donalds hands. I would be really happy if someone could test it, so it will be battle tested. It can be found at: https://github.com/valorize/MultiRegExp2
Usage:
let regex = /a(?: )bc(def(ghi)xyz)/g;
let regex2 = new MultiRegExp2(regex);
let matches = regex2.execForAllGroups('ababa bcdefghixyzXXXX'));
Will output:
[ { match: 'defghixyz', start: 8, end: 17 },
{ match: 'ghi', start: 11, end: 14 } ]

Updated Answer: 2022
See String.prototype.matchAll
The matchAll() method matches the string against a regular expression and returns an iterator of matching results.
Each match is an array, with the matched text as the first item, and then one item for each parenthetical capture group. It also includes the extra properties index and input.
let regexp = /t(e)(st(\d?))/g;
let str = 'test1test2';
for (let match of str.matchAll(regexp)) {
console.log(match)
}
// => ['test1', 'e', 'st1', '1', index: 0, input: 'test1test2', groups: undefined]
// => ['test2', 'e', 'st2', '2', index: 5, input: 'test1test2', groups: undefined]

Based on the ecma regular expression syntax I've written a parser respective an extension of the RegExp class which solves besides this problem (full indexed exec method) as well other limitations of the JavaScript RegExp implementation for example: Group based search & replace. You can test and download the implementation here (is as well available as NPM module).
The implementation works as follows (small example):
//Retrieve content and position of: opening-, closing tags and body content for: non-nested html-tags.
var pattern = '(<([^ >]+)[^>]*>)([^<]*)(<\\/\\2>)';
var str = '<html><code class="html plain">first</code><div class="content">second</div></html>';
var regex = new Regex(pattern, 'g');
var result = regex.exec(str);
console.log(5 === result.length);
console.log('<code class="html plain">first</code>'=== result[0]);
console.log('<code class="html plain">'=== result[1]);
console.log('first'=== result[3]);
console.log('</code>'=== result[4]);
console.log(5=== result.index.length);
console.log(6=== result.index[0]);
console.log(6=== result.index[1]);
console.log(31=== result.index[3]);
console.log(36=== result.index[4]);
I tried as well the implementation from #velop but the implementation seems buggy for example it does not handle backreferences correctly e.g. "/a(?: )bc(def(\1ghi)xyz)/g" - when adding paranthesis in front then the backreference \1 needs to be incremented accordingly (which is not the case in his implementation).

So, you have a text and a regular expression:
txt = "aabccde";
re = /(a).(b)(c.)d/;
The first step is to get the list of all substrings that match the regular expression:
subs = re.exec(txt);
Then, you can do a simple search on the text for each substring. You will have to keep in a variable the position of the last substring. I've named this variable cursor.
var cursor = subs.index;
for (var i = 1; i < subs.length; i++){
sub = subs[i];
index = txt.indexOf(sub, cursor);
cursor = index + sub.length;
console.log(sub + ' at index ' + index);
}
EDIT: Thanks to #nhahtdh, I've improved the mechanism and made a complete function:
String.prototype.matchIndex = function(re){
var res = [];
var subs = this.match(re);
for (var cursor = subs.index, l = subs.length, i = 1; i < l; i++){
var index = cursor;
if (i+1 !== l && subs[i] !== subs[i+1]) {
nextIndex = this.indexOf(subs[i+1], cursor);
while (true) {
currentIndex = this.indexOf(subs[i], index);
if (currentIndex !== -1 && currentIndex <= nextIndex)
index = currentIndex + 1;
else
break;
}
index--;
} else {
index = this.indexOf(subs[i], cursor);
}
cursor = index + subs[i].length;
res.push([subs[i], index]);
}
return res;
}
console.log("aabccde".matchIndex(/(a).(b)(c.)d/));
// [ [ 'a', 1 ], [ 'b', 2 ], [ 'cc', 3 ] ]
console.log("aaa".matchIndex(/(a).(.)/));
// [ [ 'a', 0 ], [ 'a', 1 ] ] <-- problem here
console.log("bababaaaaa".matchIndex(/(ba)+.(a*)/));
// [ [ 'ba', 4 ], [ 'aaa', 6 ] ]

I'm not exactly sure exactly what your requirements are for your search, but here's how you could get the desired output in your first example using Regex.exec() and a while-loop.
JavaScript
var myRe = /^a|b|c./g;
var str = "aabccde";
var myArray;
while ((myArray = myRe.exec(str)) !== null)
{
var msg = '"' + myArray[0] + '" ';
msg += "at index = " + (myRe.lastIndex - myArray[0].length);
console.log(msg);
}
Output
"a" at index = 0
"b" at index = 2
"cc" at index = 3
Using the lastIndex property, you can subtract the length of the currently matched string to obtain the starting index.

Javascript: What does this `Array(i+1)` do?

Hi I found a solution for a problem on codewars and I'm not sure what a piece of the syntax does. The function takes a string of characters, and based on the length, returns it in a certain fashion.
input = "abcd"; output = "A-Bb-Ccc-Dddd"
input = "gFkLM"; output = "G-Ff-Kkk-Llll-Mmmmm"
This guy posted this solution
function accum(str) {
var letters = str.split('');
var result = [];
for (var i = 0; i < letters.length; i++) {
result.push(letters[i].toUpperCase() + Array(i + 1).join(letters[i].toLowerCase()));
}
return result.join('-');
}
Kinda confused about the solution overall, but one thing is particularly nagging me. See that Array(i + 1) ? What does that do? Sorry, not a very easy thing to google.

I believe that this allocates an array of length i + 1. But more importantly, what is the code doing? You have to know what the join() function does... It concatenates elements in an array delimitated by the function argument. For example:
['one', 'two', 'three'].join(' ') === 'one two three'
In this case, the array is filled with undefined elements, so you get something like this:
[undefined].join('a') === ''
[undefined, undefined].join('b') === 'b'
[undefined, undefined, undefined].join('c') === 'cc'
[undefined, undefined, undefined, undefined].join('d') === 'ddd'

So in the beginning for statement, i starts out at 0. Now if you go inside the for statement where it says i+1, i would be 1. And then when the for loop updates and i equals 1, i+1 inside the for loop would equal 2. This process would continue for the length of the string. Hope this helps.

I have just checked
let x= Array(3);
console.log(x);
The output is [undefined, undefined, undefined]
So it actually creates array of size 3 with all the elements as undefined.
When we call join wit a character as param it creates a string with the same character repeating 2 times i.e (3-1).
console.log(x.join('a')); // logs aa

Commented code walk-though ....
function accum(str) {
/* converts string to character array.*/
var letters = str.split('');
/* variable to store result */
var result = [];
/* for each character concat (1.) + (2.) and push into results.
1. letters[i].toUpperCase() :
UPPER-CASE of the character.
2. Array(i + 1).join(letters[i].toLowerCase()) :
create an array with EMPTY SLOTS of length that is, +1 than the current index.
And join them to string with the current charater's LOWER-CASE as the separator.
Ex:
Index | ArrayLength, Array | Separator | Joined String
0 1, [null] 'a' ''
1 2, [null,null] 'b' 'b'
2 3, [null,null,null] 'c' 'cc'
3 4, [null,null,null,null] 'd' 'ddd'
NOTE:
Join on an array with EMPTY SLOTS, inserts the seperator inbetween the slot values.
Meaning, if N is the length of array. Then there will be N-1 seperators inserted into the joined string
*/
for (var i = 0; i < letters.length; i++) {
result.push(letters[i].toUpperCase() + Array(i + 1).join(letters[i].toLowerCase()));
}
/* finally join all sperated by '-' and return ...*/
return result.join('-');
}

How to parse two space-delimited numbers with thousand separator?

I need to parse string that contains two numbers that may be in three cases :
"646.60 25.10" => [646.60 25.10]
"1 395.86 13.50" => [1395.86, 13.50]
"13.50 1 783.69" => [13.50, 1 783.69]
In a simple case it's enough use 'number'.join(' ') but in the some cases there is thousand separator like in second and third ones.
So how could I parse there numbers for all cases?
EDIT: All numbers have a decimal separator in the last segment of a number.

var string1 = "646.60 25.10";// => [646.60 25.10]
var string2 = "1 395.86 13,50";// => [1395.86, 13,50]
var string3 = "13.50 1 783.69"; // => [13.50, 1 783.69]
function getArray(s) {
var index = s.indexOf(" ", s.indexOf("."));
return [s.substring(0,index), s.substring(index+1) ];
}
console.log(getArray(string1));
console.log(getArray(string2));
console.log(getArray(string3));

Assuming every number ends with a dot-digit-digit (in the comments you said they do), you can use that to target the right place to split with aregex.
That way, it is robust and general for any number of numbers (although you specified you have only two) and for any number of digits in the numbers, as long as it ends with the digit-dot-digit-digit:
str1 = "4 435.89 1 333 456.90 7.54";
function splitToNumbers(str){
arr = str.replace(/\s+/g,'').split(/(\d+.\d\d)/);
//now clear the empty strings however you like
arr.shift();
arr.pop();
arr = arr.join('&').split(/&+/);
return arr;
}
console.log(splitToNumbers(str1));
//4435.89,1333456.90,7.54

How to extract last characters of type number javascript/jquery

I have some strings like:
row_row1_1,
row_row1_2,
row_row1_13,
row_row1_287,
...
and I want to take the last numbers of that strings, ut can be 1, 2, 13 or 287. That strings are generated automatically and I can't control if there is going to be 1 number, 2 numbers, 3 numbers...
I would like to make a function that takes the last number character, or the numbers after the second '_' character. is there any idea?
Thank you very much!

If your strings always follow this pattern str_str_str then you can use the split method and get the 2º index of the array, like this:
var number = str.split('_')[2];

As #PaulS said, you can always use regex for that purpose:
var getLastNumbers = function(str)
{
return str.replace(/.+(_)/, '');
};
getLastNumbers("row_row1_287"); // Will result -> 287
Fiddle

Taking the last numeric characters
function getNumericSuffix(myString) {
var reversedString = myString.split('').reverse().join('');
var i, result="";
for(i = 0; i < reversedString.length; i++) {
if(!isNaN(reversedString[i])) {
result = reversedString[i] + result;
} else break;
}
return parseInt(result); // assuming all number are integers
}

RegEx for filling up string

I have the following input:
123456_r.xyz
12345_32423_131.xyz
1235.xyz
237213_21_mmm.xyz
And now I need to fill up the first connected numbers to 8 numbers leading with 0:
00123456_r.xyz
00012345_32423_131.xyz
00001235.xyz
00237213_21_mmm.xyz
My try was to split a the dot, then split (if existing) at the underscore and get the first numbers and fill them up.
But I think there will be a more efficient way with the regex replace function with just the one function, right? How would this look like?
TIA
Matt

I would use a regex, but just for the spliting :
var input = "12345_32423_131.xyz";
var output = "00000000".slice(input.split(/_|\./)[0].length)+input;
Result : "00012345_32423_131.xyz"
EDIT :
the fast, no-splitting but no-regex, solution I gave in comments :
"00000000".slice(Math.min(input.indexOf('_'), input.indexOf('.'))+1)+input

I wouldn't split at all, just replace:
"123456_r.xyz\n12345_32423_131.xyz\n1235.xyz\n237213_21_mmm.xyz".replace(/^[0-9]+/mg, function(a) {return '00000000'.slice(0, 8-a.length)+a})

There's a simple regexp to find the part of the string you want to replace, but you'll need to use a replace function to perform the action you want.
// The array with your strings
var strings = [
'123456_r.xyz',
'12345_32423_131.xyz',
'1235.xyz',
'237213_21_mmm.xyz'
];
// A function that takes a string and a desired length
function addLeadingZeros(string, desiredLength){
// ...and, while the length of the string is less than desired..
while(string.length < desiredLength){
// ...replaces is it with '0' plus itself
string = '0' + string;
}
// And returns that string
return string;
}
// So for each items in 'strings'...
for(var i = 0; i < strings.length; ++i){
// ...replace any instance of the regex (1 or more (+) integers (\d) at the start (^))...
strings[i] = strings[i].replace(/^\d+/, function replace(capturedIntegers){
// ...with the function defined above, specifying 8 as our desired length.
return addLeadingZeros(capturedIntegers, 8);
});
};
// Output to screen!
document.write(JSON.toString(strings));

Develop Reference

JavaScript is the programming language of the Web.

JavaScript split string by regex - javascript

You can do this for the whole transformation : var results = strs.map(function(s){ return s.replace(/\d+/g, function(n){ return n.split('').reduce(function(s,i){ return +i+s }, 0) }) }); For your strs array, it returns ["8", "1RBN4", "3B5"].

var results = string.match(/(\d+|\D+)/g); Testing: "aoueoe34243euouoe34432euooue34243".match(/(\d+|\D+)/g) Returns ["aoueoe", "34243", "euouoe", "34432", "euooue", "34243"]

Related

How to determine matched group's offset in JavaScript's replace? [duplicate]

Javascript: What does this `Array(i+1)` do?

How to parse two space-delimited numbers with thousand separator?

How to extract last characters of type number javascript/jquery

RegEx for filling up string

Categories

Resources