JavaScript regexp, not getting all matches, what am I missing here? [duplicate]

JavaScript regexp, not getting all matches, what am I missing here? [duplicate] - javascript

This question already has answers here:
How can I match overlapping strings with regex?
(6 answers)
Closed 4 years ago.
Let's say for example I have this simple string
let str = '5+81+3+16+42'
Now if I want to capture each plus sign with both numbers around it.
My attempt was as follows:
let matches = str.match(/\d+\+\d+/g);
What I got with that is:
['5+81', '3+16']
Why is it not matching the cases between?
['5+81', '81+3', '3+16', '16+42']

Your regex has to fulfill the whole pattern which is \d+\+\d+. It will first match 5+81, then the next character is a + which the pattern can not match because it should start with a digit. Then it can match 3+16 but it can not match the following +42 anymore given you ['5+81', '3+16'] as the matches.
Without a regex, you might use split and a for loop and check if the next value exists in the parts array:
let str = '5+81+3+16+42'
let parts = str.split('+');
for (let i = 0; i < parts.length; i++) {
if (undefined !== parts[i + 1]) {
console.log(parts[i] + '+' + parts[i + 1]);
}
}
When using more a recent version of Chrome which supports lookbehinds, you might use lookarounds with capturing groups:
(?<=(\d+))(\+)(?=(\d+))
See the regex demo
const regex = /(?<=(\d+))(\+)(?=(\d+))/g;
const str = `5+81+3+16+42`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
console.log(m[1] + m[2] + m[3]);
}

When the regular expression engine completes one iteration of a match, it "consumes" the characters from the source string. The first match of 5+81 leaves the starting point for the next match at the + sign after 81, so the next match for the expression begins at the 3.

Split string by + delimiter and use .reduce() to create new array contain target result.
let str = '5+81+3+16+42';
let arr = str.split('+').reduce((tot, num, i, arr) => {
i+1 < arr.length ? tot.push(num+"+"+arr[i+1]) : '';
return tot;
}, []);
console.log(arr);

You can do it using split and reduce without making things complex with regex.
let str = '5+81+3+16+42';
const array = str.split('+');
const splited = []
array.reduce((a, b) => {
splited.push(a+'+'+b)
return b
})
console.log(splited);

Related

Insert Spaces into string at an index

I'm trying to do this Codewars problem.
Task
In this simple Kata your task is to create a function that turns a string into a Mexican Wave. You will be passed a string and you must return that string in an array where an uppercase letter is a person standing up.
Rules
The input string will always be lower case but maybe empty.
If the character in the string is whitespace then pass over it as if it was an empty seat.
Example
wave("hello") => ["Hello", "hEllo", "heLlo", "helLo", "hellO"]
My code so far is hosted on this repl.it
My thought process is as follows:
Turn argument into array
manipulate each index of the array at index and then readjust previous index to make a wave pattern
turn array into string
reinsert spaces before logging it to console and restarting the loop
I'm pretty stuck and my mind is stuck on how to use
for(var j = 0; j < indexSpaceNumber.length; j++){
//join and add in the spaces at their former index before returning string
strToArray[indexSpaceNumber[j]].slice(0, " ");
}
to insert the spaces into the string.
If there's any guidance or tips it would be much appreciated. I feel like I'm close, but so frustratingly far.

The main idea would be:
Iterate the characters
Replace the character in the original string with an uppercase version
You can use Array.from() to convert the string to an array, and map each item to a new string. If the character is a space return something falsy (en empty string in the example). After the creating the array, filter all falsy values:
const wave = str =>
Array.from(str, (c,i) => // convert the string to an array
// replace the character with an uppercase version in the original string
c === ' ' ?
''
:
`${str.substring(0, i)}${c.toUpperCase()}${str.substring(i + 1)}`
).filter(c => c)
const result = wave("hello")
console.log(result)

For string with spaces
function wave(str) {
let res = []
str.toLowerCase().split('').forEach((v, i) => {
if(v == ' ') return;
res.push( str.substr(0, i) + v.toUpperCase() + str.substr(i + 1) )
});
return res
}
console.log(wave("hello hello"))

I'd go recursive ;)
You know that for a string of length n you need an array of the same length. That's your exit condition.
You can use the length of the array at each iteration to work out the shape of the next string:
hello [] [Hello] 0: uppercase 1st char and append
hello [Hello] [Hello hEllo] 1: uppercase 2nd char and append
hello [Hello hEllo] [Hello hEllo heLlo] 2: uppercase 3rd char and append
...
const wave =
(str, arr = []) =>
str.length === arr.length
? arr
: wave
( str
, [ ...arr
, str.slice(0, arr.length)
+ str[arr.length].toUpperCase()
+ str.slice(arr.length + 1)
]
);
console.log(wave('hello'));

Go over each char in string and build
Slice str from start till current char + current char to upper case + Slice str from current char to end
const wave = str => {
const res = [];
for (let i = 0; i < str.length; i++) {
res.push(`${str.slice(0, i)}${str[i].toUpperCase()}${str.slice(i + 1)}}`);
}
return res;
};
console.log(wave("hi my name is rylan"));
// Alternate way to do with Array.splice
const wave2 = str => {
const res = [];
for (let i in str) {
const temp = Array.from(str);
temp.splice(i, 1, temp[i].toUpperCase());
res.push(temp)
}
return res.map(x => x.join(''));
};
console.log(wave2("hi my name is rylan"));

How to remove word in string based on array in Javascript when word's character length in string is fewer than in array?

I want to remove some word in string based on array. But the word's character length in string is fewer than in array. Is it possible to match it using regex and then replace it with empty string? If not, what is the alternatives?
I tried using regex to match the word, but i can't achieve it. I don't know how to make regex match minimum 3 character from the array.
array = ['reading', 'books'];
string = 'If you want to read the book, just read it.';
desiredOutput = 'If you want to the , just it.';
// Desired match
'reading' -> match for 'rea', 'read', 'readi', 'readin', 'reading'
'books' -> match for 'boo', 'book', 'books'

One option is to match 3 or more word characters starting at a word boundary, then use a replacer function to return the empty string if any of the words startsWith the word in question:
const array = ['reading', 'books'];
const string = 'If you want to read the book, just read it.';
const output = string.replace(
/\b\w{3,}/g,
word => array.some(item => item.startsWith(word)) ? '' : word
);
console.log(output);

The answer from CertainPerformance is better - easier to implement and to maintain but it's worth noting that - you can also generate a regex from the array.
The idea is simple enough - if you want to match r, re, rea, read, readi, readin, reading the regex for that is reading|readin|readi|read|rea|re|r. The reason you want the longest variation first is because otherwise the regex engine will stop at the first match in finds:
let regex = /r|re|rea|read/g
// ↑_________________
console.log( // |
"read".replace(regex, "")// |
// ↑___________________________|
)
So you can take a word and break it out in a this pattern to generate a regex from it
function allSubstrings(word) {
let substrings = [];
for (let i = word.length; i > 0; i--) {
let sub = word.slice(0, i);
substrings.push(sub)
}
return substrings;
}
console.log(allSubstrings("reading"))
With that you can simply generate the regex you need.
function allSubstrings(word) {
let substrings = [];
for (let i = word.length; i > 0; i--) {
let sub = word.slice(0, i);
substrings.push(sub)
}
return substrings;
}
function toPattern(word) {
let substrings = allSubstrings(word);
let pattern = substrings.join("|");
return pattern;
}
console.log(toPattern("reading"))
The final thing is to take an array and convert it to a regex. Which requires treating each word and then combining each individual regex into one that matches any of the words:
const array = ['reading', 'books'];
const string = 'If you want to read the book, just read it.';
//generate the pattern
let pattern = array
.map(toPattern) //first, for each word
.join("|"); //join patterns for all words
//convert the pattern to a regex
let regex = new RegExp(pattern, "g");
let result = string.replace(regex, "");
//desiredOutput: 'If you want to the , just it.';
console.log(result);
function allSubstrings(word) {
let substrings = [];
for (let i = word.length; i > 0; i--) {
let sub = word.slice(0, i);
substrings.push(sub)
}
return substrings;
}
function toPattern(word) {
let substrings = allSubstrings(word);
let pattern = substrings.join("|");
return pattern;
}
So, this is how you can generate a regular expression from that array. In this case, that works, but it's not guaranteed to, because there is a danger it could match something you don't want. For example, r will match any character, it doesn't necessarily need to be in a word that matches this.
const array = ['reading'];
const string = 'The quick brown fox jumps over the lazy dog';
// ^ ^
let pattern = array
.map(word => allSubstrings(word).join("|"))
.join("|");
let regex = new RegExp(pattern, "g");
let result = string.replace(regex, "");
console.log(result);
function allSubstrings(word) {
let substrings = [];
for (let i = word.length; i > 0; i--) {
let sub = word.slice(0, i);
substrings.push(sub)
}
return substrings;
}
Which is when it becomes more complicated, as you want to generate a more complicated pattern for each word. You generally want to match words, so you can use the word boundary character \b which means that the pattern for "reading" can now look like this:
\breading\b|\breadin\b|\breadi\b|\bread\b|\brea\b|\bre\b|\br\b
↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑
In the interest of keeping the output at least somewhat readable, it can instead be put in a group and the whole group made to match a single word:
\b(?:reading|readin|readi|read|rea|re|r)\b
↑↑
||____ non-capturing group
So, you have to generate this pattern
function toPattern(word) {
let substrings = allSubstrings(word);
//escape backslashes, because this is a string literal and we need \b as content
let pattern = "\\b(?:" + substrings.join("|") + ")\\b";
return pattern;
}
Which leads us to this
const array = ['reading', 'books'];
const string = 'The quick brown fox jumps over the lazy dog. If you want to read the book, just read it.';
let pattern = array
.map(toPattern)
.join("|");
let regex = new RegExp(pattern, "g");
let result = string.replace(regex, "");
console.log(result);
function allSubstrings(word) {
let substrings = [];
for (let i = word.length; i > 0; i--) {
let sub = word.slice(0, i);
substrings.push(sub)
}
return substrings;
}
function toPattern(word) {
let substrings = allSubstrings(word);
let pattern = "\\b(?:" + substrings.join("|") + ")\\b";
return pattern;
}
This will suffice to solve your task. So it's possible to generate a regex. The final one looks like this:
/\b(?:reading|readin|readi|read|rea|re|r)\b|\b(?:books|book|boo|bo|b)\b/g
But most of the generation of it is spent trying to generate something that works. It's not a necessarily complex solution but as mentioned, the one suggested by CertainPerformance is better because it's simpler which means less chance of it failing and it would be easier to maintain for the future.

I don't know of a straight way to do it, but you can create your own regexp pattern, like so:
// This function create a regex pattern string for each word in the array.
// The str is the string value (the word),
// min is the minimum required letters in eac h word
function getRegexWithMinChars(str, min) {
var charArr = str.split("");
var length = charArr.length;
var regexpStr = "";
for(var i = 0; i < length; i++){
regexpStr +="[" + charArr[i] + "]" + (i < min ? "" : "?");
}
return regexpStr;
}
// This function returns a regexp object with the patters of the words in the array
function getStrArrayRegExWithMinChars(strArr, min) {
var length = strArr.length;
var regexpStr = "";
for(var i = 0; i < length; i++) {
regexpStr += "(" + getRegexWithMinChars(strArr[i], min) + ")?";
}
return new RegExp(regexpStr, "gm");
}
var regexp = getStrArrayRegExWithMinChars(searchArr, 3);
// With the given regexp I was able to use string replace to
// find and replace all the words in the string
str.replace(regexp, "");
//The same can be done with one ES6 function
const getStrArrayRegExWithMinChars = (searchArr, min) => {
return searchArr.reduce((wordsPatt, word) => {
const patt = word.split("").reduce((wordPatt, letter, index) => {
return wordPatt + "[" + letter + "]" + (index < min ? "" : "?");
},"");
return wordsPatt + "(" + patt + ")?";
}, "");
}
var regexp = getStrArrayRegExWithMinChars(searchArr, 3);
// With the given regexp I was able to use string replace to
// find and replace all the words in the string
str.replace(regexp, "");

Finding the index to a non-specified character

Let's say for example I have a string
thisIsThisTuesday Day
I want to find the index of all the capital letters, test if there is a space before it, and if not insert one. I would need the index of each one.
At least from what I can see indexOf(String) will only produce the index of the first occurance of the character T/t
This :
for(i=0;i<str.length;i++){
let char=str[i];
if(isNaN(char*1)&&char==char.toUpperCase()){
y=str.indexOf(char);
console.log(char,y)
}
}
would produce the capital letters, and their indexes but will only display the first occurrence of the character in question. I feel pretty confident that the part I am missing is a for() loop in order to move the index iteration..but it escapes me.
Thank you in advance!

You can use a regex:
It matches any non-whitespace character followed by a capital letter and replaces it by the two characters with a space between.
const str = "thisIsThisTuesday Day";
const newstr = str.replace(/([^ ])([A-Z])/g, "$1 $2");
console.log(newstr);

You can use the following regular expression:
/(?<=\S)(?=[A-Z])/g
The replace will insert spaced between characters which are non-space followed by a capital letter.
See example below:
let str = "thisIsThisTuesday Day";
const res = str.replace(/(?<=\S)(?=[A-Z])/g, ' ');
console.log(res);
Note: As pointed out ?<= (positive lookbehind) is currently not be available in all browsers.

Actually, the String.indexOf function can take a second argument, specifying the character it should start searching from. Take a look at: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/indexOf
But, if you just want to find all capital letters and prefix them with a space character, if one is not found, there are many approaches, for example:
var str = "thisIsThisTuesday Day";
var ret = '';
for (var i=0; i<str.length; i++) {
if (str.substr(i, 1) == str.substr(i, 1).toUpperCase()) {
if ((i > 0) && (str.substr(i - 1,1) != " "))
ret += " ";
}
ret += str.substr(i,1);
}
After running this, ret will hold the value "this Is This Tuesday Day"

You could iterate over the string and check if each character is a capital. Something like this:
const s = 'thisIsThisTuesday Day';
const format = (s) => {
let string = '';
for (let c of s) {
if (c.match(/[A-Z]/)) string += ' ';
string += c;
}
return string;
};
console.log(format(s));
Or alternatively with reduce function:
const s = 'thisIsThisTuesday Day';
const format = (s) => s.split('').reduce((acc, c) => c.match(/[A-Z]/) ? acc + ` ${c}` : acc + c, '');
console.log(format(s));

How to match overlapping keywords with regex

This example finds only sam. How to make it find both sam and samwise?
var regex = /sam|samwise|merry|pippin/g;
var string = 'samwise gamgee';
var match = string.match(regex);
console.log(match);
Note: this is simple example, but my real regexes are created by joining 500 keywords at time, so it's too cumbersome to search all overlapping and make a special case for them with something like /sam(wise)/. The other obvious solution I can think of, is to just iterate though all keywords individually, but I think it must be a fast and elegant, single-regex solution.

You can use lookahead regex with capturing group for this overlapping match:
var regex = /(?=(sam))(?=(samwise))/;
var string = 'samwise';
var match = string.match( regex ).filter(Boolean);
//=> ["sam", "samwise"]
It is important to not to use g (global) flag in the regex.
filter(Boolean) is used to remove first empty result from matched array.

Why not just map indexOf() on array substr:
var string = 'samwise gamgee';
var substr = ['sam', 'samwise', 'merry', 'pippin'];
var matches = substr.map(function(m) {
return (string.indexOf(m) < 0 ? false : m);
}).filter(Boolean);
See fiddle console.log(matches);
Array [ "sam", "samwise" ]
Probably of better performance than using regex. But if you need the regex functionality e.g. for caseless matching, word boundaries, returned matches... use with exec method:
var matches = substr.map(function(v) {
var re = new RegExp("\\b" + v, "i"); var m = re.exec(string);
return (m !== null ? m[0] : false);
}).filter(Boolean);
This one with i-flag (ignore case) returns each first match with initial \b word boundary.

I can't think of a simple and elegant solution, but I've got something that uses a single regex:
function quotemeta(s) {
return s.replace(/\W/g, '\\$&');
}
let keywords = ['samwise', 'sam'];
let subsumed_by = {};
keywords.sort();
for (let i = keywords.length; i--; ) {
let k = keywords[i];
for (let j = i - 1; j >= 0 && k.startsWith(keywords[j]); j--) {
(subsumed_by[k] = subsumed_by[k] || []).push(keywords[j]);
}
}
keywords.sort(function (a, b) b.length - a.length);
let re = new RegExp('(?=(' + keywords.map(quotemeta).join('|') + '))[\\s\\S]', 'g');
let string = 'samwise samgee';
let result = [];
let m;
while (m = re.exec(string)) {
result.push(m[1]);
result.push.apply(result, subsumed_by[m[1]] || []);
}
console.log(result);

How about:
var re = /((sam)(?:wise)?)/;
var m = 'samwise'.match(re); // gives ["samwise", "samwise", "sam"]
var m = 'sam'.match(re); // gives ["sam", "sam", "sam"]
You can use Unique values in an array to remove dupplicates.

If you don't want to create special cases, and if order doesn't matter, why not first match only full names with:
\b(sam|samwise|merry|pippin)\b
and then, filter if some of these doesn't contain shorter one? for example with:
(sam|samwise|merry|pippin)(?=\w+\b)
It is not one elegant regex, but I suppose it is simpler than iterating through all matches.

put dash after every n character during input from keyboard

$('.creditCardText').keyup(function() {
var foo = $(this).val().split("-").join(""); // remove hyphens
if (foo.length > 0) {
foo = foo.match(new RegExp('.{1,4}', 'g')).join("-");
}
$(this).val(foo);
});
I found this tutorial on putting dash after every 4 character from here my question is what if the character interval is not constant like in this example it is only after every 4 what if the interval is 3 characters "-" 2 characters "-" 4 characters "-" 3 characters "-" so it would appear like this 123-12-1234-123-123.

In this case, it is more convenient to just write normal code to solve the problem:
function format(input, format, sep) {
var output = "";
var idx = 0;
for (var i = 0; i < format.length && idx < input.length; i++) {
output += input.substr(idx, format[i]);
if (idx + format[i] < input.length) output += sep;
idx += format[i];
}
output += input.substr(idx);
return output;
}
Sample usage:
function format(input, format, sep) {
var output = "";
var idx = 0;
for (var i = 0; i < format.length && idx < input.length; i++) {
output += input.substr(idx, format[i]);
if (idx + format[i] < input.length) output += sep;
idx += format[i];
}
output += input.substr(idx);
return output;
}
$('.creditCardText').keyup(function() {
var foo = $(this).val().replace(/-/g, ""); // remove hyphens
// You may want to remove all non-digits here
// var foo = $(this).val().replace(/\D/g, "");
if (foo.length > 0) {
foo = format(foo, [3, 2, 4, 3, 3], "-");
}
$(this).val(foo);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<input class="creditCardText" />
While it is possible to do partial matching and capturing with regex, the replacement has to be done with a replacement function. In the replacment function, we need to determine how many capturing group actually captures some text. Since there is no clean solution with regex, I write a more general function as shown above.

You can split it using a regular expression. In this case, I'm using a expression to check for non-spaces with interval 3-2-4-3.
The RegExp.exec will return with a "match" array, with the first element containing the actual string. After removing the first element of the match, you can then join them up with dashes.
var mystring = "123121234123"
var myRegexp = /^([^\s]{3})([^\s]{2})([^\s]{4})([^\s]{3})$/g
var match = myRegexp.exec(mystring);
if (match)
{
match.shift();
mystring = match.join("-")
console.log(mystring)
}

Per further comments, the op clarified they need a fixed interval for when to insert dashes. In that case, there are several ways to implement it; I think regular expression would probably be the worst, in other words, overkill and overly complication solution.
Some simpler options would be to create a new character array, and in a loop append character by character, adding a dash too every time you get to the index you want. This would probably be the easiest to write and grok after the fact, but a little more verbose.
Or you could convert to a character array and use an 'insert into array at index'-type function like splice() (see Insert Item into Array at a Specific Index or Inserting string at position x of another string for some examples).

Pass the input value and the indexes to append the separator, first, it will remove the existing separators then just append separators on positions indexes.
export function addSeparators(
input: string,
positions: number[],
separator: string
): string {
const inputValue = input.replace(/-/g, '').split(''); // remove existing separators and split characters into array
for (let i = 0; i < inputValue.length; i++) {
if (positions.includes(i)) inputValue.splice(i, 0, separator);
}
return inputValue.join('');
}

Develop Reference

JavaScript is the programming language of the Web.

JavaScript regexp, not getting all matches, what am I missing here? [duplicate] - javascript

When the regular expression engine completes one iteration of a match, it "consumes" the characters from the source string. The first match of 5+81 leaves the starting point for the next match at the + sign after 81, so the next match for the expression begins at the 3.

Split string by + delimiter and use .reduce() to create new array contain target result. let str = '5+81+3+16+42'; let arr = str.split('+').reduce((tot, num, i, arr) => { i+1 < arr.length ? tot.push(num+"+"+arr[i+1]) : ''; return tot; }, []); console.log(arr);

You can do it using split and reduce without making things complex with regex. let str = '5+81+3+16+42'; const array = str.split('+'); const splited = [] array.reduce((a, b) => { splited.push(a+'+'+b) return b }) console.log(splited);

Related

Insert Spaces into string at an index

How to remove word in string based on array in Javascript when word's character length in string is fewer than in array?

Finding the index to a non-specified character

How to match overlapping keywords with regex

put dash after every n character during input from keyboard

Categories

Resources