Group array with two words, rather than one - javascript

CODE BELOW: When a word has been written, it stores that as its own array, meaning every single word is its own array, and then later checked for reoccurrences.
What i want: Instead of it creating an array of a word (after spacebar has been hit), i want it to do it after 2 words have been written.
IE: Instead of me writing "Hello" + spacebar, and the code creating "hello" as an array. I'd like it to wait until i've written "hello my" + spacebar and then create an array with those two numbers.
I am guessing this has something to do with the regular expression?
I've tried many different things (a little bit of a newbie) and i cannot understand how to get it to group 2 words together rather than one.
const count = (text) => {
const wordRegex = new RegExp(`([\\p{Alphabetic}\]+)`, 'gu');
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};

You may use
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
Details
\p{Alphabetic}+ - 1+ alphabetic chars
(?:\s+\p{Alphabetic}+)? - an optional sequence of:
\s+ - 1+ whitespaces
\p{Alphabetic}+ - 1+ alphabetic chars
The second word is matched optionally so that the final odd word could be matched, too.
See the JS demo below:
const count = (text) => {
const wordRegex = /\p{Alphabetic}+(?:\s+\p{Alphabetic}+)?/gu;
let result;
const words = {};
while ((result = wordRegex.exec(text)) !== null) {
const word = result[0].toLowerCase();
if (!words[word]) {
words[word] = [];
}
words[word].push(result.index);
words[word].push(result.index + word.length);
}
return words;
};
console.log(count("abc def ghi"))
A RegExp constructor way of defining this regex is
const wordRegex = new RegExp("\\p{Alphabetic}+(?:\\s+\\p{Alphabetic}+)?", "gu");
However, since the pattern is static, no variables are used to build the pattern, you can use the regex literal notation as shown at the top of the answer.

Related

Regex to fetch all spaces as long as they are not enclosed in brackets

Regex to fetch all spaces as long as they are not enclosed in braces
This is for a javascript mention system
ex: "Speak #::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}, all right?"
Need to get:
[ "Speak ", "#::{Joseph
Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}", ",", "all ", "right?"
]
[Edit]
Solved in: https://codesandbox.io/s/rough-http-8sgk2
Sorry for my bad english
I interpreted your question as you said to to fetch all spaces as long as they are not enclosed in braces, although your result example isn't what I would expect. Your example result contains a space after speak, as well as a separate match for the , after the {} groups. My output below shows what I would expect for what I think you are asking for, a list of strings split on just the spaces outside of braces.
const str =
"Speak #::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc}, all right?";
// This regex matches both pairs of {} with things inside and spaces
// It will not properly handle nested {{}}
// It does this such that instead of capturing the spaces inside the {},
// it instead captures the whole of the {} group, spaces and all,
// so we can discard those later
var re = /(?:\{[^}]*?\})|( )/g;
var match;
var matches = [];
while ((match = re.exec(str)) != null) {
matches.push(match);
}
var cutString = str;
var splitPieces = [];
for (var len=matches.length, i=len - 1; i>=0; i--) {
match = matches[i];
// Since we have matched both groups of {} and spaces, ignore the {} matches
// just look at the matches that are exactly a space
if(match[0] == ' ') {
// Note that if there is a trailing space at the end of the string,
// we will still treat it as delimiter and give an empty string
// after it as a split element
// If this is undesirable, check if match.index + 1 >= cutString.length first
splitPieces.unshift(cutString.slice(match.index + 1));
cutString = cutString.slice(0, match.index);
}
}
splitPieces.unshift(cutString);
console.log(splitPieces)
Console:
["Speak", "#::{Joseph Empyre}{b0268efc-0002-485b-b3b0-174fad6b87fc},", "all", "right?"]

Split and replace text by two rules (regex)

I trying to split text by two rules:
Split by whitespace
Split words greater than 5 symbols into two separate words like (aaaaawww into aaaaa- and www)
I create regex that can detect this rules (https://regex101.com/r/fyskB3/2) but can't understand how to make both rules work in (text.split(/REGEX/)
Currently regex - (([\s]+)|(\w{5})(?=\w))
For example initial text is hello i am markopollo and result should look like ['hello', 'i', 'am', 'marko-', 'pollo']
It would probably be easier to use .match: match up to 5 characters that aren't whitespace:
const str = 'wqerweirj ioqwejr qiwejrio jqoiwejr qwer qwer';
console.log(
str.match(/[^ ]{1,5}/g)
)
My approach would be to process the string before splitting (I'm a big fan of RegEx):
1- Search and replace all the 5 consecutive non-last characters with \1-.
The pattern (\w{5}\B) will do the trick, \w{5} will match 5 exact characters and \B will match only if the last character is not the ending character of the word.
2- Split the string by spaces.
var text = "hello123467891234 i am markopollo";
var regex = /(\w{5}\B)/g;
var processedText = text.replace(regex, "$1- ");
var result = processedText.split(" ");
console.log(result)
Hope it helps!
Something like this should work:
const str = "hello i am markopollo";
const words = str.split(/\s+/);
const CHUNK_SIZE=5;
const out = [];
for(const word of words) {
if(word.length > CHUNK_SIZE) {
let chunks = chunkSubstr(word,CHUNK_SIZE);
let last = chunks.pop();
out.push(...chunks.map(c => c + '-'),last);
} else {
out.push(word);
}
}
console.log(out);
// credit: https://stackoverflow.com/a/29202760/65387
function chunkSubstr(str, size) {
const numChunks = Math.ceil(str.length / size)
const chunks = new Array(numChunks)
for (let i = 0, o = 0; i < numChunks; ++i, o += size) {
chunks[i] = str.substr(o, size)
}
return chunks
}
i.e., first split the string into words on spaces, and then find words longer than 5 chars and 'chunk' them. I popped off the last chunk to avoid adding a - to it, but there might be a more efficient way if you patch chunkSubstr instead.
regex.split doesn't work so well because it will basically remove those items from the output. In your case, it appears you want to strip the whitespace but keep the words, so splitting on both won't work.
Uses the regex expression of #CertainPerformance = [^\s]{1,5}, then apply regex.exec, finally loop all matches to reach the goal.
Like below demo:
const str = 'wqerweirj ioqwejr qiwejrio jqoiwejr qwer qwer'
let regex1 = RegExp('[^ ]{1,5}', 'g')
function customSplit(targetString, regexExpress) {
let result = []
let matchItem = null
while ((matchItem = regexExpress.exec(targetString)) !== null) {
result.push(
matchItem[0] + (
matchItem[0].length === 5 && targetString[regexExpress.lastIndex] && targetString[regexExpress.lastIndex] !== ' '
? '-' : '')
)
}
return result
}
console.log(customSplit(str, regex1))
console.log(customSplit('hello i am markopollo', regex1))

Matching whole words with Javascript's Regex with a few restrictions

I am trying to create a regex that can extract all words from a given string that only contain alphanumeric characters.
Yes
yes absolutely
#no
*NotThis
orThis--
Good *Bad*
1ThisIsOkay2 ButNotThis2)
Words that should have been extracted: Yes, yes, absolutely, Good, 1ThisIsOkay2
Here is the work I have done thus far:
/(?:^|\b)[a-zA-Z0-9]+(?=\b|$)/g
I had found this expression that works in Ruby ( with some tweaking ) but I have not been able to convert it to Javascript regex.
Use /(?:^|\s)\w+(?!\S)/g to match 1 or more word chars in between start of string/whitespace and another whitespace or end of string:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)\w+(?!\S)/g;
var res = s.match(re).map(function(m) {
return m.trim();
});
console.log(res);
Or another variation:
var s = "Yes\nyes absolutely\n#no\n*NotThis\norThis-- \nGood *Bad*\n1ThisIsOkay2 ButNotThis2)";
var re = /(?:^|\s)(\w+)(?!\S)/g;
var res = [];
while ((m=re.exec(s)) !== null) {
res.push(m[1]);
}
console.log(res);
Pattern details:
(?:^|\s) - either start of string or whitespace (consumed, that is why trim() is necessary in Snippet 1)
\w+ - 1 or more word chars (in Snippet 2, captured into Group 1 used to populate the resulting array)
(?!\S) - negative lookahead failing the match if the word chars are not followed with non-whitespace.
You can do that (where s is your string) to match all the words:
var m = s.split(/\s+/).filter(function(i) { return !/\W/.test(i); });
If you want to proceed to a replacement, you can do that:
var res = s.split(/(\s+)/).map(function(i) { return i.replace(/^\w+$/, "#");}).join('');

Intelligent regex to understand input

Following Split string that used to be a list, I am doing this:
console.log(lines[line]);
var regex = /(-?\d{1,})/g;
var cluster = lines[line].match(regex);
console.log(cluster);
which will give me this:
((3158), (737))
["3158", "737"]
where 3158 will be latter treated as the ID in my program and 737 the associated data.
I am wondering if there was a way to treat inputs of this kind too:
((3158, 1024), (737))
where the ID will be a pair, and do something like this:
var single_regex = regex_for_single_ID;
var pair_regex = regex_for_pair_ID;
if(single_regex)
// do my logic
else if(pair_regex)
// do my other logic
else
// bad input
Is that possible?
Clarification:
What I am interested in is treating the two cases differently. For example one solution would be to have this behavior:
((3158), (737))
["3158", "737"]
and for pairs, concatenate the ID:
((3158, 1024), (737))
["31581024", "737"]
For a simple way, you can use .replace(/(\d+)\s*,\s*/g, '$1') to merge/concatenate numbers in pair and then use simple regex match that you are already using.
Example:
var v1 = "((3158), (737))"; // singular string
var v2 = "((3158, 1024), (737))"; // paired number string
var arr1 = v1.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["3158", "737"]
var arr2 = v2.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["31581024", "737"]
We use this regex in .replace:
/(\d+)\s*,\s*/
It matches and groups 1 or more digits followed by optional spaces and comma.
In replacement we use $1 that is the back reference to the number we matched, thus removing spaces and comma after the number.
You may use an alternation operator to match either a pair of numbers (capturing them into separate capturing groups) or a single one:
/\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g
See the regex demo
Details:
\((-?\d+), (-?\d+)\) - a (, a number (captured into Group 1), a ,, space, another number of the pair (captured into Group 2) and a )
| - or
\((-?\d+)\) - a (, then a number (captured into Group 3), and a ).
var re = /\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g;
var str = '((3158), (737)) ((3158, 1024), (737))';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push(m[3]);
} else {
res.push(m[1]+m[2]);
}
}
console.log(res);

Regular expression only returning first group

Probably something simple but i am trying to return the capture groups from this regex...
const expression = /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/g
The code i am using to do this is the following...
const matchExpression = (expression, pattern) => {
let match;
let matches = [];
while((match = expression.exec(pattern)) != null) {
matches.push(match[1]);
};
return matches;
};
I am expecting the following result when matched against /api/v1/users/1...
['v1', '1']
But instead only seem to get one result which is always the first group.
The expression itself is fine and has been tested across multiple services but can't seem to figure out why this is not working as expected.
Any help would be hugely appreciated.
You must make sure you add the second capturing group contents to the resulting array:
while((match = expression.exec(pattern)) != null) {
matches.push(match[1]);
matches.push(match[2]); // <- here
};
Since you are matching an entire string, you can use a /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/ regex (since you are matching a whole string you need no g global modifier) and reduce the code to:
const matchExpression = (expression, pattern) => {
let matches = pattern.match(expression);
if (matches) {
matches = matches.slice(1);
}
return matches;
};
The point is that you can use String#match with a regex without global modifier to access capturing group contents.
Demo:
var expr = /^\/api(?:\/)?([^\/]+)?\/users\/([^\/]+)$/;
var matches = "/api/v1/users/1".match(expr);
if (matches) {
console.log(matches.slice(1));
}

Categories

Resources