use named regex groups to output an array of matches - javascript

I'm trying to get the hang of named capturing groups.
Given a string
var a = '{hello} good {sir}, a [great] sunny [day] to you.';
I'd like to output an array which maintains the integrity of the sentence (complete with punctuation, spaces, etc) so I can reassemble the sentence at a later time:
[
{
group: "braces",
word: "hello"
},
{
group: "other",
word: " good " <-- space on either side is maintained
},
{
group: "braces",
word: "sir"
},
{
group: "other",
word: ", a "
},
{
group: "brackets",
word: "great"
},
{
group: "other",
word: " sunny "
},
{
group: "brackets",
word: "day"
},
{
group: "other",
word: " to you."
},
]
I'm using named capturing groups to try and output this. <braces> captures any text within {}, <brackets> captures any text within [], and <others> captures anything else (\s,.\w+):
var regex = /(?<braces>\{(.*?)\})(?<brackets>\[(.*?)\])(?<others>\s,.\w+)?/g;
console.log(a.match(regex)); outputs nothing.
If I remove <others> group,
var regex = /(?<braces>\{(.*?)\})(?<brackets>\[(.*?)\])?/g;
console.log(a.match(regex)); outputs ["{hello}", "{sir}"]
Question: How do I use capturing groups to find all instances of named groups and output them like the above desired array?

A regex match object will only contain one string for a given named capture group. For what you're trying to do, you'll have to do it in two steps: first separate out the parts of the input, then map it to the array of objects while checking which group was captured to identify the sort of group it needs:
const str = '{hello} good {sir}, a [great] sunny [day] to you.';
const matches = [...str.matchAll(/{([^{]+)}|\[([^\]]+)\]|([^[{]+)/g)]
.map(match => ({
group: match[1] ? 'braces' : match[2] ? 'brackets' : 'other',
word: match[1] || match[2] || match[3]
}));
console.log(matches);

Related

remove extra spaces in string in javascript

I have a text and after deleting special characters (!##$%^&*()-=+`";:'><.?/) and show just letters and numbers (and float numbers like 23.4 ) it returns some extra space
const input : 'this is a signal , entry : 24.30 and side is short';
const text = input.replace(/\.(?!\d)|[^\w.]/g, " ").toUpperCase();
console.log(text.split(" "))
the output :
[
'THIS', 'IS', 'A',
'SIGNAL', '', '',
'', 'ENTRY', '',
'', '24.30', 'AND',
'SIDE', 'IS', 'SHORT'
]
but I want to be this :
[
'THIS', 'IS', 'A',
'SIGNAL', 'ENTRY', '24.30',
'AND', 'SIDE', 'IS',
'SHORT'
]
And when I replace spaces and enters with empty string , returns this :
[ 'THISISASIGNALENTRY24.30ANDSIDEISSHORT' ]
what is the problem of my code?
Instead of replacing, consider matching all the sorts of characters you want to produce the array of words. It looks like you want something like:
const input = 'this is a signal , entry : 24.30 and side is short';
const matches = input.toUpperCase().match(/[\w.]+/g);
console.log(matches);
The second parameter in the replace method needs to be an empty string and not a space as you have it in your code.
Just do:
...
const text = input.replace(/\.(?!\d)|[^\w.]/g, "").toUpperCase();
...

Filter array of strings based on a pattern with placeholders

I have been struggling to do this (every DAY!) for at least a month. I have searched stackoverflow, I have read MDN array, string, regex, etc., references over and over and over again, and nothing has helped. I am somewhat familiar with regex, but this is over my head. I trust that somebody here will solve this with one line of code, which is why I waited until I'm about to throw my computer out the window before asking for help. I really wanted to find the solution for myself, but I simply cannot do it.
I was enjoying a game of cryptograms, where random letters are used to sort of 'encode' a poem or story, I probably don't need to describe it here, but here's a picture just in case.
So I thought it would be a good exercise to create a form where you can enter a pattern made up of a combination of letters, numbers, and "?" for unknown. In the image, you see the word represented with "YACAZ", there are two A's in that word, so you know those two letters are the same. So in my function, you would use any number 0 - 9 as placeholders, so using the same example, you would enter "?1a1?".
Here's what I have at the moment. Every time I try to iterate through the arrays that regex gives me, I end up at the same place, trying - and failing - to compare two sets of nested arrays with each other. No matter how I try to break them down and compare them, it ends up becoming a huge non-functioning mess. I can get the placeholder indexes, but then what?
I have nothing against lodash, but I have very little experience with it, so maybe it could help with this? It doesn't do anything that cannot be done with plain vanilla javascript, does it?
const words = [
{ word: 'bargain', score: 1700 },
{ word: 'balloon', score: 1613 },
{ word: 'bastion', score: 1299 },
{ word: 'babylon', score: 634 },
{ word: 'based on', score: 425 },
{ word: 'bassoon', score: 371 },
{ word: 'baldwin', score: 359 },
{ word: 'bahrain', score: 318 },
{ word: 'balmain', score: 249 },
{ word: 'basilan', score: 218 },
{ word: 'bang on', score: 209 },
{ word: 'baseman', score: 204 },
{ word: 'batsman', score: 204 },
{ word: 'bakunin', score: 143 },
{ word: 'barchan', score: 135 },
{ word: 'bastian', score: 133 },
{ word: 'balagan', score: 118 },
{ word: 'balafon', score: 113 },
{ word: 'bank on', score: 113 },
{ word: 'ballpen', score: 111 },
]
const input = 'ba1122n' // those are numeric 1's, not lowercase L's
//matching words from the list above should be 'balloon' and 'bassoon', using the input 'ba1122n'.
export const stringDiff = (a, b) => {
let match = false,
error = ''
const results = []
// Idk why I have a reducer inside a loop. I have tried many, many, MANY other
// ways of looping, usually 'for (const i in whatever)` but they all end up with
// the same problem. I usually have a loop inside a reducer, not the other way around.
const forLoop = (array) => {
a.reduce((acc, curr, next) => {
const aa = [...curr.input.matchAll(curr[0])] // this tells me how many 0's, 1's, etc.
const bChar = b.charAt(curr.index) // this tells me what letters are at those index positions
const bb = [...b.matchAll(bChar)] // if the array 'bb' is not the same length, it's not a match
if (aa.length === bb.length) {
/* console output:
word bargain
aa:
0: ["2", index: 4, input: "ba1122n", groups: undefined]
1: ["2", index: 5, input: "ba1122n", groups: undefined]
bb:
0: ["a", index: 1, input: "bargain", groups: undefined]
1: ["a", index: 4, input: "bargain", groups: undefined]
*/
// matching the lengths only helps narrow down ***some*** of the non-matching words.
// How do I match each index of each letter in each word with
// each index of each placeholder character??? And check the letters match ***EACH OTHER***????
// with any number of placholders for any digit 0 - 9?
}
}, [])
return array
}
console.log('forLoop', forLoop([]))
return { match, results, error }
}
stringDiff(words,input)
From the above comment of mine, I'm still not quite sure whether the next provided approach does somehow meet the OP's goal.
But if it is about creating a regex from a custom replacement/substitute pattern and then just filtering a wordlist by this regex (and maybe even capturing the correct characters, one might give the following code a try.
There is a limitation to it though; The digit range for describing the custom placeholder pattern is limited from 1 to 9 (Zero will be excluded) since this matches exactly the definition/limitation of regex capture groups (and how one does access them).
function createRegexFromSubstitutePattern(pattern) {
// - turn e.g. `ba1122n` into `/ba(\w)\1(\w)\2n/`
// - turn e.g. `?1a1?` into `/.(\w)a\1./`
// - turn e.g. `?1b22a1?` into `/.(\w)b(\w)\2a\1./`
return RegExp(
[1, 2, 3, 4, 5, 6, 7, 8, 9].reduce((regXString, placeholder) =>
// programmatically replace the first occurrence of
// any digit (from 1 to 9) with a capture group pattern
// for a single word character.
regXString.replace(RegExp(placeholder, ''), '(\\w)'),
// provide the initial input/pattern as start value.
String(pattern)
)
// replace any further occurrence of any digit (from 1 to 9)
// by a back reference pattern which matches the group's index.
.replace((/([1-9])/g), '\\$1')
// replace the wildcard placeholder with the regex wildcard.
.replace((/\?/g), '.'), '');
}
const wordList = [
{ word: 'bargain', score: 1700 },
{ word: 'balloon', score: 1613 },
{ word: 'bastion', score: 1299 },
{ word: 'babylon', score: 634 },
{ word: 'based on', score: 425 },
{ word: 'bassoon', score: 371 },
{ word: 'baldwin', score: 359 },
{ word: 'bahrain', score: 318 },
{ word: 'balmain', score: 249 },
{ word: 'basilan', score: 218 },
{ word: 'bang on', score: 209 },
{ word: 'baseman', score: 204 },
{ word: 'batsman', score: 204 },
{ word: 'bakunin', score: 143 },
{ word: 'barchan', score: 135 },
{ word: 'bastian', score: 133 },
{ word: 'balagan', score: 118 },
{ word: 'balafon', score: 113 },
{ word: 'bank on', score: 113 },
{ word: 'ballpen', score: 111 },
];
const input = 'ba1122n';
const regXWord = createRegexFromSubstitutePattern(input);
console.log(
'filter word list ...',
wordList
.filter(item => regXWord.test(item.word))
);
console.log(
"filter word list and map each word's match and captures ...",
wordList
.filter(item => regXWord.test(item.word))
.map(item => item.word.match(regXWord))
);
console.log(
"createRegexFromSubstitutePattern('ba1122n')",
createRegexFromSubstitutePattern('ba1122n')
);
console.log(
"createRegexFromSubstitutePattern('?1a1?')",
createRegexFromSubstitutePattern('?1a1?')
);
console.log(
"createRegexFromSubstitutePattern('?1b22a1?')",
createRegexFromSubstitutePattern('?1b22a1?')
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

Locating names with that have 'a' using .filter()

So I would like to figure out how to grab all the names from my object that have the letter 'a'. I was able to figure out how to locate a name that starts with the letter 'a'from my .people object, as you can see in the example below. But I want to be able to grab all names that have the letter 'a' anywhere in the name, not just in a specific index position. Thank you! (prefer arrow syntax if possible)
const people = [
{
Name: 'Ryan',
Age: 27
},
{
Name: 'Alie',
Age: 27
},
{
Name: 'Lincoln',
Age: 4
},
{
Name: 'Luke',
Age: 2
},
]
const StartsWith = people.filter( letter => letter.Name[0] === 'A' );
console.log(StartsWith)
Just use a simple case insensitive regex:
const withA = people.filter(p => /a/i.test(p.Name));
Alternatively, convert to lowercase and use String.prototype.includes:
const withA = people.filter(p => p.Name.toLowerCase().includes("a"));
You can use contains
people.filter(person => person.Name.toLowerCase().contains('a'))
the toLowerCase will make sure you match both 'a' and 'A'

how to split a string which has multiple repeated keywords in it to an array in javascript?

I has a string like this:
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples';
and now I want to split the string by following keywords:
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
now I want to get result like this:
const result = [
{isKeyword: true, text: 'John Smith'},
{isKeyword: false, text: 'I want to buy '},
{isKeyword: true, text: '100'},
{isKeyword: true, text:'apples'},
{isKeyword: false, text:'\r\nI want to buy'},
{isKeyword: true, text:'200'},
{isKeyword: true, text:'oranges'},
{isKeyword: false, text:'\r\n, and add'},
{isKeyword: true, text:'300'},
{isKeyword: true, text:'apples'}];
Keywords could be lowercase or uppercase, I want to keep the string in array just the same as string.
I also want to keep the array order as the same as the string but identify the string piece in array whether it is a keyword.
How could I get it?
I would start by finding the indexes of all your keywords. From this you can make you can know where all the keywords in the sentence start and stop. You can sort this by the index of where the keyword starts.
Then it's just a matter of taking substrings up to the start of the keywords -- these will be the keyword: false substrings, then add the keyword substring. Repeat until you are done.
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
// find all indexes of a keyword
function getInd(kw, arr) {
let regex = new RegExp(kw, 'gi'), result, pos = []
while ((result = regex.exec(string)) != null)
pos.push([result.index, result.index + kw.length]);
return pos
}
// find all index of all keywords
let positions = keywords.reduce((a, word) => a.concat(getInd(word, string)), [])
positions.sort((a, b) => a[0] - b[0])
// go through the string and make the array
let start = 0, res = []
for (let next of positions) {
if (start + 1 < next[0])
res.push({ isKeyword: false,text: string.slice(start, next[0]).trim()})
res.push({isKeyword: true, text: string.slice(next[0], next[1])})
start = next[1]
}
// get any remaining text
if (start < string.length) res.push({isKeyword: false, text: string.slice(start, string.length).trim()})
console.log(res)
I'm trimming whitespace as I go, but you may want to do something different.
If you are willing to pick a delimiter
Here's a much more succinct way to do this if you are willing to pick a set of delimiters that can't appear in your text for example, use {} below
Here we simply wrap the keywords with the delimiter and then split them out. Grabbing the keyword with the delimiter makes it easy to tell which parts of the split are your keywords:
const string = 'John Smith: I want to buy 100 apples\r\nI want to buy 200 oranges\r\n, and add 300 apples Thanks';
const keywords = ['John smith', '100', 'apples', '200', 'oranges', '300'];
let res = keywords.reduce((str, k ) => str.replace(new RegExp(`(${k})`, 'ig'), '{$1}'), string)
.split(/({.*?})/).filter(i => i.trim())
.map(s => s.startsWith('{')
? {iskeyword: true, text: s.slice(1, s.length -1)}
: {iskeyword: false, text: s.trim()})
console.log(res)
Use a regular expression
rx = new RegExp('('+keywords.join('|')+')')
thus
str.split(rx)

Convert Java tokenizing regex into Javascript

As an answer to my question Tokenizing an infix string in Java, I got the regex (?<=[^\.a-zA-Z\d])|(?=[^\.a-zA-Z\d]. However, now I'm writing the same code in Javascript, and I'm stuck as to how I would get a Javascript regex to do the same thing.
For example, if I have the string sin(4+3)*2, I would need it parsed into ["sin","(","4","+","3",")","*","2"]
What regex would I use to tokenize the string into each individual part.
Before, what I did is I just did a string replace of every possible token, and put a space around it, then split on that whitespace. However, that code quickly became very bloated.
The operators I would need to split on would be the standard math operators (+,-,*,/,^), as well as function names (sin,cos,tan,abs,etc...), and commas
What is a fast, efficient way to do this?
You can take advantage of regular expression grouping to do this. You need a regex that combines the different possible tokens, and you apply it repeatedly.
I like to separate out the different parts; it makes it easier to maintain and extend:
var tokens = [
"sin",
"cos",
"tan",
"\\(",
"\\)",
"\\+",
"-",
"\\*",
"/",
"\\d+(?:\\.\\d*)?"
];
You glue those all together into a big regular expression with | between each token:
var rtok = new RegExp( "\\s*(?:(" + tokens.join(")|(") + "))\\s*", "g" );
You can then tokenize using regex operations on your source string:
function tokenize( expression ) {
var toks = [], p;
rtok.lastIndex = p = 0; // reset the regex
while (rtok.lastIndex < expression.length) {
var match = rtok.exec(expression);
// Make sure we found a token, and that we found
// one without skipping garbage
if (!match || rtok.lastIndex - match[0].length !== p)
throw "Oops - syntax error";
// Figure out which token we matched by finding the non-null group
for (var i = 1; i < match.length; ++i) {
if (match[i]) {
toks.push({
type: i,
txt: match[i]
});
// remember the new position in the string
p = rtok.lastIndex;
break;
}
}
}
return toks;
}
That just repeatedly matches the token regex against the string. The regular expression was created with the "g" flag, so the regex machinery will automatically keep track of where to start matching after each match we make. When it doesn't see a match, or when it does but has to skip invalid stuff to find it, we know there's a syntax error. When it does match, it records in the token array which token it matched (the index of the non-null group) and the matched text. By remembering the matched token index, it saves you the trouble of having to figure out what each token string means after you've tokenized; you just have to do a simple numeric comparison.
Thus calling tokenize( "sin(4+3) * cos(25 / 3)" ) returns:
[ { type: 1, txt: 'sin' },
{ type: 4, txt: '(' },
{ type: 10, txt: '4' },
{ type: 6, txt: '+' },
{ type: 10, txt: '3' },
{ type: 5, txt: ')' },
{ type: 8, txt: '*' },
{ type: 2, txt: 'cos' },
{ type: 4, txt: '(' },
{ type: 10, txt: '25' },
{ type: 9, txt: '/' },
{ type: 10, txt: '3' },
{ type: 5, txt: ')' } ]
Token type 1 is the sin function, type 4 is left paren, type 10 is a number, etc.
edit — if you want to match identifiers like "x" and "y", then I'd probably use a different set of token patterns, with one just to match any identifiers. That'd mean that the parser would not find out directly about "sin" and "cos" etc. from the lexer, but that's OK. Here's an alternative list of token patterns:
var tokens = [
"[A-Za-z_][A-Za-z_\d]*",
"\\(",
"\\)",
"\\+",
"-",
"\\*",
"/",
"\\d+(?:\\.\\d*)?"
];
Now any identifier will be a type 1 token.
I don't know if this will do everything of what you want to achieve, but it works for me:
'sin(4+3)*2'.match(/\d+\.?\d*|[a-zA-Z]+|\S/g);
// ["sin", "(", "4", "+", "3", ")", "*", "2"]
You may replace [a-zA-Z]+ part with sin|cos|tan|etc to support only math functions.
Just offer up a few possibilities:
[a-zA-Z]+|\d+(?:\.\d+)?|.

Categories

Resources