regex: get match and remaining string with regex - javascript

I want to use the RegExp constructor to run a regular expression against a string and let me get both the match and the remaining string.
the above is to be able to implement the following UI pattern:
as you can see in the image I need to separate the match from the rest of the string to be able to apply some style or any other process separately.
/**
* INPUT
*
* input: 'las vegas'
* pattern: 'las'
*
*
* EXPECTED OUTPUT
*
* match: 'las'
* remaining: 'vegas'
*/

Get the match then replace the match with nothing in the string, and return both results.
function matchR(str, regex){
// get the match
var _match = str.match(regex);
// return the first match index, and the remaining string
return {match:_match[0], remaining:str.replace(_match, "")};
}

Here is a function that takes the user input and an array of strings to match as as parameters, and returns an array of arrays:
const strings = [
'Las Cruces',
'Las Vegas',
'Los Altos',
'Los Gatos',
];
function getMatchAndRemaining(input, strings) {
let escaped = input.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
let regex = new RegExp('^(' + escaped + ')(.*)$', 'i');
return strings.map(str => {
return (str.match(regex) || [str, '', str]).slice(1);
});
}
//tests:
['l', 'las', 'los', 'x'].forEach(input => {
let matches = getMatchAndRemaining(input, strings);
console.log(input, '=>', matches);
});
Some notes:
you need to escape the user input before creating the regex, some chars have special meaning
if there is no match, the before part is empty, and the remaining part contains the full string
you could add an additional parameter to the function with style or class to add to the before part, in which case you would return a string instead of an array of [before, remaining]

Related

How can i check for only one occurence only of each provided symbol?

I have a provided array of symbols, which can be different. For instance, like this - ['#']. One occurrence of each symbol is a mandatory. But in a string there can be only one of each provided sign.
Now I do like this:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]\\w+$`);
But it also returns an error on signs like '=' and so on. For example:
/^\w+[#]\w+$/.test('string#=string') // false
So, the result I expect:
'string#string' - ok
'string##string - not ok
Using a complex regex is most likely not the best solution. I think you would be better of creating a validation function.
In this function you can find all occurrence of the provided symbols in string. Then return false if no occurrences are found, or if the list of occurrences contains duplicate entries.
// https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
const escapeRegExp = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
function validate(string, symbols) {
if (symbols.length == 0) {
throw new Error("at least one symbol must be provided in the symbols array");
}
const symbolRegex = new RegExp(symbols.map(escapeRegExp).join("|"), "g");
const symbolsInString = string.match(symbolRegex); // <- null if no match
// string must at least contain 1 occurrence of any symbol
if (!symbolsInString) return false;
// symbols may only occur once
const hasDuplicateSymbols = symbolsInString.length != new Set(symbolsInString).size;
return !hasDuplicateSymbols;
}
const validatedSymbols = ["#", "="];
const strings = [
"string!*string", // invalid (doesn't have "#" nor "=")
"string#!string", // valid
"string#=string", // valid
"string##string", // invalid (max 1 occurance per symbol)
];
console.log("validatedSymbols", "=", JSON.stringify(validatedSymbols));
for (const string of strings) {
const isValid = validate(string, validatedSymbols);
console.log(JSON.stringify(string), "//=>", isValid);
}
I think you are looking for the following:
const regex = new RegExp(`^\\w+[${validatedSymbols.join()}]?\\w+$`);
The question mark means 1 or 0 of the previous group.
You might also need to escape the symbols in validatedSymbols as some symbols have a different meaning in regex
Edit:
For mandatory symbols it would be easier to add a group per symbol:
^\w+(#\w*){1}(#\w*){1}\w+$
Where the group is:
(#\w*){1}

Extract JPA Named Parameters in Javascript

I am trying to extract JPA named parameters in Javasacript. And this is the algorithm that I can think of
const notStrRegex = /(?<![\S"'])([^"'\s]+)(?![\S"'])/gm
const namedParamCharsRegex = /[a-zA-Z0-9_]/;
/**
* #returns array of named parameters which,
* 1. always begins with :
* 2. the remaining characters is guranteed to be following {#link namedParamCharsRegex}
*
* #example
* 1. "select * from a where id = :myId3;" -> [':myId3']
* 2. "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')" -> [':FROM_DATE']
* 3. "TO_CHAR(ep.CHANGEDT,'yyyy=mm-dd hh24:mi:ss')" -> []
*/
export function extractNamedParam(query: string): string[] {
return (query.match(notStrRegex) ?? [])
.filter((word) => word.includes(':'))
.map((splittedWord) => splittedWord.substring(splittedWord.indexOf(':')))
.filter((splittedWord) => splittedWord.length > 1) // ignore ":"
.map((word) => {
// i starts from 1 because word[0] is :
for (let i = 1; i < word.length; i++) {
const isAlphaNum = namedParamCharsRegex.test(word[i]);
if (!isAlphaNum) return word.substring(0, i);
}
return word;
});
}
I got inspired by the solution in
https://stackoverflow.com/a/11324894/12924700
to filter out all characters that are enclosed in single/double quotes.
While the code above fulfilled the 3 use cases above.
But when a user input
const testStr = '"user input invalid string \' :shouldIgnoreThisNamedParam \' in a string"'
extractNamedParam(testStr) // should return [] but it returns [":shouldIgnoreThisNamedParam"] instead
I did visit the source code of hibernate to see how named parameters are extracted there, but I couldn't find the algorithm that is doing the work. Please help.
You can use
/"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g
Get the Group 1 values only. See the regex demo. The regex matches strings between single/double quotes and captures : + one or more word chars in all other contexts.
See the JavaScript demo:
const re = /"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g;
const text = "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')";
let matches=[], m;
while (m=re.exec(text)) {
if (m[1]) {
matches.push(m[1]);
}
}
console.log(matches);
Details:
"[^\\"]*(?:\\[\w\W][^\\"]*)*" - a ", then zero or more chars other than " and \ ([^"\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than " and \, and then a "
| - or
'[^\\']*(?:\\[\w\W][^\\']*)*' - a ', then zero or more chars other than ' and \ ([^'\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than ' and \, and then a '
| - or
(:\w+) - Group 1 (this is the value we need to get, the rest is just used to consume some text where matches must be ignored): a colon and one or more word chars.

How do I correctly apply regex so array does not have empty string at the start

I'm struggling with a regex.
I am able to split the string at the required location, but when it is added to an array, the array has an empty string at the start.
// This is the string I am wanting to split.
// I want the first 4 words to be separated from the remainder of the string
const chatMessage = "This is a string that I want to split";
// I am using this regex
const r = /(^(?:\S+\s+\n?){4})/;
const chatMessageArr = chatMessage.split(r);
console.log(chatMessageArr);
It returns:
[ '', 'This is a string ', 'that I want to split' ]
But need it to return:
[ 'This is a string ', 'that I want to split' ]
I wouldn't use string split here, I would use a regex replacement:
var chatMessage = "This is a string that I want to split";
var first = chatMessage.replace(/^\s*(\S+(?:\s+\S+){3}).*$/, "$1");
var last = chatMessage.replace(/^\s*\S+(?:\s+\S+){3}\s+(.*$)/, "$1");
console.log(chatMessage);
console.log(first);
console.log(last);
Add a second capture group to the regexp and use .match() instead of .split().
// This is the string I am wanting to split.
// I want the first 4 words to be separated from the remainder of the string
const chatMessage = "This is a string that I want to split";
// I am using this regex
const r = /(^(?:\S+\s+\n?){4})(.*)/;
const chatMessageArr = chatMessage.match(r);
chatMessageArr.shift(); // remove the full match
console.log(chatMessageArr);

find the pattern and dynamically build regex to match the string

If asterisk * is present in the pattern, then it means a sequence of the same character of length 3 unless it is followed by {N} which represents how many characters should appear in the sequence where N will be at least 1. My goal is to determine if the second string exactly matches the pattern of the first string in the input. I'm having trouble building the Regex pattern
*{2}* mmRRR should return TRUE
*{2}* mRRR should return FALSE
https://jsfiddle.net/82smw9zx/
sample code::
pattern1 = /'queryStrSubStr.charAt(0){patternCount}'/;
var patternMatch = new RegExp(pattern1);
if(queryStrSubStr.match(patternMatch)) {
result = true;
} else result = false;
You need to use new RegExp() to construct your regex pattern with variables (rather than attempting to include a variable directly in your regular expression literal).
You are trying to include variables queryStrSubStr.charAt(0) and patternCount in a regular expression literal like: /'queryStrSubStr.charAt(0){patternCount}'/, but JavaScript does not interpret those strings as variables inside the literal.
Following example demonstrates how to construct your regex pattern with variables as well as incorporating the html input from your fiddle so that you can test various patterns. Code comments explain how the code works.
$('.btn').click(() => {
const result = wildcards($('.enter_pattern').val());
console.log(result);
});
const wildcards = (s) => {
if (s.startsWith('*')) { // if input string starts with *
let pattern;
let [count, text] = s.split(' '); // split input string into count and text
count = count.match(/\{\d+\}/); // match count pattern like {n}
if (count) { // if there is a count
pattern = new RegExp(text.charAt(0) + count); // regex: first character + matched count pattern
} else { // if there is no count
pattern = new RegExp(text.charAt(0) + '{3}'); // regex: first character + default pattern {3}
}
return !!s.match(pattern); // return true if text matches pattern or false if not
} else { // if input string does not start with *
return 'No pattern';
}
};
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" class="enter_pattern" />
<button type="submit" class="btn">Click</button>
/*
Example test output:
Input: *{2}* mmRRR
Log: true
Input: *{2}* mRRR
Log: false
Input: * mmmRRR
Log: true
Input: * mmRRR
Log: false
Input: mmRRR
Log: No pattern
*/
First you need to calulate the pattern using a regex:
/\*\{(\d+)\}\*/
It matches a star, a left Square bracket, followed by one or more digits and ending with a right Square bracket and a star.
How to use:
var text = 'mmRRR';
var char = text.charAt(0);
var pattern = '*{2}*';
var counter = /\*\{(\d+)\}\*/.exec(pattern)[1] || '3';
var regex = new RegeExp('^' + char + '\{' + counter + '}$');
var result = text.match(regex);

How do you split a javascript string by spaces and punctuation?

I have some random string, for example: Hello, my name is john.. I want that string split into an array like this: Hello, ,, , my, name, is, john, .,. I tried str.split(/[^\w\s]|_/g), but it does not seem to work. Any ideas?
To split a str on any run of non-word characters I.e. Not A-Z, 0-9, and underscore.
var words=str.split(/\W+/); // assumes str does not begin nor end with whitespace
Or, assuming your target language is English, you can extract all semantically useful values from a string (i.e. "tokenizing" a string) using:
var str='Here\'s a (good, bad, indifferent, ...) '+
'example sentence to be used in this test '+
'of English language "token-extraction".',
punct='\\['+ '\\!'+ '\\"'+ '\\#'+ '\\$'+ // since javascript does not
'\\%'+ '\\&'+ '\\\''+ '\\('+ '\\)'+ // support POSIX character
'\\*'+ '\\+'+ '\\,'+ '\\\\'+ '\\-'+ // classes, we'll need our
'\\.'+ '\\/'+ '\\:'+ '\\;'+ '\\<'+ // own version of [:punct:]
'\\='+ '\\>'+ '\\?'+ '\\#'+ '\\['+
'\\]'+ '\\^'+ '\\_'+ '\\`'+ '\\{'+
'\\|'+ '\\}'+ '\\~'+ '\\]',
re=new RegExp( // tokenizer
'\\s*'+ // discard possible leading whitespace
'('+ // start capture group
'\\.{3}'+ // ellipsis (must appear before punct)
'|'+ // alternator
'\\w+\\-\\w+'+ // hyphenated words (must appear before punct)
'|'+ // alternator
'\\w+\'(?:\\w+)?'+ // compound words (must appear before punct)
'|'+ // alternator
'\\w+'+ // other words
'|'+ // alternator
'['+punct+']'+ // punct
')' // end capture group
);
// grep(ary[,filt]) - filters an array
// note: could use jQuery.grep() instead
// #param {Array} ary array of members to filter
// #param {Function} filt function to test truthiness of member,
// if omitted, "function(member){ if(member) return member; }" is assumed
// #returns {Array} all members of ary where result of filter is truthy
function grep(ary,filt) {
var result=[];
for(var i=0,len=ary.length;i++<len;) {
var member=ary[i]||'';
if(filt && (typeof filt === 'Function') ? filt(member) : member) {
result.push(member);
}
}
return result;
}
var tokens=grep( str.split(re) ); // note: filter function omitted
// since all we need to test
// for is truthiness
which produces:
tokens=[
'Here\'s',
'a',
'(',
'good',
',',
'bad',
',',
'indifferent',
',',
'...',
')',
'example',
'sentence',
'to',
'be',
'used',
'in',
'this',
'test',
'of',
'English',
'language',
'"',
'token-extraction',
'"',
'.'
]
EDIT
Also available as a Github Gist
Try this (I'm not sure if this is what you wanted):
str.replace(/[^\w\s]|_/g, function ($1) { return ' ' + $1 + ' ';}).replace(/[ ]+/g, ' ').split(' ');
http://jsfiddle.net/zNHJW/3/
Try:
str.split(/([_\W])/)
This will split by any non-alphanumeric character (\W) and any underscore. It uses capturing parentheses to include the item that was split by in the final result.
This solution caused a challenge with spaces for me (still needed them), then I gave str.split(/\b/) a shot and all is well. Spaces are output in the array, which won't be hard to ignore, and the ones left after punctuation can be trimmed out.

Categories

Resources