I am trying to extract JPA named parameters in Javasacript. And this is the algorithm that I can think of
const notStrRegex = /(?<![\S"'])([^"'\s]+)(?![\S"'])/gm
const namedParamCharsRegex = /[a-zA-Z0-9_]/;
/**
* #returns array of named parameters which,
* 1. always begins with :
* 2. the remaining characters is guranteed to be following {#link namedParamCharsRegex}
*
* #example
* 1. "select * from a where id = :myId3;" -> [':myId3']
* 2. "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')" -> [':FROM_DATE']
* 3. "TO_CHAR(ep.CHANGEDT,'yyyy=mm-dd hh24:mi:ss')" -> []
*/
export function extractNamedParam(query: string): string[] {
return (query.match(notStrRegex) ?? [])
.filter((word) => word.includes(':'))
.map((splittedWord) => splittedWord.substring(splittedWord.indexOf(':')))
.filter((splittedWord) => splittedWord.length > 1) // ignore ":"
.map((word) => {
// i starts from 1 because word[0] is :
for (let i = 1; i < word.length; i++) {
const isAlphaNum = namedParamCharsRegex.test(word[i]);
if (!isAlphaNum) return word.substring(0, i);
}
return word;
});
}
I got inspired by the solution in
https://stackoverflow.com/a/11324894/12924700
to filter out all characters that are enclosed in single/double quotes.
While the code above fulfilled the 3 use cases above.
But when a user input
const testStr = '"user input invalid string \' :shouldIgnoreThisNamedParam \' in a string"'
extractNamedParam(testStr) // should return [] but it returns [":shouldIgnoreThisNamedParam"] instead
I did visit the source code of hibernate to see how named parameters are extracted there, but I couldn't find the algorithm that is doing the work. Please help.
You can use
/"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g
Get the Group 1 values only. See the regex demo. The regex matches strings between single/double quotes and captures : + one or more word chars in all other contexts.
See the JavaScript demo:
const re = /"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g;
const text = "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')";
let matches=[], m;
while (m=re.exec(text)) {
if (m[1]) {
matches.push(m[1]);
}
}
console.log(matches);
Details:
"[^\\"]*(?:\\[\w\W][^\\"]*)*" - a ", then zero or more chars other than " and \ ([^"\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than " and \, and then a "
| - or
'[^\\']*(?:\\[\w\W][^\\']*)*' - a ', then zero or more chars other than ' and \ ([^'\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than ' and \, and then a '
| - or
(:\w+) - Group 1 (this is the value we need to get, the rest is just used to consume some text where matches must be ignored): a colon and one or more word chars.
Related
I need to manipulate drawing of a SVG, so I have attribute "d" values like this:
d = "M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985"
What I want is to "purify" all the values (to strip the chars from them), to calculate them (for the sake of simplicity, let's say to add 100 to each value), to deconstruct the string, calculate the values inside and then concatenate it all back together so the final result is something like this:
d = "M661.5402,368.917 C735.622,368.917 404.476,665.985 479.298,665.985"
Have in mind that:
some values can start with a character
values are delimited by comma
some values within comma delimiter can be delimited by space
values are decimal
This is my try:
let arr1 = d.split(',');
arr1 = arr1.map(element => {
let arr2 = element.split(' ');
if (arr2.length > 1) {
arr2 = arr2.map(el => {
let startsWithChar = el.match(/\D+/);
if (startsWithChar) {
el = el.replace(/\D/g,'');
}
el = parseFloat(el) + 100;
if (startsWithChar) {
el = startsWithChar[0] + el;
}
})
}
else {
let startsWithChar = element.match(/\D+/);
if (startsWithChar) {
element = element.replace(/\D/g,'');
}
element = parseFloat(element) + 100;
if (startsWithChar) {
element = startsWithChar[0] + element;
}
}
});
d = arr1.join(',');
I tried with regex replace(/\D/g,'') but then it strips the decimal dot from the value also, so I think my solution is full of holes.
Maybe another solution would be to somehow modify directly each of path values/commands, I'm opened to that solution also, but I don't know how.
const s = 'M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985'
console.log(s.replaceAll(/[\d.]+/g, m=>+m+100))
You might use a pattern to match the format in the string with 2 capture groups.
([ ,]?\b[A-Z]?)(\d+\.\d+)\b
The pattern matches:
( Capture group 1
[ ,]?\b[A-Z]? Match an optional space or comma, a word boundary and an optional uppercase char A-Z
) Close group 1
( Capture group 2
\d+\.\d+ Match 1+ digits, a dot and 1+ digits
) Close group 1
\b A word boundary to prevent a partial word match
Regex demo
First capture the optional delimiter followed by an optional uppercase char in group 1, and the decimal number in group 2.
Then add 100 to the decimal value and join back the 2 group values.
const d = "M561.5402,268.917 C635.622,268.917 304.476,565.985 379.298,565.985";
const regex = /([ ,]?\b[A-Z]?)(\d+\.\d+)\b/g;
const res = Array.from(
d.matchAll(regex), m => m[1] + (+m[2] + 100)
).join('');
console.log(res);
I want to use the RegExp constructor to run a regular expression against a string and let me get both the match and the remaining string.
the above is to be able to implement the following UI pattern:
as you can see in the image I need to separate the match from the rest of the string to be able to apply some style or any other process separately.
/**
* INPUT
*
* input: 'las vegas'
* pattern: 'las'
*
*
* EXPECTED OUTPUT
*
* match: 'las'
* remaining: 'vegas'
*/
Get the match then replace the match with nothing in the string, and return both results.
function matchR(str, regex){
// get the match
var _match = str.match(regex);
// return the first match index, and the remaining string
return {match:_match[0], remaining:str.replace(_match, "")};
}
Here is a function that takes the user input and an array of strings to match as as parameters, and returns an array of arrays:
const strings = [
'Las Cruces',
'Las Vegas',
'Los Altos',
'Los Gatos',
];
function getMatchAndRemaining(input, strings) {
let escaped = input.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
let regex = new RegExp('^(' + escaped + ')(.*)$', 'i');
return strings.map(str => {
return (str.match(regex) || [str, '', str]).slice(1);
});
}
//tests:
['l', 'las', 'los', 'x'].forEach(input => {
let matches = getMatchAndRemaining(input, strings);
console.log(input, '=>', matches);
});
Some notes:
you need to escape the user input before creating the regex, some chars have special meaning
if there is no match, the before part is empty, and the remaining part contains the full string
you could add an additional parameter to the function with style or class to add to the before part, in which case you would return a string instead of an array of [before, remaining]
How do I slice, and get only string values excluding characters inside {} braces.
I'm able to get characters inside curly braces, but how to get contents outside the curly braces?
I have a string like this:
name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}
I want output array like this:
["the name is","and disease is","and drug is"]
Sub-Strings Outside Bracketed Sub-Strings:
If you want all sub-strings outside of the curly braces then you can use the String.split() method with a simple Regular Expression:
String strg = "name is {response.vital.name} and disease is " +
"{response.vital.some} and drug is {response.vital.dis}";
String[] array = strg.split("\\{.*?\\}");
and to simply display the sub-strings within the array to console:
System.out.println(Arrays.toString(array));
Console output will be:
[name is , and disease is , and drug is]
Regular Expression Explanation:
\\ matches the character \ literally (case sensitive). We need to apply this escape character since all the brackets or braces we may want to work against are RegEx special characters and must be escaped in order to use them as literal characters. And, because the Regular Expression is a string we need to escape the escape character \\ to make it legal. You can not place a single escape character into a Java String.
{ matches the character { literally (case sensitive).
.*? matches any character (except for line terminators).
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy).
\\ matches the character \ literally (case sensitive). Again we need to escape the following close bracket.
} matches the character } literally (case sensitive).
You could create a method to do this and accommodate any type of brace:
/**
* Retrieves all sub-strings on either side of the supplied open brace and close brace.
* Everything within braces (including the braces) is excluded.<br>
*
* #param inputString (String) The string to process. if Null or Null String is
* supplied then this method will throw a IllegalArgumentException exception.<br><br>
*
* #param openBrace (String) The Open Brace of which to ignore content within.
* This can be Parentheses ( ( ), Square Brackets ( [ ), Curly Braces ( { ), or
* Chevron Brackets ( < ). if Null or Null String is supplied then this method
* will throw a IllegalArgumentException exception.<br>
*
* #param closeBrace (String) The Close Brace of which to ignore content within.
* This can be Parentheses ( ) ), Square Brackets ( ] ), Curly Braces ( } ), or
* Chevron Brackets ( > ). if Null or Null String is supplied then this method
* will throw a IllegalArgumentException exception.<br>
*
* #param trimElements (Optional - Boolean - Default is false) If this optional
* parameter is set to boolean true then the returned array elements will have
* any leading or trailing white-spaces removed.<br>
*
* #return (Single Dimensional String Array) The detected sub-strings within a
* 1D String Array.
*/
public String[] getSubstringsOutsideBraces(String inputString, String openBrace,
String closeBrace, boolean... trimElements) {
if (inputString == null || inputString.equals("") ||
openBrace == null || openBrace.equals("") ||
closeBrace == null || closeBrace.equals("")) {
throw new IllegalArgumentException("\ngetSubstringsOutsideBraces() Method Error! "
+ "A method argument contains Null or Null String!\n");
}
boolean trim = false;
if (trimElements.length > 0) {
trim = trimElements[0];
}
String[] array = inputString.split(
// Using Ternary Operator so as to apply an additional
// RegEx to the split() method in order to remove the
// need for IF/ELSE and a FOR loop to trim elements if
// desired to do so.
(trim) ? "\\s*\\" + openBrace + ".*?\\" + closeBrace +
"\\s*" : "\\" + openBrace + ".*?\\" + closeBrace
);
return array;
}
And to use this method you might do it this way:
String strg = "name is {response.vital.name} and disease is "
+ "{response.vital.some} and drug is "
+ "{response.vital.dis} More string content.";
String[] a = getSubstringsOutsideBraces(strg, "{", "}", true);
System.out.println(Arrays.toString(a));
Console output will be:
[name is, and disease is, and drug is, More string content.]
Get Strings Inside Bracketed Sub-Strings:
If what you are wanting to do is to collect the sub-string(s) or token(s) contained between Brackets or Braces within a string then a Pattern/Matcher approach works quite well for this task:
// Apply the Regular Expression to Pattern
Pattern regex = Pattern.compile("\\{(.*?)\\}");
// Run the pattern to see if there are any matches.
Matcher matcher = regex.matcher(inputString);
// Establish a String List Object to hold any matches.
List<String> list = new ArrayList<>();
// Retrieve any matches found and add to List.
while (matcher.find()) {
list.add(matcher.group(1));
}
Regular Expression Explanation:
Here is the explanation for the Regular Expression supplied to Pattern:
\\ matches the character \ literally (case sensitive). We need to apply this escape character since all the brackets or braces we may want to work against are RegEx special characters and must be escaped in order to use them as literal characters. And, because the Regular Expression is a string we need to escape the escape character \\ to make it legal. You can not place a single escape character into a Java String.
{ matches the character { literally (case sensitive).
1st Capturing Group (.*?) The Regular Expression indicates only 1 group.
.*? matches any character (except for line terminators).
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy).
\\ matches the character \ literally (case sensitive). Again we need to escape the following close bracket.
} matches the character } literally (case sensitive).
For ease of use, this concept can also be placed into and used as a method, for example:
/**
* Retrieves all sub-strings inside of the supplied open brace and close brace.
* Everything outside of braces (including the braces) is excluded.<br>
*
* #param inputString (String) The string to process. if Null or Null String is
* supplied then this method will throw a IllegalArgumentException exception.<br>
*
* #param openBrace (String) The Open Brace of which to get content within.
* This can be Parentheses ( ( ), Square Brackets ( [ ), Curly Braces ( { ), or
* Chevron Brackets ( < ). if Null or Null String is supplied then this method
* will throw a IllegalArgumentException exception.<br>
*
* #param closeBrace (String) The Close Brace of which to get content within.
* This can be Parentheses ( ) ), Square Brackets ( ] ), Curly Braces ( } ), or
* Chevron Brackets ( > ). if Null or Null String is supplied then this method
* will throw a IllegalArgumentException exception.<br>
*
* #param trimElements (Optional - Boolean - Default is false) If this optional
* parameter is set to boolean true then the returned array elements will have
* any leading or trailing white-spaces removed.<br>
*
* #return (Single Dimensional String Array) The detected sub-strings within a
* 1D String Array.
*/
public String[] getSubstringsInsideBraces(String inputString, String openBrace,
String closeBrace, boolean... trimElements) {
// Make sure all arguments are supplied...
if (inputString == null || inputString.equals("") ||
openBrace == null || openBrace.equals("") ||
closeBrace == null || closeBrace.equals("")) {
throw new IllegalArgumentException("\ngetSubstringsInsideBraces() Method Error! "
+ "A method argument contains Null or Null String!\n");
}
// See if the option to trim bracketed content is desired.
boolean trim = false;
if (trimElements.length > 0) {
trim = trimElements[0];
}
// Apply the Regular Expression to Pattern. The supplied
// open and close braces are used within the expression.
Pattern regex = Pattern.compile("\\" + openBrace + "(.*?)\\" + closeBrace);
// Run the pattern to see if there are any matches.
Matcher matcher = regex.matcher(inputString);
// Establish a String List Object to hold any matches.
List<String> list = new ArrayList<>();
// Retrieve any matches found and add to List.
while (matcher.find()) {
// Using a ternary operator against whether or not to
// Trim the matched item if desired before adding it
// to the List. This saves on doing a IF/THEN/ELSE.
list.add((trim) ? matcher.group(1).trim() : matcher.group(1));
}
// Convert the List to a 1D String Array
if (list.isEmpty()) { return null; }
String[] array = new String[list.size()];
array = list.toArray(array);
// Return the 1D String Array
return array;
}
To use the above method:
String strg = "name is {response.vital.name} and disease is "
+ "{response.vital.some } and drug is "
+ "{response.vital.dis } More string content.";
String[] a = getSubstringsInsideBraces(strg, "{", "}", true);
System.out.println(Arrays.toString(a));
The console output will be:
[response.vital.name, response.vital.some, response.vital.dis]
If you want the above method to return a List object (which is what it fills to start with) then simply remove the conversion code within the method, instead of return array; use return list;, and declare your method to return a List<String> instead of String[]. Of course change the text within the JavaDoc above the method to reflect this change.
In JavaScript you can do the following:
s = "name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}";
a = s.split(/\s*\{[^\}]*\}\s*/)
The Java solution looks quite similar:
String s = "name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}";
String[] a = s.split("\\s*\\{[^\\}]*\\}\\s*");
A solution in JS:
var str = "name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}";
var res = str.split('{').reduce((m, o) => {
if (o.indexOf('}') > -1 && o.split('}')[1]) {
m.push(o.split('}')[1].trim());
} else {
m.push(o.trim());
}
return m;
}, []);
console.log(res);
Instead of matching content inside brackets you can just split by content between brackets.
In JavaScript you can use .split() function with this regex /{\[\w\.\]+}/i:
var str = 'name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}';
var arr = str.split(/{[\w\.]+}/).filter(s => s != "");
Demo:
var str = 'name is {response.vital.name} and disease is {response.vital.some} and drug is {response.vital.dis}';
var arr = str.split(/{[\w\.]+}/).filter(s => s != "");
console.log(arr);
So I have this Regular expression, which basically has to filter the given string to a HTML(5) format list of attributes. It currently isn't doing my fulfilling, but that's about to change! (I hope so)
I'm trying to achieve that whenever an occurrence is found, it selects the text until the next occurrence OR the end of the string, as the second match. So if you'd take a look at the current regular expression:
/([a-zA-Z]+|[a-zA-Z]+-[a-zA-Z0-9]+)=["']/g
A string like this: hey="hey world" hey-heyhhhhh3123="Hello world" data-goed="hey"
Would be filtered / matched out like this:
MATCH 1. [0-3] `hey`
MATCH 2. [16-32] `hey-heyhhhhh3123`
MATCH 3. [47-56] `data-goed`
This has to be seen as the attribute-name(s), and now.. we just have to fetch the attribute's value(s). So the mentioned string has to have an outcome like this:
MATCH 1.
1 [0-3] `hey`
2 [6-14] `hey world`
MATCH 2.
1 [16-32] `hey-heyhhhhh3123`
2 [35-45] `Hello world`
MATCH 3.
1 [47-56] `data-goed`
2 [59-61] `hey`
Could anyone try and help me to get my fulfilling? It would be appericiated a lot!
You can use
/([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g
See regex demo
Pattern details:
([^\s=]+) - Group 1 capturing 1 or more characters other than whitespace and = symbol
= - an equal sign
(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+)) - a non-capturing group of 2 alternatives (one more '([^'\\]*(?:\\.[^'\\]*)*)' alternative can be added to account for single quoted string literals)
"([^"\\]*(?:\\.[^"\\]*)*)" - a double quoted string literal pattern:
" - a double quote
([^"\\]*(?:\\.[^"\\]*)*) - Group 2 capturing 0+ characters other than \ and ", followed with 0+ sequences of any escaped symbol followed with 0+ characters other than \ and "
" - a closing dlouble quote
| - or
(\S+) - Group 3 capturing one or more non-whitespace characters
JS demo (no single quoted support):
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|(\S+))/g;
var str = 'hey="hey world" hey-heyhhhhh3123="Hello \\"world\\"" data-goed="hey" more=here';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push([m[1], m[3]]);
} else {
res.push([m[1], m[2]]);
}
}
console.log(res);
JS demo (with single quoted literal support)
var re = /([^\s=]+)=(?:"([^"\\]*(?:\\.[^"\\]*)*)"|'([^'\\]*(?:\\.[^'\\]*)*)'|(\S+))/g;
var str = 'pseudoprefix-before=\'hey1"\' data-hey="hey\'hey" more=data and="more \\"here\\""';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[2]) {
res.push([m[1], m[2]])
} else if (m[3]) {
res.push([m[1], m[3]])
} else if (m[4]) {
res.push([m[1], m[4]])
}
}
console.log(res);
var keys = {};
source.replace(
/([^=&]+)=([^&]*)/g,
function(full, key, value) {
keys[key] =
(keys[key] ? keys[key] + "," : "") + value;
return "";
}
);
var result = [];
for (var key in keys) {
result.push(key + "=" + keys[key]);
}
return result.join("&");
}
alert(compress("foo=1&foo=2&blah=a&blah=b&foo=3"));
i still confuse with this /([^=&]+)=([^&]*)/g , the + and * use for ?
The ^ means NOT these, the + means one or more characters matching, the () are groups. And the * is any ammount of matches (0+).
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
So by looking at it, I'm guesing its replacing anything thats NOT =&=& or &=& or ==, which is wierd.
+ and * are called quantifiers. They determine how many times can a subset match (the set of characters immediately preceding them usually grouped with [] or () to which the quantifiers apply) repeat.
/ start of regex
( group 1 starts
[^ anything that does not match
=& equals or ampersand
]+ one or more of above
) group 1 ends
= followed by equals sign followed by
( group 2 starts
[^ anything that does not match
=& ampersand
]* zero or more of above
) group 2 ends
/ end of regex