Using JS to modify user input for REGEXP search - javascript

I'm taking user input from a searchbar and modifying it to a regexp. From there I can search a json file for valid values and return them. It works fine with input without quotes, but with them, I'm appending "\Q" and "\E" so I can find the entirety of the string (with spaces and other special characters).
if (searchField.includes('"')){
var tempexpress = searchField.substring(1,searchField.length-1);
var tempexpress = "\\Q" + tempexpress + "\\E";
var expression = new RegExp(tempexpress);
} else {
var tempexpress = searchField.replace('(',"\\(");
var tempexpress = tempexpress.replace(')',"\\)");
var tempexpress = tempexpress.replace(/'/g,"\\'");
var tempexpress = tempexpress.replace('*',"\.");
var expression = new RegExp(tempexpress, "i");
};
if (value.data.label.search(expression) != -1){
console.log('found it');
}
If I input "QTT6" into the search field (with quotes for a literal), then it creates the following regexp: /\QQTT6\E/
In my testing, I found that it doesn't match to QTT6 for some reason and I'm not sure why. Any help is appreciated.
Also I'm very new to JS and Jquery, so sorry if my code isn't very well put together.

Per Kelly's comment:
In JS you need to use ^ and $ instead of \Q and \E.
For more information, see the MDN docs on Regex Assertions:
^:
Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".
Note: This character has a different meaning when it appears at the start of a character class.
$:
Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, /t$/ does not match the "t" in "eater", but does match it in "eat".

Related

Replce repeating set of character from end of string using regex

I want to remove all <br> from the end of this string. Currently I am doing this (in javascript) -
const value = "this is an event. <br><br><br><br>"
let description = String(value);
while (description.endsWith('<br>')) {
description = description.replace(/<br>$/, '');
}
But I want to do it without using while loop, by only using some regex with replace. Is there a way?
To identify the end of the string in RegEx, you can use the special $ symbol to denote that.
To identify repeated characters or blocks of text containing certain characters, you can use + symbol.
In your case, the final regex is: (<br>)*$
This will remove 0 or more occurrence of <br> from the end of the line.
Example:
const value = "this is an event. <br><br><br><br>"
let description = String(value);
description.replace(/(<br>)*$/g, '');
You may try:
var value = "this is an event. <br><br><br><br>";
var output = value.replace(/(<.*?>)\1*$/, "");
console.log(output);
Here is the regex logic being used:
(<.*?>) match AND capture any HTML tag
\1* then match that same tag zero or more additional times
$ all tags occurring at the end of the string

Javascript Regex match everything after last occurrence of string

I am trying to match everything after (but not including!) the last occurrence of a string in JavaScript.
The search, for example, is:
[quote="user1"]this is the first quote[/quote]\n[quote="user2"]this is the 2nd quote and some url https://www.google.com/[/quote]\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
Edit: I'm looking to match everything after the last quote block. So I was trying to match everything after the last occurrence of "quote]" ? Idk if this is the best solution but its what i've been trying.
I'll be honest, i suck at this Regex stuff.. here is what i've been trying with the results..
regex = /(quote\].+)(.*)/ig; // Returns null
regex = /.+((quote\]).+)$/ig // Returns null
regex = /( .* (quote\]) .*)$/ig // Returns null
I have made a JSfiddle for anyone to have a play with here:
https://jsfiddle.net/au4bpk0e/
One option would be to match everything up until the last [/quote], and then get anything following it. (example)
/.*\[\/quote\](.*)$/i
This works since .* is inherently greedy, and it will match every up until the last \[\/quote\].
Based on the string you provided, this would be the first capturing group match:
\nThis is all the text I\'m wirting about myself.\n\nLook at me ma. Javascript.
But since your string contains new lines, and . doesn't match newlines, you could use [\s\S] in place of . in order to match anything.
Updated Example
/[\s\S]*\[\/quote\]([\s\S]*)$/i
You could also avoid regex and use the .lastIndexOf() method along with .slice():
Updated Example
var match = '[\/quote]';
var textAfterLastQuote = str.slice(str.lastIndexOf(match) + match.length);
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;
Alternatively, you could also use .split() and then get the last value in the array:
Updated Example
var textAfterLastQuote = str.split('[\/quote]').pop();
document.getElementById('res').innerHTML = "Results: " + textAfterLastQuote;

Tell If A Multi-Word String Contains A Word

Goal:
(The reason I don't think this is a duplicate is that it involves matching the start of each word in the string, not just anything in the string)
I'm using Javascript/jQuery. Say I have a sting, which is:
Muncie South Gateway Project
I'm creating a live search box, which checks the input against the string with each keystroke. I'd like to return a match if the input matches the beginning of a word, not the middle. Example:
Mu = Match
Muncie = Match
unc = No Match
cie = No Match
Gatewa = Match
atewa = No Match
What I have
I currently am using this as my check:
if (new RegExp(input)).test(string.toLowerCase()) {return '1';}
However, this matches all letters including letters in the middle of the word. With it, then my examples get this result:
M = Match
Mu = Match
Mun = Match
Muncie = Match
unc = Match // Should not match
cie = Match // Should not match
Gatewa = Match
atewa = Match // Should not match
Question:
I know this can be done by breaking the string apart into separate words and testing each word. But I'm not sure how efficient that would be. Is there a good way to do this?
You can use word boundaries to make sure given input matches only at start of word character:
if (new RegExp("\\b" + input)).test(string.toLowerCase()) {return '1';}
Working Demo
EDIT: As per comment below you can use:
var re = new RegExp("(?:^|\\s)" + input, "i"));
if (re.test(string)) {return '1';}
I would use StartsWith and Contains:
// input
instring // the user string
searchstr // the string to search
if (searchstr.ToLower().StartsWith(instring.ToLower())
|| searchstr.ToLower().Contains(" "+instring.ToLower()))
{
// etc

File path validation in javascript

I am trying to validate XML file path in javascript. My REGEX is:
var isValid = /^([a-zA-Z]:)?(\\{2}|\/)?([a-zA-Z0-9\\s_#-^!#$%&+={}\[\]]+(\\{2}|\/)?)+(\.xml+)?$/.test(str);
It returns true even when path is wrong.
These are valid paths
D:/test.xml
D:\\folder\\test.xml
D:/folder/test.xml
D:\\folder/test.xml
D:\\test.xml
At first the obvious errors:
+ is a repeat indicator that has the meaning at least one.
so the (\.xml+) will match everything starting with .xm followed by one or more l (it would also match .xmlllll). the ? means optional, so (\.xml+)? has the meaning it could have an .xml but it is not required.
the same is for ([a-zA-Z]:)? this means the driver letter is optional.
Now the not so obvious errors
[a-zA-Z0-9\\s_#-^!#$%&+={}\[\]] here you define a list of allowed chars. you have \\s and i assume you want to allow spaces, but this allows \ and s so you need to change it to \s. then you have this part #-^ i assume you want to allow #, - and ^ but the - has a special meaning inside of [ ] with it you define a range so you allow all chars that are in the range of # to ^ if you want to allow - you need to escape it there so you have to write #\-^ you also need to take care about ^, if it is right after the [ it would have also a special meaning.
your Regex should contain the following parts:
^[a-z]: start with (^) driver letter
((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+ followed by one or more path parts that start with either \ or / and having a path name containing one or more of your defined letters (a-z0-9\s_#\-^!#$%&+={}\[\])
\.xml$ ends with ($) the .xml
therefore your final regex should look like this
/^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(str)
(under the assumption you do a case insensitve regex using the i flag)
EDIT:
var path1 = "D:/test.xml"; // D:/test.xml
var path2 = "D:\\folder\\test.xml"; // D:\folder\test.xml
var path3 = "D:/folder/test.xml"; // D:/folder/test.xml
var path4 = "D:\\folder/test.xml"; // D:\folder/test.xml
var path5 = "D:\\test.xml"; // D:\test.xml
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path1) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path2) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path3) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path4) );
console.log( /^[a-z]:((\\|\/)[a-z0-9\s_#\-^!#$%&+={}\[\]]+)+\.xml$/i.test(path5) );
UPDATE:
you need to take care about the / and the \ if you need to escape them depends on if you use it with new RegExp(' ... the regex ... ',"i") and new RegExp(" ... the regex ... ","i") or with / ... the regex ... /i
for further informations about regular expressions you should take a look at e.g. www.regular-expressions.info
This could work out for you
var str = 'D:/test.xml';
var str2 = 'D:\\folder\\test.xml';
var str3 = 'D:/folder/test.xml';
var str4 = 'D:\\folder/test.xml';
var str5 = 'D:\\test\\test\\test\\test.xml';
var regex = new RegExp('^[a-z]:((\\\\|\/)[a-zA-Z0-9_ \-]+)+\.xml$', 'i');
regex.test(str5);
The reason of having \\\\ in RegExp to match a \\ in string is that javascript uses \ to escape special characters, i.e., \n for new lines, \b for word boundary etc. So to use a literal \, use \\. It also allows you to have different rules for file name and folder name.
Update
[a-zA-Z0-9_\-]+ this section of regexp actually match file/folder name. So to allow more characters in file/folder name, just add them to this class, e.g., to allow a * in file/folder name make it [a-zA-Z0-9_\-\*]+
Update 2
For adding to the answer, following is an RegExp that adds another check to the validation, i.e., it checks for mixing of / and \\ in the path.
var str6 = 'D:/This is folder/test # file.xml';
var str7 = 'D:/This is invalid\\path.xml'
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml?', 'gi');
regex2 will match all paths but str7
Update
My apologies for mistyping a ? instead of $ in regex2. Below is the corrected and intended version
var regex2 = new RegExp('^[a-z]:(\/|\\\\)([a-zA-Z0-9_ \-]+\\1)*[a-zA-Z0-9_ #\-]+\.xml$', 'i');
Tested using Scratchpad.
var regex = /^[a-z]:((\/|(\\?))[\w .]+)+\.xml$/i;
Prints true in Web Console: (Ctrl+Shift+K on Firefox)
console.log(regex.test("D:/test.xml"));
console.log(regex.test("D:\\folder\\test.xml"));
console.log(regex.test("D:/folder/test.xml"));
console.log(regex.test("D:\\folder/test.xml"));
console.log(regex.test("D:\\test.xml"));
console.log(regex.test("D:\\te st_1.3.xml")); // spaces, dots allowed
Or, using Alert boxes:
alert(regex.test("D:/test.xml"));
alert(regex.test("D:\\folder\\test.xml"));
alert(regex.test("D:/folder/test.xml"));
alert(regex.test("D:\\folder/test.xml"));
alert(regex.test("D:\\test.xml"));
alert(regex.test("D:\\te st_1.3.xml"));
Invalid file paths:
alert(regex.test("AD:/test.xml")); // invalid drive letter
alert(regex.test("D:\\\folder\\test.xml")); // three backslashes
alert(regex.test("/folder/test.xml")); // drive letter missing
alert(regex.test("D:\\folder/test.xmlfile")); // invalid extension

Javascript regular expression to replace word but not within curly brackets

I have some content, for example:
If you have a question, ask for help on StackOverflow
I have a list of synonyms:
a={one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight}
ask={question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate}
I'm using JavaScript to:
Split synonyms based on =
Looping through every synonym, if found in content replace with {...|...}
The output should look like:
If you have {one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight} question, {question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate} for help on StackOverflow
Problem:
Instead of replacing the entire word, it's replacing every character found. My code:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp(word, "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
It should replace content word with synonym which should not be in {...|...}.
When you build the regexps, you need to include word boundary anchors at both the beginning and the end to match whole words (beginning and ending with characters from [a-zA-Z0-9_]) only:
var match = new RegExp("\\b" + word + "\\b", "ig");
Depending on the specific replacements you are making, you might want to apply your method to individual words (rather than to the entire text at once) matched using a regexp like /\w+/g to avoid replacing words that themselves are the replacements for others. Something like:
content = content.replace(/\w+/g, function(word) {
for(var i = 0, L = allSyn.length; i < L; ++i) {
var rtnSyn = allSyn[syn].split("=");
var synonym = (rtnSyn[1]).trim();
if(synonym && rtnSyn[0].toLowerCase() == word.toLowerCase()) return synonym;
}
});
Regular expressions include something called a "word-boundary", represented by \b. It is a zero-width assertion (it just checks something, it doesn't "eat" input) that says in order to match, certain word boundary conditions have to apply. One example is a space followed by a letter; given the string ' X', this regex would match it: / \bX/. So to make your code work, you just have to add word boundaries to the beginning and end of your word regex, like this:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp("\\b"+word+"\\b", "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
[Note that there are two backslashes in each of the word boundary matchers because in javascript strings, the backslash is for escape characters -- two backslashes turns into a literal backslash.]
For optimization, don't create a new RegExp on each iteration. Instead, build up a big regex like [^{A-Za-z](a|ask|...)[^}A-Za-z] and an hash with a value for each key specifying what to replace it with. I'm not familiar enough with JavaScript to create the code on the fly.
Note the separator regex which says the match cannot begin with { or end with }. This is not terribly precise, but hopefully acceptable in practice. If you genuinely need to replace words next to { or } then this can certainly be refined, but I'm hoping we won't have to.

Categories

Resources