Tell If A Multi-Word String Contains A Word - javascript

Goal:
(The reason I don't think this is a duplicate is that it involves matching the start of each word in the string, not just anything in the string)
I'm using Javascript/jQuery. Say I have a sting, which is:
Muncie South Gateway Project
I'm creating a live search box, which checks the input against the string with each keystroke. I'd like to return a match if the input matches the beginning of a word, not the middle. Example:
Mu = Match
Muncie = Match
unc = No Match
cie = No Match
Gatewa = Match
atewa = No Match
What I have
I currently am using this as my check:
if (new RegExp(input)).test(string.toLowerCase()) {return '1';}
However, this matches all letters including letters in the middle of the word. With it, then my examples get this result:
M = Match
Mu = Match
Mun = Match
Muncie = Match
unc = Match // Should not match
cie = Match // Should not match
Gatewa = Match
atewa = Match // Should not match
Question:
I know this can be done by breaking the string apart into separate words and testing each word. But I'm not sure how efficient that would be. Is there a good way to do this?

You can use word boundaries to make sure given input matches only at start of word character:
if (new RegExp("\\b" + input)).test(string.toLowerCase()) {return '1';}
Working Demo
EDIT: As per comment below you can use:
var re = new RegExp("(?:^|\\s)" + input, "i"));
if (re.test(string)) {return '1';}

I would use StartsWith and Contains:
// input
instring // the user string
searchstr // the string to search
if (searchstr.ToLower().StartsWith(instring.ToLower())
|| searchstr.ToLower().Contains(" "+instring.ToLower()))
{
// etc

Related

Using JS to modify user input for REGEXP search

I'm taking user input from a searchbar and modifying it to a regexp. From there I can search a json file for valid values and return them. It works fine with input without quotes, but with them, I'm appending "\Q" and "\E" so I can find the entirety of the string (with spaces and other special characters).
if (searchField.includes('"')){
var tempexpress = searchField.substring(1,searchField.length-1);
var tempexpress = "\\Q" + tempexpress + "\\E";
var expression = new RegExp(tempexpress);
} else {
var tempexpress = searchField.replace('(',"\\(");
var tempexpress = tempexpress.replace(')',"\\)");
var tempexpress = tempexpress.replace(/'/g,"\\'");
var tempexpress = tempexpress.replace('*',"\.");
var expression = new RegExp(tempexpress, "i");
};
if (value.data.label.search(expression) != -1){
console.log('found it');
}
If I input "QTT6" into the search field (with quotes for a literal), then it creates the following regexp: /\QQTT6\E/
In my testing, I found that it doesn't match to QTT6 for some reason and I'm not sure why. Any help is appreciated.
Also I'm very new to JS and Jquery, so sorry if my code isn't very well put together.
Per Kelly's comment:
In JS you need to use ^ and $ instead of \Q and \E.
For more information, see the MDN docs on Regex Assertions:
^:
Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, /^A/ does not match the "A" in "an A", but does match the first "A" in "An A".
Note: This character has a different meaning when it appears at the start of a character class.
$:
Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, /t$/ does not match the "t" in "eater", but does match it in "eat".

RegEx for detecting a string and a path in one go

Here is an example of what regex I need regex
I have many of these lines in a file
build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2
I need to detect the string between CXX_COMPILER__ and _Debug, which here is testfoo2.
At the same time, I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp, which comes always after the first match.
I could not figure out a regex for this. So far I have .*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+ and I am using it in typescript like so:
const fileAndTargetRegExp = new RegExp('.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+', 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
//do something
}
But I get no matches. Is there an easy way to do this?
Will it always have the || <stuff here> at the end? If so, this regex based on the one you provided should work:
/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/g
As the regex101 breakdown shows, the first capturing group should contain the string between CXX_COMPILER__ and _Debug, while the second should contain the path, using the space and pipes to detect where the latter ends.
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const matches = line.match(/.*CXX_COMPILER__(\w+)_.+?((?:\/.+)+) \|\|.*/).slice(1); //slice(1) just to not include the first complete match returned by match!
for (let match of matches) {
console.log(match);
}
If the pipes won't always be there, then this version should work instead (regex101):
.*CXX_COMPILER__(\w+)_.+?((?:\/(?:\w|\.|-)+)+).*
But it requires you to add all of the valid path characters individually every time you realize a new one might be there, and you'll need to make sure the paths don't have spaces because adding space to the regex would make it detect the stuff after the path too.
Looks good, but you need delimiters. Add "/" before and after your Regex - no quotation marks.
let fileContents = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2';
const fileAndTargetRegExp = new RegExp(/.*CXX_COMPILER__(.\w+)_\w+|(\/[a-zA-Z_0-9-]+)+\.\w+/, 'gm');
let match;
while (match = fileAndTargetRegExp.exec(fileContents)) {
console.log(match);
}
Here's my way of doing it with replace:
I need to detect the string between CXX_COMPILER__ and _Debug, which is here testfoo2.
Try to replace all characters of the string with just the first captured group $1 which is between CXX_COMPILER__ and _Debug:
/.*CXX_COMPILER__(\w+)_Debug.*/
^^^^<--testfoo2
I need to also detect the entire file path /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp
The same, just this time replace all just leave the second matched group which is anything comes after our first captured group:
/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/
^^^<-- /home/.../testfoo2.cpp
let line = 'build test/testfoo/CMakeFiles/testfoo2.dir/testfoo2.cpp.o: CXX_COMPILER__testfoo2_Debug /home/juxeii/projects/gtest-cmake-example/test/testfoo/testfoo2.cpp || cmake_object_order_depends_target_testfoo2'
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug.*/gm,'$1'))
console.log(line.replace(/.*CXX_COMPILER__(\w+)_Debug\s+(.*?)(?=\\|\|).*/gm,'$2'))

How to create a regex that checks the string contains specific pattern in javascript?

I have a requirement where I need to traverse through the string and get the first occurrence of a specific pattern like as follows,
i am a new **point**
On the occurrence of two consecutive character it must return true.
I must *not* be returned or*
The above pattern must return false.I tried to create regex following few links but the string.match method always returns null.
My code,
var getFormat = function(event) {
var element = document.getElementById('editor');
var startIndex = element.selectionStart;
var selectedText = element.value.slice(startIndex);
var regex = new RegExp(/(\b(?:([*])(?!\2{2}))+\b)/)
var stringMatch = selectedText.match(regex);
console.log('stringMatch', stringMatch);
}
<html>
<body>
<textarea onclick='getFormat(event);' rows='10' cols='10' id='editor'></textarea>
</body>
</html>
As I am new to regex I couldn't figure out where I am wrong.Could anyone help me out with this one?
On the occurrence of two consecutive character it must return true.
If I'm understanding you correctly. You just want to check if a string contains two consecutive characters, no matter which character. Then It should be enough doing:
(.)\1
Live Demo
This is of course assuming that it's literally any character. As in two consecutive whitespaces also being a valid match.
If you just need to check if there's two stars after each other. Then you don't really need regex at all.
s = "i am a new **point**";
if (s.indexOf("**") != -1)
// it's a match
If it's because you need the beginning and end of the two stars.
begin = s.indexOf("**");
end = s.indexOf("**", begin + 1);
Which you with regex could do like this:
((.)\2)(.*?)\1
Live Demo

Javascript regex between string delimiters

I have the following string:
%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%
PROBLEM: I am trying to match all occurrences between %||....||% and process those substring matches
MY REGEX: /%([\s\S]*?)(?=%)/g
MY CODE
var a = "%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%";
var pattern = /%([\s\S]*?)(?=%)/g;
a.replace( pattern, function replacer(match){
return match.doSomething();
} );
Now the patterns seems to be selecting the everything between the first and last occurrence of %|| .... %||
MY
FIDDLE
WHAT I NEED:
I want to iterate over the matches
%||1234567890||Joe||%
AND
%||1234567890||Robert||%
and do something
You need to use a callback inside a String#replace and modify the pattern to only match what is inside %|| and ||% like this:
var a = "%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%";
var pattern = /%\|\|([\s\S]*?)\|\|%/g;
a = a.replace( pattern, function (match, group1){
var chunks = group1.split('||');
return "{1}" + chunks.join("-") + "{/1}";
} );
console.log(a);
The /%\|\|([\s\S]*?)\|\|%/g pattern will match:
%\|\| - a %|| substring
([\s\S]*?) - Capturing group 1 matching any 0+ chars as few as possible up to the first...
\|\|% - a ||% substring
/g - multiple times.
Because he tries to take as much as possible, and [\s\S] basically means "anything". So he takes anything.
RegExp parts without escaping, exploded for readability
start tag : %||
first info: ([^|]*) // will stop at the first |
separator : ||
last info : ([^|]*) // will stop at the first |
end tag : ||%
Escaped RegExp:
/%\|\|([^\|]*)\|\|([^\|]*)\|\|%/g

Javascript regular expression to replace word but not within curly brackets

I have some content, for example:
If you have a question, ask for help on StackOverflow
I have a list of synonyms:
a={one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight}
ask={question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate}
I'm using JavaScript to:
Split synonyms based on =
Looping through every synonym, if found in content replace with {...|...}
The output should look like:
If you have {one typical|only one|one single|one sole|merely one|just one|one unitary|one small|this solitary|this slight} question, {question|inquire of|seek information from|put a question to|demand|request|expect|inquire|query|interrogate} for help on StackOverflow
Problem:
Instead of replacing the entire word, it's replacing every character found. My code:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp(word, "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
It should replace content word with synonym which should not be in {...|...}.
When you build the regexps, you need to include word boundary anchors at both the beginning and the end to match whole words (beginning and ending with characters from [a-zA-Z0-9_]) only:
var match = new RegExp("\\b" + word + "\\b", "ig");
Depending on the specific replacements you are making, you might want to apply your method to individual words (rather than to the entire text at once) matched using a regexp like /\w+/g to avoid replacing words that themselves are the replacements for others. Something like:
content = content.replace(/\w+/g, function(word) {
for(var i = 0, L = allSyn.length; i < L; ++i) {
var rtnSyn = allSyn[syn].split("=");
var synonym = (rtnSyn[1]).trim();
if(synonym && rtnSyn[0].toLowerCase() == word.toLowerCase()) return synonym;
}
});
Regular expressions include something called a "word-boundary", represented by \b. It is a zero-width assertion (it just checks something, it doesn't "eat" input) that says in order to match, certain word boundary conditions have to apply. One example is a space followed by a letter; given the string ' X', this regex would match it: / \bX/. So to make your code work, you just have to add word boundaries to the beginning and end of your word regex, like this:
for(syn in allSyn) {
var rtnSyn = allSyn[syn].split("=");
var word = rtnSyn[0];
var synonym = (rtnSyn[1]).trim();
if(word && synonym){
var match = new RegExp("\\b"+word+"\\b", "ig");
postProcessContent = preProcessContent.replace(match, synonym);
preProcessContent = postProcessContent;
}
}
[Note that there are two backslashes in each of the word boundary matchers because in javascript strings, the backslash is for escape characters -- two backslashes turns into a literal backslash.]
For optimization, don't create a new RegExp on each iteration. Instead, build up a big regex like [^{A-Za-z](a|ask|...)[^}A-Za-z] and an hash with a value for each key specifying what to replace it with. I'm not familiar enough with JavaScript to create the code on the fly.
Note the separator regex which says the match cannot begin with { or end with }. This is not terribly precise, but hopefully acceptable in practice. If you genuinely need to replace words next to { or } then this can certainly be refined, but I'm hoping we won't have to.

Categories

Resources