jQuery autocomplete RegExp for highlight words [duplicate] - javascript

I have this function that finds whole words and should replace them. It identifies spaces but should not replace them, ie, not capture them.
function asd (sentence, word) {
str = sentence.replace(new RegExp('(?:^|\\s)' + word + '(?:$|\\s)'), "*****");
return str;
};
Then I have the following strings:
var sentence = "ich mag Äpfel";
var word = "Äpfel";
The result should be something like:
"ich mag *****"
and NOT:
"ich mag*****"
I'm getting the latter.
How can I make it so that it identifies the space but ignores it when replacing the word?
At first this may seem like a duplicate but I did not find an answer to this question, that's why I'm asking it.
Thank you

You should put back the matched whitespaces by using a capturing group (rather than a non-capturing one) with a replacement backreference in the replacement pattern, and you may also leverage a lookahead for the right whitespace boundary, which is handy in case of consecutive matches:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word + '(?=$|\\s)'), "$1*****");
return str;
};
var sentence = "ich mag Äpfel";
var word = "Äpfel";
console.log(asd(sentence, word));
See the regex demo.
Details
(^|\s) - Group 1 (later referred to with the help of a $1 placeholder in the replacement pattern): a capturing group that matches either start of string or a whitespace
Äpfel - a search word
(?=$|\s) - a positive lookahead that requires the end of string or whitespace immediately to the right of the current location.
NOTE: If the word can contain special regex metacharacters, escape them:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '(?=$|\\s)'), "$1*****");
return str;
};

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Split string by all spaces except those in parentheses

I'm trying to split text the following like on spaces:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}"
but I want it to ignore the spaces within parentheses. This should produce an array with:
var words = ["Text", "(what is)|what's", "a", "story|fable" "called|named|about", "{Search}|{Title}"];
I know this should involve some sort of regex with line.match(). Bonus points if the regex removes the parentheses. I know that word.replace() would get rid of them in a subsequent step.
Use the following approach with specific regex pattern(based on negative lookahead assertion):
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}",
words = line.split(/(?!\(.*)\s(?![^(]*?\))/g);
console.log(words);
(?!\(.*) ensures that a separator \s is not preceded by brace ((including attendant characters)
(?![^(]*?\)) ensures that a separator \s is not followed by brace )(including attendant characters)
Not a single regexp but does the job. Removes the parentheses and splits the text by spaces.
var words = line.replace(/[\(\)]/g,'').split(" ");
One approach which is useful in some cases is to replace spaces inside parens with a placeholder, then split, then unreplace:
var line = "Text (what is)|what's a story|fable called|named|about {Search}|{Title}";
var result = line.replace(/\((.*?)\)/g, m => m.replace(' ', 'SPACE'))
.split(' ')
.map(x => x.replace(/SPACE/g, ' '));
console.log(result);

JavaScript regex to replace a whole word

I have a variable:
var str = "#devtest11 #devtest1";
I use this way to replace #devtest1 with another string:
str.replace(new RegExp('#devtest1', 'g'), "aaaa")
However, its result (aaaa1 aaaa) is not what I expect. The expected result is: #devtest11 aaaa. I just want to replace the whole word #devtest1.
How can I do that?
Use the \b zero-width word-boundary assertion.
var str = "#devtest11 #devtest1";
str.replace(/#devtest1\b/g, "aaaa");
// => #devtest11 aaaa
If you need to also prevent matching the cases like hello#devtest1, you can do this:
var str = "#devtest1 #devtest11 #devtest1 hello#devtest1";
str.replace(/( |^)#devtest1\b/g, "$1aaaa");
// => #devtest11 aaaa
Use word boundary \b for limiting the search to words.
Because # is special character, you need to match it outside of the word.
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W), since \b does not include special characters.
var str = "#devtest11 #devtest1";
str = str.replace(/#devtest1\b/g, "aaaa");
document.write(str);
If your string always starts with # and you don't want other characters to match
var str = "#devtest11 #devtest1";
str = str.replace(/(\s*)#devtest1\b/g, "$1aaaa");
// ^^^^^ ^^
document.write(str);
\b won't work properly if the words are surrounded by non space characters..I suggest the below method
var output=str.replace('(\s|^)#devtest1(?=\s|$)','$1aaaa');

Javascript - How to join two capitalize first letter of word scripts

I have an Acrobat form with some text fields with multiline on. My goal is to convert to uppercase the first letter of any sentence (look for dots) and also the first letter of any new line (after return has been pressed).
I can run each transformation separately, but do not know how to run them together.
To capitalize sentences I use the following code as custom convalidation :
// make an array split at dot
var aInput = event.value.split(". ");
var sCharacter = '';
var sWord='';
// for each element of word array, capitalize the first letter
for(i = 0; i <aInput.length; i++)
{
aInput[i] = aInput[i].substr(0, 1).toUpperCase() + aInput[i].substr(1) .toLowerCase();
}
// rebuild input string with modified words with dots
event.value = aInput.join('. ');
To capitalize new lines I replace ". " with "\r".
Thanks in advance for any help.
You can get the first character of each sentence with RegExp :
event.value = event.value.replace(/.+?[\.\?\!](\s|$)/g, function (txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
Demo : http://jsfiddle.net/00kzc370/
Regular Expression explained :
/.+?[\.\?\!](\s|$)/g is a regular expression.
.+?[\.\?\!](\s|$) is a pattern (to be used in a search) that match sentences ended by ., ? or ! and followed by a whitespace character.
g is a modifier. (Perform a global match (find all matches rather than stopping after the first match)).
Source : http://www.w3schools.com/jsref/jsref_obj_regexp.asp

Filter non-alpha-numeric and make titlecase

I have a list with a bunch of names which I need to turn into alphanumeric usernames. What I would like to do is take the name, remove any non-alpha numeric values and turn it into title case where characters were removed. For example:
johnson -> Johnson
Van Halen -> VanHalen
Torres-hernandez -> TorresHernandez
Rafael van der vaart -> RafaelVanDerVaart
Can this be done with a regular expression?
Using some string manipulation, you can do this fairly simply.
var name = "Torres-hernandez", i, part, out = "";
parts = name.split(/[^a-z0-9]+/gi);
for (i=0; part = parts[i++];) {
out += part[0].toUpperCase() + part.slice(1).toLowerCase();
}
var names = [
'johnson',
'Van Halen',
'Torres-hernandez',
'Rafael van der vaart'
]
for (var i = 0; i < names.length; i++) {
names[i] = names[i].replace(/(\W|^)(\w)/g, function(match) {
return match.substr(-1).toUpperCase();
});
}
console.log(names);
prints
[ 'Johnson', 'VanHalen', 'TorresHernandez', 'RafaelVanDerVaart' ]
You can do it with simple regexp:
var titleCase = function(s) {
return s.toLowerCase().replace(/(?:^|\W)+(\w|$)/g, function(match, tail) {
return tail.toUpperCase();
});
};
Regular expression /(?:^|\W)+(\w|$)/g here catches substrings from the begining of the previous word to the first letter of the new one which should be capitalized.
It captures the whole match and replaces it with the uppercased last character tail.
If your string ends with bad characters (e.g. whitespaces) then it'll be captured too, but taild in this case will be an empty string:
' toRReS $##%^! heRnAndeZ -++--=-=' -> 'TorresHernandez'
Let's examine my regexp:
(^|\W)+ - the sequence (...)+ of non-alphanumeric characters \W or the start of the string ^ which may be followed by any number of non-alphanumeric characters. It should contain at leas one character unless it's the start of the string, it which case it may be empty.
(?:^|\W)+ - same thing, but it won't be cached because of ?:. We don't really care about this part and just want to strip it.
(\w|$) - any alphanumeric characters \w or the end of the string $. This part will be cached and placed into tail variable.
Update If regular expressions confuses you, you may do the same thing with string and array operations:
var titleCase = function(str) {
return str.split(/\W+/g)
.filter(function(s) {
return s.length > 0;
}).map(function(s) {
return s[0].toUpperCase() + s.slice(1).toLowerCase();
}).join('');
};
This solution was inspired by FakeRainBrigand's answer and is very similar to his own. The difference is that my version uses array operations instead of for loop and uses filter to handle strings with bad character at the beginning or at the and of it.
I used \w and \W special literals in my regular expressions which are equal to [A-Za-z0-9_] and [^A-Za-z0-9_] respectively (see JavaScript Regular Expressions Docs). If you don't want _ to be counted as an alphanumeric character you should replace \w and \W with exact character sets to be matched (e.g. [A-Za-z0-9] and [^A-Za-z0-9]).

Categories

Resources