Regexp search result hightlight with accent

Regexp search result hightlight with accent - javascript

Can someone help me to improve my search please ? I try to highlight several words when a user write one or many things in the input. I am using this function :
checkHighlightList(originalStr, queries) {
const regexp = new RegExp(queries.join('|'), 'gi');
const matchs = originalStr.match(regexp);
if (matchs) {
const result = originalStr.replace(regexp, match => `<span class="highlight">${ match }</span>`);
return result;
}
}
The problem is, if I have the word "pokémon" in my queries and I write "kemon". It doesn't work because every accent characters are different (ô !== o). I would like to write "ke" or "ké" in my input and highlight the "ké" part in "pokémon". I use some french words who contains a lot of accent on it.
Thank you

To search accented text without accent in the search terms you can define an accent map, and compose a regex from that:
accentMap = {
ae: '(ae|æ|ǽ|ǣ)',
a: '(a|á|ă|ắ|ặ|ằ|ẳ|ẵ|ǎ|â|ấ|ậ|ầ|ẩ|ẫ|ä|ǟ|ȧ|ǡ|ạ|ȁ|à|ả|ȃ|ā|ą|ᶏ|ẚ|å|ǻ|ḁ|ⱥ|ã)',
c: '(c|ć|č|ç|ḉ|ĉ|ɕ|ċ|ƈ|ȼ)',
e: '(e|é|ĕ|ě|ȩ|ḝ|ê|ế|ệ|ề|ể|ễ|ḙ|ë|ė|ẹ|ȅ|è|ẻ|ȇ|ē|ḗ|ḕ|ⱸ|ę|ᶒ|ɇ|ẽ|ḛ)',
i: '(i|í|ĭ|ǐ|î|ï|ḯ|ị|ȉ|ì|ỉ|ȋ|ī|į|ᶖ|ɨ|ĩ|ḭ)',
n: '(n|ń|ň|ņ|ṋ|ȵ|ṅ|ṇ|ǹ|ɲ|ṉ|ƞ|ᵰ|ᶇ|ɳ|ñ)',
o: '(o|ó|ŏ|ǒ|ô|ố|ộ|ồ|ổ|ỗ|ö|ȫ|ȯ|ȱ|ọ|ő|ȍ|ò|ỏ|ơ|ớ|ợ|ờ|ở|ỡ|ȏ|ō|ṓ|ṑ|ǫ|ǭ|ø|ǿ|õ|ṍ|ṏ|ȭ)',
u: '(u|ú|ŭ|ǔ|û|ṷ|ü|ǘ|ǚ|ǜ|ǖ|ṳ|ụ|ű|ȕ|ù|ủ|ư|ứ|ự|ừ|ử|ữ|ȗ|ū|ṻ|ų|ᶙ|ů|ũ|ṹ|ṵ)'
};
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
function checkHighlightList(str, queries) {
accentRegex = new RegExp(Object.keys(accentMap).join('|'), 'g');
const queryRegex = new RegExp(queries.map(q => {
return escapeRegExp(q).toLowerCase().replace(accentRegex, m => {
return accentMap[m] || m;
});
}).join('|'), 'gi');
return str.replace(queryRegex, m => `<span class="highlight">${ m }</span>`);
}
let source = 'Pokémon & Crème Brulée';
let result = checkHighlightList(source, [ 'kemon', 'creme' ]);
console.log('source:\n "' + source + '"');
console.log('result:\n "' + result + '"');
Output for search terms [ 'kemon', 'creme' ]:
source:
"Pokémon & Crème Brulée"
result:
"Po<span class="highlight">kémon</span> & <span class="highlight">Crème</span> Brulée"
Explanation of accentRegex:
it is an OR regex of all keys of accentMap:
example: /ae|a|c|e|i|n|o|u/g
tweak the map as needed for additional accent chars
Explanation of queryRegex:
it is an OR regex of all query terms, where each key in accentMap gets mapped to an OR regex of all accented version of that key
example: query term cafe results in this (shortened) regex: /c(a|á|ă|ắ|ặ|ǎ|â|ậ|ä|ȧ|à|)f(e|é|ĕ|ě|ệ|ë|ė|è|ẽ)/gi
Note: Since the query terms are user specified and used in a regex, we need to escape the regex symbols in the user input, hence the use of function escapeRegExp().

Related

How to get the characters / words that lie on both sides, between which our selected character fall?

Let's consider a string str which is defined as :
const str = " 'I am going' - 'I' "
and a function calc() which can be used as :
console.log( calc(str) ) // => am going
So, I decided to make the calc() using regex ! So here is what I thought about.
const calc = (str) => {
const reg = // Not understanding how to get the strings between which '-' falls
str = str.replace(reg, function(_, a) {
const b = remove(a[0], a[1])
return b
})
return str
}
remove() is a function for removing words from string, I made. You can freely modify my code if there is anything incorrect. It's an example how I imagined. So, please help me completing my function !

You could split the string at - and get the content within '' suing match. Then, create a dynamic regex using the RegExp constructor and replace all the instance of second match from the first match
function calc(str) {
const [first, second] = str.split(/\s*-\s*/)
.map(s => s.match(/'([^']+)'/)[1])
return first.replace(new RegExp(second, "g"), '')
}
console.log(calc("'I am going' - 'I'"))
console.log(calc("'Remove this from this string' - 'this'"))
If matchAll is supported in your environment, you could also:
const [first, second] = Array.from(str.matchAll(/'([^']+)'/g), ([,m]) => m)

Javascript Regular expression with replace

I'm working on a project on search about emoji and replace it with icon
but I have some problem on regular expression, Below mentioned is my code for reference:
var f = ["( :3 )" , "( :P )","\(:star:\)"];
var re = function(s){return new RegExp(s, 'g');};
now when I'm going to search about emoji and replace it as shown below:
s = "hello :D how are you :P dwdwd";
for(var n in f){
var m;
if ((m = re(f[n]).exec(s)) !== null) {
m.forEach((match, groupIndex) => {
s = s.replace(match,"<img src='http://abs.twimg.com/emoji/v1/72x72/"+ r[n] +".png'>");
});
}}
In this case, it works well and replace the emoji. But it only replace when there are space before and after emoji what should i do to replace the emoji in the begin of string or end !
s = ":D hello how are you :)";
This case is not working. How can i edit my regular expression for being able to replace emoji at begin and end of string and at the same time if its found in middle of string & have space between word and emoji
My 2nd problem with regular expression is "\(:star:\)" it never replaces. While it must replace word :star: with an emoji but i think i miss some thing on regular expression for it.

You can use beginning & ending anchors along with pipe to achieve this. For example:
/(^:3\s)|(\s:3\s)|(\s:3$)/g
^ is an anchor which matches :3\s to the beginning of the string.
$ is an anchor which matches \s:3 to the end of the string.
\s matches whitespace.
| is the pipe operator which acts as a logical OR operator between the different capture groups.

Just remove the spaces from your emoji regex.
var f = ["(:3)", "(:P)", "\(:star:\)"];
var r = ["[sadface]", "[toungeface]", "[staremoji]"];
var re = function(s) {
return new RegExp(s, 'g');
};
s = "hello :3 how are you :P dwdwd :star: :3";
console.log(s);
for (var n in f) {
var m;
if ((m = re(f[n]).exec(s)) !== null) {
m.forEach((match, groupIndex) => {
s = s.replace(match, r[n]);
});
}
}
console.log(s);

var content = "hello :D how are you :P dwdwd";
content = content.replace(/((:D|:P))/g,function(match){
var result = "";
var index = -1;
switch(match)
{
case ":D":
result = "happy";
index = 0
break;
case ":P":
result = "smilie";
index = 1
break;
}
if(index != -1)
{
return "<img src='http://abs.twimg.com/emoji/v1/72x72/"+index+".png'>";
}
return result;
});
console.log(content);
Please try this.

I created a more generic solution, starting with the mapping of emojis to the relevant names. Rather than two lists that need to be kept in synch, I used a single object:
const emojis = {
'(c)': 'a9',
'(r)': 'ae',
'(tm)': '2122'
//, ...
};
This strikes me as a much more useful structure to work with, but the code below could easily be altered to deal with the two-lists version.
Then I use a helper function to escape characters which are not allowed as plain text in Regular Expression by prepending them with \:
const escapeSpecials = (() => {
const specials = ['/', '.', '*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '\\'];
const reg = new RegExp('(\\' + specials.join('|\\') + ')', 'g');
return str => str.replace(reg, '\\$1');
})();
Then I have the key function:
const replaceStringsWith = (emojis, convert) => str => Object.keys(emojis).reduce(
(str, em) => str.replace(new RegExp(`(^|\\s+)(${escapeSpecials(em)})($|\\s+)`, 'g'),
(m, a, b, c) => `${a}${convert(emojis[b], b)}${c}`),
str
);
This takes an object containing string/replacement pairs and a converter function which accepts the replacement and gives you back the final form. It returns a function which takes a string, and then searches for any matches on the keys of the object (properly checked for strings or string ends), replacing them with the result of calling the converter on the object's value for the particular key.
Thus we can do:
const toUrl = (name) => `<img src='http://abs.twimg.com/emoji/v1/72x72/${name}.png'>`;
const replaceEmojis = replaceStringsWith(emojis, toUrl)
and call it as
const s = "This is Copyright (c) 2017, FooBar is (tm) BazCo (r)";
replaceEmojis(s); //=>
// `This is Copyright <img src='http://abs.twimg.com/emoji/v1/72x72/a9.png'>
// 2017, FooBar is <img src='http://abs.twimg.com/emoji/v1/72x72/2122.png'>
// BazCo <img src='http://abs.twimg.com/emoji/v1/72x72/ae.png'>`
Note that the converter also takes a second parameter. So you could instead use
const toUrl = (name, emoji) =>
`<img src='http://abs.twimg.com/emoji/v1/72x72/${name}.png' title='${emoji}'>`;
to get
//=> `This is Copyright <img
// src='http://abs.twimg.com/emoji/v1/72x72/a9.png' title='(c)'>
// 2017, FooBar is <img src='http://abs.twimg.com/emoji/v1/72x72/2122.png'
// title='(tm)'> BazCo <img src='http://abs.twimg.com/emoji/v1/72x72/ae.png' title='(r)'>"

Get first letter of each word in a string, in JavaScript

How would you go around to collect the first letter of each word in a string, as in to receive an abbreviation?
Input: "Java Script Object Notation"
Output: "JSON"

I think what you're looking for is the acronym of a supplied string.
var str = "Java Script Object Notation";
var matches = str.match(/\b(\w)/g); // ['J','S','O','N']
var acronym = matches.join(''); // JSON
console.log(acronym)
Note: this will fail for hyphenated/apostrophe'd words Help-me I'm Dieing will be HmImD. If that's not what you want, the split on space, grab first letter approach might be what you want.
Here's a quick example of that:
let str = "Java Script Object Notation";
let acronym = str.split(/\s/).reduce((response,word)=> response+=word.slice(0,1),'')
console.log(acronym);

I think you can do this with
'Aa Bb'.match(/\b\w/g).join('')
Explanation: Obtain all /g the alphanumeric characters \w that occur after a non-alphanumeric character (i.e: after a word boundary \b), put them on an array with .match() and join everything in a single string .join('')
Depending on what you want to do you can also consider simply selecting all the uppercase characters:
'JavaScript Object Notation'.match(/[A-Z]/g).join('')

Easiest way without regex
var abbr = "Java Script Object Notation".split(' ').map(function(item){return item[0]}).join('');

This is made very simple with ES6
string.split(' ').map(i => i.charAt(0)) //Inherit case of each letter
string.split(' ').map(i => i.charAt(0)).toUpperCase() //Uppercase each letter
string.split(' ').map(i => i.charAt(0)).toLowerCase() //lowercase each letter
This ONLY works with spaces or whatever is defined in the .split(' ') method
ie, .split(', ') .split('; '), etc.
string.split(' ') .map(i => i.charAt(0)) .toString() .toUpperCase().split(',')

To add to the great examples, you could do it like this in ES6
const x = "Java Script Object Notation".split(' ').map(x => x[0]).join('');
console.log(x); // JSON
and this works too but please ignore it, I went a bit nuts here :-)
const [j,s,o,n] = "Java Script Object Notation".split(' ').map(x => x[0]);
console.log(`${j}${s}${o}${n}`);

#BotNet flaw:
i think i solved it after excruciating 3 days of regular expressions tutorials:
==> I'm a an animal
(used to catch m of I'm) because of the word boundary, it seems to work for me that way.
/(\s|^)([a-z])/gi

Try -
var text = '';
var arr = "Java Script Object Notation".split(' ');
for(i=0;i<arr.length;i++) {
text += arr[i].substr(0,1)
}
alert(text);
Demo - http://jsfiddle.net/r2maQ/

Using map (from functional programming)
'use strict';
function acronym(words)
{
if (!words) { return ''; }
var first_letter = function(x){ if (x) { return x[0]; } else { return ''; }};
return words.split(' ').map(first_letter).join('');
}

Alternative 1:
you can also use this regex to return an array of the first letter of every word
/(?<=(\s|^))[a-z]/gi
(?<=(\s|^)) is called positive lookbehind which make sure the element in our search pattern is preceded by (\s|^).
so, for your case:
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.match(/(?<=(\s|^))[a-z]/gi)
.join('')
.toUpperCase();
};
toAbbr("java script object notation"); //result JSON
(by the way, there are also negative lookbehind, positive lookahead, negative lookahead, if you want to learn more)
Alternative 2:
match all the words and use replace() method to replace them with the first letter of each word and ignore the space (the method will not mutate your original string)
// in case the input is lowercase & there's a word with apostrophe
const toAbbr = (str) => {
return str.replace(/(\S+)(\s*)/gi, (match, p1, p2) => p1[0].toUpperCase());
};
toAbbr("java script object notation"); //result JSON
// word = not space = \S+ = p1 (p1 is the first pattern)
// space = \s* = p2 (p2 is the second pattern)

It's important to trim the word before splitting it, otherwise, we'd lose some letters.
const getWordInitials = (word: string): string => {
const bits = word.trim().split(' ');
return bits
.map((bit) => bit.charAt(0))
.join('')
.toUpperCase();
};
$ getWordInitials("Java Script Object Notation")
$ "JSON"

How about this:
var str = "", abbr = "";
str = "Java Script Object Notation";
str = str.split(' ');
for (i = 0; i < str.length; i++) {
abbr += str[i].substr(0,1);
}
alert(abbr);
Working Example.

If you came here looking for how to do this that supports non-BMP characters that use surrogate pairs:
initials = str.split(' ')
.map(s => String.fromCodePoint(s.codePointAt(0) || '').toUpperCase())
.join('');
Works in all modern browsers with no polyfills (not IE though)

Getting first letter of any Unicode word in JavaScript is now easy with the ECMAScript 2018 standard:
/(?<!\p{L}\p{M}*)\p{L}/gu
This regex finds any Unicode letter (see the last \p{L}) that is not preceded with any other letter that can optionally have diacritic symbols (see the (?<!\p{L}\p{M}*) negative lookbehind where \p{M}* matches 0 or more diacritic chars). Note that u flag is compulsory here for the Unicode property classes (like \p{L}) to work correctly.
To emulate a fully Unicode-aware \b, you'd need to add a digit matching pattern and connector punctuation:
/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu
It works in Chrome, Firefox (since June 30, 2020), Node.js, and the majority of other environments (see the compatibility matrix here), for any natural language including Arabic.
Quick test:
const regex = /(?<!\p{L}\p{M}*)\p{L}/gu;
const string = "Żerard Łyżwiński";
// Extracting
console.log(string.match(regex)); // => [ "Ż", "Ł" ]
// Extracting and concatenating into string
console.log(string.match(regex).join("")) // => ŻŁ
// Removing
console.log(string.replace(regex, "")) // => erard yżwiński
// Enclosing (wrapping) with a tag
console.log(string.replace(regex, "<span>$&</span>")) // => <span>Ż</span>erard <span>Ł</span>yżwiński
console.log("_Łukasz 1Żukowski".match(/(?<!\p{L}\p{M}*|[\p{N}\p{Pc}])\p{L}/gu)); // => null

In ES6:
function getFirstCharacters(str) {
let result = [];
str.split(' ').map(word => word.charAt(0) != '' ? result.push(word.charAt(0)) : '');
return result;
}
const str1 = "Hello4 World65 123 !!";
const str2 = "123and 456 and 78-1";
const str3 = " Hello World !!";
console.log(getFirstCharacters(str1));
console.log(getFirstCharacters(str2));
console.log(getFirstCharacters(str3));
Output:
[ 'H', 'W', '1', '!' ]
[ '1', '4', 'a', '7' ]
[ 'H', 'W', '!' ]

This should do it.
var s = "Java Script Object Notation",
a = s.split(' '),
l = a.length,
i = 0,
n = "";
for (; i < l; ++i)
{
n += a[i].charAt(0);
}
console.log(n);

The regular expression versions for JavaScript is not compatible with Unicode on older than ECMAScript 6, so for those who want to support characters such as "å" will need to rely on non-regex versions of scripts.
Event when on version 6, you need to indicate Unicode with \u.
More details: https://mathiasbynens.be/notes/es6-unicode-regex

Yet another option using reduce function:
var value = "Java Script Object Notation";
var result = value.split(' ').reduce(function(previous, current){
return {v : previous.v + current[0]};
},{v:""});
$("#output").text(result.v);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<pre id="output"/>

This is similar to others, but (IMHO) a tad easier to read:
const getAcronym = title =>
title.split(' ')
.map(word => word[0])
.join('');

ES6 reduce way:
const initials = inputStr.split(' ').reduce((result, currentWord) =>
result + currentWord.charAt(0).toUpperCase(), '');
alert(initials);

Try This Function
const createUserName = function (name) {
const username = name
.toLowerCase()
.split(' ')
.map((elem) => elem[0])
.join('');
return username;
};
console.log(createUserName('Anisul Haque Bhuiyan'));

Convert camelCaseText to Title Case Text

How can I convert a string either like 'helloThere' or 'HelloThere' to 'Hello There' in JavaScript?

const text = 'helloThereMister';
const result = text.replace(/([A-Z])/g, " $1");
const finalResult = result.charAt(0).toUpperCase() + result.slice(1);
console.log(finalResult);
capitalize the first letter - as an example. Note the space in " $1".
Of course, in case the first letter is already capital - you would have a spare space to remove.

Alternatively using lodash:
lodash.startCase(str);
Example:
_.startCase('helloThere');
// ➜ 'Hello There'
Lodash is a fine library to give shortcut to many everyday js tasks.There are many other similar string manipulation functions such as camelCase, kebabCase etc.

I had a similar problem and dealt with it like this:
stringValue.replace(/([A-Z]+)*([A-Z][a-z])/g, "$1 $2")
For a more robust solution:
stringValue.replace(/([A-Z]+)/g, " $1").replace(/([A-Z][a-z])/g, " $1")
http://jsfiddle.net/PeYYQ/
Input:
helloThere
HelloThere
ILoveTheUSA
iLoveTheUSA
Output:
hello There
Hello There
I Love The USA
i Love The USA

Example without side effects.
function camel2title(camelCase) {
// no side-effects
return camelCase
// inject space before the upper case letters
.replace(/([A-Z])/g, function(match) {
return " " + match;
})
// replace first char with upper case
.replace(/^./, function(match) {
return match.toUpperCase();
});
}
In ES6
const camel2title = (camelCase) => camelCase
.replace(/([A-Z])/g, (match) => ` ${match}`)
.replace(/^./, (match) => match.toUpperCase())
.trim();

The best string I've found for testing camel-case-to-title-case functions is this ridiculously nonsensical example, which tests a lot of edge cases. To the best of my knowledge, none of the previously posted functions handle this correctly:
__ToGetYourGEDInTimeASongAboutThe26ABCsIsOfTheEssenceButAPersonalIDCardForUser_456InRoom26AContainingABC26TimesIsNotAsEasyAs123ForC3POOrR2D2Or2R2D
This should be converted to:
To Get Your GED In Time A Song About The 26 ABCs Is Of The Essence But A Personal ID Card For User 456 In Room 26A Containing ABC 26 Times Is Not As Easy As 123 For C3PO Or R2D2 Or 2R2D
If you want just a simple function that handles cases like the one above (and more cases than many of the previously answers), here's the one I wrote. This code isn't particularly elegant or fast, but it's simple, understandable, and works.
The snippet below contains an online runnable example:
var mystrings = [ "__ToGetYourGEDInTimeASongAboutThe26ABCsIsOfTheEssenceButAPersonalIDCardForUser_456InRoom26AContainingABC26TimesIsNotAsEasyAs123ForC3POOrR2D2Or2R2D", "helloThere", "HelloThere", "ILoveTheUSA", "iLoveTheUSA", "DBHostCountry", "SetSlot123ToInput456", "ILoveTheUSANetworkInTheUSA", "Limit_IOC_Duration", "_This_is_a_Test_of_Network123_in_12__days_", "ASongAboutTheABCsIsFunToSing", "CFDs", "DBSettings", "IWouldLove1Apple", "Employee22IsCool", "SubIDIn", "ConfigureABCsImmediately", "UseMainNameOnBehalfOfSubNameInOrders" ];
// Take a single camel case string and convert it to a string of separate words (with spaces) at the camel-case boundaries.
//
// E.g.:
// __ToGetYourGEDInTimeASongAboutThe26ABCsIsOfTheEssenceButAPersonalIDCardForUser_456InRoom26AContainingABC26TimesIsNotAsEasyAs123ForC3POOrR2D2Or2R2D
// --> To Get Your GED In Time A Song About The 26 ABCs Is Of The Essence But A Personal ID Card For User 456 In Room 26A Containing ABC 26 Times Is Not As Easy As 123 For C3PO Or R2D2 Or 2R2D
// helloThere --> Hello There
// HelloThere --> Hello There
// ILoveTheUSA --> I Love The USA
// iLoveTheUSA --> I Love The USA
// DBHostCountry --> DB Host Country
// SetSlot123ToInput456 --> Set Slot 123 To Input 456
// ILoveTheUSANetworkInTheUSA --> I Love The USA Network In The USA
// Limit_IOC_Duration --> Limit IOC Duration
// This_is_a_Test_of_Network123_in_12_days --> This Is A Test Of Network 123 In 12 Days
// ASongAboutTheABCsIsFunToSing --> A Song About The ABCs Is Fun To Sing
// CFDs --> CFDs
// DBSettings --> DB Settings
// IWouldLove1Apple --> I Would Love 1 Apple
// Employee22IsCool --> Employee 22 Is Cool
// SubIDIn --> Sub ID In
// ConfigureCFDsImmediately --> Configure CFDs Immediately
// UseTakerLoginForOnBehalfOfSubIDInOrders --> Use Taker Login For On Behalf Of Sub ID In Orders
//
function camelCaseToTitleCase(in_camelCaseString) {
var result = in_camelCaseString // "__ToGetYourGEDInTimeASongAboutThe26ABCsIsOfTheEssenceButAPersonalIDCardForUser_456InRoom26AContainingABC26TimesIsNotAsEasyAs123ForC3POOrR2D2Or2R2D"
.replace(/(_)+/g, ' ') // " ToGetYourGEDInTimeASongAboutThe26ABCsIsOfTheEssenceButAPersonalIDCardForUser 456InRoom26AContainingABC26TimesIsNotAsEasyAs123ForC3POOrR2D2Or2R2D"
.replace(/([a-z])([A-Z][a-z])/g, "$1 $2") // " To Get YourGEDIn TimeASong About The26ABCs IsOf The Essence ButAPersonalIDCard For User456In Room26AContainingABC26Times IsNot AsEasy As123ForC3POOrR2D2Or2R2D"
.replace(/([A-Z][a-z])([A-Z])/g, "$1 $2") // " To Get YourGEDIn TimeASong About The26ABCs Is Of The Essence ButAPersonalIDCard For User456In Room26AContainingABC26Times Is Not As Easy As123ForC3POOr R2D2Or2R2D"
.replace(/([a-z])([A-Z]+[a-z])/g, "$1 $2") // " To Get Your GEDIn Time ASong About The26ABCs Is Of The Essence But APersonal IDCard For User456In Room26AContainingABC26Times Is Not As Easy As123ForC3POOr R2D2Or2R2D"
.replace(/([A-Z]+)([A-Z][a-z][a-z])/g, "$1 $2") // " To Get Your GEDIn Time A Song About The26ABCs Is Of The Essence But A Personal ID Card For User456In Room26A ContainingABC26Times Is Not As Easy As123ForC3POOr R2D2Or2R2D"
.replace(/([a-z]+)([A-Z0-9]+)/g, "$1 $2") // " To Get Your GEDIn Time A Song About The 26ABCs Is Of The Essence But A Personal ID Card For User 456In Room 26A Containing ABC26Times Is Not As Easy As 123For C3POOr R2D2Or 2R2D"
// Note: the next regex includes a special case to exclude plurals of acronyms, e.g. "ABCs"
.replace(/([A-Z]+)([A-Z][a-rt-z][a-z]*)/g, "$1 $2") // " To Get Your GED In Time A Song About The 26ABCs Is Of The Essence But A Personal ID Card For User 456In Room 26A Containing ABC26Times Is Not As Easy As 123For C3PO Or R2D2Or 2R2D"
.replace(/([0-9])([A-Z][a-z]+)/g, "$1 $2") // " To Get Your GED In Time A Song About The 26ABCs Is Of The Essence But A Personal ID Card For User 456In Room 26A Containing ABC 26Times Is Not As Easy As 123For C3PO Or R2D2Or 2R2D"
// Note: the next two regexes use {2,} instead of + to add space on phrases like Room26A and 26ABCs but not on phrases like R2D2 and C3PO"
.replace(/([A-Z]{2,})([0-9]{2,})/g, "$1 $2") // " To Get Your GED In Time A Song About The 26ABCs Is Of The Essence But A Personal ID Card For User 456 In Room 26A Containing ABC 26 Times Is Not As Easy As 123 For C3PO Or R2D2 Or 2R2D"
.replace(/([0-9]{2,})([A-Z]{2,})/g, "$1 $2") // " To Get Your GED In Time A Song About The 26 ABCs Is Of The Essence But A Personal ID Card For User 456 In Room 26A Containing ABC 26 Times Is Not As Easy As 123 For C3PO Or R2D2 Or 2R2D"
.trim() // "To Get Your GED In Time A Song About The 26 ABCs Is Of The Essence But A Personal ID Card For User 456 In Room 26A Containing ABC 26 Times Is Not As Easy As 123 For C3PO Or R2D2 Or 2R2D"
;
// capitalize the first letter
return result.charAt(0).toUpperCase() + result.slice(1);
}
for (var i = 0; i < mystrings.length; i++) {
jQuery(document.body).append("<br />\"");
jQuery(document.body).append(camelCaseToTitleCase(mystrings[i]));
jQuery(document.body).append("\"<br>(was: \"");
jQuery(document.body).append(mystrings[i]);
jQuery(document.body).append("\") <br />");
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.2.3/jquery.min.js"></script>

Based on one of the examples above I came up with this:
const camelToTitle = (camelCase) => camelCase
.replace(/([A-Z])/g, (match) => ` ${match}`)
.replace(/^./, (match) => match.toUpperCase())
.trim()
It works for me because it uses .trim() to handle the edge case where the first letter is capitalized and you end up with a extra leading space.
Reference:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim

Ok, I'm a few years late to the game, but I had a similar question, and I wanted to make a one-replace solution for every possible input. I must give most of the credit to #ZenMaster in this thread and #Benjamin Udink ten Cate in this thread.
Here's the code:
var camelEdges = /([A-Z](?=[A-Z][a-z])|[^A-Z](?=[A-Z])|[a-zA-Z](?=[^a-zA-Z]))/g;
var textArray = ["lowercase",
"Class",
"MyClass",
"HTML",
"PDFLoader",
"AString",
"SimpleXMLParser",
"GL11Version",
"99Bottles",
"May5",
"BFG9000"];
var text;
var resultArray = [];
for (var i = 0; i < textArray.length; i++){
text = textArray[i];
text = text.replace(camelEdges,'$1 ');
text = text.charAt(0).toUpperCase() + text.slice(1);
resultArray.push(text);
}
It has three clauses, all using lookahead to prevent the regex engine from consuming too many characters:
[A-Z](?=[A-Z][a-z]) looks for a capital letter that is followed by a capital then a lowercase. This is to end acronyms like USA.
[^A-Z](?=[A-Z]) looks for a non-capital-letter followed by a capital letter. This ends words like myWord and symbols like 99Bottles.
[a-zA-Z](?=[^a-zA-Z]) looks for a letter followed by a non-letter. This ends words before symbols like BFG9000.
This question was at the top of my search results, so hopefully I can save others some time!

Here's my version of it. It adds a space before every UpperCase english letter that comes after a lowercase english letter and also capitalizes the first letter if needed:
For example:
thisIsCamelCase --> This Is Camel Case
this IsCamelCase --> This Is Camel Case
thisIsCamelCase123 --> This Is Camel Case123
function camelCaseToTitleCase(camelCase){
if (camelCase == null || camelCase == "") {
return camelCase;
}
camelCase = camelCase.trim();
var newText = "";
for (var i = 0; i < camelCase.length; i++) {
if (/[A-Z]/.test(camelCase[i])
&& i != 0
&& /[a-z]/.test(camelCase[i-1])) {
newText += " ";
}
if (i == 0 && /[a-z]/.test(camelCase[i]))
{
newText += camelCase[i].toUpperCase();
} else {
newText += camelCase[i];
}
}
return newText;
}

This implementation takes consecutive uppercase letters and numbers in consideration.
function camelToTitleCase(str) {
return str
.replace(/[0-9]{2,}/g, match => ` ${match} `)
.replace(/[^A-Z0-9][A-Z]/g, match => `${match[0]} ${match[1]}`)
.replace(/[A-Z][A-Z][^A-Z0-9]/g, match => `${match[0]} ${match[1]}${match[2]}`)
.replace(/[ ]{2,}/g, match => ' ')
.replace(/\s./g, match => match.toUpperCase())
.replace(/^./, match => match.toUpperCase())
.trim();
}
// ----------------------------------------------------- //
var testSet = [
'camelCase',
'camelTOPCase',
'aP2PConnection',
'superSimpleExample',
'aGoodIPAddress',
'goodNumber90text',
'bad132Number90text',
];
testSet.forEach(function(item) {
console.log(item, '->', camelToTitleCase(item));
});
Expected output:
camelCase -> Camel Case
camelTOPCase -> Camel TOP Case
aP2PConnection -> A P2P Connection
superSimpleExample -> Super Simple Example
aGoodIPAddress -> A Good IP Address
goodNumber90text -> Good Number 90 Text
bad132Number90text -> Bad 132 Number 90 Text

You can use a function like this:
function fixStr(str) {
var out = str.replace(/^\s*/, ""); // strip leading spaces
out = out.replace(/^[a-z]|[^\s][A-Z]/g, function(str, offset) {
if (offset == 0) {
return(str.toUpperCase());
} else {
return(str.substr(0,1) + " " + str.substr(1).toUpperCase());
}
});
return(out);
}
"hello World" ==> "Hello World"
"HelloWorld" ==> "Hello World"
"FunInTheSun" ==? "Fun In The Sun"
Code with a bunch of test strings here: http://jsfiddle.net/jfriend00/FWLuV/.
Alternate version that keeps leading spaces here: http://jsfiddle.net/jfriend00/Uy2ac/.

One more solution based on RegEx.
respace(str) {
const regex = /([A-Z])(?=[A-Z][a-z])|([a-z])(?=[A-Z])/g;
return str.replace(regex, '$& ');
}
Explanation
The above RegEx consist of two similar parts separated by OR operator. The first half:
([A-Z]) - matches uppercase letters...
(?=[A-Z][a-z]) - followed by a sequence of uppercase and lowercase letters.
When applied to sequence FOo, this effectively matches its F letter.
Or the second scenario:
([a-z]) - matches lowercase letters...
(?=[A-Z]) - followed by an uppercase letter.
When applied to sequence barFoo, this effectively matches its r letter.
When all replace candidates were found, the last thing to do is to replace them with the same letter but with an additional space character. For this we can use '$& ' as a replacement, and it will resolve to a matched substring followed by a space character.
Example
const regex = /([A-Z])(?=[A-Z][a-z])|([a-z])(?=[A-Z])/g
const testWords = ['ACoolExample', 'fooBar', 'INAndOUT', 'QWERTY', 'fooBBar']
testWords.map(w => w.replace(regex, '$& '))
->(5) ["A Cool Example", "foo Bar", "IN And OUT", "QWERTY", "foo B Bar"]

If you deal with Capital Camel Case this snippet can help you, also it contains some specs so you could be sure that it matches appropriate to your case.
export const fromCamelCaseToSentence = (word) =>
word
.replace(/([A-Z][a-z]+)/g, ' $1')
.replace(/([A-Z]{2,})/g, ' $1')
.replace(/\s{2,}/g, ' ')
.trim();
And specs:
describe('fromCamelCaseToSentence', () => {
test('does not fall with a single word', () => {
expect(fromCamelCaseToSentence('Approved')).toContain('Approved')
expect(fromCamelCaseToSentence('MDA')).toContain('MDA')
})
test('does not fall with an empty string', () => {
expect(fromCamelCaseToSentence('')).toContain('')
})
test('returns the separated by space words', () => {
expect(fromCamelCaseToSentence('NotApprovedStatus')).toContain('Not Approved Status')
expect(fromCamelCaseToSentence('GDBState')).toContain('GDB State')
expect(fromCamelCaseToSentence('StatusDGG')).toContain('Status DGG')
})
})

My split case solution which behaves the way I want:
const splitCase = s => !s || s.indexOf(' ') >= 0 ? s :
(s.charAt(0).toUpperCase() + s.substring(1))
.split(/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/g)
.map(x => x.replace(/([0-9]+)/g,'$1 '))
.join(' ')
Input
'a,abc,TheId,TheID,TheIDWord,TheID2Word,Leave me Alone!'
.split(',').map(splitCase)
.forEach(x => console.log(x))
Output
A
Abc
The Id
The ID
The ID Word
The ID2 Word
Leave me Alone!
As this above function requires Lookbehind in JS which isn't currently implemented in Safari, I've rewritten the implementation to not use RegEx below:
const isUpper = c => c >= 'A' && c <= 'Z'
const isDigit = c => c >= '0' && c <= '9'
const upperOrDigit = c => isUpper(c) || isDigit(c)
function splitCase(s) {
let to = []
if (typeof s != 'string') return to
let lastSplit = 0
for (let i=0; i<s.length; i++) {
let c = s[i]
let prev = i>0 ? s[i-1] : null
let next = i+1 < s.length ? s[i+1] : null
if (upperOrDigit(c) && (!upperOrDigit(prev) || !upperOrDigit(next))) {
to.push(s.substring(lastSplit, i))
lastSplit = i
}
}
to.push(s.substring(lastSplit, s.length))
return to.filter(x => !!x)
}

try this library
http://sugarjs.com/api/String/titleize
'man from the boondocks'.titleize()>"Man from the Boondocks"
'x-men: the last stand'.titleize()>"X Men: The Last Stand"
'TheManWithoutAPast'.titleize()>"The Man Without a Past"
'raiders_of_the_lost_ark'.titleize()>"Raiders of the Lost Ark"

Using JS's String.prototype.replace() and String.prototype.toUpperCase()
const str = "thisIsATestString";
const res = str.replace(/^[a-z]|[A-Z]/g, (c, i) => (i? " " : "") + c.toUpperCase());
console.log(res); // "This Is A Test String"

The most compatible answer for consecutive capital-case words is this:
const text = 'theKD';
const result = text.replace(/([A-Z]{1,})/g, " $1");
const finalResult = result.charAt(0).toUpperCase() + result.slice(1);
console.log(finalResult);
It's also compatible with The KD and it will not convert it to The K D.

None of the answers above worked perfectly for me, so had to come with own bicycle:
function camelCaseToTitle(camelCase) {
if (!camelCase) {
return '';
}
var pascalCase = camelCase.charAt(0).toUpperCase() + camelCase.substr(1);
return pascalCase
.replace(/([a-z])([A-Z])/g, '$1 $2')
.replace(/([A-Z])([A-Z][a-z])/g, '$1 $2')
.replace(/([a-z])([0-9])/gi, '$1 $2')
.replace(/([0-9])([a-z])/gi, '$1 $2');
}
Test cases:
null => ''
'' => ''
'simpleString' => 'Simple String'
'stringWithABBREVIATIONInside => 'String With ABBREVIATION Inside'
'stringWithNumber123' => 'String With Number 123'
'complexExampleWith123ABBR890Etc' => 'Complex Example With 123 ABBR 890 Etc'

This works for me check this out
CamelcaseToWord("MyName"); // returns My Name
function CamelcaseToWord(string){
return string.replace(/([A-Z]+)/g, " $1").replace(/([A-Z][a-z])/g, " $1");
}

I didn't try everyone's answer, but the few solutions I tinkered with did not match all of my requirements.
I was able to come up with something that did...
export const jsObjToCSSString = (o={}) =>
Object.keys(o)
.map(key => ({ key, value: o[key] }))
.map(({key, value}) =>
({
key: key.replace( /([A-Z])/g, "-$1").toLowerCase(),
value
})
)
.reduce(
(css, {key, value}) =>
`${css} ${key}: ${value}; `.trim(),
'')

I think this can be done just with the reg exp /([a-z]|[A-Z]+)([A-Z])/g and replacement "$1 $2".
ILoveTheUSADope -> I Love The USA Dope

Below is link which demonstrates camel case string to sentence string using regex.
Input
myCamelCaseSTRINGToSPLITDemo
Output
my Camel Case STRING To SPLIT Demo
This is regex for conversion of camel case to sentence text
(?=[A-Z][a-z])|([A-Z]+)([A-Z][a-rt-z][a-z]\*)
with $1 $2 as subsitution.
Click to view the conversion on regex

Input
javaScript
Output
Java Script
var text = 'javaScript';
text.replace(/([a-z])([A-Z][a-z])/g, "$1 $2").charAt(0).toUpperCase()+text.slice(1).replace(/([a-z])([A-Z][a-z])/g, "$1 $2");

HTTPRequest_ToServer-AndWaiting --> HTTP Request To Server And Waiting
function toSpaceCase(str) {
return str
.replace(/[-_]/g, ' ')
/*
* insert a space between lower & upper
* HttpRequest => Http Request
*/
.replace(/([a-z])([A-Z])/g, '$1 $2')
/*
* space before last upper in a sequence followed by lower
* XMLHttp => XML Http
*/
.replace(/\b([A-Z]+)([A-Z])([a-z])/, '$1 $2$3')
// uppercase the first character
.replace(/^./, str => str.toUpperCase())
.replace(/\s+/g, ' ')
.trim();
}
const input = 'HTTPRequest_ToServer-AndWaiting';
const result = toSpaceCase(input);
console.log(input,'-->', result)

Undercover C programmer. If like me you want to preserve acronyms and don't want to look at cryptic patterns, then perhaps you may like this:
function isUpperCase (str) {
return str === str.toUpperCase()
}
export function camelCaseToTitle (str) {
for (let i = str.length - 1; i > 0; i--) {
if (!isUpperCase(str[i - 1]) && isUpperCase(str[i])) {
str = str.slice(0, i) + ' ' + str.slice(i)
}
}
return str.charAt(0).toUpperCase() + str.slice(1)
}

This solution works also for other Unicode characters which are not in the [A-Z] range. E.g. Ä, Ö, Å.
let camelCaseToTitleCase = (s) => (
s.split("").reduce(
(acc, letter, i) => (
i === 0 || console.log(acc, letter, i)
? [...acc, letter.toUpperCase()]
: letter === letter.toUpperCase()
? [...acc, " ", letter]
: [...acc, letter]
), []
).join("")
)
const myString = "ArchipelagoOfÅland"
camelCaseToTitleCase(myString)

Adding yet another ES6 solution that I liked better after not being happy with a few thoughts above.
https://codepen.io/902Labs/pen/mxdxRv?editors=0010#0
const camelize = (str) => str
.split(' ')
.map(([first, ...theRest]) => (
`${first.toUpperCase()}${theRest.join('').toLowerCase()}`)
)
.join(' ');

Javascript: highlight substring keeping original case but searching in case insensitive mode

I'm trying to write a "suggestion search box" and I cannot find a solution that allows to highlight a substring with javascript keeping the original case.
For example if I search for "ca" I search server side in a case insensitive mode and I have the following results:
Calculator
calendar
ESCAPE
I would like to view the search string in all the previous words, so the result should be:
Calculator
calendar
ESCAPE
I tried with the following code:
var reg = new RegExp(querystr, 'gi');
var final_str = 'foo ' + result.replace(reg, '<b>'+querystr+'</b>');
$('#'+id).html(final_str);
But obviously in this way I loose the original case!
Is there a way to solve this problem?

Use a function for the second argument for .replace() that returns the actual matched string with the concatenated tags.
Try it out: http://jsfiddle.net/4sGLL/
reg = new RegExp(querystr, 'gi');
// The str parameter references the matched string
// --------------------------------------v
final_str = 'foo ' + result.replace(reg, function(str) {return '<b>'+str+'</b>'});
$('#' + id).html(final_str);
JSFiddle Example with Input: https://jsfiddle.net/pawmbude/

ES6 version
const highlight = (needle, haystack) =>
haystack.replace(
new RegExp(needle, 'gi'),
(str) => `<strong>${str}</strong>`
);

nice results with
function str_highlight_text(string, str_to_highlight){
var reg = new RegExp(str_to_highlight, 'gi');
return string.replace(reg, function(str) {return '<span style="background-color:#ffbf00;color:#fff;"><b>'+str+'</b></span>'});
}
and easier to remember...
thx to user113716: https://stackoverflow.com/a/3294644/2065594

While the other answers so far seem simple, they can't be really used in many real world cases as they don't handle proper text HTML escaping and RegExp escaping. If you want to highlight every possible snippet, while escaping the text properly, a function like that would return all elements you should add to your suggestions box:
function highlightLabel(label, term) {
if (!term) return [ document.createTextNode(label) ]
const regex = new RegExp(term.replace(/[\\^$*+?.()|[\]{}]/g, '\\$&'), 'gi')
const result = []
let left, match, right = label
while (match = right.match(regex)) {
const m = match[0], hl = document.createElement('b'), i = match.index
hl.innerText = m
left = right.slice(0, i)
right = right.slice(i + m.length)
result.push(document.createTextNode(left), hl)
if (!right.length) return result
}
result.push(document.createTextNode(right))
return result
}

string.replace fails in the general case. If you use .innerHTML, replace can replace matches in tags (like a tags). If you use .innerText or .textContent, it will remove any tags there were previously in the html. More than that, in both cases it damages your html if you want to remove the highlighting.
The true answer is mark.js (https://markjs.io/). I just found this - it is what I have been searching for for such a long time. It does just what you want it to.

I do the exact same thing.
You need to make a copy.
I store in the db a copy of the real string, in all lower case.
Then I search using a lower case version of the query string or do a case insensitive regexp.
Then use the resulting found start index in the main string, plus the length of the query string, to highlight the query string within the result.
You can not use the query string in the result since its case is not determinate. You need to highlight a portion of the original string.

.match() performs case insensitive matching and returns an array of the matches with case intact.
var matches = str.match(queryString),
startHere = 0,
nextMatch,
resultStr ='',
qLength = queryString.length;
for (var match in matches) {
nextMatch = str.substr(startHere).indexOf(match);
resultStr = resultStr + str.substr(startHere, nextMatch) + '<b>' + match + '</b>';
startHere = nextMatch + qLength;
}

I have found a easiest way to achieve it. JavaScript regular expression remembers the string it matched. This feature can be used here.
I have modified the code a bit.
reg = new RegExp("("+querystr.trim()+")", 'gi');
final_str = 'foo ' + result.replace(reg, "<b>&1</b>");
$('#'+id).html(final_str);

Highlight search term and anchoring to first occurence - Start
function highlightSearchText(searchText) {
var innerHTML = document.documentElement.innerHTML;
var replaceString = '<mark>'+searchText+'</mark>';
var newInnerHtml = this.replaceAll(innerHTML, searchText, replaceString);
document.documentElement.innerHTML = newInnerHtml;
var elmnt = document.documentElement.getElementsByTagName('mark')[0]
elmnt.scrollIntoView();
}
function replaceAll(str, querystr, replace) {
var reg = new RegExp(querystr, 'gi');
var final_str = str.replace(reg, function(str) {return '<mark>'+str+'</mark>'});
return final_str
}
Highlight search term and anchoring to first occurence - End

Develop Reference

JavaScript is the programming language of the Web.

Regexp search result hightlight with accent - javascript

Related

How to get the characters / words that lie on both sides, between which our selected character fall?

Javascript Regular expression with replace

Get first letter of each word in a string, in JavaScript

Convert camelCaseText to Title Case Text

Javascript: highlight substring keeping original case but searching in case insensitive mode

Categories

Resources