How can I compare "M" and "M" (in UTF) using Javascript? - javascript

I have a situation where I have to search a grid if it contains a certain substring. I have a search bar where the user can type the string. The problem is that the grid contains mix of Japanese text and Unicode characters,
for example : MAGシンチ注 333MBq .
How can I compare for content equality the letter 'M' that I type from the keyboard and the letter "M" as in the example above? I am trying to do this using plain Javascript and not Jquery or other library. And I have to do this in Internet Explorer.
Thanks,

As mentioned in an insightful comment from #Rhymoid on the question, modern JavaScript (ES2015) includes support for normalization of Unicode. One mode of normalization is to map "compatible" letterforms from higher code pages down to their most basic representatives in lower code pages (to summarize, it's kind-of involved). The .normalize("NFKD") method will map the "M" from the Japanese code page down to the Latin equivalent. Thus
"MAGシンチ注 333MBq".normalize("NFKD")
will give
"MAGシンチ注 333MBq"
As of late 2016, .normalize() isn't supported by IE.
At a lower level, ES2015 also has .codePointAt() (mentioned in another good answer), which is like the older .charCodeAt() described below but which also understands UTF-16 pairs. However, .codePointAt() is (again, late 2016) not supported by Safari.
below is original answer for older browsers
You can use the .charCodeAt() method to examine the UTF-16 character codes in the string.
"M".charCodeAt(0)
is 77, while
"M".charCodeAt(0)
is 65325.
This approach is complicated by the fact that for some Unicode characters, the UTF-16 representation involves two separate character positions in the JavaScript string. The language does not provide native support for dealing with that, so you have to do it yourself. A character code between 55926 and 57343 (D800 and DFFF hex) indicates the start of a two-character pair. The UTF-16 Wikipedia page has more information, and there are various other sources.

Building a dictionary should work in any browser, find the charCodes at the start of ranges you want to transform then move the characters in your favourite way, for example
function shift65248(str) {
var dict = {}, characters = [],
character, i;
for (i = 0; i < 10; ++i) { // 0 - 9
character = String.fromCharCode(65296 + i);
dict[character] = String.fromCharCode(48 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // A - Z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(65 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // a - z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(97 + i);
characters.push(character);
}
return str.replace(
new RegExp(characters.join('|'), 'g'),
function (m) {return dict[m];}
);
}
shift65248('MAGシンチ注 333MBq'); // "MAGシンチ注 333MBq"
I tried just moving the whole range 65248..65375 onto 0..127 but it conflicted with the other characters :(

I am assuming that you have access to those strings, by reading the DOM for some other way.
If so, codePointAt will be your friend.
console.log("Test of values");
console.log("M".codePointAt(0));
console.log("M".codePointAt(0));
console.log("Determining end of string");
console.log("M".codePointAt(10));
var str = "MAGシンチ注 333MBq .";
var idx = 0;
do {
point = str.codePointAt(idx);
idx++;
console.log(point);
} while(point !== undefined);

You could try building your own dictionary and compare function as follows:
var compareDB = {
'm' : ['M'],
'b' : ['B']
};
function doCompare(inputChar, searchText){
inputCharLower = inputChar.toLowerCase();
searchTextLower = searchText.toLowerCase();
if(searchTextLower.indexOf(inputChar) > -1)
return true;
if(compareDB[inputCharLower] !== undefined)
{
for(i=0; i<compareDB[inputCharLower].length; i++){
if(searchTextLower.indexOf(compareDB[inputCharLower][i].toLowerCase()) > -1)
return true;
}
}
return false;
}
console.log("searching with m");
console.log(doCompare('m', "searching text with M"));
console.log("searching with m");
console.log(doCompare('m', "searching text with B"));
console.log("searching with B");
console.log(doCompare('B', "searching text with B"));

Related

How do I search for words similar to other words?

I am looking to make a small script in Node.js that will match words with another word that is similar. For example I am searching for ***ing and I have an array like ['loving', 'mating', 'cats', 'wording'] then I would expect it to return ['loving', 'mating'] and exclude ['cats'] (because it does not end in ing), and ['wording'] (because it is seven characters and not six.).
This is my current not working code that I have written.
let foundWords = [];
for (let i = 0, len = wordList.length; i < len; i++) {
for (let j = 0, len = wordList[i].split('').length; j < len; j++) {
if (wordToFind.charAt(j) == '*') {
return;
};
if (wordToFind.charAt(j) === wordList[i].charAt(j)) {
if (foundWords.includes(wordList[i]) == false) {
foundWords.push(wordList[i]);
};
}
}
}
console.log(foundWords);
The objective of writing this code is to allow me to brute force with a dictionary list all the combinations for this cryptogram and the words inside.
i really recommend you to read about Levenshtein distance
sound exactly like what you trying to achieve here
https://en.wikipedia.org/wiki/Levenshtein_distance#Example
an implementation in java script also
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#JavaScript
in information theory and computer science, the Levenshtein distance
is a metric for measuring the amount of difference between two
sequences (i.e. an edit distance). The Levenshtein distance between
two strings is defined as the minimum number of edits needed to
transform one string into the other, with the allowable edit
operations being insertion, deletion, or substitution of a single
character.
Example The Levenshtein distance between "kitten" and "sitting" is 3,
since the following three edits change one into the other, and there
isn't a way to do it with fewer than three edits:
kitten sitten (substitution of 'k' with 's')
sitten sittin (substitution of 'e' with 'i')
sittin sitting (insert 'g' at the end).
You can use Array.prototype.filter along with a RegExp.
To construct the regex you will need to replace your wildcard characters * with the wildcard character of a regex: .. Then add ^ and $ to anchor the regex to match all the way from the beginning to the end of the string.
function filterMatches(needle, haystack) {
const regex = new RegExp('^' + needle.replace(/\*/g, '.') + '$');
return haystack.filter(word => regex.test(word));
}
console.log(filterMatches('***ing', ['loving', 'mating', 'cats', 'wording']));
Hey I think this should work. If you are not understanding a part try to look up the String.prototype functions at MDN. It really helps to know some of these functions since it will make you code more easy.
let input = '***ing';
let inputLength = input.length
let results = [];
while (input.charAt(0) === "*") {
input = input.substr(1);
}
const arr = ['loving', 'mating', 'cats', 'wording'];
for (let i = 0; i < arr.length; i++) {
if (inputLength != arr[i].length) {
continue;
}
if(arr[i].indexOf(input) != -1) {
results.push(arr[i]);
}
}
console.log(results);
Another approach could be like;
function getMatches(ts, ss){
var es = ts.split(/\*+/)[1]; // or ts.match(/[^\*]+$/)[0];
return ss.filter(s => s.endsWith(es) && s.length === ts.length)
}
var res = getMatches("***ing",['loving', 'mating', 'cats', 'wording']);
console.log(res);

Where and why would you use tagged template literals? [duplicate]

I understand the syntax of ES6 tagged templates. What I don't see is the practical usability. When is it better than passing an object parameter, like the settings in jQuery's AJAX? $.ajax('url', { /*this guy here*/ })
Right now I only see the tricky syntax but I don't see why I would need/use it. I also found that the TypeScript team chose to implement it (in 1.5) before other important features. What is the concept behind tagged string templates?
You can use tagged templates to build APIs that are more expressive than regular function calls.
For example, I'm working on a proof-of-concept library for SQL queries on JS arrays:
let admins = sql`SELECT name, id FROM ${users}
WHERE ${user => user.roles.indexOf('admin') >= 0}`
Notice it has nothing to do with String interpolation; it uses tagged templates for readability. It would be hard to construct something that reads as intuitively with plain function calls - I guess you'd have something like this:
let admins = sql("SELECT name, id FROM $users WHERE $filter",
{ $users: users, $filter: (user) => user.roles.contains('admin') })
This example is just a fun side project, but I think it shows some of the benefits of tagged templates.
Another example, maybe more obvious, is i18n - a tagged template could insert locale-sensitive versions of your input.
See Sitepoint's explanation:
The final stage of template strings specification is about adding a custom function before the string itself to create a tagged template string.
...
For instance, here is a piece of code to block strings that try to inject custom DOM elements:
var items = [];
items.push("banana");
items.push("tomato");
items.push("light saber");
var total = "Trying to hijack your site <BR>";
var myTagFunction = function (strings,...values) {
var output = "";
for (var index = 0; index < values.length; index++) {
var valueString = values[index].toString();
if (valueString.indexOf(">") !== -1) {
// Far more complex tests can be implemented here :)
return "String analyzed and refused!";
}
output += strings[index] + values[index];
}
output += strings[index]
return output;
}
result.innerHTML = myTagFunction `You have ${items.length} item(s) in your basket for a total of $${total}`;
Tagged template strings can used for a lot of things like security, localization, creating your own domain specific language, etc.
They're useful because the function can (almost) completely define the meaning of the text inside it (almost = other than placeholders). I like to use the example of Steven Levithan's XRegExp library. It's awkward to use regular expressions defined as strings, because you have to double-escape things: Once for the string literal, and once for regex. This is one of the reasons we have regular expression literals in JavaScript.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
Suppose in my work I've introduced XRegExp to the codebase. Since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, *facepalm*, and go back and escape that backslash, since it's being consumed by the string literal:
var isSingleUnicodeWord = XRegExp("^\\pL+$");
// ---------------------------------^
What a pain. Suppose I could write the actual regular expression without worrying about double-escaping?
I can: With a tagged template function. I can put this in my standard lib:
function xrex(strings, ...values) {
const raw = strings.raw;
let result = "";
for (let i = 0; i < raw.length; ++i) {
result += raw[i];
if (i < values.length) { // `values` always has one fewer entry
result += values[i];
}
}
return XRegExp(result);
}
Or alternately, this is a valid use case for reduce, and we can use destructuring in the argument list:
function xrex({raw}, ...values) {
return XRegExp(
raw.reduce(
(acc, str, index) => acc + str + (index < values.length ? values[index] : ""),
""
)
);
}
And then I can happily write:
const isSingleUnicodeWord = xrex`^\pL+$`;
Example:
// My tag function (defined once, then reused)
function xrex({raw}, ...values) {
const result = raw.reduce(
(acc, str, index) => acc + str + (index < values.length ? values[index] : ""),
""
);
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
function test(str) {
console.log(str + ": " + isSingleUnicodeWord.test(str));
}
test("Русский"); // true
test("日本語"); // true
test("العربية"); // true
test("foo bar"); // false
test("$£"); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
The only thing I have to remember now is that ${...} is special because it's a placeholder. In this specific case, it's not a problem, I'm unlikely to want to apply a quantifier to the end-of-input assertion, but that's a coincidence...

ES6 tagged templates practical usability

I understand the syntax of ES6 tagged templates. What I don't see is the practical usability. When is it better than passing an object parameter, like the settings in jQuery's AJAX? $.ajax('url', { /*this guy here*/ })
Right now I only see the tricky syntax but I don't see why I would need/use it. I also found that the TypeScript team chose to implement it (in 1.5) before other important features. What is the concept behind tagged string templates?
You can use tagged templates to build APIs that are more expressive than regular function calls.
For example, I'm working on a proof-of-concept library for SQL queries on JS arrays:
let admins = sql`SELECT name, id FROM ${users}
WHERE ${user => user.roles.indexOf('admin') >= 0}`
Notice it has nothing to do with String interpolation; it uses tagged templates for readability. It would be hard to construct something that reads as intuitively with plain function calls - I guess you'd have something like this:
let admins = sql("SELECT name, id FROM $users WHERE $filter",
{ $users: users, $filter: (user) => user.roles.contains('admin') })
This example is just a fun side project, but I think it shows some of the benefits of tagged templates.
Another example, maybe more obvious, is i18n - a tagged template could insert locale-sensitive versions of your input.
See Sitepoint's explanation:
The final stage of template strings specification is about adding a custom function before the string itself to create a tagged template string.
...
For instance, here is a piece of code to block strings that try to inject custom DOM elements:
var items = [];
items.push("banana");
items.push("tomato");
items.push("light saber");
var total = "Trying to hijack your site <BR>";
var myTagFunction = function (strings,...values) {
var output = "";
for (var index = 0; index < values.length; index++) {
var valueString = values[index].toString();
if (valueString.indexOf(">") !== -1) {
// Far more complex tests can be implemented here :)
return "String analyzed and refused!";
}
output += strings[index] + values[index];
}
output += strings[index]
return output;
}
result.innerHTML = myTagFunction `You have ${items.length} item(s) in your basket for a total of $${total}`;
Tagged template strings can used for a lot of things like security, localization, creating your own domain specific language, etc.
They're useful because the function can (almost) completely define the meaning of the text inside it (almost = other than placeholders). I like to use the example of Steven Levithan's XRegExp library. It's awkward to use regular expressions defined as strings, because you have to double-escape things: Once for the string literal, and once for regex. This is one of the reasons we have regular expression literals in JavaScript.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
Suppose in my work I've introduced XRegExp to the codebase. Since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, *facepalm*, and go back and escape that backslash, since it's being consumed by the string literal:
var isSingleUnicodeWord = XRegExp("^\\pL+$");
// ---------------------------------^
What a pain. Suppose I could write the actual regular expression without worrying about double-escaping?
I can: With a tagged template function. I can put this in my standard lib:
function xrex(strings, ...values) {
const raw = strings.raw;
let result = "";
for (let i = 0; i < raw.length; ++i) {
result += raw[i];
if (i < values.length) { // `values` always has one fewer entry
result += values[i];
}
}
return XRegExp(result);
}
Or alternately, this is a valid use case for reduce, and we can use destructuring in the argument list:
function xrex({raw}, ...values) {
return XRegExp(
raw.reduce(
(acc, str, index) => acc + str + (index < values.length ? values[index] : ""),
""
)
);
}
And then I can happily write:
const isSingleUnicodeWord = xrex`^\pL+$`;
Example:
// My tag function (defined once, then reused)
function xrex({raw}, ...values) {
const result = raw.reduce(
(acc, str, index) => acc + str + (index < values.length ? values[index] : ""),
""
);
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
function test(str) {
console.log(str + ": " + isSingleUnicodeWord.test(str));
}
test("Русский"); // true
test("日本語"); // true
test("العربية"); // true
test("foo bar"); // false
test("$£"); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
The only thing I have to remember now is that ${...} is special because it's a placeholder. In this specific case, it's not a problem, I'm unlikely to want to apply a quantifier to the end-of-input assertion, but that's a coincidence...

javascript and string manipulation w/ utf-16 surrogate pairs

I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware.
I've got this function to parse strings into arrays while preserving the surrogate pairs. Then I'll recode several functions to deal with the arrays rather than strings.
function sortSurrogates(str){
var cp = []; // array to hold code points
while(str.length){ // loop till we've done the whole string
if(/[\uD800-\uDFFF]/.test(str.substr(0,1))){ // test the first character
// High surrogate found low surrogate follows
cp.push(str.substr(0,2)); // push the two onto array
str = str.substr(2); // clip the two off the string
}else{ // else BMP code point
cp.push(str.substr(0,1)); // push one onto array
str = str.substr(1); // clip one from string
}
} // loop
return cp; // return the array
}
My question is, is there something simpler I'm missing? I see so many people reiterating that javascript deals with utf-16 natively, yet my testing leads me to believe, that may be the data format, but the functions don't know it yet. Am I missing something simple?
EDIT:
To help illustrate the issue:
var a = "0123456789"; // U+0030 - U+0039 2 bytes each
var b = "𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"; // U+1D7D8 - U+1D7E1 4 bytes each
alert(a.length); // javascript shows 10
alert(b.length); // javascript shows 20
Twitter sees and counts both of those as being 10 characters long.
Javascript uses UCS-2 internally, which is not UTF-16. It is very difficult to handle Unicode in Javascript because of this, and I do not suggest attempting to do so.
As for what Twitter does, you seem to be saying that it is sanely counting by code point not insanely by code unit.
Unless you have no choice, you should use a programming language that actually supports Unicode, and which has a code-point interface, not a code-unit interface. Javascript isn't good enough for that as you have discovered.
It has The UCS-2 Curse, which is even worse than The UTF-16 Curse, which is already bad enough. I talk about all this in OSCON talk, 🔫 Unicode Support Shootout: 👍 The Good, the Bad, & the (mostly) Ugly 👎.
Due to its horrible Curse, you have to hand-simulate UTF-16 with UCS-2 in Javascript, which is simply nuts.
Javascript suffers from all kinds of other terrible Unicode troubles, too. It has no support for graphemes or normalization or collation, all of which you really need. And its regexes are broken, sometimes due to the Curse, sometimes just because people got it wrong. For example, Javascript is incapable of expressing regexes like [𝒜-𝒵]. Javascript doesn’t even support casefolding, so you can’t write a pattern like /ΣΤΙΓΜΑΣ/i and have it correctly match στιγμας.
You can try to use the XRegEXp plugin, but you won’t banish the Curse that way. Only changing to a language with Unicode support will do that, and 𝒥𝒶𝓋𝒶𝓈𝒸𝓇𝒾𝓅𝓉 just isn’t one of those.
I've knocked together the starting point for a Unicode string handling object. It creates a function called UnicodeString() that accepts either a JavaScript string or an array of integers representing Unicode code points and provides length and codePoints properties and toString() and slice() methods. Adding regular expression support would be very complicated, but things like indexOf() and split() (without regex support) should be pretty easy to implement.
var UnicodeString = (function() {
function surrogatePairToCodePoint(charCode1, charCode2) {
return ((charCode1 & 0x3FF) << 10) + (charCode2 & 0x3FF) + 0x10000;
}
function stringToCodePointArray(str) {
var codePoints = [], i = 0, charCode;
while (i < str.length) {
charCode = str.charCodeAt(i);
if ((charCode & 0xF800) == 0xD800) {
codePoints.push(surrogatePairToCodePoint(charCode, str.charCodeAt(++i)));
} else {
codePoints.push(charCode);
}
++i;
}
return codePoints;
}
function codePointArrayToString(codePoints) {
var stringParts = [];
for (var i = 0, len = codePoints.length, codePoint, offset, codePointCharCodes; i < len; ++i) {
codePoint = codePoints[i];
if (codePoint > 0xFFFF) {
offset = codePoint - 0x10000;
codePointCharCodes = [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)];
} else {
codePointCharCodes = [codePoint];
}
stringParts.push(String.fromCharCode.apply(String, codePointCharCodes));
}
return stringParts.join("");
}
function UnicodeString(arg) {
if (this instanceof UnicodeString) {
this.codePoints = (typeof arg == "string") ? stringToCodePointArray(arg) : arg;
this.length = this.codePoints.length;
} else {
return new UnicodeString(arg);
}
}
UnicodeString.prototype = {
slice: function(start, end) {
return new UnicodeString(this.codePoints.slice(start, end));
},
toString: function() {
return codePointArrayToString(this.codePoints);
}
};
return UnicodeString;
})();
var ustr = UnicodeString("f𝌆𝌆bar");
document.getElementById("output").textContent = "String: '" + ustr + "', length: " + ustr.length + ", slice(2, 4): " + ustr.slice(2, 4);
<div id="output"></div>
Here are a couple scripts that might be helpful when dealing with surrogate pairs in JavaScript:
ES6 Unicode shims for ES3+ adds the String.fromCodePoint and String.prototype.codePointAt methods from ECMAScript 6. The ES3/5 fromCharCode and charCodeAt methods do not account for surrogate pairs and therefore give wrong results.
Full 21-bit Unicode code point matching in XRegExp with \u{10FFFF} allows matching any individual code point in XRegExp regexes.
Javascript string iterators can give you the actual characters instead of the surrogate code points:
>>> [..."0123456789"]
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
>>> [..."𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"]
["𝟘", "𝟙", "𝟚", "𝟛", "𝟜", "𝟝", "𝟞", "𝟟", "𝟠", "𝟡"]
>>> [..."0123456789"].length
10
>>> [..."𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡"].length
10
This is along the lines of what I was looking for. It needs better support for the different string functions. As I add to it I will update this answer.
function wString(str){
var T = this; //makes 'this' visible in functions
T.cp = []; //code point array
T.length = 0; //length attribute
T.wString = true; // (item.wString) tests for wString object
//member functions
sortSurrogates = function(s){ //returns array of utf-16 code points
var chrs = [];
while(s.length){ // loop till we've done the whole string
if(/[\uD800-\uDFFF]/.test(s.substr(0,1))){ // test the first character
// High surrogate found low surrogate follows
chrs.push(s.substr(0,2)); // push the two onto array
s = s.substr(2); // clip the two off the string
}else{ // else BMP code point
chrs.push(s.substr(0,1)); // push one onto array
s = s.substr(1); // clip one from string
}
} // loop
return chrs;
};
//end member functions
//prototype functions
T.substr = function(start,len){
if(len){
return T.cp.slice(start,start+len).join('');
}else{
return T.cp.slice(start).join('');
}
};
T.substring = function(start,end){
return T.cp.slice(start,end).join('');
};
T.replace = function(target,str){
//allow wStrings as parameters
if(str.wString) str = str.cp.join('');
if(target.wString) target = target.cp.join('');
return T.toString().replace(target,str);
};
T.equals = function(s){
if(!s.wString){
s = sortSurrogates(s);
T.cp = s;
}else{
T.cp = s.cp;
}
T.length = T.cp.length;
};
T.toString = function(){return T.cp.join('');};
//end prototype functions
T.equals(str)
};
Test results:
// plain string
var x = "0123456789";
alert(x); // 0123456789
alert(x.substr(4,5)) // 45678
alert(x.substring(2,4)) // 23
alert(x.replace("456","x")); // 0123x789
alert(x.length); // 10
// wString object
x = new wString("𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡");
alert(x); // 𝟘𝟙𝟚𝟛𝟜𝟝𝟞𝟟𝟠𝟡
alert(x.substr(4,5)) // 𝟜𝟝𝟞𝟟𝟠
alert(x.substring(2,4)) // 𝟚𝟛
alert(x.replace("𝟜𝟝𝟞","x")); // 𝟘𝟙𝟚𝟛x𝟟𝟠𝟡
alert(x.length); // 10

Convert JavaScript String to be all lowercase

How can I convert a JavaScript string value to be in all lowercase letters?
Example: "Your Name" to "your name"
var lowerCaseName = "Your Name".toLowerCase();
Use either toLowerCase or toLocaleLowerCase methods of the String object. The difference is that toLocaleLowerCase will take current locale of the user/host into account. As per § 15.5.4.17 of the ECMAScript Language Specification (ECMA-262), toLocaleLowerCase…
…works exactly the same as toLowerCase
except that its result is intended to
yield the correct result for the host
environment’s current locale, rather
than a locale-independent result.
There will only be a difference in the
few cases (such as Turkish) where the
rules for that language conflict with
the regular Unicode case mappings.
Example:
var lower = 'Your Name'.toLowerCase();
Also note that the toLowerCase and toLocaleLowerCase functions are implemented to work generically on any value type. Therefore you can invoke these functions even on non-String objects. Doing so will imply automatic conversion to a string value prior to changing the case of each character in the resulting string value. For example, you can apply toLowerCase directly on a date like this:
var lower = String.prototype.toLowerCase.apply(new Date());
and which is effectively equivalent to:
var lower = new Date().toString().toLowerCase();
The second form is generally preferred for its simplicity and readability. On earlier versions of IE, the first had the benefit that it could work with a null value. The result of applying toLowerCase or toLocaleLowerCase on null would yield null (and not an error condition).
Yes, any string in JavaScript has a toLowerCase() method that will return a new string that is the old string in all lowercase. The old string will remain unchanged.
So, you can do something like:
"Foo".toLowerCase();
document.getElementById('myField').value.toLowerCase();
toLocaleUpperCase() or lower case functions don't behave like they should do. For example, on my system, with Safari 4, Chrome 4 Beta, and Firefox 3.5.x, it converts strings with Turkish characters incorrectly. The browsers respond to navigator.language as "en-US", "tr", "en-US" respectively.
But there isn't any way to get user's Accept-Lang setting in the browser as far as I could find.
Only Chrome gives me trouble although I have configured every browser as tr-TR locale preferred. I think these settings only affect the HTTP header, but we can't access to these settings via JavaScript.
In the Mozilla documentation it says "The characters within a string are converted to ... while respecting the current locale. For most languages, this will return the same as ...". I think it's valid for Turkish, and it doesn't differ if it's configured as en or tr.
In Turkish it should convert "DİNÇ" to "dinç" and "DINÇ" to "dınç" or vice-versa.
Just an example for toLowerCase(), toUpperCase() and a prototype for the not yet available toTitleCase() or toProperCase():
String.prototype.toTitleCase = function() {
return this.split(' ').map(i => i[0].toUpperCase() + i.substring(1).toLowerCase()).join(' ');
}
String.prototype.toPropperCase = function() {
return this.toTitleCase();
}
var OriginalCase = 'Your Name';
var lowercase = OriginalCase.toLowerCase();
var upperCase = lowercase.toUpperCase();
var titleCase = upperCase.toTitleCase();
console.log('Original: ' + OriginalCase);
console.log('toLowerCase(): ' + lowercase);
console.log('toUpperCase(): ' + upperCase);
console.log('toTitleCase(): ' + titleCase);
I paid attention that lots of people are looking for strtolower() in JavaScript. They are expecting the same function name as in other languages, and that's why this post is here.
I would recommend using a native JavaScript function:
"SomE StriNg".toLowerCase()
Here's the function that behaves exactly the same as PHP's one (for those who are porting PHP code into JavaScript)
function strToLower (str) {
return String(str).toLowerCase();
}
Methods or functions: toLowerCase() and toUpperCase()
Description: These methods are used to cover a string or alphabet from lowercase to uppercase or vice versa. E.g., "and" to "AND".
Converting to uppercase:
Example code:
<script language=javascript>
var ss = " testing case conversion method ";
var result = ss.toUpperCase();
document.write(result);
</script>
Result: TESTING CASE CONVERSION METHOD
Converting to lowercase:
Example Code:
<script language=javascript>
var ss = " TESTING LOWERCASE CONVERT FUNCTION ";
var result = ss.toLowerCase();
document.write(result);
</script>
Result: testing lowercase convert function
Explanation: In the above examples,
toUpperCase() method converts any string to "UPPER" case letters.
toLowerCase() method converts any string to "lower" case letters.
Note that the function will only work on string objects.
For instance, I was consuming a plugin, and was confused why I was getting a "extension.tolowercase is not a function" JavaScript error.
onChange: function(file, extension)
{
alert("extension.toLowerCase()=>" + extension.toLowerCase() + "<=");
Which produced the error "extension.toLowerCase is not a function". So I tried this piece of code, which revealed the problem!
alert("(typeof extension)=>" + (typeof extension) + "<=");;
The output was "(typeof extension)=>object<=" - so aha, I was not getting a string var for my input. The fix is straightforward though - just force the darn thing into a String!:
var extension = String(extension);
After the cast, the extension.toLowerCase() function worked fine.
Option 1: Using toLowerCase()
var x = 'ABC';
x = x.toLowerCase();
Option 2: Using your own function
function convertToLowerCase(str) {
var result = '';
for (var i = 0; i < str.length; i++) {
var code = str.charCodeAt(i);
if (code > 64 && code < 91) {
result += String.fromCharCode(code + 32);
} else {
result += str.charAt(i);
}
}
return result;
}
Call it as:
x = convertToLowerCase(x);
Simply use JS toLowerCase()
let v = "Your Name"
let u = v.toLowerCase(); or
let u = "Your Name".toLowerCase();
const str = 'Your Name';
// convert string to lowercase
const lowerStr = str.toLowerCase();
// print the new string
console.log(lowerStr);
In case you want to build it yourself:
function toLowerCase(string) {
let lowerCaseString = "";
for (let i = 0; i < string.length; i++) {
// Find ASCII charcode
let charcode = string.charCodeAt(i);
// If uppercase
if (charcode > 64 && charcode < 97) {
// Convert to lowercase
charcode = charcode + 32
}
// Back to char
let lowercase = String.fromCharCode(charcode);
// Append
lowerCaseString = lowerCaseString.concat(lowercase);
}
return lowerCaseString
}
You can use the in built .toLowerCase() method on JavaScript strings. Example:
var x = "Hello";
x.toLowerCase();
Try this short way:
var lower = (str+"").toLowerCase();
Try
<input type="text" style="text-transform: uppercase"> <!-- uppercase -->
<input type="text" style="text-transform: lowercase"> <!-- lowercase -->
Demo - JSFiddle

Categories

Resources