Locale independent string search in Javascript

Locale independent string search in Javascript - javascript

Is there a way for searching / comparing strings without consideration of locale?
I mean: if I have two input sources on my keyboard (Russian and English) and I start typing - I want to search word without consideration what input source is active at the moment.
And I'll find string contained "Search" without metter what I've typed "search" or "ыуфкср"
Thanks.

You would simply need to do search by two phrases: the original input and the result of conversion to another keyboard layout. You would have a conversion map like this:
{
a: 'ф',
s: 'ы',
d: 'в',
f: 'а',
...
}

When I think about it, I come to the conclusion that there is no correct way to implement this, even if we want such opportunity. One reason: there may be different keyboards (when on one keyboard 'a' is equal to 'ф' and on other one it is equal to 'ы'). So probably you should implement such functionllity by your self.

Related

How can I convert this UTF-8 string to plain text in javascript and how can a normal user write it in a textarea [duplicate]

While reviewing JavaScript concepts, I found String.normalize(). This is not something that shows up in W3School's "JavaScript String Reference", and, hence, it is the reason I might have missed it before.
I found more information about it in HackerRank which states:
Returns a string containing the Unicode Normalization Form of the
calling string's value.
With the example:
var s = "HackerRank";
console.log(s.normalize());
console.log(s.normalize("NFKC"));
having as output:
HackerRank
HackerRank
Also, in GeeksForGeeks:
The string.normalize() is an inbuilt function in javascript which is
used to return a Unicode normalisation form of a given input string.
with the example:
<script>
// Taking a string as input.
var a = "GeeksForGeeks";
// calling normalize function.
b = a.normalize('NFC')
c = a.normalize('NFD')
d = a.normalize('NFKC')
e = a.normalize('NFKD')
// Printing normalised form.
document.write(b +"<br>");
document.write(c +"<br>");
document.write(d +"<br>");
document.write(e);
</script>
having as output:
GeeksForGeeks
GeeksForGeeks
GeeksForGeeks
GeeksForGeeks
Maybe the examples given are just really bad as they don't allow me to see any change.
I wonder... what's the point of this method?

It depends on what will do with strings: often you do not need it (if you are just getting input from user, and putting it to user). But to check/search/use as key/etc. such strings, you may want a unique way to identify the same string (semantically speaking).
The main problem is that you may have two strings which are semantically the same, but with two different representations: e.g. one with a accented character [one code point], and one with a character combined with accent [one code point for character, one for combining accent]. User may not be in control on how the input text will be sent, so you may have two different user names, or two different password. But also if you mangle data, you may get different results, depending on initial string. Users do not like it.
An other problem is about unique order of combining characters. You may have an accent, and a lower tail (e.g. cedilla): you may express this with several combinations: "pure char, tail, accent", "pure char, accent, tail", "char+tail, accent", "char+accent, cedilla".
And you may have degenerate cases (especially if you type from a keyboard): you may get code points which should be removed (you may have a infinite long string which could be equivalent of few bytes.
In any case, for sorting strings, you (or your library) requires a normalized form: if you already provide the right, the lib will not need to transform it again.
So: you want that the same (semantically speaking) string has the same sequence of unicode code points.
Note: If you are doing directly on UTF-8, you should also care about special cases of UTF-8: same codepoint could be written in different ways [using more bytes]. Also this could be a security problem.
The K is often used for "searches" and similar tasks: CO2 and CO₂ will be interpreted in the same manner, but this could change the meaning of the text, so it should often used only internally, for temporary tasks, but keeping the original text.

As stated in MDN documentation, String.prototype.normalize() return the Unicode Normalized Form of the string. This because in Unicode, some characters can have different representation code.
This is the example (taken from MDN):
const name1 = '\u0041\u006d\u00e9\u006c\u0069\u0065';
const name2 = '\u0041\u006d\u0065\u0301\u006c\u0069\u0065';
console.log(`${name1}, ${name2}`);
// expected output: "Amélie, Amélie"
console.log(name1 === name2);
// expected output: false
console.log(name1.length === name2.length);
// expected output: false
const name1NFC = name1.normalize('NFC');
const name2NFC = name2.normalize('NFC');
console.log(`${name1NFC}, ${name2NFC}`);
// expected output: "Amélie, Amélie"
console.log(name1NFC === name2NFC);
// expected output: true
console.log(name1NFC.length === name2NFC.length);
// expected output: true
As you can see, the string Amélie as two different Unicode representations. With normalization, we can reduce the two forms to the same string.

Very beautifully explained here --> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
Short answer : The point is, characters are represented through a coding scheme like ascii, utf-8 , etc.,(We use mostly UTF-8). And some characters have more than one representation. So 2 string may render similarly, but their unicode may vary! So string comparrision may fail here! So we use normaize to return a single type of representation
// source from MDN
let string1 = '\u00F1'; // ñ
let string2 = '\u006E\u0303'; // ñ
string1 = string1.normalize('NFC');
string2 = string2.normalize('NFC');
console.log(string1 === string2); // true
console.log(string1.length); // 1
console.log(string2.length); // 1

Normalization of strings isn't exclusive of JavaScript - see for instances in Python. The values valid for the arguments are defined by the Unicode (more on Unicode normalization).
When it comes to JavaScript, note that there's documentation with String.normalize() and String.prototype.normalize(). As #ChrisG mentions
String.prototype.normalize() is correct in a technical sense, because
normalize() is a dynamic method you call on instances, not the class
itself. The point of normalize() is to be able to compare Strings that
look the same but don't consist of the same characters, as shown in
the example code on MDN.
Then, when it comes to its usage, found a great example of the usage of String.normalize() that has
let s1 = 'sabiá';
let s2 = 'sabiá';
// one is in NFC, the other in NFD, so they're different
console.log(s1 == s2); // false
// with normalization, they become the same
console.log(s1.normalize('NFC') === s2.normalize('NFC')); // true
// transform string into array of codepoints
function codepoints(s) { return Array.from(s).map(c => c.codePointAt(0).toString(16)); }
// printing the codepoints you can see the difference
console.log(codepoints(s1)); // [ "73", "61", "62", "69", "e1" ]
console.log(codepoints(s2)); // [ "73", "61", "62", "69", "61", "301" ]
So while saibá e saibá in this example look the same to the human eye or even if we used console.log(), we can see that without normalization when comparing them we'd get different results. Then, by analyzing the codepoints, we see they're different.

There are some great answers here already, but I wanted to throw in a practical example.
I enjoy Bible translation as a hobby. I wasn't too thrilled at the flashcard option out there in the wild in my price range (free) so I made my own. The problem is, there is more than one way to do Hebrew and Greek in Unicode to get the exact same thing. For example:
בָּא
בָּא
These should look identical on your screen, and for all practical purposes they are identical. However, the first was typed with the qamats (the little t shaped thing under it) before the dagesh (the dot in the middle of the letter) and the second was typed with the dagesh before the qamats. Now, since you're just reading this, you don't care. And your web browser doesn't care. But when my flashcards compare the two, then they aren't the same. To the code behind the scenes, it's no different than saying "center" and "centre" are the same.
Similarly, in Greek:
ἀ
ἀ
These two should look nearly identical, but the top is one Unicode character and the second one is two Unicode characters. Which one is going to end up typed in my flashcards is going to depend on which keyboard I'm sitting at.
When I'm adding flashcards, believe it or not, I don't always type in vocab lists of 100 words. That's why God gave us spreadsheets. And sometimes the places I'm importing the lists from do it one way, and sometimes they do it the other way, and sometimes they mix it. But when I'm typing, I'm not trying to memorize the order that the dagesh or quamats appear or if the accents are typed as a separate character or not. Regardless if I remember to type the dagesh first or not, I want to get the right answer, because really it's the same answer in every practical sense either way.
So I normalize the order before saving the flashcards and I normalize the order before checking it, and the result is that it doesn't matter which way I type it, it comes out right!
If you want to check out the results:
https://sthelenskungfu.com/flashcards/
You need a Google or Facebook account to log in, so it can track progress and such. As far as I know (or care) only my daughter and I currently use it.
It's free, but eternally in beta.

What is the best way to make a prompt into an integer in JS? [duplicate]

This question already has answers here:
How to get numeric value from a prompt box? [duplicate]
(6 answers)
Closed 9 months ago.
I am writing a program to perform "Russian Math" (using the numberphile youtube video on it as my basis for the algorithm). It works. But, to "prove" that it works, I'm giving the user the ability to try using their own numbers as input.
When I assign numbers to the variables myself, it works without fail. However, when I use prompt var numberOne = prompt('What is the first number you want to multiply?'); on one variable (with the other being assigned myself)it works. But as soon as I prompt the user for both numbers it won't work. Presumably because a string can be converted to an integer when an operation is performed on it (multiplied by an integer), but it does not seem to work when both are strings.
Adding another line to reset the prompt variable to an integer using parse seems like too much extra.
var numberOne = prompt('What is the first number you want to multiply?');
var numberTwo = prompt('What is the second number you want to multiply?');
var numberOneInt = parseInt(numberOne);
var numberTwoInt = parseInt(numberTwo);
Is this really the best way to do it?

For a prompt, yes, that is pretty much what you want to do. Prompt returns a string, so you need to convert it. There are other ways, i.e. Number(numberOne), but parseInt is fine. They have slightly different behaviors, but for your case they're mostly the same. (parseInt stops parsing at the first non-number, while Number type-casting attempts to convert the whole thing).
And kudos for figuring out the edge behavior of having one string and one int multiplied together.
In general, most developers prefer using inputs on the page rather than prompts. The problem with prompts is that they interrupt the user's control of the page. As a bonus, with inputs you can set type=number to give the users number controls on some devices and limit the input to actual numbers.
Edit
I don't ever use prompt, so I was reading up on them. One thing to look out for is if the user hits escape, it returns null, which may break your code. You could prevent that by just checking for it first, i.e. if(numberOne){ ... }

How to "unformat" a numerical string? JavaScript

So I know how to format a string or integer like 2000 to 2K, but how do I reverse it?
I want to do something like:
var string = "$2K".replace("/* K with 000 and remove $ symbol in front of 2 */");
How do I start? I am not very good regular expressions, but I have been taking some more time out to learn them. If you can help, I certainly appreciate it. Is it possible to do the same thing for M for millions (adding 000000 at the end) or B for billions (adding 000000000 at the end)?

var string = "$2K".replace(/\$(\d+)K/, "$1000");
will give output as
2000

I'm going to take a different approach to this, as the best way to do this is to change your app to not lose the original numeric information. I recognize that this isn't always possible (for example, if you're scraping formatted values...), but it could be useful way to think about it for other users with similar question.
Instead of just storing the numeric values or the display values (and then trying to convert back to the numeric values later on), try to update your app to store both in the same object:
var value = {numeric: 2000, display: '2K'}
console.log(value.numeric); // 2000
console.log(value.display); // 2K
The example here is a bit simplified, but if you pass around your values like this, you don't need to convert back in the first place. It also allows you to have your formatted values change based on locale, currency, or rounding, and you don't lose the precision of your original values.

jQuery zip masking for multiple formats

I have a requirements for masking a zip field so that it allows the classic 5 digits zip (XXXXX) or 5 + 4 format (XXXXX-XXXX).
I could so something like:
$('#myZipField').mask("?99999-9999");
but the complication comes from the fact that dash should not be showing if the user puts in only 5 digits.
This is the best I came up with so far - I could extend it to auto-insert the dash when they insert the 6th digit but the problem with this would be funny behavior on deletion (I could stop them from deleting the dash but it would patching the patch and so forth, it becomes a nightmare):
$.mask.definitions['~']='[-]';
$("#myZipField").mask("?99999~9999", {placeholder:""});
Is there any out of the box way of doing this or do I have to roll my own?

You don't have to use a different plug-in. Just move the question mark, so that instead of:
$('#myZipField').mask("?99999-9999");
you should use:
$('#myZipField').mask("99999?-9999");
After all, it isn't the entire string which is optional, just the - and onward.

This zip code is actually simple, but when you have a more complex format to handle, here is how it's solved with the plugin (from the demo page):
var options = {onKeyPress: function(cep, e, field, options){
var masks = ['00000-000', '0-00-00-00'];
mask = (cep.length>7) ? masks[1] : masks[0];
$('.crazy_cep').mask(mask, options);
}};
$('.crazy_cep').mask('00000-000', options);

If you're using jQuery already, there are probably hundreds of plugins for masks etc, for example:
http://www.meiocodigo.com/projects/meiomask/
So I don't think you'd have to roll your own

When you use jQuery Inputmask plugin and you want to use 4 or 5 digit values for zip code you should use:
$('#myZipField').inputmask("9999[9]");

Why not have the field be transparent, and have a text object behind it with the form in light grey? So they see #######-#### in the background, and then rig it so the letters dissapear as they type. At that point, it suggests that they should enter a dash if they want to put the extra four, right? Then, you could just rig the script to autoinsert the hyphen if they mess up and type 6 numbers?

word decoder by javascript?

Implement the “Word Decoder” game. This game will present the player with a series of scrambled words (up to 20 words) and challenge him/her to attempt to unscramble them. Each time a new word is displayed, and a text input is provided for the user to write the unscrambled word.
Once the player thinks the word has been properly decoded, he clicks on the “Check answer” button. If the player’s answer is correct, his score is increased by one. If his answer is not correct, he is notified and he is then given a different word.
i understood the Question , but i dont know how to generate it , or even how to start it!!
any help please?

To start, try breaking down the problem into things you'll need; think nouns and verbs. This is simply rewriting the problem in new terms. You need:
word: just a string, but it's a noun you'll need, so list it.
dictionary: a collection of words to choose from (during testing, you don't need many)
display: these become HTML elements, since you're working with JS
scrambled word
text input
submit button to check answer
score
"wrong answer" notifier
to scramble a word
to compare words: how can you compare two words to see if one is a permutation of the other? Do it right and anagrams aren't a problem.
to check an answer
to increment score
to notify user of incorrect answer
to present a new scrambled word
Any item beginning with "to" is a verb; anything else is a noun. Nouns become objects, verbs become methods/functions.
The above is mostly a top-down approach, in contrast with bottom-up (note that top-down vs bottom-up isn't an either-or proposition). Other approaches that might help with not knowing where to start are test driven development or its offshoot, behavior driven development. With these you start by defining, in code, what the program should do, then fill in the details to make it do that.
A hint on comparing words: the problem is basically defining an equivalence class—two strings are equivalent if one is a permutation of the other. The permutations of a string, taken together, form the equivalence class for that string; two strings are in the same equivalence class if the strings are equivalent. As the linked document points out, equivalence classes are well represented by picking a single element of the class as the class representative. Lastly, you can turn the equivalence class definition around: two strings are permutations of each other if they are in the same equivalence class.

Look into loading a dictionary via XHR.
there are tons of those available online [http://www.mieliestronk.com/wordlist.html NOTE: it contains some swear words, if you're going to be doing this for academic purposes, since its your homework, you should look for a "clean" list]...
For scrambling the word: make your string into a char array, then find an array shuffle function [they are simple to write, I wrote one for my implementation of Bogosort]...
function shuffle(b)
{
var a = b.concat([]); //makes a copy of B, b won't be changed...
var final = [];
while(a.length != 0)
{
//0 -> a length-1
var targIndex = Math.round((a.length-1)*(Math.random()));
var value = a[targIndex]
a.remove(targIndex);
final.push(value);
}
return final;
}
When the user is done inputting, simply compare input with the answer [case insensitive, ignore spaces] As stated in comments, there are also the possibility of anagrams, so be sure to check for those... perhaps, you could simply verify the word exists in the dictionary.

Develop Reference

JavaScript is the programming language of the Web.