Why doesn't my function correctly replace when using some regex pattern

Why doesn't my function correctly replace when using some regex pattern - javascript

This is an extension of this SO question
I made a function to see if i can correctly format any number. The answers below work on tools like https://regex101.com and https://regexr.com/, but not within my function(tried in node and browser):
const
const format = (num, regex) => String(num).replace(regex, '$1')
Basically given any whole number, it should not exceed 15 significant digits. Given any decimal, it should not exceed 2 decimal points.
so...
Now
format(0.12345678901234567890, /^\d{1,13}(\.\d{1,2}|\d{0,2})$/)
returns 0.123456789012345678 instead of 0.123456789012345
but
format(0.123456789012345,/^-?(\d*\.?\d{0,2}).*/)
returns number formatted to 2 deimal points as expected.

Let me try to explain what's going on.
For the given input 0.12345678901234567890 and the regex /^\d{1,13}(\.\d{1,2}|\d{0,2})$/, let's go step by step and see what's happening.
^\d{1,13} Does indeed match the start of the string 0
(\. Now you've opened a new group, and it does match .
\d{1,2} It does find the digits 1 and 2
|\d{0,2} So this part is skipped
) So this is the end of your capture group.
$ This indicates the end of the string, but it won't match, because you've still got 345678901234567890 remaining.
Javascript returns the whole string because the match failed in the end.
Let's try removing $ at the end, to become /^\d{1,13}(\.\d{1,2}|\d{0,2})/
You'd get back ".12345678901234567890". This generates a couple of questions.
Why did the preceding 0 get removed?
Because it was not part of your matching group, enclosed with ().
Why did we not get only two decimal places, i.e. .12?
Remember that you're doing a replace. Which means that by default, the original string will be kept in place, only the parts that match will get replaced. Since 345678901234567890 was not part of the match, it was left intact. The only part that matched was 0.12.

Answer to title question: your function doesn't replace, because there's nothing to replace - the regex doesn't match anything in the string. csb's answer explains that in all details.
But that's perhaps not the answer you really need.
Now, it seems like you have an XY problem. You ask why your call to .replace() doesn't work, but .replace() is definitely not a function you should use. Role of .replace() is replacing parts of string, while you actually want to create a different string. Moreover, in the comments you suggest that your formatting is not only for presenting data to user, but you also intend to use it in some further computation. You also mention cryptocurriencies.
Let's cope with these problems one-by-one.
What to do instead of replace?
Well, just produce the string you need instead of replacing something in the string you don't like. There are some edge cases. Instead of writing all-in-one regex, just handle them one-by-one.
The following code is definitely not best possible, but it's main aim is to be simple and show exactly what is going on.
function format(n) {
const max_significant_digits = 15;
const max_precision = 2;
let digits_before_decimal_point;
if (n < 0) {
// Don't count minus sign.
digits_before_decimal_point = n.toFixed(0).length - 1;
} else {
digits_before_decimal_point = n.toFixed(0).length;
}
if (digits_before_decimal_point > max_significant_digits) {
throw new Error('No good representation for this number');
}
const available_significant_digits_for_precision =
Math.max(0, max_significant_digits - digits_before_decimal_point);
const effective_max_precision =
Math.min(max_precision, available_significant_digits_for_precision);
const with_trailing_zeroes = n.toFixed(effective_max_precision);
// I want to keep the string and change just matching part,
// so here .replace() is a proper method to use.
const withouth_trailing_zeroes = with_trailing_zeroes.replace(/\.?0*$/, '');
return withouth_trailing_zeroes;
}
So, you got the number formatted the way you want. What now?
What can you use this string for?
Well, you can display it to the user. And that's mostly it. The value was rounded to (1) represent it in a different base and (2) fit in limited precision, so it's pretty much useless for any computation. And, BTW, why would you convert it to String in the first place, if what you want is a number?
Was the value you are trying to print ever useful in the first place?
Well, that's the most serious question here. Because, you know, floating point numbers are tricky. And they are absolutely abysmal for representing money. So, most likely the number you are trying to format is already a wrong number.
What to use instead?
Fixed-point arithmetic is the most obvious answer. Works most of the time. However, it's pretty tricky in JS, where number may slip into floating-point representation almost any time. So, it's better to use decimal arithmetic library. Optionally, switch to a language that has built-in bignums and decimals, like Python.

Related

How can I convert this UTF-8 string to plain text in javascript and how can a normal user write it in a textarea [duplicate]

While reviewing JavaScript concepts, I found String.normalize(). This is not something that shows up in W3School's "JavaScript String Reference", and, hence, it is the reason I might have missed it before.
I found more information about it in HackerRank which states:
Returns a string containing the Unicode Normalization Form of the
calling string's value.
With the example:
var s = "HackerRank";
console.log(s.normalize());
console.log(s.normalize("NFKC"));
having as output:
HackerRank
HackerRank
Also, in GeeksForGeeks:
The string.normalize() is an inbuilt function in javascript which is
used to return a Unicode normalisation form of a given input string.
with the example:
<script>
// Taking a string as input.
var a = "GeeksForGeeks";
// calling normalize function.
b = a.normalize('NFC')
c = a.normalize('NFD')
d = a.normalize('NFKC')
e = a.normalize('NFKD')
// Printing normalised form.
document.write(b +"<br>");
document.write(c +"<br>");
document.write(d +"<br>");
document.write(e);
</script>
having as output:
GeeksForGeeks
GeeksForGeeks
GeeksForGeeks
GeeksForGeeks
Maybe the examples given are just really bad as they don't allow me to see any change.
I wonder... what's the point of this method?

It depends on what will do with strings: often you do not need it (if you are just getting input from user, and putting it to user). But to check/search/use as key/etc. such strings, you may want a unique way to identify the same string (semantically speaking).
The main problem is that you may have two strings which are semantically the same, but with two different representations: e.g. one with a accented character [one code point], and one with a character combined with accent [one code point for character, one for combining accent]. User may not be in control on how the input text will be sent, so you may have two different user names, or two different password. But also if you mangle data, you may get different results, depending on initial string. Users do not like it.
An other problem is about unique order of combining characters. You may have an accent, and a lower tail (e.g. cedilla): you may express this with several combinations: "pure char, tail, accent", "pure char, accent, tail", "char+tail, accent", "char+accent, cedilla".
And you may have degenerate cases (especially if you type from a keyboard): you may get code points which should be removed (you may have a infinite long string which could be equivalent of few bytes.
In any case, for sorting strings, you (or your library) requires a normalized form: if you already provide the right, the lib will not need to transform it again.
So: you want that the same (semantically speaking) string has the same sequence of unicode code points.
Note: If you are doing directly on UTF-8, you should also care about special cases of UTF-8: same codepoint could be written in different ways [using more bytes]. Also this could be a security problem.
The K is often used for "searches" and similar tasks: CO2 and CO₂ will be interpreted in the same manner, but this could change the meaning of the text, so it should often used only internally, for temporary tasks, but keeping the original text.

As stated in MDN documentation, String.prototype.normalize() return the Unicode Normalized Form of the string. This because in Unicode, some characters can have different representation code.
This is the example (taken from MDN):
const name1 = '\u0041\u006d\u00e9\u006c\u0069\u0065';
const name2 = '\u0041\u006d\u0065\u0301\u006c\u0069\u0065';
console.log(`${name1}, ${name2}`);
// expected output: "Amélie, Amélie"
console.log(name1 === name2);
// expected output: false
console.log(name1.length === name2.length);
// expected output: false
const name1NFC = name1.normalize('NFC');
const name2NFC = name2.normalize('NFC');
console.log(`${name1NFC}, ${name2NFC}`);
// expected output: "Amélie, Amélie"
console.log(name1NFC === name2NFC);
// expected output: true
console.log(name1NFC.length === name2NFC.length);
// expected output: true
As you can see, the string Amélie as two different Unicode representations. With normalization, we can reduce the two forms to the same string.

Very beautifully explained here --> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
Short answer : The point is, characters are represented through a coding scheme like ascii, utf-8 , etc.,(We use mostly UTF-8). And some characters have more than one representation. So 2 string may render similarly, but their unicode may vary! So string comparrision may fail here! So we use normaize to return a single type of representation
// source from MDN
let string1 = '\u00F1'; // ñ
let string2 = '\u006E\u0303'; // ñ
string1 = string1.normalize('NFC');
string2 = string2.normalize('NFC');
console.log(string1 === string2); // true
console.log(string1.length); // 1
console.log(string2.length); // 1

Normalization of strings isn't exclusive of JavaScript - see for instances in Python. The values valid for the arguments are defined by the Unicode (more on Unicode normalization).
When it comes to JavaScript, note that there's documentation with String.normalize() and String.prototype.normalize(). As #ChrisG mentions
String.prototype.normalize() is correct in a technical sense, because
normalize() is a dynamic method you call on instances, not the class
itself. The point of normalize() is to be able to compare Strings that
look the same but don't consist of the same characters, as shown in
the example code on MDN.
Then, when it comes to its usage, found a great example of the usage of String.normalize() that has
let s1 = 'sabiá';
let s2 = 'sabiá';
// one is in NFC, the other in NFD, so they're different
console.log(s1 == s2); // false
// with normalization, they become the same
console.log(s1.normalize('NFC') === s2.normalize('NFC')); // true
// transform string into array of codepoints
function codepoints(s) { return Array.from(s).map(c => c.codePointAt(0).toString(16)); }
// printing the codepoints you can see the difference
console.log(codepoints(s1)); // [ "73", "61", "62", "69", "e1" ]
console.log(codepoints(s2)); // [ "73", "61", "62", "69", "61", "301" ]
So while saibá e saibá in this example look the same to the human eye or even if we used console.log(), we can see that without normalization when comparing them we'd get different results. Then, by analyzing the codepoints, we see they're different.

There are some great answers here already, but I wanted to throw in a practical example.
I enjoy Bible translation as a hobby. I wasn't too thrilled at the flashcard option out there in the wild in my price range (free) so I made my own. The problem is, there is more than one way to do Hebrew and Greek in Unicode to get the exact same thing. For example:
בָּא
בָּא
These should look identical on your screen, and for all practical purposes they are identical. However, the first was typed with the qamats (the little t shaped thing under it) before the dagesh (the dot in the middle of the letter) and the second was typed with the dagesh before the qamats. Now, since you're just reading this, you don't care. And your web browser doesn't care. But when my flashcards compare the two, then they aren't the same. To the code behind the scenes, it's no different than saying "center" and "centre" are the same.
Similarly, in Greek:
ἀ
ἀ
These two should look nearly identical, but the top is one Unicode character and the second one is two Unicode characters. Which one is going to end up typed in my flashcards is going to depend on which keyboard I'm sitting at.
When I'm adding flashcards, believe it or not, I don't always type in vocab lists of 100 words. That's why God gave us spreadsheets. And sometimes the places I'm importing the lists from do it one way, and sometimes they do it the other way, and sometimes they mix it. But when I'm typing, I'm not trying to memorize the order that the dagesh or quamats appear or if the accents are typed as a separate character or not. Regardless if I remember to type the dagesh first or not, I want to get the right answer, because really it's the same answer in every practical sense either way.
So I normalize the order before saving the flashcards and I normalize the order before checking it, and the result is that it doesn't matter which way I type it, it comes out right!
If you want to check out the results:
https://sthelenskungfu.com/flashcards/
You need a Google or Facebook account to log in, so it can track progress and such. As far as I know (or care) only my daughter and I currently use it.
It's free, but eternally in beta.

Regex - creating an input/textarea that correctly interprets numbers

Im designing a conversion website where i perform calculations on inputted numbers and i need my input or textarea to receive and interpret numbers entered in different fashions
like:
Entry = 3,000,000.1111
Interpreted value = 3000000.1111
or
Entry = 3000000.1111
Interpreted value = 3000000.1111
and I want to include a second input for European decimal notation
(or if possible have the same input do both)
Entry = 3.000.000,1111 (comma acts a decimal, decimal as separator)
Interpreted value = 3000000.1111
I wonder how I could do this. I suspect from some of my research that I could use regex.
Also should i use an input or a textarea? I want to limit the size of the number to 40 places.
It seems the textarea Im currently using won't recognize any values after a comma when a comma is used. I realized this is due to parseFloat. So I need to remove the commas using .replace() before parsing. But what do I do in the instance of European notation where the comma IS the decimal point? I suspect I should use regex to identify if a number is in comma decimal notation or standard decimal point notation and then outline the appropriate replacement behavior based on that. Any ideas how to write regex to identify a number between .0000000001 and 1,000,000,000,000,000 by only the separator and decimal point? What about when the entry doesn't use either? 12000 for example. Any help with this would be appreciated. Using HTML5 and Javascript. I am not using a form and am new at this. This is my first web page so please be patient with my questions.
I was thinking about this:
input = //value from textarea as a string
if(/REGEX which determines that the structure of the number is N,NNN.NN/.test(input)){
input = input.replace(/\,/,""); //replace the commas with nothing
}
else if(/REGEX which determine that structure of the number is N.NNN,NN/.test(input){
input = input.replace(/\./,""); //replace the decimal point separators with nothing
input = input.replace(/\,/,".");//replace the comma decimal with a point decimal
}
else{
//input unchanged assuming is NNNN without decimal
}
number = parseFloat(input);
I want to keep the possibility open for them to enter large numbers and also to use numbers less than one to 10 decimal places. Thanks to those who contributed.
Best,RP

I believe this should handle everything:
^[1-9](?:\d{0,2}(?:([,.])\d{3})*|\d+)(?:(?!\1)[,.]\d+)?$
You're treading on complicated territory here. Also, the above RegEx does not allow for values less than "1".
Basically, the RegEx does the following:
Allows for no thousandths separators ("," or ".") but ensures if they are used that they occur in the correct places.
Allows for either "," or "." to be used as both thousandths/cents separators, but ensures that the cents separator is not the same as the thousandths separator.
Requires the string equivalent number to begin with any digit other than "0".
To implement this you could attach an event listener to your form element(s) and use JS to do a simple .test.
After reading further, I think I misinterpreted your goal originally. I assumed you simply wanted to validate these values with a RegEx. I also assumed you're trying to work with currency (ie. two decimal places). However, fret not! You can still utilize my original answer if you really want.
You mentioned input and textarea which are both form elements. You can attach a listener to these element(s) looking for the input, change, and/or keyup events. As a part of the callback you can run the .test method or some other functionality. Personally, I would rethink how you want to handle input. Also, what's your actual goal here? Do you really need to know the thousandths separator or keep track of it? Why not just disallow any characters other than the one decimal point/comma and digits?
Also, parsing numbers like .0000000001 as a float is a terrible idea. You will lose precision very quickly if you do any sort of calculations such as multiplication, division, power, etc. You're going to have to figure out a different method to do this like storing the number to the right separately and as integers instead then go from there.
I can help you if you describe what you're trying to do in better detail.

Getting the numeric value after the hyphen in a string

How can I extract and get just the numeric value after the hyphen in a string?
Here is the input string:
var x = "-2147467259"
After some processing.... return:
alert(2147467259)
How do I accomplish this?

You could replace away the hyphen:
alert(+x.replace("-", ""));
And yes, the + is important. It converts a string to a number; so you're removing the hypen by replacing it with nothing, and then essentially casting the result of that operation into a number. This operation will also work if no hyphen is present.
You could also use substr to achieve this:
alert(+x.substr(1));
You could also use parseInt to convert the string to a number (which will end up negative if a hyphen is persent), and then find its absolute value:
alert(Math.abs(parseInt(x, 10));
As Bergi notes, if you can be sure that the first character in the string is always a hyphen, you can simple return its negative, which will by default cast the value into a number and then perform the negative operation on it:
alert(-x);
You could also check to see if the number is negative or positive via a tertiary operator and then perform the respective operation on it to ensure that it is a positive Number:
x = x >= 0 ? +x : -x;
This may be cheaper in terms of performance than using Math.abs, but the difference will be minuscule either way.
As you can see, there really are a variety of ways to achieve this. I'd recommend reading up on JavaScript string functions and number manipulation in general, as well as examining JavaScript's Math object to get a feel for what tools are available to you when you go to solve a problem.

How about:
Math.abs(parseInt("-2147467259"))
Or
"-2147467259".replace('-','')
or
"-2147467259".replace(/\-/,'')
#1 option is converting the string to numbers. The #2 approach is removing all - from the string and the #3 option even though it will not be necessary on this example uses Regular Expression but I wanted to show the possibility of using RegEx in replace situations.
If you need a number as the final value #1 is your choice if you need strings #2 is your choice.

How to substract 2 char in javascript to get a difference in ascii

alert('g' - 'a') is returning Not a Number. ('NAN').
But I expect, to get the difference between ascii as alert(103-97) => alert(6). Hence 6 to be output.
In C, int i = 'g' - 'a', will give i = 6.
How to achieve this subtraction of 2 characters in javascript? (easily without much effort as below)
alert("g".charCodeAt(0) - "a".charCodeAt(0)) is giving 6.
Application : I am using this in chess program.

The only practicable way to do as you want is the way you've already suggested:
alert('g'.charCodeAt(0) - 'a'.charCodeAt(0));
As you know, this will retrieve the ASCII character code from 0th element of the string in each case, and subtract the second from the first.
Unfortunately this is the only way to retrieve the ASCII code of a given character, though using a function would be somewhat simpler, though given the brevity/simplicity of the charCodeAt() solution not all that much so.
References:
String.charCodeAt().

JavaScript doesn't treat characters as numbers; they are single-character strings instead. So the subtract operator will be calculating Number('g') - Number('a').
You should do 'g'.charCodeAt(0) - 'a'.charCodeAt(0) (there is no better way, but you can wrap it in a function)

You can write yourself a custom function. Something like this:
function asciiDif(a,b) {
return a.charCodeAt(0) - b.charCodeAt(0);
}
And then:
alert(asciiDif('g','a'));

Find numbers at a specific position

I'm trying to find an expression for JavaScript which gives me the two characters at a specific position.
It's always the same call so its may be not too complicated.
I have always a 10 char long number and i want to replace the first two, the two at place 3 and 4 or the two at place 5 and 6 and so on.
So far I've done this:
number.replace(/\d{2}/, index));
this replace my first 2 digits with 2 others digits.
but now I want to include some variables at which position the digits should be replaced, something like:
number.replace(/\d{atposx,atpox+1}/, index));
that means:
01234567891
and I want sometimes to replace 01 with 02 and sometimes 23 with 56.
(or something like this with other numbers).
I hope I pointed out what I want.

This function works fine:
function replaceChars(input, startPos, replacement){
return input.substring(0,startPos) +
replacement +
input.substring(startPos+replacement.length)
}
Usage:
replaceChars("0123456789",2,"55") // output: 0155456789
Live example: http://jsfiddle.net/FnkpT/

Numbers are fairly easily interpreted as strings in JS. So, if you're working with an actual number (i.e. 9876543210) and not a number that's represented by a string (i.e. '987654321'), just turn the number into a string (''.concat(number); ) and don't limit yourself to the constraints of what you can do with just numbers.
Both of the above examples are fine (bah, they beat me to it), but you can even think about it like this:
var numberString = ''.concat(number);
var numberChunks = numberString.match(/(\d{2})/g);
You've now got an array of chunks that you can either walk through, switch through, or whatever other kind of flow you want to follow. When you're done, just say...
numberString = numberChunks.join('');
number = parseInt(numberString, 10);
You've got your number back as a native number (or skip the last part to just get the string back). And, aside from that, if you're doing multiple replacements.. the more replacements you do in the number, the more efficient breaking it up into chunks and dealing with the chunks are. I did a quick test, and running the 'replaceChars' function was faster on a single change, but will be slower than just splitting into an array if you're doing two or more changes to the data.
Hope that makes sense!

You can try this
function replaceAtIndex(str,value,index) {
return str.substr(0,index)+value+str.substr(index+value.length);
}
replaceAtIndex('0123456789','X',3); // returns "012X456789"
replaceAtIndex('0123456789','XY',3); // returns "012XY56789"

Develop Reference

JavaScript is the programming language of the Web.