Related
I was wondering how JavaScript interpret the string numbers (i.e. "2"). for instance:
var car = { color: "red", model: "370"};
var anotherAge = ++(car.model) + 2;
or
var b=+"1" + 2; // returns 3
How JavaScript really works when it use ++("370")? I am looking for the way JavaScript behaves. What happens beyond?
It basically works like this:
var anotherAge = ++(car.model) + 2;
// ^^^^^^^^^^^^^
// coerce to numeric -> int(371)
var b=+"1" + 2;
// ^
// coerce to numeric -> int(1)
Once converted to a numeric expression, the remaining + operation will be interpreted as a mathematical operation rather than string concatenation.
I am trying to add numbers as strings using basic math. I first set the local storage to "0" then add "1" to it each time. I feel I am on the right path, but when I run this my result is not 0 + 1 = 1 rather I get "01" in my local storage. I want to be able to add 1 to the existing local storage each time so 0 + 1 I get 1. Next time around 1 + 1 I get 2, and 2 + 1 I get 3 and so on.
// sets "points" to 0 when user first loads page.
if (localStorage.getItem("points") === null){
localStorage.setItem("points", "0");
}
// get points
var totalPoints = localStorage.getItem("points");
// add 1 points to exisiting total
var addPoint = totalPoints +"1";
// set new total
localStorage.setItem("points", addPoint);
You can convert a string to a number in several ways (not an exhaustive list):
var n = s * 1; // s is the string
var n = s - 0;
var n = parseFloat(s);
var n = Number(s);
var n = ~~s; // force to 32-bit integer
var n = parseInt(s, 10); // also integer, precise up to 53 bits
Convert your strings to numbers when you fetch them from local storage, do the math, and then put the results back.
edit β the thing to keep in mind is that + is more "interesting" than the other arithmetic operators because it has meaning for string-valued operands. In fact, JavaScript tends to prefer string interpretation of the + operator, so if there's a string on one side and a number on the other, the operation is string concatenation and not arithmetic addition.
var s= new Date().getHours(); // 20
var m= new Date().getMinutes(); // 38
document.write(s,m); // returns 2038
var time = s,m;
document.write(time); // returns only 20
var time = s+m;
document.write(time); // returns 58
How can I declare a time variable which returns 2038 using s and m variables ?
Cast one to a string.
var time = ''+s+m;
document.write(time);
You have to make sure js knows its a string first.
var time = "" + s + m;
The longer answer is that JavaScript applies some rules when it sees code like this.
In this case document.write(s,m) is just outputting each value:
var s= new Date().getHours(); // 20
var m= new Date().getMinutes(); // 38
document.write(s,m); // returns 2038
The following is declaring two variables (separated by a comma), one called time and one called m, and the value contained in variable s is being assigned to time. With JavaScript scoping rules, you can declare e.g. var m as often as you like in a function and it will always refer to the same variable m:
var time = s,m;
document.write(time); // returns only 20
JavaScript uses + for both addition and concatenation, and will try to coerce variables to match the type of the first variable in order to decide whether it is to perform addition or concatenation. In this case s is a number so it will try to perform addition on the two values:
var time = s+m;
document.write(time); // returns 58
The correct JavaScript idiom for this is to put an empty string at the beginning of the expression and JavaScript will attempt to coerce subsequent variables to that type:
var time = '' + s + m;
document.write(time); // returns 2038
I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware.
I've got this function to parse strings into arrays while preserving the surrogate pairs. Then I'll recode several functions to deal with the arrays rather than strings.
function sortSurrogates(str){
var cp = []; // array to hold code points
while(str.length){ // loop till we've done the whole string
if(/[\uD800-\uDFFF]/.test(str.substr(0,1))){ // test the first character
// High surrogate found low surrogate follows
cp.push(str.substr(0,2)); // push the two onto array
str = str.substr(2); // clip the two off the string
}else{ // else BMP code point
cp.push(str.substr(0,1)); // push one onto array
str = str.substr(1); // clip one from string
}
} // loop
return cp; // return the array
}
My question is, is there something simpler I'm missing? I see so many people reiterating that javascript deals with utf-16 natively, yet my testing leads me to believe, that may be the data format, but the functions don't know it yet. Am I missing something simple?
EDIT:
To help illustrate the issue:
var a = "0123456789"; // U+0030 - U+0039 2 bytes each
var b = "πππππππππ π‘"; // U+1D7D8 - U+1D7E1 4 bytes each
alert(a.length); // javascript shows 10
alert(b.length); // javascript shows 20
Twitter sees and counts both of those as being 10 characters long.
Javascript uses UCS-2 internally, which is not UTF-16. It is very difficult to handle Unicode in Javascript because of this, and I do not suggest attempting to do so.
As for what Twitter does, you seem to be saying that it is sanely counting by code point not insanely by code unit.
Unless you have no choice, you should use a programming language that actually supports Unicode, and which has a code-point interface, not a code-unit interface. Javascript isn't good enough for that as you have discovered.
It has The UCS-2 Curse, which is even worse than The UTF-16 Curse, which is already bad enough. I talk about all this in OSCON talk, π« Unicode Support Shootout: π The Good, the Bad, & the (mostly) Ugly π.
Due to its horrible Curse, you have to hand-simulate UTF-16 with UCS-2 in Javascript, which is simply nuts.
Javascript suffers from all kinds of other terrible Unicode troubles, too. It has no support for graphemes or normalization or collation, all of which you really need. And its regexes are broken, sometimes due to the Curse, sometimes just because people got it wrong. For example, Javascript is incapable of expressing regexes like [π-π΅]. Javascript doesnβt even support casefolding, so you canβt write a pattern like /Σ΀ΞΞΞΞΞ£/i and have it correctly match ΟΟΞΉΞ³ΞΌΞ±Ο.
You can try to use the XRegEXp plugin, but you wonβt banish the Curse that way. Only changing to a language with Unicode support will do that, and π₯πΆππΆππΈππΎπ
π just isnβt one of those.
I've knocked together the starting point for a Unicode string handling object. It creates a function called UnicodeString() that accepts either a JavaScript string or an array of integers representing Unicode code points and provides length and codePoints properties and toString() and slice() methods. Adding regular expression support would be very complicated, but things like indexOf() and split() (without regex support) should be pretty easy to implement.
var UnicodeString = (function() {
function surrogatePairToCodePoint(charCode1, charCode2) {
return ((charCode1 & 0x3FF) << 10) + (charCode2 & 0x3FF) + 0x10000;
}
function stringToCodePointArray(str) {
var codePoints = [], i = 0, charCode;
while (i < str.length) {
charCode = str.charCodeAt(i);
if ((charCode & 0xF800) == 0xD800) {
codePoints.push(surrogatePairToCodePoint(charCode, str.charCodeAt(++i)));
} else {
codePoints.push(charCode);
}
++i;
}
return codePoints;
}
function codePointArrayToString(codePoints) {
var stringParts = [];
for (var i = 0, len = codePoints.length, codePoint, offset, codePointCharCodes; i < len; ++i) {
codePoint = codePoints[i];
if (codePoint > 0xFFFF) {
offset = codePoint - 0x10000;
codePointCharCodes = [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)];
} else {
codePointCharCodes = [codePoint];
}
stringParts.push(String.fromCharCode.apply(String, codePointCharCodes));
}
return stringParts.join("");
}
function UnicodeString(arg) {
if (this instanceof UnicodeString) {
this.codePoints = (typeof arg == "string") ? stringToCodePointArray(arg) : arg;
this.length = this.codePoints.length;
} else {
return new UnicodeString(arg);
}
}
UnicodeString.prototype = {
slice: function(start, end) {
return new UnicodeString(this.codePoints.slice(start, end));
},
toString: function() {
return codePointArrayToString(this.codePoints);
}
};
return UnicodeString;
})();
var ustr = UnicodeString("fππbar");
document.getElementById("output").textContent = "String: '" + ustr + "', length: " + ustr.length + ", slice(2, 4): " + ustr.slice(2, 4);
<div id="output"></div>
Here are a couple scripts that might be helpful when dealing with surrogate pairs in JavaScript:
ES6 Unicode shims for ES3+ adds the String.fromCodePoint and String.prototype.codePointAt methods from ECMAScript 6. The ES3/5 fromCharCode and charCodeAt methods do not account for surrogate pairs and therefore give wrong results.
Full 21-bit Unicode code point matching in XRegExp with \u{10FFFF} allows matching any individual code point in XRegExp regexes.
Javascript string iterators can give you the actual characters instead of the surrogate code points:
>>> [..."0123456789"]
["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
>>> [..."πππππππππ π‘"]
["π", "π", "π", "π", "π", "π", "π", "π", "π ", "π‘"]
>>> [..."0123456789"].length
10
>>> [..."πππππππππ π‘"].length
10
This is along the lines of what I was looking for. It needs better support for the different string functions. As I add to it I will update this answer.
function wString(str){
var T = this; //makes 'this' visible in functions
T.cp = []; //code point array
T.length = 0; //length attribute
T.wString = true; // (item.wString) tests for wString object
//member functions
sortSurrogates = function(s){ //returns array of utf-16 code points
var chrs = [];
while(s.length){ // loop till we've done the whole string
if(/[\uD800-\uDFFF]/.test(s.substr(0,1))){ // test the first character
// High surrogate found low surrogate follows
chrs.push(s.substr(0,2)); // push the two onto array
s = s.substr(2); // clip the two off the string
}else{ // else BMP code point
chrs.push(s.substr(0,1)); // push one onto array
s = s.substr(1); // clip one from string
}
} // loop
return chrs;
};
//end member functions
//prototype functions
T.substr = function(start,len){
if(len){
return T.cp.slice(start,start+len).join('');
}else{
return T.cp.slice(start).join('');
}
};
T.substring = function(start,end){
return T.cp.slice(start,end).join('');
};
T.replace = function(target,str){
//allow wStrings as parameters
if(str.wString) str = str.cp.join('');
if(target.wString) target = target.cp.join('');
return T.toString().replace(target,str);
};
T.equals = function(s){
if(!s.wString){
s = sortSurrogates(s);
T.cp = s;
}else{
T.cp = s.cp;
}
T.length = T.cp.length;
};
T.toString = function(){return T.cp.join('');};
//end prototype functions
T.equals(str)
};
Test results:
// plain string
var x = "0123456789";
alert(x); // 0123456789
alert(x.substr(4,5)) // 45678
alert(x.substring(2,4)) // 23
alert(x.replace("456","x")); // 0123x789
alert(x.length); // 10
// wString object
x = new wString("πππππππππ π‘");
alert(x); // πππππππππ π‘
alert(x.substr(4,5)) // πππππ
alert(x.substring(2,4)) // ππ
alert(x.replace("πππ","x")); // ππππxππ π‘
alert(x.length); // 10
How would it be a nice way of handling this?
I already thought on removing the comma and then parsing to float.
Do you know a better/cleaner way?
Thanks
parseFloat( theString.replace(/,/g,'') );
I don't know why no one has suggested this expression-
parseFloat( theString.replace(/[^\d\.]/g,'') );
Removes any non-numeric characters except for periods. You don't need custom functions/loops for this either, that's just overkill.
Nope. Remove the comma.
You can use the string replace method, but not in a one liner as a regexp allows.
while(str.indexOf(',')!=-1)str= str.replace(',','');
parseFloat(str);
Or to make a single expression without a regexp=
return parseFloat(str.split(',').join(''));
I'd use the regexp.
I don't have enough reputation to add a comment, but for anyone wondering on the performance for regex vs split/join, here's a quick fiddle: https://jsfiddle.net/uh3mmgru/
var test = "1,123,214.19";
var t0 = performance.now();
for (var i = 0; i < 1000000; i++)
{
var a = parseFloat(test.replace(/,/g,''));
}
var t1 = performance.now();
document.write('Regex took: ' + (t1 - t0) + ' ms');
document.write('<br>')
var t0 = performance.now();
for (var i = 0; i < 1000000; i++)
{
var b = parseFloat(test.split(',').join(''));
}
var t1 = performance.now();
document.write('Split/join took: ' + (t1 - t0) + ' ms');
The results I get are (for 1 million loops each):
Regex: 263.335 ms
Split/join: 1035.875 ms
So I think its safe to say that regex is the way to go in this scenario
Building on the idea from #kennebec, if you want to make sure that the commas are correct, and you don't want to replace commas, you could try something like this:
function myParse(num) {
var n2 = num.split(",")
out = 0
for(var i = 0; i < n2.length; i++) {
out *= 1000;
out += parseFloat(n2[i])
}
return out
}
alert(myParse("1,432,85"));
// Returns 1432085, as the comma is misplaced.
It may not be as fast, but you wanted alternatives :)
What about a simple function to solve most of the common problems?
function getValue(obj) {
Value = parseFloat( $(obj).val().replace(/,/g,'') ).toFixed(2);
return +Value;
}
The above function gets values from fields (using jQuery) assuming the entered values are numeric (I rather validate fields while user is entering data, so I know for sure field content is numeric).
In case of floating point values, if well formatted in the field, the function will return a float point value correctly.
This function is far from complete, but it quickly fix the "," (comma) issue for values entered as 1,234.56 or 1,234,567. It will return valid number as far the content is numeric.
The + (plus) sign in front of the variable Value in the return command is a "dirty trick" used in JavaScript to assure the variable content returned will be numeric.
it is easy to modify the function to other purposes, such as (for instance), convert strings to numeric values taking care of the "," (comma) issue:
function parseValue(str) {
Value = parseFloat( str.replace(/,/g,'') ).toFixed(2);
return +Value;
}
Both operations can even be combined in one function. I.e.:
function parseNumber(item,isField=false) {
Value = (isField) ? parseFloat( $(item).val().replace(/,/g,'') ).toFixed(2) : parseFloat( item.replace(/,/g,'') ).toFixed(2)
return +Value;
}
In such case, if function is called result = parseNumber('12,092.98'); it will parse the value as it is a String. But if called as result = parseNumber('#MyField', true); it will try to obtain the value from '#MyField'.
As I said before, such functions are far from complete, and can be expanded in many ways. One idea is to check the first character of the given parameter (string) and decide based on the string format where to obtain the value to be parsed (if 1st character is = '#' then it is an ID from a DOM object, otherwise, if it begins with a number, it must be a string to be parsed).
Try it... Happy coding.