How to convert a very large hex number to decimal in javascript - javascript

I am trying without much success to convert a very large hex number to decimal.
My problem is that using deciaml = parseInt(hex, 16)
gives me errors in the number when I try to convert a hex number above 14 digits.
I have no problem with this in Java, but Javascript does not seem to be accurate above 14 digits of hex.
I have tried "BigNumber" but tis gives me the same erroneous result.
I have trawled the web to the best of my ability and found web sites that will do the conversion but cannot figure out how to do the conversion longhand.
I have tried getting each character in turn and multiplying it by its factor i.e. 123456789abcdef
15 * Math.pow(16, 0) + 14 * Math.pow(16, 1).... etc but I think (being a noob) that my subroutines may not hev been all they should be because I got a completely (and I mean really different!) answer.
If it helps you guys I can post what I have written so far for you to look at but I am hoping someone has simple answer for me.
<script>
function Hex2decimal(hex){
var stringLength = hex.length;
var characterPosition = stringLength;
var character;
var hexChars = new Array();
hexChars[0] = "0";
hexChars[1] = "1";
hexChars[2] = "2";
hexChars[3] = "3";
hexChars[4] = "4";
hexChars[5] = "5";
hexChars[6] = "6";
hexChars[7] = "7";
hexChars[8] = "8";
hexChars[9] = "9";
hexChars[10] = "a";
hexChars[11] = "b";
hexChars[12] = "c";
hexChars[13] = "d";
hexChars[14] = "e";
hexChars[15] = "f";
var index = 0;
var hexChar;
var result;
// document.writeln(hex);
while (characterPosition >= 0)
{
// document.writeln(characterPosition);
character = hex.charAt(characterPosition);
while (index < hexChars.length)
{
// document.writeln(index);
document.writeln("String Character = " + character);
hexChar = hexChars[index];
document.writeln("Hex Character = " + hexChar);
if (hexChar == character)
{
result = hexChar;
document.writeln(result);
}
index++
}
// document.write(character);
characterPosition--;
}
return result;
}
</script>
Thank you.
Paul

The New 'n' Easy Way
var hex = "7FDDDDDDDDDDDDDDDDDDDDDD";
if (hex.length % 2) { hex = '0' + hex; }
var bn = BigInt('0x' + hex);
var d = bn.toString(10);
BigInts are now available in most browsers (except IE).
Earlier in this answer:
BigInts are now available in both node.js and Chrome. Firefox shouldn't be far behind.
If you need to deal with negative numbers, that requires a bit of work:
How to handle Signed JS BigInts
Essentially:
function hexToBn(hex) {
if (hex.length % 2) {
hex = '0' + hex;
}
var highbyte = parseInt(hex.slice(0, 2), 16)
var bn = BigInt('0x' + hex);
if (0x80 & highbyte) {
// You'd think `bn = ~bn;` would work... but it doesn't
// manually perform two's compliment (flip bits, add one)
// (because JS binary operators are incorrect for negatives)
bn = BigInt('0b' + bn.toString(2).split('').map(function (i) {
return '0' === i ? 1 : 0
}).join('')) + BigInt(1);
bn = -bn;
}
return bn;
}

Ok, let's try this:
function h2d(s) {
function add(x, y) {
var c = 0, r = [];
var x = x.split('').map(Number);
var y = y.split('').map(Number);
while(x.length || y.length) {
var s = (x.pop() || 0) + (y.pop() || 0) + c;
r.unshift(s < 10 ? s : s - 10);
c = s < 10 ? 0 : 1;
}
if(c) r.unshift(c);
return r.join('');
}
var dec = '0';
s.split('').forEach(function(chr) {
var n = parseInt(chr, 16);
for(var t = 8; t; t >>= 1) {
dec = add(dec, dec);
if(n & t) dec = add(dec, '1');
}
});
return dec;
}
Test:
t = 'dfae267ab6e87c62b10b476e0d70b06f8378802d21f34e7'
console.log(h2d(t))
prints
342789023478234789127089427304981273408912349586345899239
which is correct (feel free to verify).

Notice that "0x" + "ff" will be considered as 255, so convert your hex value to a string and add "0x" ahead.
function Hex2decimal(hex)
{
return ("0x" + hex) / 1;
}

If you are using the '0x' notation for your Hex String, don't forget to add s = s.slice(2) to remove the '0x' prefix.

Keep in mind that JavaScript only has a single numeric type (double), and does not provide any separate integer types. So it may not be possible for it to store exact representations of your numbers.
In order to get exact results you need to use a library for arbitrary-precision integers, such as BigInt.js. For example, the code:
var x = str2bigInt("5061756c205768697465",16,1,1);
var s = bigInt2str(x, 10);
$('#output').text(s);
Correctly converts 0x5061756c205768697465 to the expected result of 379587113978081151906917.
Here is a jsfiddle if you would like to experiment with the code listed above.

The BigInt constructor can take a hex string as argument:
/** #param hex = "a83b01cd..." */
function Hex2decimal(hex) {
return BigInt("0x" + hex).toString(10);
}
Usage:
Hex2decimal("100");
Output:
256
A rip-off from the other answer, but without the meaningless 0 padding =P

Related

How can i make a loop that will show '-' mark x time iteration was? [duplicate]

In Perl I can repeat a character multiple times using the syntax:
$a = "a" x 10; // results in "aaaaaaaaaa"
Is there a simple way to accomplish this in Javascript? I can obviously use a function, but I was wondering if there was any built in approach, or some other clever technique.
These days, the repeat string method is implemented almost everywhere. (It is not in Internet Explorer.) So unless you need to support older browsers, you can simply write:
"a".repeat(10)
Before repeat, we used this hack:
Array(11).join("a") // create string with 10 a's: "aaaaaaaaaa"
(Note that an array of length 11 gets you only 10 "a"s, since Array.join puts the argument between the array elements.)
Simon also points out that according to this benchmark, it appears that it's faster in Safari and Chrome (but not Firefox) to repeat a character multiple times by simply appending using a for loop (although a bit less concise).
In a new ES6 harmony, you will have native way for doing this with repeat. Also ES6 right now only experimental, this feature is already available in Edge, FF, Chrome and Safari
"abc".repeat(3) // "abcabcabc"
And surely if repeat function is not available you can use old-good Array(n + 1).join("abc")
Convenient if you repeat yourself a lot:
String.prototype.repeat = String.prototype.repeat || function(n){
n= n || 1;
return Array(n+1).join(this);
}
alert( 'Are we there yet?\nNo.\n'.repeat(10) )
Array(10).fill('a').join('')
Although the most voted answer is a bit more compact, with this approach you don't have to add an extra array item.
An alternative is:
for(var word = ''; word.length < 10; word += 'a'){}
If you need to repeat multiple chars, multiply your conditional:
for(var word = ''; word.length < 10 * 3; word += 'foo'){}
NOTE: You do not have to overshoot by 1 as with word = Array(11).join('a')
The most performance-wice way is https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat
Short version is below.
String.prototype.repeat = function(count) {
if (count < 1) return '';
var result = '', pattern = this.valueOf();
while (count > 1) {
if (count & 1) result += pattern;
count >>>= 1, pattern += pattern;
}
return result + pattern;
};
var a = "a";
console.debug(a.repeat(10));
Polyfill from Mozilla:
if (!String.prototype.repeat) {
String.prototype.repeat = function(count) {
'use strict';
if (this == null) {
throw new TypeError('can\'t convert ' + this + ' to object');
}
var str = '' + this;
count = +count;
if (count != count) {
count = 0;
}
if (count < 0) {
throw new RangeError('repeat count must be non-negative');
}
if (count == Infinity) {
throw new RangeError('repeat count must be less than infinity');
}
count = Math.floor(count);
if (str.length == 0 || count == 0) {
return '';
}
// Ensuring count is a 31-bit integer allows us to heavily optimize the
// main part. But anyway, most current (August 2014) browsers can't handle
// strings 1 << 28 chars or longer, so:
if (str.length * count >= 1 << 28) {
throw new RangeError('repeat count must not overflow maximum string size');
}
var rpt = '';
for (;;) {
if ((count & 1) == 1) {
rpt += str;
}
count >>>= 1;
if (count == 0) {
break;
}
str += str;
}
// Could we try:
// return Array(count + 1).join(this);
return rpt;
}
}
If you're not opposed to including a library in your project, lodash has a repeat function.
_.repeat('*', 3);
// β†’ '***
https://lodash.com/docs#repeat
For all browsers
The following function will perform a lot faster than the option suggested in the accepted answer:
var repeat = function(str, count) {
var array = [];
for(var i = 0; i < count;)
array[i++] = str;
return array.join('');
}
You'd use it like this :
var repeatedString = repeat("a", 10);
To compare the performance of this function with that of the option proposed in the accepted answer, see this Fiddle and this Fiddle for benchmarks.
For moderns browsers only
In modern browsers, you can now do this using String.prototype.repeat method:
var repeatedString = "a".repeat(10);
Read more about this method on MDN.
This option is even faster. Unfortunately, it doesn't work in any version of Internet explorer. The numbers in the table specify the first browser version that fully supports the method:
In ES2015/ES6 you can use "*".repeat(n)
So just add this to your projects, and your are good to go.
String.prototype.repeat = String.prototype.repeat ||
function(n) {
if (n < 0) throw new RangeError("invalid count value");
if (n == 0) return "";
return new Array(n + 1).join(this.toString())
};
String.repeat() is supported by 96.39% of browsers as of now.
function pad(text, maxLength){
return text + "0".repeat(maxLength - text.length);
}
console.log(pad('text', 7)); //text000
/**
* Repeat a string `n`-times (recursive)
* #param {String} s - The string you want to repeat.
* #param {Number} n - The times to repeat the string.
* #param {String} d - A delimiter between each string.
*/
var repeat = function (s, n, d) {
return --n ? s + (d || "") + repeat(s, n, d) : "" + s;
};
var foo = "foo";
console.log(
"%s\n%s\n%s\n%s",
repeat(foo), // "foo"
repeat(foo, 2), // "foofoo"
repeat(foo, "2"), // "foofoo"
repeat(foo, 2, "-") // "foo-foo"
);
Just for the fun of it, here is another way by using the toFixed(), used to format floating point numbers.
By doing
(0).toFixed(2)
(0).toFixed(3)
(0).toFixed(4)
we get
0.00
0.000
0.0000
If the first two characters 0. are deleted, we can use this repeating pattern to generate any repetition.
function repeat(str, nTimes) {
return (0).toFixed(nTimes).substr(2).replaceAll('0', str);
}
console.info(repeat('3', 5));
console.info(repeat('hello ', 4));
Another interesting way to quickly repeat n character is to use idea from quick exponentiation algorithm:
var repeatString = function(string, n) {
var result = '', i;
for (i = 1; i <= n; i *= 2) {
if ((n & i) === i) {
result += string;
}
string = string + string;
}
return result;
};
For repeat a value in my projects i use repeat
For example:
var n = 6;
for (i = 0; i < n; i++) {
console.log("#".repeat(i+1))
}
but be careful because this method has been added to the ECMAScript 6 specification.
function repeatString(n, string) {
var repeat = [];
repeat.length = n + 1;
return repeat.join(string);
}
repeatString(3,'x'); // => xxx
repeatString(10,'🌹'); // => "🌹🌹🌹🌹🌹🌹🌹🌹🌹🌹"
This is how you can call a function and get the result by the helps of Array() and join()
using Typescript and arrow fun
const repeatString = (str: string, num: number) => num > 0 ?
Array(num+1).join(str) : "";
console.log(repeatString("🌷",10))
//outputs: 🌷🌷🌷🌷🌷🌷🌷🌷🌷🌷
function repeatString(str, num) {
// Array(num+1) is the string you want to repeat and the times to repeat the string
return num > 0 ? Array(num+1).join(str) : "";
}
console.log(repeatString("a",10))
// outputs: aaaaaaaaaa
console.log(repeatString("🌷",10))
//outputs: 🌷🌷🌷🌷🌷🌷🌷🌷🌷🌷
Here is what I use:
function repeat(str, num) {
var holder = [];
for(var i=0; i<num; i++) {
holder.push(str);
}
return holder.join('');
}
I realize that it's not a popular task, what if you need to repeat your string not an integer number of times?
It's possible with repeat() and slice(), here's how:
String.prototype.fracRepeat = function(n){
if(n < 0) n = 0;
var n_int = ~~n; // amount of whole times to repeat
var n_frac = n - n_int; // amount of fraction times (e.g., 0.5)
var frac_length = ~~(n_frac * this.length); // length in characters of fraction part, floored
return this.repeat(n) + this.slice(0, frac_length);
}
And below a shortened version:
String.prototype.fracRepeat = function(n){
if(n < 0) n = 0;
return this.repeat(n) + this.slice(0, ~~((n - ~~n) * this.length));
}
var s = "abcd";
console.log(s.fracRepeat(2.5))
I'm going to expand on #bonbon's answer. His method is an easy way to "append N chars to an existing string", just in case anyone needs to do that. For example since "a google" is a 1 followed by 100 zeros.
for(var google = '1'; google.length < 1 + 100; google += '0'){}
document.getElementById('el').innerText = google;
<div>This is "a google":</div>
<div id="el"></div>
NOTE: You do have to add the length of the original string to the conditional.
Lodash offers a similar functionality as the Javascript repeat() function which is not available in all browers. It is called _.repeat and available since version 3.0.0:
_.repeat('a', 10);
var stringRepeat = function(string, val) {
var newString = [];
for(var i = 0; i < val; i++) {
newString.push(string);
}
return newString.join('');
}
var repeatedString = stringRepeat("a", 1);
Can be used as a one-liner too:
function repeat(str, len) {
while (str.length < len) str += str.substr(0, len-str.length);
return str;
}
In CoffeeScript:
( 'a' for dot in [0..10]).join('')
String.prototype.repeat = function (n) { n = Math.abs(n) || 1; return Array(n + 1).join(this || ''); };
// console.log("0".repeat(3) , "0".repeat(-3))
// return: "000" "000"

Javascript multiple digit index

I have searched around the net and the solution must be so simple no one has asked?
I just wanted to use an index like + i + to return 001, 002, 003, etc
How about
('000' + i).substr(-3);
So something like this?
function number_pad(num,len) {
num = ""+num;
while(num.length < len) num = "0"+num;
return num;
}
// Usage: number_pad(i,3);
Alternatively, extend the native object:
Number.prototype.pad(len) {
var num = ""+this;
while(num.length < len) num = "0"+num;
return num;
}
// Usage: i.pad(3);
For future reference, this is called zerofill or zero-padding.
function paddedNumber(n) {
// A string containing the fully padded zero value.
var zeroes = "000";
// The number as a string.
var numstr = "" + n;
var nDigits = numstr.length;
// Keep any sign at the front.
var sign = "";
if (/^[\+\-]/.test(numstr)) {
sign = numstr.charAt(0);
numstr = numstr.substring(1);
}
// Concatenates the number with just enough zeroes.
// No padding if itoa is already longer than the pad.
return sign + zeroes.substring(nDigits) + numstr;
}

Adding extra zeros in front of a number using jQuery?

I have file that are uploaded which are formatted like so
MR 1
MR 2
MR 100
MR 200
MR 300
ETC.
What i need to do is add extra two 00s before anything before MR 10 and add one extra 0 before MR10-99
So files are formatted
MR 001
MR 010
MR 076
ETC.
Any help would be great!
Assuming you have those values stored in some strings, try this:
function pad (str, max) {
str = str.toString();
return str.length < max ? pad("0" + str, max) : str;
}
pad("3", 3); // => "003"
pad("123", 3); // => "123"
pad("1234", 3); // => "1234"
var test = "MR 2";
var parts = test.split(" ");
parts[1] = pad(parts[1], 3);
parts.join(" "); // => "MR 002"
I have a potential solution which I guess is relevent, I posted about it here:
https://www.facebook.com/antimatterstudios/posts/10150752380719364
basically, you want a minimum length of 2 or 3, you can adjust how many 0's you put in this piece of code
var d = new Date();
var h = ("0"+d.getHours()).slice(-2);
var m = ("0"+d.getMinutes()).slice(-2);
var s = ("0"+d.getSeconds()).slice(-2);
I knew I would always get a single integer as a minimum (cause hour 1, hour 2) etc, but if you can't be sure of getting anything but an empty string, you can just do "000"+d.getHours() to make sure you get the minimum.
then you want 3 numbers? just use -3 instead of -2 in my code, I'm just writing this because I wanted to construct a 24 hour clock in a super easy fashion.
Note: see Update 2 if you are using latest ECMAScript...
Here a solution I liked for its simplicity from an answer to a similar question:
var n = 123
String('00000' + n).slice(-5); // returns 00123
('00000' + n).slice(-5); // returns 00123
UPDATE
As #RWC suggested you can wrap this of course nicely in a generic function like this:
function leftPad(value, length) {
return ('0'.repeat(length) + value).slice(-length);
}
leftPad(123, 5); // returns 00123
And for those who don't like the slice:
function leftPad(value, length) {
value = String(value);
length = length - value.length;
return ('0'.repeat(length) + value)
}
But if performance matters I recommend reading through the linked answer before choosing one of the solutions suggested.
UPDATE 2
In ES6 the String class now comes with a inbuilt padStart method which adds leading characters to a string. Check MDN here for reference on String.prototype.padStart(). And there is also a padEnd method for ending characters.
So with ES6 it became as simple as:
var n = '123';
n.padStart(5, '0'); // returns 00123
Note: #Sahbi is right, make sure you have a string otherwise calling padStart will throw a type error.
So in case the variable is or could be a number you should cast it to a string first:
String(n).padStart(5, '0');
function addLeadingZeros (n, length)
{
var str = (n > 0 ? n : -n) + "";
var zeros = "";
for (var i = length - str.length; i > 0; i--)
zeros += "0";
zeros += str;
return n >= 0 ? zeros : "-" + zeros;
}
//addLeadingZeros (1, 3) = "001"
//addLeadingZeros (12, 3) = "012"
//addLeadingZeros (123, 3) = "123"
This is the function that I generally use in my code to prepend zeros to a number or string.
The inputs are the string or number (str), and the desired length of the output (len).
var PrependZeros = function (str, len) {
if(typeof str === 'number' || Number(str)){
str = str.toString();
return (len - str.length > 0) ? new Array(len + 1 - str.length).join('0') + str: str;
}
else{
for(var i = 0,spl = str.split(' '); i < spl.length; spl[i] = (Number(spl[i])&& spl[i].length < len)?PrependZeros(spl[i],len):spl[i],str = (i == spl.length -1)?spl.join(' '):str,i++);
return str;
}
};
Examples:
PrependZeros('MR 3',3); // MR 003
PrependZeros('MR 23',3); // MR 023
PrependZeros('MR 123',3); // MR 123
PrependZeros('foo bar 23',3); // foo bar 023
If you split on the space, you can add leading zeros using a simple function like:
function addZeros(n) {
return (n < 10)? '00' + n : (n < 100)? '0' + n : '' + n;
}
So you can test the length of the string and if it's less than 6, split on the space, add zeros to the number, then join it back together.
Or as a regular expression:
function addZeros(s) {
return s.replace(/ (\d$)/,' 00$1').replace(/ (\d\d)$/,' 0$1');
}
I'm sure someone can do it with one replace, not two.
Edit - examples
alert(addZeros('MR 3')); // MR 003
alert(addZeros('MR 23')); // MR 023
alert(addZeros('MR 123')); // MR 123
alert(addZeros('foo bar 23')); // foo bar 023
It will put one or two zeros infront of a number at the end of a string with a space in front of it. It doesn't care what bit before the space is.
Just for a laugh do it the long nasty way....:
(NOTE: ive not used this, and i would not advise using this.!)
function pad(str, new_length) {
('00000000000000000000000000000000000000000000000000' + str).
substr((50 + str.toString().length) - new_length, new_length)
}
I needed something like this myself the other day, Pud instead of always a 0, I wanted to be able to tell it what I wanted padded ing the front. Here's what I came up with for code:
function lpad(n, e, d) {
var o = ''; if(typeof(d) === 'undefined'){ d='0'; } if(typeof(e) === 'undefined'){ e=2; }
if(n.length < e){ for(var r=0; r < e - n.length; r++){ o += d; } o += n; } else { o=n; }
return o; }
Where n is what you want padded, e is the power you want it padded to (number of characters long it should be), and d is what you want it to be padded with. Seems to work well for what I needed it for, but it would fail if "d" was more than one character long is some cases.
var str = "43215";
console.log("Before : \n string :"+str+"\n Length :"+str.length);
var max = 9;
while(str.length < max ){
str = "0" + str;
}
console.log("After : \n string :"+str+"\n Length :"+str.length);
It worked for me !
To increase the zeroes, update the 'max' variable
Working Fiddle URL : Adding extra zeros in front of a number using jQuery?:
str could be a number or a string.
formatting("hi",3);
function formatting(str,len)
{
return ("000000"+str).slice(-len);
}
Add more zeros if needs large digits
In simple terms we can written as follows,
for(var i=1;i<=31;i++)
i=(i<10) ? '0'+i : i;
//Because most of the time we need this for day, month or amount matters.
Know this is an old post, but here's another short, effective way:
edit: dur. if num isn't string, you'd add:
len -= String(num).length;
else, it's all good
function addLeadingZeros(sNum, len) {
len -= sNum.length;
while (len--) sNum = '0' + sNum;
return sNum;
}
Try following, which will convert convert single and double digit numbers to 3 digit numbers by prefixing zeros.
var base_number = 2;
var zero_prefixed_string = ("000" + base_number).slice(-3);
By adding 100 to the number, then run a substring function from index 1 to the last position in right.
var dt = new Date();
var month = (100 + dt.getMonth()+1).toString().substr(1, 2);
var day = (100 + dt.getDate()).toString().substr(1, 2);
console.log(month,day);
you will got this result from the date of 2020-11-3
11,03
I hope the answer is useful

How to make 8 digit number in javascript?

I am trying to make an auto-generator of numbers. but I'm having a problem on how to forced the number to 8 digit.
for(i=1;i<=100;i++) {
var i = x++;
var test = i.toFixed(8); // I used this but this is only for decimals
jQuery('.generated_table').append(test+'<br />');;
}
Please help.
Use toPrecision:
(10000000).toPrecision(8); //=> '10000000'
(100).toPrecision(8); //=> '100.00000'
If you meant preceding a number with leading zero's:
var i = (100).toPrecision(8).split('.').reverse().join(''); //=> '00000100'
You can also make a Number.prototype function of that:
Number.prototype.leadingZeros = function(n) {
return this.toPrecision(n).split('.').reverse().join('');
};
(100).leadinZeros(8); //=> '00000100'
Just to be complete: a more precise way to print any (number of) leading character(s) to any number may be:
Number.prototype.toWidth = function(n,chr) {
chr = chr || ' ';
var len = String(parseFloat(this)).length;
function multiply(str,nn){
var s = str;
while (--nn>0){
str+=s;
}
return str;
}
n = n<len ? 0 : Math.abs(len-n);
return (n>1 && n ? multiply(chr,n) : n<1 ? '' : chr)+this;
};
(100).toWidth(8,'0'); //=> 00000100
Whooo!!! i got anser :: Try it
for(i=1;i<=100;i++) {
//var i = x++;
var test = i.toPrecision(8).replace("\.","");
jQuery('.generated_table').append(test+'<br />');;
}
Check out this SO question for some links to various printf-style functions for Javascript: Javascript printf/string.format
var randNum = "";
var MAX_LENGTH = 8;
while(randNum.toString().length < MAX_LENGTH){
var temp = Math.floor(Math.random() * 10);
randNum += temp.toString();
}
alert(randNum);

JavaScript strings outside of the BMP

BMP being Basic Multilingual Plane
According to JavaScript: the Good Parts:
JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide.
This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF.
Further investigation confirms this:
> String.fromCharCode(0x20001);
The fromCharCode method seems to only use the lowest 16 bits when returning the Unicode character. Trying to get U+20001 (CJK unified ideograph 20001) instead returns U+0001.
Question: is it at all possible to handle post-BMP characters in JavaScript?
2011-07-31: slide twelve from Unicode Support Shootout: The Good, The Bad, & the (mostly) Ugly covers issues related to this quite well:
Depends what you mean by β€˜support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can.
But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, slice etc) all deal with code units not characters, so will quite happily split surrogate pairs or hold invalid surrogate sequences.
If you want surrogate-aware methods, I'm afraid you're going to have to start writing them yourself! For example:
String.prototype.getCodePointLength= function() {
return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};
String.fromCodePoint= function() {
var chars= Array.prototype.slice.call(arguments);
for (var i= chars.length; i-->0;) {
var n = chars[i]-0x10000;
if (n>=0)
chars.splice(i, 1, 0xD800+(n>>10), 0xDC00+(n&0x3FF));
}
return String.fromCharCode.apply(null, chars);
};
I came to the same conclusion as bobince. If you want to work with strings containing unicode characters outside of the BMP, you have to reimplement javascript's String methods. This is because javascript counts characters as each 16-bit code value. Symbols outside of the BMP need two code values to be represented. You therefore run into a case where some symbols count as two characters and some count only as one.
I've reimplemented the following methods to treat each unicode code point as a single character: .length, .charCodeAt, .fromCharCode, .charAt, .indexOf, .lastIndexOf, .splice, and .split.
You can check it out on jsfiddle: http://jsfiddle.net/Y89Du/
Here's the code without comments. I tested it, but it may still have errors. Comments are welcome.
if (!String.prototype.ucLength) {
String.prototype.ucLength = function() {
// this solution was taken from
// http://stackoverflow.com/questions/3744721/javascript-strings-outside-of-the-bmp
return this.length - this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length + 1;
};
}
if (!String.prototype.codePointAt) {
String.prototype.codePointAt = function (ucPos) {
if (isNaN(ucPos)){
ucPos = 0;
}
var str = String(this);
var codePoint = null;
var pairFound = false;
var ucIndex = -1;
var i = 0;
while (i < str.length){
ucIndex += 1;
var code = str.charCodeAt(i);
var next = str.charCodeAt(i + 1);
pairFound = (0xD800 <= code && code <= 0xDBFF && 0xDC00 <= next && next <= 0xDFFF);
if (ucIndex == ucPos){
codePoint = pairFound ? ((code - 0xD800) * 0x400) + (next - 0xDC00) + 0x10000 : code;
break;
} else{
i += pairFound ? 2 : 1;
}
}
return codePoint;
};
}
if (!String.fromCodePoint) {
String.fromCodePoint = function () {
var strChars = [], codePoint, offset, codeValues, i;
for (i = 0; i < arguments.length; ++i) {
codePoint = arguments[i];
offset = codePoint - 0x10000;
if (codePoint > 0xFFFF){
codeValues = [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)];
} else{
codeValues = [codePoint];
}
strChars.push(String.fromCharCode.apply(null, codeValues));
}
return strChars.join("");
};
}
if (!String.prototype.ucCharAt) {
String.prototype.ucCharAt = function (ucIndex) {
var str = String(this);
var codePoint = str.codePointAt(ucIndex);
var ucChar = String.fromCodePoint(codePoint);
return ucChar;
};
}
if (!String.prototype.ucIndexOf) {
String.prototype.ucIndexOf = function (searchStr, ucStart) {
if (isNaN(ucStart)){
ucStart = 0;
}
if (ucStart < 0){
ucStart = 0;
}
var str = String(this);
var strUCLength = str.ucLength();
searchStr = String(searchStr);
var ucSearchLength = searchStr.ucLength();
var i = ucStart;
while (i < strUCLength){
var ucSlice = str.ucSlice(i,i+ucSearchLength);
if (ucSlice == searchStr){
return i;
}
i++;
}
return -1;
};
}
if (!String.prototype.ucLastIndexOf) {
String.prototype.ucLastIndexOf = function (searchStr, ucStart) {
var str = String(this);
var strUCLength = str.ucLength();
if (isNaN(ucStart)){
ucStart = strUCLength - 1;
}
if (ucStart >= strUCLength){
ucStart = strUCLength - 1;
}
searchStr = String(searchStr);
var ucSearchLength = searchStr.ucLength();
var i = ucStart;
while (i >= 0){
var ucSlice = str.ucSlice(i,i+ucSearchLength);
if (ucSlice == searchStr){
return i;
}
i--;
}
return -1;
};
}
if (!String.prototype.ucSlice) {
String.prototype.ucSlice = function (ucStart, ucStop) {
var str = String(this);
var strUCLength = str.ucLength();
if (isNaN(ucStart)){
ucStart = 0;
}
if (ucStart < 0){
ucStart = strUCLength + ucStart;
if (ucStart < 0){ ucStart = 0;}
}
if (typeof(ucStop) == 'undefined'){
ucStop = strUCLength - 1;
}
if (ucStop < 0){
ucStop = strUCLength + ucStop;
if (ucStop < 0){ ucStop = 0;}
}
var ucChars = [];
var i = ucStart;
while (i < ucStop){
ucChars.push(str.ucCharAt(i));
i++;
}
return ucChars.join("");
};
}
if (!String.prototype.ucSplit) {
String.prototype.ucSplit = function (delimeter, limit) {
var str = String(this);
var strUCLength = str.ucLength();
var ucChars = [];
if (delimeter == ''){
for (var i = 0; i < strUCLength; i++){
ucChars.push(str.ucCharAt(i));
}
ucChars = ucChars.slice(0, 0 + limit);
} else{
ucChars = str.split(delimeter, limit);
}
return ucChars;
};
}
More recent JavaScript engines have String.fromCodePoint.
const ideograph = String.fromCodePoint( 0x20001 ); // outside the BMP
Also a code-point iterator, which gets you the code-point length.
function countCodePoints( str )
{
const i = str[Symbol.iterator]();
let count = 0;
while( !i.next().done ) ++count;
return count;
}
console.log( ideograph.length ); // gives '2'
console.log( countCodePoints(ideograph) ); // '1'
Yes, you can. Although support to non-BMP characters directly in source documents is optional according to the ECMAScript standard, modern browsers let you use them. Naturally, the document encoding must be properly declared, and for most practical purposes you would need to use the UTF-8 encoding. Moreover, you need an editor that can handle UTF-8, and you need some input method(s); see e.g. my Full Unicode Input utility.
Using suitable tools and settings, you can write var foo = '𠀁'.
The non-BMP characters will be internally represented as surrogate pairs, so each non-BMP character counts as 2 in the string length.
Using for (c of this) instruction, one can make various computations on a string that contains non-BMP characters. For instance, to compute the string length, and to get the nth character of the string:
String.prototype.magicLength = function()
{
var c, k;
k = 0;
for (c of this) // iterate each char of this
{
k++;
}
return k;
}
String.prototype.magicCharAt = function(n)
{
var c, k;
k = 0;
for (c of this) // iterate each char of this
{
if (k == n) return c + "";
k++;
}
return "";
}
This old topic has now a simple solution in ES6:
Split characters into an array
simple version
[..."πŸ˜΄πŸ˜„πŸ˜ƒβ›”πŸŽ πŸš“πŸš‡"] // ["😴", "πŸ˜„", "πŸ˜ƒ", "β›”", "🎠", "πŸš“", "πŸš‡"]
Then having each one separated you can handle them easily for most common cases.
Credit: DownGoat
Full solution
To overcome special emojis as the one in the comment, one can search for the connection charecter (char code 8205 in UTF-16) and make some modifications. Here is how:
let myStr = "πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘§πŸ˜ƒπŒ†"
let arr = [...myStr]
for (i = arr.length-1; i--; i>= 0) {
if (arr[i].charCodeAt(0) == 8205) { // special combination character
arr[i-1] += arr[i] + arr[i+1]; // combine them back to a single emoji
arr.splice(i, 2)
}
}
console.log(arr.length) //3
Haven't found a case where this doesn't work. Comment if you do.
To conclude
it seems that JS uses the 8205 char code to represent UCS-2 characters as a UTF-16 combinations.

Categories

Resources