Python - Javascript encrypt function for a known decrypt function - javascript

I found a streaming website that encrypts the iframe code with an interesting Javascript function. On the webpage is visible the decrypt function (obviously) but not the encryption one. This is the function:
function base64_decode(data) {
var b64 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';
var o1, o2, o3, h1, h2, h3, h4, bits, i = 0,
ac = 0,
dec = '',
tmp_arr = [];
if (!data) {
return data;
}
data += '';
do {
h1 = b64.indexOf(data.charAt(i++));
h2 = b64.indexOf(data.charAt(i++));
h3 = b64.indexOf(data.charAt(i++));
h4 = b64.indexOf(data.charAt(i++));
bits = h1 << 18 | h2 << 12 | h3 << 6 | h4;
o1 = bits >> 16 & 0xff;
o2 = bits >> 8 & 0xff;
o3 = bits & 0xff;
if (h3 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1);
} else if (h4 == 64) {
tmp_arr[ac++] = String.fromCharCode(o1, o2);
} else {
tmp_arr[ac++] = String.fromCharCode(o1, o2, o3);
}
} while (i < data.length);
dec = tmp_arr.join('');
return dec.replace(/\0+$/, '');
}
function ord(string) {
var str = string + '',
code = str.charCodeAt(0);
if (0xD800 <= code && code <= 0xDBFF) {
var hi = code;
if (str.length === 1) {
return code;
}
var low = str.charCodeAt(1);
return ((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;
}
if (0xDC00 <= code && code <= 0xDFFF) {
return code;
}
return code;
}
function decrypt(sData, sKey) {
var sResult = "";
sData = base64_decode(sData);
var i = 0;
for (i = 0; i < sData.length; i++) {
var sChar = sData.substr(i, 1);
var sKeyChar = sKey.substr(i % sKey.length - 1, 1);
sChar = Math.floor(ord(sChar) - ord(sKeyChar));
sChar = String.fromCharCode(sChar);
sResult = sResult + sChar;
}
return sResult;
}
So this code:
decrypt('s+Dd6djk3Jfq6dq0md/r6+fqsaam5ufc5ePm2Nul2uam3OTZ3Numy83Zw87aqazMvbimmZfq2unm4+Pg5d60meXmmZfd6djk3Nnm6dvc6bSZp5mX7uDb69+0mainp5yZl9/c4N7f67SZqKennJmX2OPj5u7d7OPj6trp3NzltJnr6ezcmZfu3Nni4OvY4+Pm7t3s4+Pq2unc3OW0mevp7NyZl+Tm8djj4+bu3ezj4+ra6dzc5bSZ6+ns3Jm1s6bg3enY5Ny1', 'w')
will return:
<iframe src="https://openload.co/embed/TVbLWc25UFA/" scrolling="no" frameborder="0" width="100%" height="100%" allowfullscreen="true" webkitallowfullscreen="true" mozallowfullscreen="true"></iframe>
I translated the decrypt function in Python with math and base64 modules and it works well, but now I need the encrypt function (in Python) that starting from a string outputs encrypted string + key. Is it some kind of known encryption?

This seems to be a bad implementation of the Vigenère cipher, without the modulus as the resulting sChar may have higher values. It simply adds the value of a key character to each plain character, reusing the key if it is depleted. It will mainly function as something to confuse virus-scanners or packet-inspecting firewalls as the encryption itself is of course completely insecure. It won't have a name (and no self-respecting cryptographer will lend his name for it either).
There seems to be a bug in the code here:
sKey.substr(i % sKey.length - 1, 1);
I'm not sure why the - 1 is required or how this plays out in practice (this is why languages and API's should be strict in what is acceptable).
ord seems to have been implemented to avoid issues with 16-bit Unicode characters.
base64_decode simply implements base 64 decoding, nothing to see there.

Related

Equivalent of Swift &+ in JavaScript

I'm not able to get the same djbhash in JavaScript that I was getting in Swift.
extension String {
public func djbHash() -> Int {
return self.utf8
.map {return $0}
.reduce(5381) {
let h = ($0 << 5) &+ $0 &+ Int($1)
print("h", h)
return h
}
}
}
var djbHash = function (string) {
var h = 5381; // our hash
var i = 0; // our iterator
for (i = 0; i < string.length; i++) {
var ascii = string.charCodeAt(i); // grab ASCII integer
h = (h << 5) + h + ascii; // bitwise operations
}
return h;
}
I tried using BigInt, but the value for the string "QHChLUHDMNh5UTBUcgtLmlPziN42" I'm getting is 17760568308754997342052348842020823769412069976n, compared to 357350748206983768 in Swift.
The Swift &+ operator is an β€œoverflow operator”: It truncates the result of the addition to the available number of bits for the used integer type.
A Swift Int is a 64-bit (signed) integer on all 64-bit platforms, and adding two integers would crash with a runtime exception if the result does not fit into an Int:
let a: Int = 0x7ffffffffffffff0
let b: Int = 0x7ffffffffffffff0
print(a + b) // πŸ’£ Swift runtime failure: arithmetic overflow
With &+ the result is truncated to 64-bit:
let a: Int = 0x7ffffffffffffff0
let b: Int = 0x7ffffffffffffff0
print(a &+ b) // -32
In order to get the same result with JavaScript and BigInt one can use the BigInt.asIntN() function:
var a = 0x7ffffffffffffff0n
var b = 0x7ffffffffffffff0n
console.log(a + b) // 18446744073709551584n
console.log(BigInt.asIntN(64, a+b)) // -32n
With that change, the JavaScript function gives the same result as your Swift code:
var djbHash = function (string) {
var h = 5381n; // our hash
var i = 0; // our iterator
for (i = 0; i < string.length; i++) {
var code = string.charCodeAt(i); // grab UTF-16 code point
h = BigInt.asIntN(64, (h << 5n) + h + BigInt(code)); // bitwise operations
}
return h;
}
console.log(djbHash("QHChLUHDMNh5UTBUcgtLmlPziN42")) // 357350748206983768n
As mentioned in the comments to the other answer, charCodeAt() returns UTF-16 code points, whereas your Swift function works with the UTF-8 representation of a string. So this will still give different results for strings containing any non-ASCII characters.
For identical results for arbitrary strings (umlauts, Emojis, flags, ...) its best to work with the Unicode code points. In Swift that would be
extension String {
public func djbHash() -> Int {
return self.unicodeScalars
.reduce(5381) { ($0 << 5) &+ $0 &+ Int($1.value) }
}
}
print("Γ€ΓΆΓΌβ‚¬πŸ˜€πŸš©".djbHash()) // 6958626281456
(You may also consider to use Int64 instead of Int for platform-independent code, or Int32 if a 32-bit hash is sufficient.)
The corresponding JavaScript code is
var djbHash = function (string) {
var h = 5381n; // our hash
for (const codePoint of string) {
h = BigInt.asIntN(64, (h << 5n) + h + BigInt(codePoint.codePointAt(0))); // bitwise operations
}
return h;
}
console.log(djbHash("Γ€ΓΆΓΌβ‚¬πŸ˜€πŸš©")) // 6958626281456n
I've had a similar issue in which I've used the & in combination with the used operator. I think the code below should work. It's still under review but you can checkout my post
var djbHash = function (string) {
var h = 5381; // our hash
var i = 0; // our iterator
for (i = 0; i < string.length; i++) {
var ascii = string.charCodeAt(i); // grab ASCII integer
h = (h << 5) + h &+ ascii; // bitwise operations
}
return h;
}

How to encode ??

During my research I've found these information but it seems like they are not really matching to my problem.
http://www.cplusplus.com/forum/beginner/31776/
Base 10 to base n conversions
https://cboard.cprogramming.com/cplusplus-programming/83808-base10-base-n-converter.html
So I'd like to implement a custom Base64 to BaseN encoding and decoding using C++.
I should be able to convert a (Base64)-string like "IloveC0mpil3rs" to a custom Base (e.g Base4) string like e.g "10230102010301" and back again.
Additional I should be able to use a custom charset (alphabet) for the base values like the default one probably is "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
So I should be able to use a shuffled one like e.g this (kind of encoding :) ): "J87opBEyWwDQdNAYujzshP3LOx1T0XK2e+ZrvFnticbCS64a9/Il5GmgVkqUfRMH".
I thought about translating the convertBase-function below from javascript into C++ but I'm obviously a beginner and got big problems, so I got stuck right there because my code is not working as expected and I can not find the error:
string encoded = convertBase("Test", 64, 4); // gets 313032130131000
cout << encoded << endl;
string decoded = convertBase(encoded, 4, 64); // error
cout << decoded << endl;
C++ code: (not working)
std::string convertBase(string value, int from_base, int to_base) {
string range = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/";
string from_range = range.substr(0, from_base),
to_range = range.substr(0, to_base);
int dec_value = 0;
int index = 0;
string reversed(value.rbegin(), value.rend());
for(std::string::iterator it = reversed.begin(); it != reversed.end(); ++it) {
index++;
char digit = *it;
if (!range.find(digit)) return "error";
dec_value += from_range.find(digit) * pow(from_base, index);
}
string new_value = "";
while (dec_value > 0) {
new_value = to_range[dec_value % to_base] + new_value;
dec_value = (dec_value - (dec_value % to_base)) / to_base;
}
return new_value;
}
javascript code: (working)
function convertBase(value, from_base, to_base) {
var range = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/'.split('');
var from_range = range.slice(0, from_base);
var to_range = range.slice(0, to_base);
var dec_value = value.split('').reverse().reduce(function (carry, digit, index) {
if (from_range.indexOf(digit) === -1) throw new Error('Invalid digit `'+digit+'` for base '+from_base+'.');
return carry += from_range.indexOf(digit) * (Math.pow(from_base, index));
}, 0);
var new_value = '';
while (dec_value > 0) {
new_value = to_range[dec_value % to_base] + new_value;
dec_value = (dec_value - (dec_value % to_base)) / to_base;
}
return new_value || '0';
}
let encoded = convertBase("Test", 64, 4)
console.log(encoded);
let decoded = convertBase(encoded, 4, 64)
console.log(decoded);
Any help how to fix my code would be very appreciated!

How to decode Base64 binary content as string, and then write it back on the HDD as binary content in WSH

Basically I'm trying to save a small binary file in my JS file(as Base64 string). Then, during execution, to Base64 decode it and simply save it on the hard-drive.
I've been struggling for hours to find out what's wrong with my code, but all in all, once i write the decoded base64 back to the HDD, i see different bytes between the original file, and the encoded - decoded file.
I've also used the following site to encrypt, decrypt the Base64ed string just to validate that the input / output is okay.
var base64string = "base64 string";
function Base64Encode(str) {
if (/([^\u0000-\u00ff])/.test(str)) throw Error('String must be ASCII');
var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var o1, o2, o3, bits, h1, h2, h3, h4, e=[], pad = '', c;
c = str.length % 3; // pad string to length of multiple of 3
if (c > 0) { while (c++ < 3) { pad += '='; str += '\0'; } }
// note: doing padding here saves us doing special-case packing for trailing 1 or 2 chars
for (c=0; c<str.length; c+=3) { // pack three octets into four hexets
o1 = str.charCodeAt(c);
o2 = str.charCodeAt(c+1);
o3 = str.charCodeAt(c+2);
bits = o1<<16 | o2<<8 | o3;
h1 = bits>>18 & 0x3f;
h2 = bits>>12 & 0x3f;
h3 = bits>>6 & 0x3f;
h4 = bits & 0x3f;
// use hextets to index into code string
e[c/3] = b64.charAt(h1) + b64.charAt(h2) + b64.charAt(h3) + b64.charAt(h4);
}
str = e.join(''); // use Array.join() for better performance than repeated string appends
// replace 'A's from padded nulls with '='s
str = str.slice(0, str.length-pad.length) + pad;
return str;
}
function Base64Decode(str) {
if (!(/^[a-z0-9+/]+={0,2}$/i.test(str)) || str.length%4 != 0) throw Error('Not base64 string');
var b64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var o1, o2, o3, h1, h2, h3, h4, bits, d=[];
for (var c=0; c<str.length; c+=4) { // unpack four hexets into three octets
h1 = b64.indexOf(str.charAt(c));
h2 = b64.indexOf(str.charAt(c+1));
h3 = b64.indexOf(str.charAt(c+2));
h4 = b64.indexOf(str.charAt(c+3));
bits = h1<<18 | h2<<12 | h3<<6 | h4;
o1 = bits>>>16 & 0xff;
o2 = bits>>>8 & 0xff;
o3 = bits & 0xff;
d[c/4] = String.fromCharCode(o1, o2, o3);
// check for padding
if (h4 == 0x40) d[c/4] = String.fromCharCode(o1, o2);
if (h3 == 0x40) d[c/4] = String.fromCharCode(o1);
}
str = d.join(''); // use Array.join() for better performance than repeated string appends
return str;
}
var base64decoded= Base64Decode(base64string);
var TextStream = WScript.CreateObject('ADODB.Stream');
TextStream.Type = 2;
.,TextStream.charSet = 'windows-1250';
TextStream.Open();
TextStream.WriteText(base64decoded);
var BinaryStream = WScript.CreateObject('ADODB.Stream');
BinaryStream.Type = 1;
BinaryStream.Open();
TextStream.Position = 0;
TextStream.CopyTo(BinaryStream);
BinaryStream.SaveToFile("C:\\file.bin", 2);
BinaryStream.Close();
Turns out i used the wrong charSet.
TextStream.charSet = 'iso-8859-1'; is the correct answer.

How to convert a very large hex number to decimal in javascript

I am trying without much success to convert a very large hex number to decimal.
My problem is that using deciaml = parseInt(hex, 16)
gives me errors in the number when I try to convert a hex number above 14 digits.
I have no problem with this in Java, but Javascript does not seem to be accurate above 14 digits of hex.
I have tried "BigNumber" but tis gives me the same erroneous result.
I have trawled the web to the best of my ability and found web sites that will do the conversion but cannot figure out how to do the conversion longhand.
I have tried getting each character in turn and multiplying it by its factor i.e. 123456789abcdef
15 * Math.pow(16, 0) + 14 * Math.pow(16, 1).... etc but I think (being a noob) that my subroutines may not hev been all they should be because I got a completely (and I mean really different!) answer.
If it helps you guys I can post what I have written so far for you to look at but I am hoping someone has simple answer for me.
<script>
function Hex2decimal(hex){
var stringLength = hex.length;
var characterPosition = stringLength;
var character;
var hexChars = new Array();
hexChars[0] = "0";
hexChars[1] = "1";
hexChars[2] = "2";
hexChars[3] = "3";
hexChars[4] = "4";
hexChars[5] = "5";
hexChars[6] = "6";
hexChars[7] = "7";
hexChars[8] = "8";
hexChars[9] = "9";
hexChars[10] = "a";
hexChars[11] = "b";
hexChars[12] = "c";
hexChars[13] = "d";
hexChars[14] = "e";
hexChars[15] = "f";
var index = 0;
var hexChar;
var result;
// document.writeln(hex);
while (characterPosition >= 0)
{
// document.writeln(characterPosition);
character = hex.charAt(characterPosition);
while (index < hexChars.length)
{
// document.writeln(index);
document.writeln("String Character = " + character);
hexChar = hexChars[index];
document.writeln("Hex Character = " + hexChar);
if (hexChar == character)
{
result = hexChar;
document.writeln(result);
}
index++
}
// document.write(character);
characterPosition--;
}
return result;
}
</script>
Thank you.
Paul
The New 'n' Easy Way
var hex = "7FDDDDDDDDDDDDDDDDDDDDDD";
if (hex.length % 2) { hex = '0' + hex; }
var bn = BigInt('0x' + hex);
var d = bn.toString(10);
BigInts are now available in most browsers (except IE).
Earlier in this answer:
BigInts are now available in both node.js and Chrome. Firefox shouldn't be far behind.
If you need to deal with negative numbers, that requires a bit of work:
How to handle Signed JS BigInts
Essentially:
function hexToBn(hex) {
if (hex.length % 2) {
hex = '0' + hex;
}
var highbyte = parseInt(hex.slice(0, 2), 16)
var bn = BigInt('0x' + hex);
if (0x80 & highbyte) {
// You'd think `bn = ~bn;` would work... but it doesn't
// manually perform two's compliment (flip bits, add one)
// (because JS binary operators are incorrect for negatives)
bn = BigInt('0b' + bn.toString(2).split('').map(function (i) {
return '0' === i ? 1 : 0
}).join('')) + BigInt(1);
bn = -bn;
}
return bn;
}
Ok, let's try this:
function h2d(s) {
function add(x, y) {
var c = 0, r = [];
var x = x.split('').map(Number);
var y = y.split('').map(Number);
while(x.length || y.length) {
var s = (x.pop() || 0) + (y.pop() || 0) + c;
r.unshift(s < 10 ? s : s - 10);
c = s < 10 ? 0 : 1;
}
if(c) r.unshift(c);
return r.join('');
}
var dec = '0';
s.split('').forEach(function(chr) {
var n = parseInt(chr, 16);
for(var t = 8; t; t >>= 1) {
dec = add(dec, dec);
if(n & t) dec = add(dec, '1');
}
});
return dec;
}
Test:
t = 'dfae267ab6e87c62b10b476e0d70b06f8378802d21f34e7'
console.log(h2d(t))
prints
342789023478234789127089427304981273408912349586345899239
which is correct (feel free to verify).
Notice that "0x" + "ff" will be considered as 255, so convert your hex value to a string and add "0x" ahead.
function Hex2decimal(hex)
{
return ("0x" + hex) / 1;
}
If you are using the '0x' notation for your Hex String, don't forget to add s = s.slice(2) to remove the '0x' prefix.
Keep in mind that JavaScript only has a single numeric type (double), and does not provide any separate integer types. So it may not be possible for it to store exact representations of your numbers.
In order to get exact results you need to use a library for arbitrary-precision integers, such as BigInt.js. For example, the code:
var x = str2bigInt("5061756c205768697465",16,1,1);
var s = bigInt2str(x, 10);
$('#output').text(s);
Correctly converts 0x5061756c205768697465 to the expected result of 379587113978081151906917.
Here is a jsfiddle if you would like to experiment with the code listed above.
The BigInt constructor can take a hex string as argument:
/** #param hex = "a83b01cd..." */
function Hex2decimal(hex) {
return BigInt("0x" + hex).toString(10);
}
Usage:
Hex2decimal("100");
Output:
256
A rip-off from the other answer, but without the meaningless 0 padding =P

JavaScript strings outside of the BMP

BMP being Basic Multilingual Plane
According to JavaScript: the Good Parts:
JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide.
This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF.
Further investigation confirms this:
> String.fromCharCode(0x20001);
The fromCharCode method seems to only use the lowest 16 bits when returning the Unicode character. Trying to get U+20001 (CJK unified ideograph 20001) instead returns U+0001.
Question: is it at all possible to handle post-BMP characters in JavaScript?
2011-07-31: slide twelve from Unicode Support Shootout: The Good, The Bad, & the (mostly) Ugly covers issues related to this quite well:
Depends what you mean by β€˜support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can.
But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, slice etc) all deal with code units not characters, so will quite happily split surrogate pairs or hold invalid surrogate sequences.
If you want surrogate-aware methods, I'm afraid you're going to have to start writing them yourself! For example:
String.prototype.getCodePointLength= function() {
return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};
String.fromCodePoint= function() {
var chars= Array.prototype.slice.call(arguments);
for (var i= chars.length; i-->0;) {
var n = chars[i]-0x10000;
if (n>=0)
chars.splice(i, 1, 0xD800+(n>>10), 0xDC00+(n&0x3FF));
}
return String.fromCharCode.apply(null, chars);
};
I came to the same conclusion as bobince. If you want to work with strings containing unicode characters outside of the BMP, you have to reimplement javascript's String methods. This is because javascript counts characters as each 16-bit code value. Symbols outside of the BMP need two code values to be represented. You therefore run into a case where some symbols count as two characters and some count only as one.
I've reimplemented the following methods to treat each unicode code point as a single character: .length, .charCodeAt, .fromCharCode, .charAt, .indexOf, .lastIndexOf, .splice, and .split.
You can check it out on jsfiddle: http://jsfiddle.net/Y89Du/
Here's the code without comments. I tested it, but it may still have errors. Comments are welcome.
if (!String.prototype.ucLength) {
String.prototype.ucLength = function() {
// this solution was taken from
// http://stackoverflow.com/questions/3744721/javascript-strings-outside-of-the-bmp
return this.length - this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length + 1;
};
}
if (!String.prototype.codePointAt) {
String.prototype.codePointAt = function (ucPos) {
if (isNaN(ucPos)){
ucPos = 0;
}
var str = String(this);
var codePoint = null;
var pairFound = false;
var ucIndex = -1;
var i = 0;
while (i < str.length){
ucIndex += 1;
var code = str.charCodeAt(i);
var next = str.charCodeAt(i + 1);
pairFound = (0xD800 <= code && code <= 0xDBFF && 0xDC00 <= next && next <= 0xDFFF);
if (ucIndex == ucPos){
codePoint = pairFound ? ((code - 0xD800) * 0x400) + (next - 0xDC00) + 0x10000 : code;
break;
} else{
i += pairFound ? 2 : 1;
}
}
return codePoint;
};
}
if (!String.fromCodePoint) {
String.fromCodePoint = function () {
var strChars = [], codePoint, offset, codeValues, i;
for (i = 0; i < arguments.length; ++i) {
codePoint = arguments[i];
offset = codePoint - 0x10000;
if (codePoint > 0xFFFF){
codeValues = [0xD800 + (offset >> 10), 0xDC00 + (offset & 0x3FF)];
} else{
codeValues = [codePoint];
}
strChars.push(String.fromCharCode.apply(null, codeValues));
}
return strChars.join("");
};
}
if (!String.prototype.ucCharAt) {
String.prototype.ucCharAt = function (ucIndex) {
var str = String(this);
var codePoint = str.codePointAt(ucIndex);
var ucChar = String.fromCodePoint(codePoint);
return ucChar;
};
}
if (!String.prototype.ucIndexOf) {
String.prototype.ucIndexOf = function (searchStr, ucStart) {
if (isNaN(ucStart)){
ucStart = 0;
}
if (ucStart < 0){
ucStart = 0;
}
var str = String(this);
var strUCLength = str.ucLength();
searchStr = String(searchStr);
var ucSearchLength = searchStr.ucLength();
var i = ucStart;
while (i < strUCLength){
var ucSlice = str.ucSlice(i,i+ucSearchLength);
if (ucSlice == searchStr){
return i;
}
i++;
}
return -1;
};
}
if (!String.prototype.ucLastIndexOf) {
String.prototype.ucLastIndexOf = function (searchStr, ucStart) {
var str = String(this);
var strUCLength = str.ucLength();
if (isNaN(ucStart)){
ucStart = strUCLength - 1;
}
if (ucStart >= strUCLength){
ucStart = strUCLength - 1;
}
searchStr = String(searchStr);
var ucSearchLength = searchStr.ucLength();
var i = ucStart;
while (i >= 0){
var ucSlice = str.ucSlice(i,i+ucSearchLength);
if (ucSlice == searchStr){
return i;
}
i--;
}
return -1;
};
}
if (!String.prototype.ucSlice) {
String.prototype.ucSlice = function (ucStart, ucStop) {
var str = String(this);
var strUCLength = str.ucLength();
if (isNaN(ucStart)){
ucStart = 0;
}
if (ucStart < 0){
ucStart = strUCLength + ucStart;
if (ucStart < 0){ ucStart = 0;}
}
if (typeof(ucStop) == 'undefined'){
ucStop = strUCLength - 1;
}
if (ucStop < 0){
ucStop = strUCLength + ucStop;
if (ucStop < 0){ ucStop = 0;}
}
var ucChars = [];
var i = ucStart;
while (i < ucStop){
ucChars.push(str.ucCharAt(i));
i++;
}
return ucChars.join("");
};
}
if (!String.prototype.ucSplit) {
String.prototype.ucSplit = function (delimeter, limit) {
var str = String(this);
var strUCLength = str.ucLength();
var ucChars = [];
if (delimeter == ''){
for (var i = 0; i < strUCLength; i++){
ucChars.push(str.ucCharAt(i));
}
ucChars = ucChars.slice(0, 0 + limit);
} else{
ucChars = str.split(delimeter, limit);
}
return ucChars;
};
}
More recent JavaScript engines have String.fromCodePoint.
const ideograph = String.fromCodePoint( 0x20001 ); // outside the BMP
Also a code-point iterator, which gets you the code-point length.
function countCodePoints( str )
{
const i = str[Symbol.iterator]();
let count = 0;
while( !i.next().done ) ++count;
return count;
}
console.log( ideograph.length ); // gives '2'
console.log( countCodePoints(ideograph) ); // '1'
Yes, you can. Although support to non-BMP characters directly in source documents is optional according to the ECMAScript standard, modern browsers let you use them. Naturally, the document encoding must be properly declared, and for most practical purposes you would need to use the UTF-8 encoding. Moreover, you need an editor that can handle UTF-8, and you need some input method(s); see e.g. my Full Unicode Input utility.
Using suitable tools and settings, you can write var foo = '𠀁'.
The non-BMP characters will be internally represented as surrogate pairs, so each non-BMP character counts as 2 in the string length.
Using for (c of this) instruction, one can make various computations on a string that contains non-BMP characters. For instance, to compute the string length, and to get the nth character of the string:
String.prototype.magicLength = function()
{
var c, k;
k = 0;
for (c of this) // iterate each char of this
{
k++;
}
return k;
}
String.prototype.magicCharAt = function(n)
{
var c, k;
k = 0;
for (c of this) // iterate each char of this
{
if (k == n) return c + "";
k++;
}
return "";
}
This old topic has now a simple solution in ES6:
Split characters into an array
simple version
[..."πŸ˜΄πŸ˜„πŸ˜ƒβ›”πŸŽ πŸš“πŸš‡"] // ["😴", "πŸ˜„", "πŸ˜ƒ", "β›”", "🎠", "πŸš“", "πŸš‡"]
Then having each one separated you can handle them easily for most common cases.
Credit: DownGoat
Full solution
To overcome special emojis as the one in the comment, one can search for the connection charecter (char code 8205 in UTF-16) and make some modifications. Here is how:
let myStr = "πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘§πŸ˜ƒπŒ†"
let arr = [...myStr]
for (i = arr.length-1; i--; i>= 0) {
if (arr[i].charCodeAt(0) == 8205) { // special combination character
arr[i-1] += arr[i] + arr[i+1]; // combine them back to a single emoji
arr.splice(i, 2)
}
}
console.log(arr.length) //3
Haven't found a case where this doesn't work. Comment if you do.
To conclude
it seems that JS uses the 8205 char code to represent UCS-2 characters as a UTF-16 combinations.

Categories

Resources