How to convert a String to Bytearray - javascript
How can I convert a string in bytearray using JavaScript. Output should be equivalent of the below C# code.
UnicodeEncoding encoding = new UnicodeEncoding();
byte[] bytes = encoding.GetBytes(AnyString);
As UnicodeEncoding is by default of UTF-16 with Little-Endianness.
Edit: I have a requirement to match the bytearray generated client side with the one generated at server side using the above C# code.
Update 2018 - The easiest way in 2018 should be TextEncoder
let utf8Encode = new TextEncoder();
utf8Encode.encode("abc");
// Uint8Array [ 97, 98, 99 ]
Caveats - The returned element is a Uint8Array, and not all browsers support it.
If you are looking for a solution that works in node.js, you can use this:
var myBuffer = [];
var str = 'Stack Overflow';
var buffer = new Buffer(str, 'utf16le');
for (var i = 0; i < buffer.length; i++) {
myBuffer.push(buffer[i]);
}
console.log(myBuffer);
In C# running this
UnicodeEncoding encoding = new UnicodeEncoding();
byte[] bytes = encoding.GetBytes("Hello");
Will create an array with
72,0,101,0,108,0,108,0,111,0
For a character which the code is greater than 255 it will look like this
If you want a very similar behavior in JavaScript you can do this (v2 is a bit more robust solution, while the original version will only work for 0x00 ~ 0xff)
var str = "Hello竜";
var bytes = []; // char codes
var bytesv2 = []; // char codes
for (var i = 0; i < str.length; ++i) {
var code = str.charCodeAt(i);
bytes = bytes.concat([code]);
bytesv2 = bytesv2.concat([code & 0xff, code / 256 >>> 0]);
}
// 72, 101, 108, 108, 111, 31452
console.log('bytes', bytes.join(', '));
// 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 220, 122
console.log('bytesv2', bytesv2.join(', '));
I suppose C# and Java produce equal byte arrays. If you have non-ASCII characters, it's not enough to add an additional 0. My example contains a few special characters:
var str = "Hell ö € Ω 𝄞";
var bytes = [];
var charCode;
for (var i = 0; i < str.length; ++i)
{
charCode = str.charCodeAt(i);
bytes.push((charCode & 0xFF00) >> 8);
bytes.push(charCode & 0xFF);
}
alert(bytes.join(' '));
// 0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30
I don't know if C# places BOM (Byte Order Marks), but if using UTF-16, Java String.getBytes adds following bytes: 254 255.
String s = "Hell ö € Ω ";
// now add a character outside the BMP (Basic Multilingual Plane)
// we take the violin-symbol (U+1D11E) MUSICAL SYMBOL G CLEF
s += new String(Character.toChars(0x1D11E));
// surrogate codepoints are: d834, dd1e, so one could also write "\ud834\udd1e"
byte[] bytes = s.getBytes("UTF-16");
for (byte aByte : bytes) {
System.out.print((0xFF & aByte) + " ");
}
// 254 255 0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30
Edit:
Added a special character (U+1D11E) MUSICAL SYMBOL G CLEF (outside BPM, so taking not only 2 bytes in UTF-16, but 4.
Current JavaScript versions use "UCS-2" internally, so this symbol takes the space of 2 normal characters.
I'm not sure but when using charCodeAt it seems we get exactly the surrogate codepoints also used in UTF-16, so non-BPM characters are handled correctly.
This problem is absolutely non-trivial. It might depend on the used JavaScript versions and engines. So if you want reliable solutions, you should have a look at:
https://github.com/koichik/node-codepoint/
http://mathiasbynens.be/notes/javascript-escapes
Mozilla Developer Network: charCodeAt
BigEndian vs. LittleEndian
UTF-16 Byte Array
JavaScript encodes strings as UTF-16, just like C#'s UnicodeEncoding, so creating a byte array is relatively straightforward.
JavaScript's charCodeAt() returns a 16-bit code unit (aka a 2-byte integer between 0 and 65535). You can split it into distinct bytes using the following:
function strToUtf16Bytes(str) {
const bytes = [];
for (ii = 0; ii < str.length; ii++) {
const code = str.charCodeAt(ii); // x00-xFFFF
bytes.push(code & 255, code >> 8); // low, high
}
return bytes;
}
For example:
strToUtf16Bytes('🌵');
// [ 60, 216, 53, 223 ]
This works between C# and JavaScript because they both support UTF-16. However, if you want to get a UTF-8 byte array from JS, you must transcode the bytes.
UTF-8 Byte Array
The solution feels somewhat non-trivial, but I used the code below in production with great success (original source).
Also, for the interested reader, I published my unicode helpers that help me work with string lengths reported by other languages such as PHP.
/**
* Convert a string to a unicode byte array
* #param {string} str
* #return {Array} of bytes
*/
export function strToUtf8Bytes(str) {
const utf8 = [];
for (let ii = 0; ii < str.length; ii++) {
let charCode = str.charCodeAt(ii);
if (charCode < 0x80) utf8.push(charCode);
else if (charCode < 0x800) {
utf8.push(0xc0 | (charCode >> 6), 0x80 | (charCode & 0x3f));
} else if (charCode < 0xd800 || charCode >= 0xe000) {
utf8.push(0xe0 | (charCode >> 12), 0x80 | ((charCode >> 6) & 0x3f), 0x80 | (charCode & 0x3f));
} else {
ii++;
// Surrogate pair:
// UTF-16 encodes 0x10000-0x10FFFF by subtracting 0x10000 and
// splitting the 20 bits of 0x0-0xFFFFF into two halves
charCode = 0x10000 + (((charCode & 0x3ff) << 10) | (str.charCodeAt(ii) & 0x3ff));
utf8.push(
0xf0 | (charCode >> 18),
0x80 | ((charCode >> 12) & 0x3f),
0x80 | ((charCode >> 6) & 0x3f),
0x80 | (charCode & 0x3f),
);
}
}
return utf8;
}
Inspired by #hgoebl's answer. His code is for UTF-16 and I needed something for US-ASCII. So here's a more complete answer covering US-ASCII, UTF-16, and UTF-32.
/**#returns {Array} bytes of US-ASCII*/
function stringToAsciiByteArray(str)
{
var bytes = [];
for (var i = 0; i < str.length; ++i)
{
var charCode = str.charCodeAt(i);
if (charCode > 0xFF) // char > 1 byte since charCodeAt returns the UTF-16 value
{
throw new Error('Character ' + String.fromCharCode(charCode) + ' can\'t be represented by a US-ASCII byte.');
}
bytes.push(charCode);
}
return bytes;
}
/**#returns {Array} bytes of UTF-16 Big Endian without BOM*/
function stringToUtf16ByteArray(str)
{
var bytes = [];
//currently the function returns without BOM. Uncomment the next line to change that.
//bytes.push(254, 255); //Big Endian Byte Order Marks
for (var i = 0; i < str.length; ++i)
{
var charCode = str.charCodeAt(i);
//char > 2 bytes is impossible since charCodeAt can only return 2 bytes
bytes.push((charCode & 0xFF00) >>> 8); //high byte (might be 0)
bytes.push(charCode & 0xFF); //low byte
}
return bytes;
}
/**#returns {Array} bytes of UTF-32 Big Endian without BOM*/
function stringToUtf32ByteArray(str)
{
var bytes = [];
//currently the function returns without BOM. Uncomment the next line to change that.
//bytes.push(0, 0, 254, 255); //Big Endian Byte Order Marks
for (var i = 0; i < str.length; i+=2)
{
var charPoint = str.codePointAt(i);
//char > 4 bytes is impossible since codePointAt can only return 4 bytes
bytes.push((charPoint & 0xFF000000) >>> 24);
bytes.push((charPoint & 0xFF0000) >>> 16);
bytes.push((charPoint & 0xFF00) >>> 8);
bytes.push(charPoint & 0xFF);
}
return bytes;
}
UTF-8 is variable length and isn't included because I would have to write the encoding myself. UTF-8 and UTF-16 are variable length. UTF-8, UTF-16, and UTF-32 have a minimum number of bits as their name indicates. If a UTF-32 character has a code point of 65 then that means there are 3 leading 0s. But the same code for UTF-16 has only 1 leading 0. US-ASCII on the other hand is fixed width 8-bits which means it can be directly translated to bytes.
String.prototype.charCodeAt returns a maximum number of 2 bytes and matches UTF-16 exactly. However for UTF-32 String.prototype.codePointAt is needed which is part of the ECMAScript 6 (Harmony) proposal. Because charCodeAt returns 2 bytes which is more possible characters than US-ASCII can represent, the function stringToAsciiByteArray will throw in such cases instead of splitting the character in half and taking either or both bytes.
Note that this answer is non-trivial because character encoding is non-trivial. What kind of byte array you want depends on what character encoding you want those bytes to represent.
javascript has the option of internally using either UTF-16 or UCS-2 but since it has methods that act like it is UTF-16 I don't see why any browser would use UCS-2.
Also see: https://mathiasbynens.be/notes/javascript-encoding
Yes I know the question is 4 years old but I needed this answer for myself.
Since I cannot comment on the answer, I'd build on Jin Izzraeel's answer
var myBuffer = [];
var str = 'Stack Overflow';
var buffer = new Buffer(str, 'utf16le');
for (var i = 0; i < buffer.length; i++) {
myBuffer.push(buffer[i]);
}
console.log(myBuffer);
by saying that you could use this if you want to use a Node.js buffer in your browser.
https://github.com/feross/buffer
Therefore, Tom Stickel's objection is not valid, and the answer is indeed a valid answer.
String.prototype.encodeHex = function () {
return this.split('').map(e => e.charCodeAt())
};
String.prototype.decodeHex = function () {
return this.map(e => String.fromCharCode(e)).join('')
};
The best solution I've come up with at on the spot (though most likely crude) would be:
String.prototype.getBytes = function() {
var bytes = [];
for (var i = 0; i < this.length; i++) {
var charCode = this.charCodeAt(i);
var cLen = Math.ceil(Math.log(charCode)/Math.log(256));
for (var j = 0; j < cLen; j++) {
bytes.push((charCode << (j*8)) & 0xFF);
}
}
return bytes;
}
Though I notice this question has been here for over a year.
I know the question is almost 4 years old, but this is what worked smoothly with me:
String.prototype.encodeHex = function () {
var bytes = [];
for (var i = 0; i < this.length; ++i) {
bytes.push(this.charCodeAt(i));
}
return bytes;
};
Array.prototype.decodeHex = function () {
var str = [];
var hex = this.toString().split(',');
for (var i = 0; i < hex.length; i++) {
str.push(String.fromCharCode(hex[i]));
}
return str.toString().replace(/,/g, "");
};
var str = "Hello World!";
var bytes = str.encodeHex();
alert('The Hexa Code is: '+bytes+' The original string is: '+bytes.decodeHex());
or, if you want to work with strings only, and no Array, you can use:
String.prototype.encodeHex = function () {
var bytes = [];
for (var i = 0; i < this.length; ++i) {
bytes.push(this.charCodeAt(i));
}
return bytes.toString();
};
String.prototype.decodeHex = function () {
var str = [];
var hex = this.split(',');
for (var i = 0; i < hex.length; i++) {
str.push(String.fromCharCode(hex[i]));
}
return str.toString().replace(/,/g, "");
};
var str = "Hello World!";
var bytes = str.encodeHex();
alert('The Hexa Code is: '+bytes+' The original string is: '+bytes.decodeHex());
Here is the same function that #BrunoLM posted converted to a String prototype function:
String.prototype.getBytes = function () {
var bytes = [];
for (var i = 0; i < this.length; ++i) {
bytes.push(this.charCodeAt(i));
}
return bytes;
};
If you define the function as such, then you can call the .getBytes() method on any string:
var str = "Hello World!";
var bytes = str.getBytes();
You don't need underscore, just use built-in map:
var string = 'Hello World!';
document.write(string.split('').map(function(c) { return c.charCodeAt(); }));
Related
Converting UUID to bytes [duplicate]
How can I convert a string in bytearray using JavaScript. Output should be equivalent of the below C# code. UnicodeEncoding encoding = new UnicodeEncoding(); byte[] bytes = encoding.GetBytes(AnyString); As UnicodeEncoding is by default of UTF-16 with Little-Endianness. Edit: I have a requirement to match the bytearray generated client side with the one generated at server side using the above C# code.
Update 2018 - The easiest way in 2018 should be TextEncoder let utf8Encode = new TextEncoder(); utf8Encode.encode("abc"); // Uint8Array [ 97, 98, 99 ] Caveats - The returned element is a Uint8Array, and not all browsers support it.
If you are looking for a solution that works in node.js, you can use this: var myBuffer = []; var str = 'Stack Overflow'; var buffer = new Buffer(str, 'utf16le'); for (var i = 0; i < buffer.length; i++) { myBuffer.push(buffer[i]); } console.log(myBuffer);
In C# running this UnicodeEncoding encoding = new UnicodeEncoding(); byte[] bytes = encoding.GetBytes("Hello"); Will create an array with 72,0,101,0,108,0,108,0,111,0 For a character which the code is greater than 255 it will look like this If you want a very similar behavior in JavaScript you can do this (v2 is a bit more robust solution, while the original version will only work for 0x00 ~ 0xff) var str = "Hello竜"; var bytes = []; // char codes var bytesv2 = []; // char codes for (var i = 0; i < str.length; ++i) { var code = str.charCodeAt(i); bytes = bytes.concat([code]); bytesv2 = bytesv2.concat([code & 0xff, code / 256 >>> 0]); } // 72, 101, 108, 108, 111, 31452 console.log('bytes', bytes.join(', ')); // 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 220, 122 console.log('bytesv2', bytesv2.join(', '));
I suppose C# and Java produce equal byte arrays. If you have non-ASCII characters, it's not enough to add an additional 0. My example contains a few special characters: var str = "Hell ö € Ω 𝄞"; var bytes = []; var charCode; for (var i = 0; i < str.length; ++i) { charCode = str.charCodeAt(i); bytes.push((charCode & 0xFF00) >> 8); bytes.push(charCode & 0xFF); } alert(bytes.join(' ')); // 0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30 I don't know if C# places BOM (Byte Order Marks), but if using UTF-16, Java String.getBytes adds following bytes: 254 255. String s = "Hell ö € Ω "; // now add a character outside the BMP (Basic Multilingual Plane) // we take the violin-symbol (U+1D11E) MUSICAL SYMBOL G CLEF s += new String(Character.toChars(0x1D11E)); // surrogate codepoints are: d834, dd1e, so one could also write "\ud834\udd1e" byte[] bytes = s.getBytes("UTF-16"); for (byte aByte : bytes) { System.out.print((0xFF & aByte) + " "); } // 254 255 0 72 0 101 0 108 0 108 0 32 0 246 0 32 32 172 0 32 3 169 0 32 216 52 221 30 Edit: Added a special character (U+1D11E) MUSICAL SYMBOL G CLEF (outside BPM, so taking not only 2 bytes in UTF-16, but 4. Current JavaScript versions use "UCS-2" internally, so this symbol takes the space of 2 normal characters. I'm not sure but when using charCodeAt it seems we get exactly the surrogate codepoints also used in UTF-16, so non-BPM characters are handled correctly. This problem is absolutely non-trivial. It might depend on the used JavaScript versions and engines. So if you want reliable solutions, you should have a look at: https://github.com/koichik/node-codepoint/ http://mathiasbynens.be/notes/javascript-escapes Mozilla Developer Network: charCodeAt BigEndian vs. LittleEndian
UTF-16 Byte Array JavaScript encodes strings as UTF-16, just like C#'s UnicodeEncoding, so creating a byte array is relatively straightforward. JavaScript's charCodeAt() returns a 16-bit code unit (aka a 2-byte integer between 0 and 65535). You can split it into distinct bytes using the following: function strToUtf16Bytes(str) { const bytes = []; for (ii = 0; ii < str.length; ii++) { const code = str.charCodeAt(ii); // x00-xFFFF bytes.push(code & 255, code >> 8); // low, high } return bytes; } For example: strToUtf16Bytes('🌵'); // [ 60, 216, 53, 223 ] This works between C# and JavaScript because they both support UTF-16. However, if you want to get a UTF-8 byte array from JS, you must transcode the bytes. UTF-8 Byte Array The solution feels somewhat non-trivial, but I used the code below in production with great success (original source). Also, for the interested reader, I published my unicode helpers that help me work with string lengths reported by other languages such as PHP. /** * Convert a string to a unicode byte array * #param {string} str * #return {Array} of bytes */ export function strToUtf8Bytes(str) { const utf8 = []; for (let ii = 0; ii < str.length; ii++) { let charCode = str.charCodeAt(ii); if (charCode < 0x80) utf8.push(charCode); else if (charCode < 0x800) { utf8.push(0xc0 | (charCode >> 6), 0x80 | (charCode & 0x3f)); } else if (charCode < 0xd800 || charCode >= 0xe000) { utf8.push(0xe0 | (charCode >> 12), 0x80 | ((charCode >> 6) & 0x3f), 0x80 | (charCode & 0x3f)); } else { ii++; // Surrogate pair: // UTF-16 encodes 0x10000-0x10FFFF by subtracting 0x10000 and // splitting the 20 bits of 0x0-0xFFFFF into two halves charCode = 0x10000 + (((charCode & 0x3ff) << 10) | (str.charCodeAt(ii) & 0x3ff)); utf8.push( 0xf0 | (charCode >> 18), 0x80 | ((charCode >> 12) & 0x3f), 0x80 | ((charCode >> 6) & 0x3f), 0x80 | (charCode & 0x3f), ); } } return utf8; }
Inspired by #hgoebl's answer. His code is for UTF-16 and I needed something for US-ASCII. So here's a more complete answer covering US-ASCII, UTF-16, and UTF-32. /**#returns {Array} bytes of US-ASCII*/ function stringToAsciiByteArray(str) { var bytes = []; for (var i = 0; i < str.length; ++i) { var charCode = str.charCodeAt(i); if (charCode > 0xFF) // char > 1 byte since charCodeAt returns the UTF-16 value { throw new Error('Character ' + String.fromCharCode(charCode) + ' can\'t be represented by a US-ASCII byte.'); } bytes.push(charCode); } return bytes; } /**#returns {Array} bytes of UTF-16 Big Endian without BOM*/ function stringToUtf16ByteArray(str) { var bytes = []; //currently the function returns without BOM. Uncomment the next line to change that. //bytes.push(254, 255); //Big Endian Byte Order Marks for (var i = 0; i < str.length; ++i) { var charCode = str.charCodeAt(i); //char > 2 bytes is impossible since charCodeAt can only return 2 bytes bytes.push((charCode & 0xFF00) >>> 8); //high byte (might be 0) bytes.push(charCode & 0xFF); //low byte } return bytes; } /**#returns {Array} bytes of UTF-32 Big Endian without BOM*/ function stringToUtf32ByteArray(str) { var bytes = []; //currently the function returns without BOM. Uncomment the next line to change that. //bytes.push(0, 0, 254, 255); //Big Endian Byte Order Marks for (var i = 0; i < str.length; i+=2) { var charPoint = str.codePointAt(i); //char > 4 bytes is impossible since codePointAt can only return 4 bytes bytes.push((charPoint & 0xFF000000) >>> 24); bytes.push((charPoint & 0xFF0000) >>> 16); bytes.push((charPoint & 0xFF00) >>> 8); bytes.push(charPoint & 0xFF); } return bytes; } UTF-8 is variable length and isn't included because I would have to write the encoding myself. UTF-8 and UTF-16 are variable length. UTF-8, UTF-16, and UTF-32 have a minimum number of bits as their name indicates. If a UTF-32 character has a code point of 65 then that means there are 3 leading 0s. But the same code for UTF-16 has only 1 leading 0. US-ASCII on the other hand is fixed width 8-bits which means it can be directly translated to bytes. String.prototype.charCodeAt returns a maximum number of 2 bytes and matches UTF-16 exactly. However for UTF-32 String.prototype.codePointAt is needed which is part of the ECMAScript 6 (Harmony) proposal. Because charCodeAt returns 2 bytes which is more possible characters than US-ASCII can represent, the function stringToAsciiByteArray will throw in such cases instead of splitting the character in half and taking either or both bytes. Note that this answer is non-trivial because character encoding is non-trivial. What kind of byte array you want depends on what character encoding you want those bytes to represent. javascript has the option of internally using either UTF-16 or UCS-2 but since it has methods that act like it is UTF-16 I don't see why any browser would use UCS-2. Also see: https://mathiasbynens.be/notes/javascript-encoding Yes I know the question is 4 years old but I needed this answer for myself.
Since I cannot comment on the answer, I'd build on Jin Izzraeel's answer var myBuffer = []; var str = 'Stack Overflow'; var buffer = new Buffer(str, 'utf16le'); for (var i = 0; i < buffer.length; i++) { myBuffer.push(buffer[i]); } console.log(myBuffer); by saying that you could use this if you want to use a Node.js buffer in your browser. https://github.com/feross/buffer Therefore, Tom Stickel's objection is not valid, and the answer is indeed a valid answer.
String.prototype.encodeHex = function () { return this.split('').map(e => e.charCodeAt()) }; String.prototype.decodeHex = function () { return this.map(e => String.fromCharCode(e)).join('') };
The best solution I've come up with at on the spot (though most likely crude) would be: String.prototype.getBytes = function() { var bytes = []; for (var i = 0; i < this.length; i++) { var charCode = this.charCodeAt(i); var cLen = Math.ceil(Math.log(charCode)/Math.log(256)); for (var j = 0; j < cLen; j++) { bytes.push((charCode << (j*8)) & 0xFF); } } return bytes; } Though I notice this question has been here for over a year.
I know the question is almost 4 years old, but this is what worked smoothly with me: String.prototype.encodeHex = function () { var bytes = []; for (var i = 0; i < this.length; ++i) { bytes.push(this.charCodeAt(i)); } return bytes; }; Array.prototype.decodeHex = function () { var str = []; var hex = this.toString().split(','); for (var i = 0; i < hex.length; i++) { str.push(String.fromCharCode(hex[i])); } return str.toString().replace(/,/g, ""); }; var str = "Hello World!"; var bytes = str.encodeHex(); alert('The Hexa Code is: '+bytes+' The original string is: '+bytes.decodeHex()); or, if you want to work with strings only, and no Array, you can use: String.prototype.encodeHex = function () { var bytes = []; for (var i = 0; i < this.length; ++i) { bytes.push(this.charCodeAt(i)); } return bytes.toString(); }; String.prototype.decodeHex = function () { var str = []; var hex = this.split(','); for (var i = 0; i < hex.length; i++) { str.push(String.fromCharCode(hex[i])); } return str.toString().replace(/,/g, ""); }; var str = "Hello World!"; var bytes = str.encodeHex(); alert('The Hexa Code is: '+bytes+' The original string is: '+bytes.decodeHex());
Here is the same function that #BrunoLM posted converted to a String prototype function: String.prototype.getBytes = function () { var bytes = []; for (var i = 0; i < this.length; ++i) { bytes.push(this.charCodeAt(i)); } return bytes; }; If you define the function as such, then you can call the .getBytes() method on any string: var str = "Hello World!"; var bytes = str.getBytes();
You don't need underscore, just use built-in map: var string = 'Hello World!'; document.write(string.split('').map(function(c) { return c.charCodeAt(); }));
Equivalent of Java's getBytes in JavaScript for different encodings
I have a function in Java that I need to convert to JavaScript and that contains this line: byte[] bytes = ttText.getBytes(Charset.forName("Cp1250")); ttText is String. I need to do the same. I need to get the bytes of a string encoded in Cp1250 (windows-1250), modify those bytes and then convert it back to string. Is there a way how to do it in JavaScript? I discovered for example TextEncoder and TextDecoder but the support for different encodings than UTF-8 was dropped some time ago.
var cp1250 = '€ ‚ „…†‡ ‰Š‹ŚŤŽŹ ‘’“”•–— ™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙'; function encodeCP1250(text) { var buf = []; for (var i = 0; i < text.length; i++) { var code = cp1250.indexOf(text[i]); if (code >= 0) { code += 128; } else { code = text.charCodeAt(i); } buf.push(code > 255 ? 32 : code); } return buf; } function decodeCP1250(buf) { var text = ''; for (var i = 0; i < buf.length; i++) { var code = buf[i]; text += code > 127 ? cp1250[code - 128] : String.fromCharCode(code); } return text; } var buf = encodeCP1250('AÁÂĂÄ'); // [65, 193, 194, 195, 196] var text = decodeCP1250(buf); // 'AÁÂĂÄ' Upd: Chrome and Firefox have TextDecoder as experimental feature, but TextEncoder works only with UTF-8.
Try this. https://mths.be/windows-1250 This looks promising. It provides support for both encoding and decoding. All you need to do is add the library and use the methods. var encodedData = windows1250.encode(text);
Longitudinal redundancy check in Javascript
I'm working with a system that integrates a Point of Sell (POS) device, I use chrome serial to scan ports and be able to read credit card data. The problem I'm facing is that I need to concat the LRC from a string in this format: STX = '\002' (2 HEX) (Start of text) LLL = Length of data (doesn't include STX or ETX but command). Command C50 {C = A message from PC to POS, 50 the actual code that "prints" a message on POS} ETX = '\003' (3 HEX) (End of text) LRC = Longitudinal Redundancy Check A message example would be as follows: '\002014C50HELLO WORLD\003' Here we can see 002 as STX, 014 is the length from C50 to D, and 003 as ETX. I found some algorithms in C# like this one or this one and even this one in Java, I even saw this question that was removed from SO on Google's cache, which actually asks the same as I but had no examples or answers. I also made this Java algorithm: private int calculateLRC(String str) { int result = 0; for (int i = 0; i < str.length(); i++) { String char1 = str.substring(i, i + 1); char[] char2 = char1.toCharArray(); int number = char2[0]; result = result ^ number; } return result; } and tried passing it to Javascript (where I have poor knowledge) function calculateLRC2(str) { var result = 0; for (var i = 0; i < str.length; i++) { var char1 = str.substring(i, i + 1); //var char2[] = char1.join(''); var number = char1; result = result ^ number; } return result.toString(); } and after following the Wikipedia's pseudocode I tried doing this: function calculateLRC(str) { var buffer = convertStringToArrayBuffer(str); var lrc; for (var i = 0; i < str.length; i++) { lrc = (lrc + buffer[i]) & 0xFF; } lrc = ((lrc ^ 0xFF) + 1) & 0xFF; return lrc; } This is how I call the above method: var finalMessage = '\002014C50HELLO WORLD\003' var lrc = calculateLRC(finalMessage); console.log('lrc: ' + lrc); finalMessage = finalMessage.concat(lrc); console.log('finalMessage: ' + finalMessage); However after trying all these methods, I still can't send a message to POS correctly. I have 3 days now trying to fix this thing and can't do anything more unless I finish it. Is there anyone that knows another way to calculate LRC or what am I doing wrong here? I need it to be with Javascritpt since POS comunicates with PC through NodeJS. Oh btw the code from convertStringToArrayBuffer is on the chrome serial documentation which is this one: var writeSerial=function(str) { chrome.serial.send(connectionId, convertStringToArrayBuffer(str), onSend); } // Convert string to ArrayBuffer var convertStringToArrayBuffer=function(str) { var buf=new ArrayBuffer(str.length); var bufView=new Uint8Array(buf); for (var i=0; i<str.length; i++) { bufView[i]=str.charCodeAt(i); } return buf; } Edit After testing I came with this algorithm which returns a 'z' (lower case) with the following input: \002007C50HOLA\003. function calculateLRC (str) { var bytes = []; var lrc = 0; for (var i = 0; i < str.length; i++) { bytes.push(str.charCodeAt(i)); } for (var i = 0; i < str.length; i++) { lrc ^= bytes[i]; console.log('lrc: ' + lrc); //console.log('lrcString: ' + String.fromCharCode(lrc)); } console.log('bytes: ' + bytes); return String.fromCharCode(lrc); } However with some longer inputs and specialy when trying to read card data, LRC becomes sometimes a Control Character which in my case that I use them on my String, might be a problem. Is there a way to force LRC to avoid those characters? Or maybe I'm doing it wrong and that's why I'm having those characters as output.
I solved LRC issue by calculating it with the following method, after reading #Jack A.'s answer and modifying it to this one: function calculateLRC (str) { var bytes = []; var lrc = 0; for (var i = 0; i < str.length; i++) { bytes.push(str.charCodeAt(i)); } for (var i = 0; i < str.length; i++) { lrc ^= bytes[i]; } return String.fromCharCode(lrc); } Explanation of what it does: 1st: it converts the string received to it's ASCII equivalent (charCodeAt()). 2nd: it calculates LRC by doing a XOR operation between last calculated LRC (0 on 1st iteration) and string's ASCII for each char. 3rd: it converts from ASCII to it's equivalent chat (fromCharCode()) and returns this char to main function (or whatever function called it).
Your pseudocode-based algorithm is using addition. For the XOR version, try this: function calculateLRC(str) { var buffer = convertStringToArrayBuffer(str); var lrc = 0; for (var i = 0; i < str.length; i++) { lrc = (lrc ^ buffer[i]) & 0xFF; } return lrc; } I think your original attempt at the XOR version was failing because you needed to get the character code. The number variable still contained a string when you did result = result ^ number, so the results were probably not what you expected. This is a SWAG since I don't have Node.JS installed at the moment so I can't verify it will work. Another thing I would be concerned about is character encoding. JavaScript uses UTF-16 for text, so converting any non-ASCII characters to 8-bit bytes may give unexpected results.
how convert binary data in base64 from couchdb attachments
I try to get binary data from my couchdb server. But I can use them. The response contain a string that rapresent binary data, but if I try to code in base64 with the function btoa, the function give me this error: Uncaught InvalidCharacterError: 'btoa' failed: The string to be encoded contains characters outside of the Latin1 range. I know that I can get data directly coded in base64, but I don't want. $.ajax({ url: "http://localhost:5984/testdb/7d9de7a8f2cab6c0b3409d4495000e3f/img", headers: { Authorization: 'Basic ' + btoa("name:password"), }, success: function(data){ /*console.log(JSON.parse(jsonData)); console.log(imageData);*/ document.getElementById("immagine").src = "Data:image/jpg;base64," + btoa(data); console.log(data); } }); any idea?
Start with the knowledge that each char in a Base64 String var chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split(''); represents a Number, specifically from 0 to 63. Next, consider that this range of numbers is all the numbers you can write with 6 bits and that we normally think about binary data in bytes which are 8 bits long. So now we can conclude, the transformation we which to achieve is 8 bit integers to 6 bit integers, which looks a bit like this xxxxxx xxyyyy yyyyzz zzzzzz where each letter describes which byte the bit is in, and the spaces describe the breaks between the 6 bit integers. Once we have the 6 bit number, we can simply transform to the char and finally add = signs if we need to indicate the number of bytes was not a multiple of 3 (and they're not simply 0) So how do we do this? var arr8 = 'foobar'; // assuming binary string var i, // to iterate s1, s2, s3, s4, // sixes e1, e2, e3, // eights b64 = ''; // result // some code I prepared earlier for (i = 0; i < arr8.length; i += 3) { e1 = arr8[i ]; e1 = e1 ? e1.charCodeAt(0) & 255 : 0; e2 = arr8[i + 1]; e2 = e2 ? e2.charCodeAt(0) & 255 : 0; e3 = arr8[i + 2]; e3 = e3 ? e3.charCodeAt(0) & 255 : 0; // wwwwwwxx xxxxyyyy yyzzzzzz s1 = e1 >>> 2 ; s2 = ((e1 & 3) << 4) + (e2 >>> 4); s3 = ((e2 & 15) << 2) + (e3 >>> 6); s4 = e3 & 63 ; b64 += chars[s1] + chars[s2]; if (arr8[i + 2] !== undefined) b64 += chars[s3] + chars[s4]; else if (arr8[i + 1] !== undefined) b64 += chars[s3] + '='; else b64 += '=='; } // and the result b64; // "Zm9vYmFy"
Get the Value of a byte with Javascript
I'm not sure if the title makes any sense, but here is what I need. This is my byte: ���������ˇ�����0F*E��ù� � I have already managed to get the values of this byte with this php snippet: <?php $var = "���������ˇ�����0F*E��ù�"; for($i = 0; $i < strlen($var); $i++) { echo ord($var[$i])."<br/>"; } ?> The result was: 0 0 0 0 0 0 0 0 2 2 0 255 0 0 0 0 0 2 48 70 1 42 69 0 0 1 157 0 But now I need to do the exact same thing without php, but in Java Script. Any help would be appreciated
If you're wanting to get the numeric value of each character in a string in JavaScript, that can be done like so: var someString = "blarg"; for(var i=0;i<someString.length;i++) { var char = someString.charCodeAt(i); } String.charCodeAt(index) returns the Unicode code-point value of the specified character in the string. It does not behave like PHP or C where it returns the numeric value of a fixed 8-bit encoding (i.e. ASCII). Assuming your string is a human-readable string (as opposed to raw binary data) then using charCodeAt is perfectly fine. If you're working with raw binary data then don't use a JavaScript string. If your strings contain characters that have Unicode code-points below 128 then charCodeAt behaves the same as ord in PHP or C's char type, however the example you've provided contains non-ASCII characters, so Unicode's (sometimes complicated) rules will come into play. See the documentation on charCodeAt here: https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/charCodeAt
The PHP String is computed as 8-bit (bytes 0..255) while JavaScript uses 16-bit unicode characters (0..65535). Depending on your string you may either split it into (16bit) char codes or into the bytes. If you know your String contains only 8-bit chars you may ignore the "hiByte" (see below) to get the same results as in PHP. function toByteVersionA(s) { var len = s.length; // char codes var charCodes = new Array(); for(var i=0; i<len; i++) { charCodes.push(s.charCodeAt(i).toString()); } var charCodesString = charCodes.join(" "); return charCodesString; } function toByteVersionB(s) { var len = s.length; var bytes = new Array(); for(var i=0; i<len; i++) { var charCode = s.charCodeAt(i); var loByte = charCode & 255; var hiByte = charCode >> 8; bytes.push(loByte.toString()); bytes.push(hiByte.toString()); } var bytesString = bytes.join(" "); return bytesString; } function toByteVersionC(s) { var len = s.length; var bytes = new Array(); for(var i=0; i<len; i++) { var charCode = s.charCodeAt(i); var loByte = charCode & 255; bytes.push(loByte.toString()); } var bytesString = bytes.join(" "); return bytesString; } var myString = "abc"; // whatever your String is var myBytes = toByteVersionA(myString); // whatever version you want