Javascript converting text from greek to UTF-8 - javascript

I am attempting to help my teacher convert a Greek textbook into an online application. Part of this includes taking a Shapefile ( draws polygons on maps, along with descriptions of the polygons. ) and mapping everything on this map. I cannot directly access the part of the shapefile file that has the data I need to convert due to it being in hexadecimal.
Anyways, here is the code that I am printing to my console.
console.log((arr[1][i]['PERIOD']);
"arr" is the data array that contains all of the properties that I want to convert from Greek into UTF-8. I am only printing "PERIOD", rather than the 12 other propierties that are associated with the array.
When I run my page, the console returns several variations of text(as there exist several periods.) Here is an example of the text it returns.
ÎÏÏαÏκή, ÎλαÏική, ÎλληνιÏÏική
ΡÏμαÏκή
ÎθÏμανική
Î¥ÏÏεÏοβÏζανÏινή
Believe it or not, but this is not Greek text. So I snooped around and found this function to convert to utf-8:
function encode_utf8( s ){
return unescape(encodeURI( s ));
}
When I add this function to my console.log, this is what I get:
áÃÂüñÃÂúî
ÃÂøÃÂüñýùúî
ÃÂ¥ÃÂÃÂõÃÂÿòÃÂöñýÃÂùýî
ÃÂøÃÂüñýùúî
I am not 100% positive but I think that the text I am trying to convert is currently in ISO-8859-7.
Any help with this would be amazing.
Thank you.

You quite easily can build a map of the bytes of one char set to another (although it can get tedious)
Assuming ISO 8859-7 which is only 256 bytes long so not too difficult,
function genCharMap() { // ISO 8859-7 to Unicode
var map = [], i, j, str;
map.length = 256;
map[0] = 0; // fill in 0
str = '\u2018\u2019\u00a3\u20ac\u20af\u00a6\u00a7\u00a8\u00a9\u037a\u00ab\u00ac\u00ad\u00ae\u2015\u00b0\u00b1\u00b2\u00b3\u0384\u0385\u0386\u00b7\u0388\u0389\u038a\u00bb\u038c\u00bd\u038e';
for (i = 0; i < str.length; ++i) // fill in 0xA1 to 0xBE
map[0xA1 + i] = str.charCodeAt(i);
for (i = 0; i < 256; ++i) // fill in blanks
if (i in map) j = map[i] - i;
else map[i] = j + i;
return map;
}
Now you can apply this transformation to your bytes
var byteArr = [0xC1, 0xE2, 0xE3, 0xE4], // Αβγδ
str_out = '',
i,
map = genCharMap();
for (i = 0; i < byteArr.length; ++i) {
str_out += String.fromCharCode(
map[byteArr[i]]
);
}
str_out; // "Αβγδ"
If you're re-writing this code for a charset with "combining chars" it may be safer to swap the str I used in genCharMap for an Array of numbers instead.

Related

How can I compare "M" and "M" (in UTF) using Javascript?

I have a situation where I have to search a grid if it contains a certain substring. I have a search bar where the user can type the string. The problem is that the grid contains mix of Japanese text and Unicode characters,
for example : MAGシンチ注 333MBq .
How can I compare for content equality the letter 'M' that I type from the keyboard and the letter "M" as in the example above? I am trying to do this using plain Javascript and not Jquery or other library. And I have to do this in Internet Explorer.
Thanks,
As mentioned in an insightful comment from #Rhymoid on the question, modern JavaScript (ES2015) includes support for normalization of Unicode. One mode of normalization is to map "compatible" letterforms from higher code pages down to their most basic representatives in lower code pages (to summarize, it's kind-of involved). The .normalize("NFKD") method will map the "M" from the Japanese code page down to the Latin equivalent. Thus
"MAGシンチ注 333MBq".normalize("NFKD")
will give
"MAGシンチ注 333MBq"
As of late 2016, .normalize() isn't supported by IE.
At a lower level, ES2015 also has .codePointAt() (mentioned in another good answer), which is like the older .charCodeAt() described below but which also understands UTF-16 pairs. However, .codePointAt() is (again, late 2016) not supported by Safari.
below is original answer for older browsers
You can use the .charCodeAt() method to examine the UTF-16 character codes in the string.
"M".charCodeAt(0)
is 77, while
"M".charCodeAt(0)
is 65325.
This approach is complicated by the fact that for some Unicode characters, the UTF-16 representation involves two separate character positions in the JavaScript string. The language does not provide native support for dealing with that, so you have to do it yourself. A character code between 55926 and 57343 (D800 and DFFF hex) indicates the start of a two-character pair. The UTF-16 Wikipedia page has more information, and there are various other sources.
Building a dictionary should work in any browser, find the charCodes at the start of ranges you want to transform then move the characters in your favourite way, for example
function shift65248(str) {
var dict = {}, characters = [],
character, i;
for (i = 0; i < 10; ++i) { // 0 - 9
character = String.fromCharCode(65296 + i);
dict[character] = String.fromCharCode(48 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // A - Z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(65 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // a - z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(97 + i);
characters.push(character);
}
return str.replace(
new RegExp(characters.join('|'), 'g'),
function (m) {return dict[m];}
);
}
shift65248('MAGシンチ注 333MBq'); // "MAGシンチ注 333MBq"
I tried just moving the whole range 65248..65375 onto 0..127 but it conflicted with the other characters :(
I am assuming that you have access to those strings, by reading the DOM for some other way.
If so, codePointAt will be your friend.
console.log("Test of values");
console.log("M".codePointAt(0));
console.log("M".codePointAt(0));
console.log("Determining end of string");
console.log("M".codePointAt(10));
var str = "MAGシンチ注 333MBq .";
var idx = 0;
do {
point = str.codePointAt(idx);
idx++;
console.log(point);
} while(point !== undefined);
You could try building your own dictionary and compare function as follows:
var compareDB = {
'm' : ['M'],
'b' : ['B']
};
function doCompare(inputChar, searchText){
inputCharLower = inputChar.toLowerCase();
searchTextLower = searchText.toLowerCase();
if(searchTextLower.indexOf(inputChar) > -1)
return true;
if(compareDB[inputCharLower] !== undefined)
{
for(i=0; i<compareDB[inputCharLower].length; i++){
if(searchTextLower.indexOf(compareDB[inputCharLower][i].toLowerCase()) > -1)
return true;
}
}
return false;
}
console.log("searching with m");
console.log(doCompare('m', "searching text with M"));
console.log("searching with m");
console.log(doCompare('m', "searching text with B"));
console.log("searching with B");
console.log(doCompare('B', "searching text with B"));

Conversion from Byte Array to hex format using bitcoinjs-min.js

I'm trying to generate the public key from the following x and y of object Q in the browser. The problem is in order to use this public key for verifying a JWT I have to get the hexadecimal format of the key. I'm using keypair from the src="bitcoinjs.min.js" which does not allow me to retrieve the hexadecimal form of public key.
Is there any library or function to convert it into hexadecimal form?
// Taking reference from http://procbits.com/2013/08/27/generating-a-bitcoin-address-with-javascript
var pubX = hdnode.keyPair.Q.x.toByteArrayUnsigned();
var pubY = hdnode.keyPair.Q.y.toByteArrayUnsigned();
var publicKeyBytes = pubX.concat(pubY);
publicKeyBytes.unshift(0x04);
meanwhile I tried
<script src="http://peterolson.github.com/BigInteger.js/BigInteger.min.js"></script>
var publicKeyInt = BigInt.fromByteArrayUnsigned(publicKeyBytes);
but it's not working
Thanks in Advance
Ok so I'm going to expand on my comment
Assume: key is an array or an iterable of bytes
function getHexArray(key) {
function num2hex(num) {
return num > 9 ? num + 55 : num + 48;
}
var hex_key = [];
var lower, upper;
for (var i = 0; i < key.length; i++) {
lower = key[i] & 0x0f;
upper = key[i] >> 4;
return String.fromCharCode(num2hex(upper)) +
String.fromCharCode(num2hex(lower));
}
return hex_key;
}
Note that if you want a long string of hex, you probably want to reverse the order of lower and upper (this is for writing as an array of hex bytes)
This function allows you to put in an array of bytes and will output an array of 2-char strings representing the hex value of the bytes.
WORKING:
below is the piece of working code which is taking a byte array and gives out a hexadecimal string.
function toHexString(bytes) {
return bytes.map(function(byte) {
return (byte & 0xFF).toString(16)
}).join('')
}
Thanks #derekdreery for your help :)

How do you divide a string of random letters into pairs of substrings and replace with new characters continuously in Javascript?

So I'm working on this project that involves translating a string of text such as "als;kdfja;lsjkdf" into regular charaters like "the big dog" by parsing for certain pairs of letters that translate. (i.e: "fj" = "D")
The catch is I cant simply use the .replace() function in javascript, because there are many occurences where it's given the text "fjkl", and needs to find "jk" and logically interprets the collision of "fj" and "kl" to say that it's found it. This wont work for me, because for me, it didnt find it, as i am only trying to look at found pairs within 2 characters at a time. (i.e: "fjkl" could only yeild "fj" and "kl".)
(In the end I intend to utilize just the 8 characters "asdfjkl;" and set pairs of characters to actual letters. (in this subsitution method, fyi, "fj" OR "jf" would actually be "_"(space). )
in trying to figure out this task in javascript, (I dont know if another language might handle it more efficiently,) I tried utilizing the "split" function in the following way. (Disclaimer, I'm not sure if this is formatted 100% perfectly)
<textarea id="textbox"></textarea>
<script>
var text = document.getElementById("textbox").value; //getting string from the textarea
var pairs = text.split(/(..)/).filter(String); //spliting string into pairs
if(pairs == "fj"){replace(pairs, " ")} //some sort of subsitution
</script>
Additionally, if possible, i would like the replaced characters to be fed directly into the textarea continuosly as the user types, so the translation happens almost simutaneously. (I'm assuming this will use some sort of setInterval() function?)
If any tips can be given on the correct formatting of which tools i should use in javascript, that would be very outstanding; Thanks in advance.
if your interested, here is full list of subsitutions im making in the end of this project:
syntax:(X OR Y == result)
AJ JA = F
AK KA = V
AL LA = B
A; ;A = Y
SJ JS = N
SK KS = M
SL LS = S
S; ;S = P
DJ JD = A
DK DK = U
DL LD = D
D; ;D = G
FJ JF = _
FK KF = I
FL LF = T
F; ;F = K
AS SA = C
SD DS = L
DF FD = E
JK KJ = O
KL LK = R
L; ;L = Z
AD DA = -
SF FS = ,
AF FA = .
JL LJ = !
K; ;K = :
J; ;J = ?
-Daniel Rehman
I have prepared a code for your requirement. You can bind a function on keydown to allow continuous changes as you type in the textarea.
I am using replacePair method to replace a pair of character by its equivalent uppercase representation. You can inject your own custom logic here.
var tb = document.getElementById('tb');
var processedLength = 0;
var pairEntered=false;
tb.onkeydown = function (e) {
pairEntered=!pairEntered;
if (pairEntered) {
var nextTwoChars = this.value.substr(this.value.length - 2, 2);
var prevPart=this.value.substr(0,this.value.length-2);
var finalText=prevPart+ replacePair(nextTwoChars);
this.value=finalText;
processedLength+=2;
}
}
function replacePair(str){
return str.toUpperCase();
}
jsfiddle:http://jsfiddle.net/218fq7t2/
updated fiddle as per your replacement logic: http://jsfiddle.net/218fq7t2/3/
If you can be assured that certain pairs always translate to the same character, then perhaps a dictionary object can help.
var dict = { as:"A", kl:"B", al:"C", fj:"D" ... };
And, if your 'decryption' algorithm is 'lazy' (evaluates the first pair it encounters), then you can just travel through the input string.
var outputString = "", c, cl;
for (c = 1, cl = inputString.length; c < cl; c += 2) {
outputString += dict[inputString[c-1] + inputString[c]] || "";
}
If your replacement algorithm is not any more complicated than simply looking up which letter the pair represents, then this should do alright for you. No real logic necessary.
Couldn't you do it as follows:
var text = document.getElementById("textbox").value;
for (i = 0; i <= text.length; i++) {
if (text[i] == "j") {
if (text[i+1] == "f") {
pair = "jf";
text = text.replace(pair, "_");
}
}
What this would do is it would always, when checking any letter, also check the letter after it during the same step in the procedure. When it finds both letter i and letter i+1 matching up with a pair you are looking for, then the letters will be replaced by a space (or whatever you want), meaning that when the for-loop reaches the next run after a pair was found, the size of the text string will have been reduced by one. Thus, when it increments i, it will automatically skip the letter that made up the second component of the found pair. Thus, "jfkl" will be identified as two different pairs and your algorithm will not be confused.
of course, you would also have to work in the other pairs/codewords into the for loop so that they are all checked in some way
I had hoped my previous answer was enough to get you started. I was merely providing an algorithm that you could then use to your liking (wrap it in a function and add your own event listeners, etc).
Here is the solution to your problem. I did not write the entire dictionary. You will need to complete that.
var dictionary = { "aj":"F", "ja":"F", "ak":"V", "ka":"V", "al":"B", "la":"B", "a;":"Y", ";a":"Y" }
var input, output;
function init() {
input = document.getElementById("input");
output = document.getElementById("output");
input.addEventListener("keyup", decrypt, false);
}
function decrypt () {
if (!input || !output) {
return;
}
var i = input.value, o = "", c, cl;
for (c = 1, cl = i.length; c < cl; c += 2) {
o += dictionary[ i[c-1] + i[c] ] || "";
}
while (output.hasChildNodes()) {
output.removeChild(output.firstChild);
}
output.appendChild(document.createTextNode(o));
}
window.addEventListener("load", init, false);
<textarea id="input"></textarea>
<div id="output"></div>

Retrieving binary data in Javascript (Ajax)

Im trying to get this remote binary file to read the bytes, which (of course) are supossed to come in the range 0..255. Since the response is given as a string, I need to use charCodeAt to get the numeric values for every character. I have come across the problem that charCodeAt returns the value in UTF8 (if im not mistaken), so for example the ASCII value 139 gets converted to 8249. This messes up my whole application cause I need to get those value as they are sent from the server.
The immediate solution is to create a big switch that, for every given UTF8 code will return the corresponding ASCII. But i was wondering if there is a more elegant and simpler solution. Thanks in advance.
The following code has been extracted from an answer to this StackOverflow question and should help you work around your issue.
function stringToBytesFaster ( str ) {
var ch, st, re = [], j=0;
for (var i = 0; i < str.length; i++ ) {
ch = str.charCodeAt(i);
if(ch < 127)
{
re[j++] = ch & 0xFF;
}
else
{
st = []; // clear stack
do {
st.push( ch & 0xFF ); // push byte to stack
ch = ch >> 8; // shift value down by 1 byte
}
while ( ch );
// add stack contents to result
// done because chars have "wrong" endianness
st = st.reverse();
for(var k=0;k<st.length; ++k)
re[j++] = st[k];
}
}
// return an array of bytes
return re;
}
var str = "\x8b\x00\x01\x41A\u1242B\u4123C";
alert(stringToBytesFaster(str)); // 139,0,1,65,65,18,66,66,65,35,67
I would recommend encoding the binary data is some character-encoding independent format like base64

converting numbers into alphabets

I want to convert numbers into alpha characters using JavaScript. For example, 01=n, 02=i 03=n, 04=a, etc.
When someone enters the numbers:01020304 in the form he will get the response: nina. Whatever the user enters gets replaced with the equivalent characters including spaces.
Update
Thank you all for quick response. I have found this code in one site. It converts alpha characters into numbers, but code for converting numbers into alpha characters isn't working. Here is the code for converting alpha characters into numbers:
var i,j;
var getc;
var len;
var num, alpha;
num=new Array("01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17",
"18","19","20","21","22","23","24","25","26","00","##","$$");
alpha=new Array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","
v","w","x","y","z"," ",".",",");
function encode() {
len=document.f1.ta1.value.length;
document.f1.ta2.value="";
for(i=0;i<len;i++) {
getc=document.f1.ta1.value.charAt(i);
getc=getc.toLowerCase();
for(j=0;j<alpha.length;j++) {
if(alpha[j]==getc) {
document.f1.ta2.value+=num[j];
}
}
}
}
Can anyone show me how to convert this to do the opposite character conversion?
I agree with Skrilldrick, you should learn how to do this yourself, but I couldn't help myself: http://jsfiddle.net/dQkxw/
HTML
<html>
<body>
<input type="text" id="code">
<button onclick="decode($('#code').val())">
Decode
</button>
</body>
</html>
JavaScript
window.decode = function(numbers) {
if (numbers.length % 2 != 0)
{
alert("invalid code!");
return;
}
var result = "";
for (var i = 0; i < numbers.length; i+=2) {
var number = Number(numbers.substring(i, i+2));
if (number < 1 || number > 26)
{
alert("invalid number: "+number);
return;
}
result += String.fromCharCode(96+number);
}
alert(result);
}
A good way to do this easily, and so it is a scalable solution would be to have a multi dimensional array that maps each char to it's corresponding char. You can have multiple dimensions for each conversion and pick between them.
var myCharArray=new Array(4)
for (i=0; i < 4; i++)
myCharArray[i]=new Array(2)
myCharArray[0][0]="a"
myCharArray[0][1]="1"
myCharArray[1][0]="b"
myCharArray[1][1]="2"
myCharArray[2][0]="c"
myCharArray[2][1]="3"
myCharArray[3][0]="d"
myCharArray[3][1]="4"
Then, upon conversion, loop every single character in your string to be encoded, and search for it in the array. If it is found, switch it with the encoded value. This should be reasonably easy to do.
The method you described seems to be a simple derivative off a Caesar cipher. Also remember because the script is client side, it will be incredible easy to decode, so make sure it's not for anything important!

Categories

Resources