how to render 32bit unicode characters in google v8 (and nodejs) - javascript

does anyone have an idea how to render unicode 'astral plane' characters (whose CIDs are beyond 0xffff) in google v8, the javascript vm that drives both google chrome and nodejs?
funnily enough, when i give google chrome (it identifies as 11.0.696.71, running on ubuntu 10.4) an html page like this:
<script>document.write( "helo" )
document.write( "𡥂 ⿸𠂇子" );
</script>
it will correctly render the 'wide' character 𡥂 alongside with the 'narrow' ones, but when i try the equivalent in nodejs (using console.log()) i get a single � (0xfffd, REPLACEMENT CHARACTER) for the 'wide' character instead.
i have also been told that for whatever non-understandable reason google have decided to implement characters using a 16bit-wide datatype. while i find that stupid, the surrogate codepoints have been designed precisely to enable the 'channeling' of 'astral codepoints' through 16bit-challenged pathways. and somehow the v8 running inside of chrome 11.0.696.71 seems to use this bit of unicode-foo or other magic to do its work (i seem to remember years ago i always got boxes instead even on static pages).
ah yes, node --version reports v0.4.10, gotta figure out how to obtain a v8 version number from that.
update i did the following in coffee-script:
a = String.fromCharCode( 0xd801 )
b = String.fromCharCode( 0xdc00 )
c = a + b
console.log a
console.log b
console.log c
console.log String.fromCharCode( 0xd835, 0xdc9c )
but that only gives me
���
���
������
������
the thinking behind this is that since that braindead part of the javascript specification that deals with unicode appears to mandate? / not downright forbid? / allows? the use of surrogate pairs, then maybe my source file encoding (utf-8) might be part of the problem. after all, there are two ways to encode 32bit codepoints in utf-8: one is two write out the utf-8 octets needed for the first surrogate, then those for the second; the other way (which is the preferred way, as per utf-8 spec) is to calculate the resulting codepoint and write out the octets needed for that codepoint. so here i completely exclude the question of source file encoding by dealing only with numbers. the above code does work with document.write() in chrome, giving 𐐀𝒜, so i know i got the numbers right.
sigh.
EDIT i did some experiments and found out that when i do
var f = function( text ) {
document.write( '<h1>', text, '</h1>' );
document.write( '<div>', text.length, '</div>' );
document.write( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' );
document.write( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' );
console.log( '<h1>', text, '</h1>' );
console.log( '<div>', text.length, '</div>' );
console.log( '<div>0x', text.charCodeAt(0).toString( 16 ), '</div>' );
console.log( '<div>0x', text.charCodeAt(1).toString( 16 ), '</div>' ); };
f( '𩄎' );
f( String.fromCharCode( 0xd864, 0xdd0e ) );
i do get correct results in google chrome---both inside the browser window and on the console:
𩄎
2
0xd864
0xdd0e
𩄎
2
0xd864
0xdd0e
however, this is what i get when using nodejs' console.log:
<h1> � </h1>
<div> 1 </div>
<div>0x fffd </div>
<div>0x NaN </div>
<h1> �����</h1>
<div> 2 </div>
<div>0x d864 </div>
<div>0x dd0e </div>
this seems to indicate that both parsing utf-8 with CIDs beyond 0xffff and outputting those characters to the console is broken. python 3.1, by the way, does treat the character as a surrogate pair and can print the charactr to the console.
NOTE i've cross-posted this question to the v8-users mailing list.

This recent presentation covers all sorts of issues with Unicode in popular languages, and isn't kind to Javascript: The Good, the Bad, & the (mostly) Ugly
He covers the issue with two-byte representation of Unicode in Javascript:
The UTF‐16 née UCS‐2 Curse
Like several other languages, Javascript
suffers from The UTF‐16 Curse. Except that Javascript has an even
worse form of it, The UCS‐2 Curse. Things like charCodeAt and
fromCharCode only ever deal with 16‐bit quantities, not with real,
21‐bit Unicode code points. Therefore, if you want to print out
something like 𝒜, U+1D49C, MATHEMATICAL SCRIPT CAPITAL A, you have to
specify not one character but two “char units”: "\uD835\uDC9C". 😱
// ERROR!!
document.write(String.fromCharCode(0x1D49C));
// needed bogosity
document.write(String.fromCharCode(0xD835,0xDC9C));

I think it's a console.log issue. Since console.log is only for debugging do you have the same issues when you output from node via http to a browser?

Related

Javascript string.length does not equal Python len()

Imagine the following text entered in an HTML textarea:
123456
7
If one calculates the length of this text via javascript, i.e. string.length, that comes out to 10.
Now if that input's length is measured in python, i.e. via len(string), it is 13.
It does not look 13 to the human eye, but if one runs print repr(string) in python, we get 123456\r\n\r\n\r\n7. That is 13 characters, not 10. For reference, this test was carried out in Ubuntu OS.
Is there any way for python to report the string length via a mechanism that imitates javascript's string.length's result? I.e. in simpler terms, how do I get 10 in python?
I understand I can manually iterate and collapse \r\n into a single character, but I wonder if there is a more robust - even inbuilt - way to do it? In any case, an illustrative example would be great!
You can make use of Regular Expressions which is much more elegant than iterating. Replacing the characters \n and \r by '' does the trick.
Use the re module of python.
import re
x = '123456\r\n\r\n\r\n7'
y = re.sub(r'\r\n','\n',x)
print(len(y)) #Answer will be 10
For further reference, check out the python docs

Unexpected behaviour of String.fromCodePoint / String#codePointAt (Firefox/ES6)

Since version 29 of Firefox, Mozilla provides the String.fromCodePoint and String#codePointAt methods and also published polyfills on the respective MDN pages.
So it happens that I am currently trying this out and it seems that I am missing something important, as splitting the string "ä☺𠜎" into codepoints and reassembling it from these returns an, at least for me, unexpected result.
I've built a test case: http://jsfiddle.net/dcodeIO/YhwP7/
var str = "ä☺𠜎";
...split it, reassemble it...
Am I missing something?
This is not a problem of .codePointAt, but more of the char encoding of the character 𠜎. 𠜎 has a javascript string length of 2.
Why?
Because Javascript Strings are encoded using 2-byte UTF-16. 𠜎 ( charcode: 132878 ) is greater than 2-byte UTF-16 ( 0-65535 ). This means it needs to be encoded using 4-byte UTF-16. Its UTF-16 representation is 0xD841 0xDF0E consuming two characters in the string.
When using .charAt() you will see the correct values:
var string = "𠜎";
console.log( string.charAt(0), string.charAt(1) ); // logs 55361 57102 (0xD841 0xDF0E)
Why doesn't it display 228, 9786, 55361, 57102?
Thats because .codePointAt() converts 4-byte UTF-16 characters to integers correctly ( 132878 ).
So why does it output 57,102 then?
Because you are iterating for str.length in your loop, which returns 4 (because "𠜎".length == "), so .codePointAt() will get executed on str[3] which is 57102.

Numbers localization in Web applications

How can I set the variant of Arabic numeral without changing character codes?
Eastern Arabic ۰ ۱ ۲ ۳ ٦ ٥ ٤ ۷ ۸ ۹
Persian variant ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹
Western Arabic 0 1 2 3 4 5 6 7 8 9
(And other numeral systems)
Here is a sample code:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<div lang="fa">0123456789</div>
<div lang="ar">0123456789</div>
<div lang="en">0123456789</div>
</body>
</html>
How can I do this using only client-side technologies (HTML,CSS,JS)?
The solution should have no negative impact on page's SEO score.
Note that in Windows text boxes (e.g. Run) numbers are displayed correctly according to language of surrounding text.
See also: Numbers localization in desktop applications
Note: Localisation of numbers are super easy on backend using this PHP package https://github.com/salarmehr/cosmopolitan
Here is an approach with code shifting:
// Eastern Arabic (officially "Arabic-Indic digits")
"0123456789".replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x0630);
}); // "٠١٢٣٤٥٦٧٨٩"
// Persian variant (officially "Eastern Arabic-Indic digits (Persian and Urdu)")
"0123456789".replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x06C0);
}); // "۰۱۲۳۴۵۶۷۸۹"
DEMO: http://jsfiddle.net/bKEbR/
Here we use Unicode shift, since numerals in any Unicode group are placed in the same order as in latin group (i.e. [0x0030 ... 0x0039]). So, for example, for Arabic-Indic group shift is 0x0630.
Note, it is difficult for me to distinguish Eastern characters, so if I've made a mistake (there are many different groups of Eastern characters in Unicode), you could always calculate the shift using any online Unicode table. You may use either official Unicode Character Code Charts, or Unicode Online Chartable.
One has to decide if this is a question of appearance or of transformation. One must also decide if this is a question involving character-level semantics or numeral representations. Here are my thoughts:
The question would have entirely different semantics, if we had a situation where Unicode had not sparated out the codes for numeric characters.
Then, displaying the different glyphs as appropriate would simply be a matter of using the appropriate font. On the other hand, it would not have been possible to simply write out the different characters as I did below without changing fonts. (The situation is not exactly perfect as fonts do not necessarily cover the whole range of the 16-bit Unicode set, let alone the 32-bit Unicode set.)
9, ٩ (Arabic), ۹ (Urdu), 玖 (Chinese, complex), ๙ (Thai), ௯ (Tamil) etc.
Now, assuming we accept Unicode semantics i.e. that '9' ,'٩', and '۹' are distinct characters, we may conclude that the question is not about appearance (something that would have been in the purview of CSS), but of transformation -- a few thoughts about this later, for now let us assume this is the case.
When focusing on character-level semantics, the situation is not too dissimilar with what happens with alphabets and letters. For instance, Greek 'α' and Latin 'a' are considered distinct, even though the Latin alphabet is nearly identical to the Greek alphabet used in Euboea. Perhaps even more dramatically, the corresponding capital variants, 'Α' (Greek) and 'A' (Latin) are visually identical in practically all fonts supporting both scripts, yet distinct as far as Unicode is concerned.
Having stated the ground rules, let us see how the question can be answered by ignoring them, and in particular ignoring (character-level) Unicode semantics.
(Horrible, nasty and non-backwards compatible) Solution: Use fonts that map '0' to '9' to the desired glyphs. I am not aware of any such fonts. You would have to use #font-face and some font that has been appropriately hacked to do what you want.
Needless to say, I am not particularly fond of this solution. However, it is the only simple solution I am aware of that does what the question asks "without changing character codes" on either the server or the client side. (Technically speaking the Cufon solution I propose below does not change the character codes either, but what it does, drawing text into canvases is vastly more complex and also requires tweaking open-source code).
Note: Any transformational solution i.e. any solution that changes the DOM and replaces characters in the range '0' to '9' to, say, their Arabic equivalents will break code that expects numerals to appear in their original form in the DOM. This problem is, of course, worst when discussing forms and inputs.
An example of an answer taking the transformational approach is would be:
$("[lang='fa']").find("*").andSelf().contents().each(function() {
if (this.nodeType === 3)
{
this.nodeValue = this.nodeValue.replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x0630);
});
}
});
Note: Code taken from VisioN's second jsFiddle. If this is the only part of this answer that you like, make sure you upvote VisioN's answer, not mine!!! :-)
This has two problems:
It messes with the DOM and as a result may break code that used to work assuming it would find numerals in the "standard" form (using digits '0' to '9'). See the problem here: http://jsfiddle.net/bKEbR/10/ For instance, if you had a field containing the sum of some integers the user inputs, you might be in for a surprise when you try to get its value...
It does not address the issue of what goes on inside input (and textarea) elements. If an input field is initialised with, say, "42", it will retail that value. This can be fixed easily, but then there is the issue of actual input... One may decide to change characters as they come, convert the values when they changes and so on and so forth. If such conversion is made then both the client side and the server side will need to be prepared to deal with different kinds of numeral. What comes out of the box in Javascript, jQuery and even Globalize (client-side), and ASP.NET, PHP etc. (server-side) will break if fed with numerals in non-standard formats ...
A slightly more comprehensive solution (taking care also of input/textarea elements, both their initial values and user input) might be:
//before the DOM change, test1 holds a numeral parseInt can understand
alert("Before: test holds the value:" +parseInt($("#test1").text()));
function convertNumChar(c) {
return String.fromCharCode(c.charCodeAt(0) + 0x0630);
}
function convertNumStr(s) {
return s.replace(/\d/g, convertNumChar);
}
//the change in the DOM
$("[lang='fa']").find("*").andSelf().contents()
.each(function() {
if (this.nodeType === 3)
this.nodeValue = convertNumStr(this.nodeValue);
})
.filter("input:text,textarea")
.each(function() {
this.value = convertNumStr(this.value)
})
.change(function () {this.value = convertNumStr(this.value)});
//test1 now holds a numeral parseInt cannot understand
alert("After: test holds the value:" +parseInt($("#test1").text()))
The entire jsFiddle can be found here: http://jsfiddle.net/bKEbR/13/
Needless to say, this only solves the aforementioned problems partially. Client-side and/or server-side code will have to recognise the non-standard numerals and convert them appropriately either to the standard format or to their actual values.
This is not a simple matter that a few lines of javascript will solve. And this is but the simplest case of such possible conversion since there is a simple character-to-character mapping that needs to be applied to go from one form of numeral to the other.
Another go at an appearance-based approach:
Cufon-based Solution (Overkill, Non-Backwards Compatible (requires canvas), etc.): One could relatively easily tweak a library like Cufon to do what is envisaged. Cufon can do its thing and draw glyphs on a canvas object, except that the tweak will ensure that when elements have a certain property, the desired glyphs will be used instead of the ones normally chosen. Cufon and other libraries of the kind tend to add elements to the DOM and alter the appearance of existing elements but not touch their text, so the problems with the transformational approaches should not apply. In fact it is interesting to note that while (tweaked) Cufon provides a clearly transformational apprroach as far as the overall DOM is concerned, it is an appearance-based solution as far as its mentality goes; I would call it a hybrid solution.
Alternative Hybrid-Solution: Create new DOM elements with the arabic content, hide the old elements but leave their ids and content intact. Synchronize the arabic content elements with their corresponding, hidden, elements.
Let's try to think outside the box (the box being current web standards).
The fact that certain characters are unique does not mean they are unrelated. Moreover, it does not necessarily mean that their difference is one of appearance. For instance, 'a' and 'A' are the same letter; in some contexts they are considered to be the same and in others to be different. Having, the distinction in Unicode (and ASCII and ISO-Latin-1 etc. before it) means that some effort is required to overcome it.
CSS offers a quick and easy way for changing the case of letters. For instance, body {text-transform:uppercase} would turn all letters in the text in the body of the page into upper case. Note that this is also a case of appearance-change rather than transformation: the DOM of the body element does not change, just the way it is rendered.
Note: If CSS supported something like numerals-transform: 'ar' that would probably have been the ideal answer to the question as it was phrased.
However, before we rush to tell the CSS committee to add this feature, we may want to consider what that would mean. Here, we are tackling a tiny little problem, but they have to deal with the big picture.
Output:
Would this numerals-transform feature work allow '10' (2-characters) to appear as 十(Chinese, simple), 拾 (Chinese, complex), X (Latin) (all 1-character) and so on if instead of 'ar', the appropriate arguments were given?
Input:
Would this numerals-transform feature change '十'(Chinese, simple) into its Arabic equivalent, or would it simply target '10'? Would it somehow cleverly detect that "MMXI" (Latin numeral for 2012) is a number and not a word and convert it accordingly?
The question of number representation is not as simple as one might imagine just looking at this question.
So, where does all this leave us:
There is no simple presentation-based solution. If one appears in the future, it will not be backwards compatible.
There can be a transformational "solution" here and now, but even if this is made to work also with form elements as I have done (http://jsfiddle.net/bKEbR/13/) there need to be server-side and client-side awareness of the non-standard format used.
There may be complex hybrid solutions. They are complex but offer some of the advantages of the presentation-based approaches in some cases.
A CSS solution would be nice, but actually the problem is big and complex when one looks at the big picture which involves other numeric systems (with less trivial conversions from and to the standard system), decimal points,signs etc.
At the end of the day, the solution I see as realistic and backwards compatible would be an extension of Globalize (and server-side equivalents) possibly with some additional code to take care of user input. The idea is that this is not a problem at the character-level (because once you consider the big picture it is not) and that it will have to be treated in the same way that differences with thousands and decimal separators have been dealt with: as formatting/parsing issues.
I imagine the best way is to use a regexp to search what numeric characters should be changed via adding a class name to the div that needs a different numeric set.
You can do this using jQuery fairly easy.
jsfiddle DEMO
EDIT: And if you don't want to use a variable, then see this revised demo:
jsfiddle DEMO 2
I have been working on a general web page localization technique that does more than just numbers (its similar to .po files)
The localization files are simple (the strings can contain html if needed)
/* Localization file - save as document_url.lang.js ... index.html.en.js: */
items=[
{"id":"string1","value":"Localized text of string1 here."},
{"id":"string2", "value":"۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ "}
];
rtl=false; /* set to true for rtl languages */
This format is useful to separate out for translators (or mechanical turk)
and a basic page template
<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<head><title>My title</title>
<style>.txt{float:left;margin-left:10px}</style>
</head>
<body onload='setLang()'>
<div id="string1" class="txt">This is the default text of string1.</div>
<div id="string2" class="txt">0 1 2 3 4 5 6 7 8 9 </div>
</body></html>
<script>
function setLang(){
for(var i=0;i<items.length;i++){
term=document.getElementById(items[i].id)
if(term)term.innerHTML=items[i].value
if(rtl){ /* for rtl languages */
term.style.styleFloat="right"
term.style.cssFloat="right"
term.style.textAlign="right"
}
}
}
var lang=navigator.userLanguage || navigator.language;
var script=document.createElement("script");
script.src=document.URL+"-"+lang.substring(0,2)+".js"
var head = document.getElementsByTagName('head')[0]
head.insertBefore(script,head.firstChild)
</script>
I tried to keep it pretty simple, yet cover as many locales as possible so additional css is likely required (I have to admit a lack of exposure to rtl languages, so many more styles may need to be set)
I do have font checking code that would be useful if you know what fonts support your character codes well
function hasFont(f){
var s=document.createElement("span")
s.style.fontSize="72px"
s.innerHTML="MWMWM"
s.style.visibility="hidden"
s.style.fontFamily=[(f=="monospace")?'':'monospace','sans-serif','serif']
document.body.appendChild(s)
var w=s.offsetWidth
s.style.fontFamily=[f,'monospace','sans-serif','serif']
document.body.lastChild=s
return s.offsetWidth!=w
}
usage: if(hasFont("myfont"))myelement.style.fontFamily="myfont"
A new (to date) and simple JS solution would be to use Intl.NumberFormat. It supports numeral localization, formatting variations as well as local currencies (see documentation for more examples).
To use an example very similar to MDN's own:
const val = 1234567809;
console.log('Eastern Arabic (Arabic-Egyptian)', new Intl.NumberFormat('ar-EG').format(val));
console.log('Persian variant (Farsi)',new Intl.NumberFormat('fa').format(val));
console.log('English (US)',new Intl.NumberFormat('en-US').format(val));
Intl.NumberFormat also seems to support string numeric values as well as indicates when it's not a number in the local language.
const val1 = '456';
const val2 = 'Numeric + string example, 123';
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val1));
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val2));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val1));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val2));
console.log('English',new Intl.NumberFormat('en-US').format(val1));
console.log('English', new Intl.NumberFormat('en-US').format(val2));
For the locale identifier (string passed to NumberFormat constructor indicating locale), I experimented with the values above and they seemed fine. I tried finding a list for all possible values, and through MDN came across this documentation and this list that could be helpful.
I'm not familiar with SEO, and am thus unsure how this answers that part of the question.
you can try this:
This is CSS source code:
#font-face
{
font-family: A1Tahoma;
src: url(yourfont.eot) format('eot')
, url(yourfont.ttf) format('truetype')
, url(yourfont.woff) format('woff')
, url(yourfont.svg) format('svg');
}
p{font-family:A1Tahoma; font-size:30px;}
And this is HTML code:
<p>سلام به همه</p>
<p>1234567890</p>
And finally you will see your result.remember that 4 font types use for any browser such as IE,FIREFOX and so on.
"salam reza ,to mituni in karo anjam bedi ta un fonte dekhaheto be site ezafe koni."
I have created a jquery plugin that can convert Western Arabic numbers to Eastern ones (Persian only). But it can be extended to convert a number to any desired numeral system. My jQuery plugin has two advantages:
Detect and convert numbers properly in child nodes.
Detect and convert point characters appropriately.
You can clone this plugin from github.
My plugin code:
(function( $ ){
$.fn.persiaNumber = function() {
var groupSelection = this;
for(i=0; i< groupSelection.length ; i++){
var htmlTxt = $(groupSelection[i]).html();
var trueTxt = convertDecimalPoint(htmlTxt);
trueTxt = convertToPersianNum(trueTxt);
$(groupSelection[i]).html(trueTxt);
}
function convertToPersianNum(htmlTxt){
var otIndex = htmlTxt.indexOf("<");
var ctIndex = htmlTxt.indexOf(">");
if(otIndex == -1 && ctIndex == -1 && htmlTxt.length > 0){
var trueTxt = htmlTxt.replace(/1/gi, "۱").replace(/2/gi, "۲").replace(/3/gi, "۳").replace(/4/gi, "۴").replace(/5/gi, "۵").replace(/6/gi, "۶").replace(/7/gi, "۷").replace(/8/gi, "۸").replace(/9/gi, "۹").replace(/0/gi, "۰");
return trueTxt;
}
var tag = htmlTxt.substring(otIndex,ctIndex + 1);
var str = htmlTxt.substring(0,otIndex);
str = convertDecimalPoint(str);
str = str.replace(/1/gi, "۱").replace(/2/gi, "۲").replace(/3/gi, "۳").replace(/4/gi, "۴").replace(/5/gi, "۵").replace(/6/gi, "۶").replace(/7/gi, "۷").replace(/8/gi, "۸").replace(/9/gi, "۹").replace(/0/gi, "۰");
var refinedHtmlTxt = str + tag;
var htmlTxt = htmlTxt.substring(ctIndex + 1, htmlTxt.length);
if(htmlTxt.length > 0 && otIndex != -1 || ctIndex != -1){
var trueTxt = refinedHtmlTxt;
var trueTxt = trueTxt + convertToPersianNum(htmlTxt);
}else{
return refinedHtmlTxt+ htmlTxt;
}
return trueTxt;
}
function convertDecimalPoint(str){
for(j=1;j<str.length - 1; j++){
if(str.charCodeAt(j-1) > 47 && str.charCodeAt(j-1) < 58 && str.charCodeAt(j+1) > 47 && str.charCodeAt(j+1) < 58 && str.charCodeAt(j) == 46)
str = str.substring(0,j) + '٫' + str.substring(j+1,str.length);
}
return str;
}
};
})( jQuery );
http://jsfiddle.net/VPWmq/2/
You can convert numbers in this way:
const persianDigits = ['۰', '۱', '۲', '۳', '۴', '۵', '۶', '۷', '۸', '۹'];
const number = 44653420;
convertedNumber = String(number).replace(/\d/g, function(digit) {
return persianDigits[digit]
})
console.log(convertedNumber) // ۴۴۶۵۳۴۲۰
If anyone is looking for localizing into Bangla numbers using this code shifting method:
$("[lang='bang']").text(function(i, val) {
return val.replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x09B6);
});
});
You can also visit here to see the UNICODE of ASCII Hexadecimal codes of Bangla

How do I make Internet Explorer include the required line feeds when transferring innerHTML to a <textarea>?

The purpose of this JavaScript program is to enable the user to report a problem on a social network with all the pertinent information in his initial message.
The user enters the information by answering a set of questions on a form consisting of text boxes, &c.  The answers are used to create an array of string literals, the elements of which are concatenated to form a single string. This string is then presented at the end of the page for the user to copy and then to paste on to the social-network page.
Hitherto this has been done by placing the string (using document.getElementById('divName').innerHTML) in a <div> set up for the purpose. That's fine: works in all five browsers and even on the Iphone.
Now, in order to make it easier for the user to make minor changes to the report before posting it (and to make it easier to copy), I want to be able to place the report not in a <div> but in a <textarea> part of the input form. This too is fine: works in Firefox, Chrome, Opera and Safari – even on the Iphone – but...
With some inevitability the only browser on the whole parade ground marching in step – MSIE – cannot handle it. It puts the information in the <textarea>– minus all the line feeds.
The array is initialized:
function createReport() { // [0] and [21] are constant.
outReportArray[0] = 'E M E R G E N C Y' + horizLine;
for (i=1 ; i<outReportArray.length ; i++) {
outReportArray[i] = '';
}
outReportArray[21] = 'Thank you.';
}
(Global variable horizLine is a sequence of m-rules (—) with a line feed at each end.)
As the user progresses through the form, the array is updated:
outReportArray[element] = label + value + (underLine ? horizLine : '
');
(The following also have been tried for generating the line feed: '\n', '\r\n', '\r', '
', '
')
The output string is continually rebuilt and pasted to the <textarea> so that when he arrives at the foot of the form he presses an ‘update’ button and is simply transferred to the <<textarea>:
outReportStr = ''; // Initialize the output string.
// Build the output string from the output array.
for (i=0 ; i<outReportArray.length ; i++) {
outReportStr += outReportArray[i];
}
// Populate the <textarea> 'outbox'.
document.getElementById('outbox').innerHTML = outReportStr;
Normally the content of the <textarea> looks like this:
E M E R G E N C Y
———————————
Name: John Doe
Land-line: (213) 555 1234
Cell-phone: (213) 555 1235
E-mail: JDoe#aol.com
———————————
Location of animal now:
Washington (St Landry Parish), La
———————————
&c.
(There’s more to it but this demonstrates the layout required in the output.)
In Internet Explorer, however, each line feed is replaced by a single space:
E M E R G E N C Y ——————————— Name: John Doe Land-line: (213) 555 1234 Cell-phone: (213) 555 1235 E-mail: JDoe#aol.com ——————————— Location of animal now: Washington (St Landry Parish), La ——————————— &c.
My question is this: how do I make Internet Explorer include the required line feeds in the innerHTML transferred to <textarea> 'outbox'?
(I have tried creating a textNode consisting of the innerHTML and then appending it as a child to the <textarea>; no dice: what I then get are all the character-entity codes (&#...) instead of the characters themselves.)
I'm very much an amateur at this game so, quite apart from not wanting to impose anything more complicated than HTML, CSS and JavaScript on the user, I don't want to get involved with complicated add-ins and proprietary libraries. A front seat at the pearly gates to any-one that can help me solve this problem!
The opener re-bids
First let me express sincere thanks to all that responded to my question. I had had some doubt of even getting a response, never mind one so quick.
Your insightful answers not only solved my problem but taught me quite a bit about the languages I'm deploying way above my pay grade – even of one I've never used!
#Kolink and #JayC told me to use .value rather than .innerHTML (and Kolink was quite right to adopt a tone of admonishment). Although I was aware of .value as part of the process of transferring data from an input element to the program, it had not even occurred to me that it might be written to: d'oh! I believe is the term.
Thank you, #RobG also, for your account of the use of \u codes; when it came to using .value vice .innerHTML, that was an important part of the solution.
#deceze recommended, indeed pleaded, that I learn 'Markdown' (which I always thought was something retailers applied to merchandise they were putting in to the clearance sale). He didn't say whether that was for my benefit or for his as a possible respondent but, searching for it on Google, I found a very interesting alternative to the jEdit I use, its strength being that the 'code' (the Markdown version of the text) is legible by a human, which must make it much simpler to edit. Thank you, deceze, I'll look in to that in due course (I've even tried coding this message in it); for the time being, not to the best of my knowledge having Perl to hand, I shall have to stick with what I (almost) know.
And, naturally, you all qualify for a front seat at the pearly gates. Thank you all for making my first visit to such a forum both fruitful and enjoyable.
ΠΞ
Set the textarea's value, not its innerHTML. In HTML, whitespace is stripped - you should know that just from making a basic webpage. Internet Explorer is just doing things right, unlike the other browsers...
textareas are very much like inputs. why don't you use the value attribute?
'\n' should work. Inserting the following as the value of a textarea element in IE 8 puts in the returns (wrapped for convenience only):
var s = 'E M E R G E N C Y\n' +
'\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\n' +
'Name: John Doe Land-line';
I've substituted the unicode value rather than the HTML character entity for the "—" character only. You might want to use "\n\r" but I don't think it's necessary.
Oh, you can also do things like:
var t = ['E M E R G E N C Y',
'\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014',
'Name: John Doe Land-line'];
then set the value to:
t.join('\n') // or t.join('\u000d');
You can also substitute \u000a for linefeed and \u000d for carraige return.

What is the significance of the number 93 in Unicode?

Since there is currently no universal way to read live data from an audio track in JavaScript I'm using a small library/API to read volume data from a text file that I converted from an MP3 offline.
The string looks like this
!!!!!!!!!!!!!!!!!!!!!!!!!!###"~{~||ysvgfiw`gXg}i}|mbnTaac[Wb~v|xqsfSeYiV`R
][\Z^RdZ\XX`Ihb\O`3Z1W*I'D'H&J&J'O&M&O%O&I&M&S&R&R%U&W&T&V&m%\%n%[%Y%I&O'P'G
'L(V'X&I'F(O&a&h'[&W'P&C'](I&R&Y'\)\'Y'G(O'X'b'f&N&S&U'N&P&J'N)O'R)K'T(f|`|d
//etc...
and the idea is basically that at a given point in the song the Unicode number of the character at the corresponding point in the text file yields a nominal value to represent volume.
The library translates the data (in this case, a stereo track) with the following (simplified here):
getVolume = function(sampleIndex,o) {
o.left = Math.min(1,(this.data.charCodeAt(sampleIndex*2|0)-33)/93);
o.right = Math.min(1,(this.data.charCodeAt(sampleIndex*2+1|0)-33)/93);
}
I'd like some insight into how the file was encoded in the first place, and how I'm making use of it here.
What is the significance of 93 and 33?
What is the purpose of the bitwise |?
Is this a common means of porting information (ie, does it have a name), or is there a better way to do it?
It looks like the range of the characters in that file are from ! to ~. ! has an ASCII code of 33 and ~ has an ASCII code of 126. 126-33 = 93.
33 and 93 are used for normalizing values beween ! and ~.
var data = '!';
Math.min(1,(data.charCodeAt(0*2)-33)/93); // will yield 0
var data = '~';
Math.min(1,(data.charCodeAt(0*2)-33)/93); // will yield 1
var data = '"';
Math.min(1,(data.charCodeAt(0*2)-33)/93); // will yield 0.010752688172043012
var data = '#';
Math.min(1,(data.charCodeAt(0*2)-33)/93); // will yield 0.021505376344086023
// ... and so on
The |0 is there due to the fact that sampleIndex*2 or sampleIndex*2+1 will yield a non-integer value when being passed a non-integer sampleIndex. |0 truncates the decimal part just in case someone sends in an incorrectly formatted sampleIndex (i.e. non-integer).
Doing a bitwise OR with zero will truncate the number on the LHS to a integer. Not sure about the rest of your question though, sorry.
93 and 33 are ASCII codes (not unicode) for the characters "]" and "!" respectively. Hope that helps a bit.
This will help you forever:
http://www.asciitable.com/
ASCIII codes for everything.
Enjoy!

Categories

Resources