How to reverse a string that contains complicated emojis? - javascript

Input:
Hello world👩‍🦰👩‍👩‍👦‍👦
Desired Output:
👩‍👩‍👦‍👦👩‍🦰dlrow olleH
I tried several approaches but none gave me correct answer.
This failed miserablly:
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.split('').reverse().join('');
console.log(reversed);
This kinda works but it breaks 👩‍👩‍👦‍👦 into 4 different emojis:
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = [...text].reverse().join('');
console.log(reversed);
I also tried every answer in this question but none of them works.
Is there a way to get the desired output?

If you're able to, use the _.split() function provided by lodash. From version 4.0 onwards, _.split() is capable of splitting unicode emojis.
Using the native .reverse().join('') to reverse the 'characters' should work just fine with emojis containing zero-width joiners
function reverse(txt) { return _.split(txt, '').reverse().join(''); }
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
console.log(reverse(text));
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.20/lodash.min.js" integrity="sha512-90vH1Z83AJY9DmlWa8WkjkV79yfS2n2Oxhsi2dZbIv0nC4E6m5AbH8Nh156kkM7JePmqD6tcZsfad1ueoaovww==" crossorigin="anonymous"></script>

I took TKoL's idea of using the \u200d character and used it to attempt to create a smaller script.
Note: Not all compositions use a zero width joiner so it will be buggy with other composition characters.
It uses the traditional for loop because we skip some iterations in case we find combined emoticons. Within the for loop there is a while loop to check if there is a following \u200d character. As long there is one we add the next 2 characters as well and forward the for loop with 2 iterations so combined emoticons are not reversed.
To easily use it on any string I made it as a new prototype function on the string object.
String.prototype.reverse = function() {
let textArray = [...this];
let reverseString = "";
for (let i = 0; i < textArray.length; i++) {
let char = textArray[i];
while (textArray[i + 1] === '\u200d') {
char += textArray[i + 1] + textArray[i + 2];
i = i + 2;
}
reverseString = char + reverseString;
}
return reverseString;
}
const text = "Hello world👩‍🦰👩‍👩‍👦‍👦";
console.log(text.reverse());
//Fun fact, you can chain them to double reverse :)
//console.log(text.reverse().reverse());

Reversing Unicode text is tricky for a lot of reasons.
First, depending on the programming language, strings are represented in different ways, either as a list of bytes, a list of UTF-16 code units (16 bits wide, often called "characters" in the API), or as ucs4 code points (4 bytes wide).
Second, different APIs reflect that inner representation to different degrees. Some work on the abstraction of bytes, some on UTF-16 characters, some on code points. When the representation uses bytes or UTF-16 characters, there are usually parts of the API that give you access to the elements of this representation, as well as parts that perform the necessary logic to get from bytes (via UTF-8) or from UTF-16 characters to the actual code points.
Often, the parts of the API performing that logic and thus giving you access to the code points have been added later, as first there was 7 bit ascii, then a bit later everybody thought 8 bits were enough, using different code pages, and even later that 16 bits were enough for unicode. The notion of code points as integer numbers without a fixed upper limit was historically added as the fourth common character length for logically encoding text.
Using an API that gives you access to the actual code points seems like that's it. But...
Third, there are a lot of modifier code points affecting the next code point or following code points. E.g. there's a diacritic modifier turning a following a into an ä, e to ë, &c. Turn the code points around, and aë becomes eä, made of different letters. There is a direct representation of e.g. ä as its own code point but using the modifier is just as valid.
Fourth, everything is in constant flux. There are also a lot of modifiers among the emoji, as used in the example, and more are added every year. Therefore, if an API gives you access to the information whether a code point is a modifier, the version of the API will determine whether it already knows a specific new modifier.
Unicode provides a hacky trick, though, for when it's only about the visual appearance:
There are writing direction modifiers. In the case of the example, left-to-right writing direction is used. Just add a right-to-left writing direction modifier at the beginning of the text and depending on the version of the API / browser, it will look correctly reversed 😎
'\u202e' is called right to left override, it is the strongest version of the right to left marker.
See this explanation by w3.org
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦'
console.log('\u202e' + text)
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦'
let original = document.getElementById('original')
original.appendChild(document.createTextNode(text))
let result = document.getElementById('result')
result.appendChild(document.createTextNode('\u202e' + text))
body {
font-family: sans-serif
}
<p id="original"></p>
<p id="result"></p>

I know! I'll use RegExp. What could go wrong? (Answer left as an exercise for the reader.)
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.match(/.(\u200d.)*/gu).reverse().join('');
console.log(reversed);

Alternative solution would be to use runes library, small but effective solution:
https://github.com/dotcypress/runes
const runes = require('runes')
// String.substring
'👨‍👨‍👧‍👧a'.substring(1) => '�‍👨‍👧‍👧a'
// Runes
runes.substr('👨‍👨‍👧‍👧a', 1) => 'a'
runes('12👩‍👩‍👦‍👦3🍕✓').reverse().join();
// results in: "✓🍕3👩‍👩‍👦‍👦21"

You don't just have trouble with emoji, but also with other combining characters.
These things that feel like individual letters but are actually one-or-more unicode characters are called "extended grapheme clusters".
Breaking a string into these clusters is tricky (for example see these unicode docs). I would not rely on implementing it myself but use an existing library. Google pointed me at the grapheme-splitter library. The docs for this library contain some nice examples that will trip up most implementations:
Using this you should be able to write:
var splitter = new GraphemeSplitter();
var graphemes = splitter.splitGraphemes(string);
var reversed = graphemes.reverse().join('');
ASIDE: For visitors from the future, or those willing to live on the bleeding edge:
There is a proposal to add a grapheme segmenter to the javascript standard. (It actually provides other segmenting options too).
It is in stage 3 review for acceptance at the moment and is currently implemented in JSC and V8 (see https://github.com/tc39/proposal-intl-segmenter/issues/114).
Using this the code would look like:
var segmenter = new Intl.Segmenter("en", {granularity: "grapheme"})
var segment_iterator = segmenter.segment(string)
var graphemes = []
for (let {segment} of segment_iterator) {
graphemes.push(segment)
}
var reversed = graphemes.reverse().join('');
You can probably make this neater if you know more modern javascript than me...
There is an implementation here - but I don't know what it requires.
Note: This points out a fun issue that other answers haven't addressed yet. Segmentation can depend upon the locale that you are using - not just the characters in the string.

I just decided to do it for fun, was a good challenge. Not sure it's correct in all cases, so use at your own risk, but here it is:
function run() {
const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const newText = reverseText(text);
console.log(newText);
}
function reverseText(text) {
// first, create an array of characters
let textArray = [...text];
let lastCharConnector = false;
textArray = textArray.reduce((acc, char, index) => {
if (char.charCodeAt(0) === 8205) {
const lastChar = acc[acc.length-1];
if (Array.isArray(lastChar)) {
lastChar.push(char);
} else {
acc[acc.length-1] = [lastChar, char];
}
lastCharConnector = true;
} else if (lastCharConnector) {
acc[acc.length-1].push(char);
lastCharConnector = false;
} else {
acc.push(char);
lastCharConnector = false;
}
return acc;
}, []);
console.log('initial text array', textArray);
textArray = textArray.reverse();
console.log('reversed text array', textArray);
textArray = textArray.map((item) => {
if (Array.isArray(item)) {
return item.join('');
} else {
return item;
}
});
return textArray.join('');
}
run();

You can use:
yourstring.split('').reverse().join('')
It should turn your string into a list, reverse it then make it a string again.

const text = 'Hello world👩‍🦰👩‍👩‍👦‍👦';
const reversed = text.split('').reverse().join('');
console.log(reversed);

Related

how to stop replace function in JavaScript?

I have a very big string (> 2mln chars).
And I need to replace the text in this string, but this works very slowly.
How I can stop the replacer function if offset > 65535?
let inReg = MyRegExp();
let str = "very big string....";
str = str.replace(inReg, (match, p1, p2, offset, string) => {
if (offset > 65535) {
return myConvertFunc(match);
} else {
//How i can stop replacer?
}
return match; //this did not stop the replacer function
});
You can't. You could test the offset and just return the original text if it was > 65535, but the search would still continue past that point and your callback would still get called.
Instead, isolate the part you want to do the change in:
const segmentLength = 65536;
str = str.substring(0, segmentLength).replace(myConvertFunc) +
str.substring(segmentLength);
On the face of it, that seems insane — creating two new strings, one of them a copy of 2M characters! But modern JavaScript engines are very smart about strings (which, being immutable, offer a lot of opportunity for optimization). It's likely any good JavaScript engine would reuse, not copy, the contents of the two strings created via substring above, sharing the underlying character array with the original.

Removing a String from an Array, but adding the Number amount back in Javascript

I am trying to create a FEN notation converter for chess. I've decided the best way to go about this is to remove any unnecessary values from the string that I don't need. Here is what it looks like.
rnbqkbnr/pppppppp/8/
These are all strings in my code. I've already found solutions for the end of each file which is represented by the / in the string. But my new problem is taking away the number values and adding back in blank squares to cover them.
When you see a number in FEN, that is essentially how many empty squares on the Chess table there are until another piece. My idea once again, to dumb it down for myself, was to convert these numbers like 8 into this 11111111 a series of numbers representing each square that would have to be empty.
While in practice, I assumed that I would just be able to splice the 8 out and just start filling up that index onwards with 1, this seems to be viable option, but unfortunately is quite buggy when moving the number around to different places.
Would anyone have a better option for this? Yes I know that there is already libraries that accomplish this, but what's the fun of coding if you don't attempt to reinvent the wheel a couple times.
correctFen function - maps over the string looking for any numbers
const correctFen = () => {
newFen.map((pos, index) => {
if (Number(pos) && pos !== '/'){
for (let i = 0; i < Number(pos); i++) {
console.log('firing')
setNewFen(addAfter(newFen, index + i, '1'))
}
}
})
console.log(newFen)
figureOutPos()
}
after looking at this, it's not really removing the index that I'm wanting, could be a problem here, it's adding the 1s after the last /
function addAfter(array, index, newItem) {
return [
...array.slice(0, index),
newItem,
...array.slice(index)
];
}
Looks like you're modifying newFen from the map callback. This would indeed mess up indices because everything after the current index shifts around.
Instead, return the new character(s) from the map callback. In most cases this will be the original character, but if it's a digit, you'll return a string of multiple characters. Then join the array together into a string again.
Something like this:
// Assuming newFen is a string in some outer scope.
const correctFen = () => {
const correctedFen = newFen
// Split the string into an array of single characters.
.split('')
// Apply a function to each character and gather the results in an array.
.map((char, _index) => {
// Try to parse `char` as an integer.
const number = parseInt(char)
if (isNaN(number)) {
// `char` is not a digit. Return the character itself.
return char
} else {
// `char` is a digit. Return a string of that many 1s.
return '1'.repeat(number)
}
})
// Join the array back into a string.
.join('')
setNewFen(correctedFen)
}

Why doesn't my function correctly replace when using some regex pattern

This is an extension of this SO question
I made a function to see if i can correctly format any number. The answers below work on tools like https://regex101.com and https://regexr.com/, but not within my function(tried in node and browser):
const
const format = (num, regex) => String(num).replace(regex, '$1')
Basically given any whole number, it should not exceed 15 significant digits. Given any decimal, it should not exceed 2 decimal points.
so...
Now
format(0.12345678901234567890, /^\d{1,13}(\.\d{1,2}|\d{0,2})$/)
returns 0.123456789012345678 instead of 0.123456789012345
but
format(0.123456789012345,/^-?(\d*\.?\d{0,2}).*/)
returns number formatted to 2 deimal points as expected.
Let me try to explain what's going on.
For the given input 0.12345678901234567890 and the regex /^\d{1,13}(\.\d{1,2}|\d{0,2})$/, let's go step by step and see what's happening.
^\d{1,13} Does indeed match the start of the string 0
(\. Now you've opened a new group, and it does match .
\d{1,2} It does find the digits 1 and 2
|\d{0,2} So this part is skipped
) So this is the end of your capture group.
$ This indicates the end of the string, but it won't match, because you've still got 345678901234567890 remaining.
Javascript returns the whole string because the match failed in the end.
Let's try removing $ at the end, to become /^\d{1,13}(\.\d{1,2}|\d{0,2})/
You'd get back ".12345678901234567890". This generates a couple of questions.
Why did the preceding 0 get removed?
Because it was not part of your matching group, enclosed with ().
Why did we not get only two decimal places, i.e. .12?
Remember that you're doing a replace. Which means that by default, the original string will be kept in place, only the parts that match will get replaced. Since 345678901234567890 was not part of the match, it was left intact. The only part that matched was 0.12.
Answer to title question: your function doesn't replace, because there's nothing to replace - the regex doesn't match anything in the string. csb's answer explains that in all details.
But that's perhaps not the answer you really need.
Now, it seems like you have an XY problem. You ask why your call to .replace() doesn't work, but .replace() is definitely not a function you should use. Role of .replace() is replacing parts of string, while you actually want to create a different string. Moreover, in the comments you suggest that your formatting is not only for presenting data to user, but you also intend to use it in some further computation. You also mention cryptocurriencies.
Let's cope with these problems one-by-one.
What to do instead of replace?
Well, just produce the string you need instead of replacing something in the string you don't like. There are some edge cases. Instead of writing all-in-one regex, just handle them one-by-one.
The following code is definitely not best possible, but it's main aim is to be simple and show exactly what is going on.
function format(n) {
const max_significant_digits = 15;
const max_precision = 2;
let digits_before_decimal_point;
if (n < 0) {
// Don't count minus sign.
digits_before_decimal_point = n.toFixed(0).length - 1;
} else {
digits_before_decimal_point = n.toFixed(0).length;
}
if (digits_before_decimal_point > max_significant_digits) {
throw new Error('No good representation for this number');
}
const available_significant_digits_for_precision =
Math.max(0, max_significant_digits - digits_before_decimal_point);
const effective_max_precision =
Math.min(max_precision, available_significant_digits_for_precision);
const with_trailing_zeroes = n.toFixed(effective_max_precision);
// I want to keep the string and change just matching part,
// so here .replace() is a proper method to use.
const withouth_trailing_zeroes = with_trailing_zeroes.replace(/\.?0*$/, '');
return withouth_trailing_zeroes;
}
So, you got the number formatted the way you want. What now?
What can you use this string for?
Well, you can display it to the user. And that's mostly it. The value was rounded to (1) represent it in a different base and (2) fit in limited precision, so it's pretty much useless for any computation. And, BTW, why would you convert it to String in the first place, if what you want is a number?
Was the value you are trying to print ever useful in the first place?
Well, that's the most serious question here. Because, you know, floating point numbers are tricky. And they are absolutely abysmal for representing money. So, most likely the number you are trying to format is already a wrong number.
What to use instead?
Fixed-point arithmetic is the most obvious answer. Works most of the time. However, it's pretty tricky in JS, where number may slip into floating-point representation almost any time. So, it's better to use decimal arithmetic library. Optionally, switch to a language that has built-in bignums and decimals, like Python.

Is it better to compare strings using toLowerCase or toUpperCase in JavaScript?

I'm going through a code review and I'm curious if it's better to convert strings to upper or lower case in JavaScript when attempting to compare them while ignoring case.
Trivial example:
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();
or should I do this:
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();
It seems like either "should" or would work with limited character sets like only English letters, so is one more robust than the other?
As a note, MSDN recommends normalizing strings to uppercase, but that is for managed code (presumably C# & F# but they have fancy StringComparers and base libraries):
http://msdn.microsoft.com/en-us/library/bb386042.aspx
Revised answer
It's been quite a while when I answered this question. While cultural issues still holds true (and I don't think they will ever go away), the development of ECMA-402 standard made my original answer... outdated (or obsolete?).
The best solution for comparing localized strings seems to be using function localeCompare() with appropriate locales and options:
var locale = 'en'; // that should be somehow detected and passed on to JS
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) {
// do something when equal
}
This will compare two strings case-insensitive, but accent-sensitive (for example ą != a).
If this is not sufficient for performance reasons, you may want to use eithertoLocaleUpperCase()ortoLocaleLowerCase()` passing the locale as a parameter:
if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) {
// do something when equal
}
In theory there should be no differences. In practice, subtle implementation details (or lack of implementation in the given browser) may yield different results...
Original answer
I am not sure if you really meant to ask this question in Internationalization (i18n) tag, but since you did...
Probably the most unexpected answer is: neither.
There are tons of problems with case conversion, which inevitably leads to functional issues if you want to convert the character case without indicating the language (like in JavaScript case). For instance:
There are many natural languages that don't have concept of upper- and lowercase characters. No point in trying to convert them (although this will work).
There are language specific rules for converting the string. German sharp S character (ß) is bound to be converted into two upper case S letters (SS).
Turkish and Azerbaijani (or Azeri if you prefer) has "very strange" concept of two i characters: dotless ı (which converts to uppercase I) and dotted i (which converts to uppercase İ <- this font does not allow for correct presentation, but this is really different glyph).
Greek language has many "strange" conversion rules. One particular rule regards to uppercase letter sigma (Σ) which depending on a place in a word has two lowercase counterparts: regular sigma (σ) and final sigma (ς). There are also other conversion rules in regard to "accented" characters, but they are commonly omitted during implementation of conversion function.
Some languages has title-case letters, i.e. Lj which should be converted to things like LJ or less appropriately LJ. The same may regard to ligatures.
Finally there are many compatibility characters that may mean the same as what you are trying to compare to, but be composed of completely different characters. To make it worse, things like "ae" may be the equivalent of "ä" in German and Finnish, but equivalent of "æ" in Danish.
I am trying to convince you that it is really better to compare user input literally, rather than converting it. If it is not user-related, it probably doesn't matter, but case conversion will always take time. Why bother?
Some other options have been presented, but if you must use toLowerCase, or
toUpperCase, I wanted some actual data on this. I pulled the full list
of two byte characters that fail with toLowerCase or toUpperCase. I then
ran this test:
let pairs = [
[0x00E5,0x212B],[0x00C5,0x212B],[0x0399,0x1FBE],[0x03B9,0x1FBE],[0x03B2,0x03D0],
[0x03B5,0x03F5],[0x03B8,0x03D1],[0x03B8,0x03F4],[0x03D1,0x03F4],[0x03B9,0x1FBE],
[0x0345,0x03B9],[0x0345,0x1FBE],[0x03BA,0x03F0],[0x00B5,0x03BC],[0x03C0,0x03D6],
[0x03C1,0x03F1],[0x03C2,0x03C3],[0x03C6,0x03D5],[0x03C9,0x2126],[0x0392,0x03D0],
[0x0395,0x03F5],[0x03D1,0x03F4],[0x0398,0x03D1],[0x0398,0x03F4],[0x0345,0x1FBE],
[0x0345,0x0399],[0x0399,0x1FBE],[0x039A,0x03F0],[0x00B5,0x039C],[0x03A0,0x03D6],
[0x03A1,0x03F1],[0x03A3,0x03C2],[0x03A6,0x03D5],[0x03A9,0x2126],[0x0398,0x03F4],
[0x03B8,0x03F4],[0x03B8,0x03D1],[0x0398,0x03D1],[0x0432,0x1C80],[0x0434,0x1C81],
[0x043E,0x1C82],[0x0441,0x1C83],[0x0442,0x1C84],[0x0442,0x1C85],[0x1C84,0x1C85],
[0x044A,0x1C86],[0x0412,0x1C80],[0x0414,0x1C81],[0x041E,0x1C82],[0x0421,0x1C83],
[0x1C84,0x1C85],[0x0422,0x1C84],[0x0422,0x1C85],[0x042A,0x1C86],[0x0463,0x1C87],
[0x0462,0x1C87]
];
let upper = 0, lower = 0;
for (let pair of pairs) {
let row = 'U+' + pair[0].toString(16).padStart(4, '0') + ' ';
row += 'U+' + pair[1].toString(16).padStart(4, '0') + ' pass: ';
let s = String.fromCodePoint(pair[0]);
let t = String.fromCodePoint(pair[1]);
if (s.toUpperCase() == t.toUpperCase()) {
row += 'toUpperCase ';
upper++;
} else {
row += ' ';
}
if (s.toLowerCase() == t.toLowerCase()) {
row += 'toLowerCase';
lower++;
}
console.log(row);
}
console.log('upper pass: ' + upper + ', lower pass: ' + lower);
Interestingly, one of the pairs fails with both. But based on this,
toUpperCase is the best option.
It never depends upon the browser as it is only the JavaScript which is involved.
both will give the performance based upon the no of characters need to be changed (flipping case)
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();
If you use test prepared by #adeneo you can feel it's browser dependent, but make some other test inputs like:
"AAAAAAAAAAAAAAAAAAAAAAAAAAAA"
and
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
and compare.
Javascript performance depends upon the browser if some DOM API or any DOM manipulation/interaction is there, otherwise for all plain JavaScript, it will give the same performance.

Numbers localization in Web applications

How can I set the variant of Arabic numeral without changing character codes?
Eastern Arabic ۰ ۱ ۲ ۳ ٦ ٥ ٤ ۷ ۸ ۹
Persian variant ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹
Western Arabic 0 1 2 3 4 5 6 7 8 9
(And other numeral systems)
Here is a sample code:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<div lang="fa">0123456789</div>
<div lang="ar">0123456789</div>
<div lang="en">0123456789</div>
</body>
</html>
How can I do this using only client-side technologies (HTML,CSS,JS)?
The solution should have no negative impact on page's SEO score.
Note that in Windows text boxes (e.g. Run) numbers are displayed correctly according to language of surrounding text.
See also: Numbers localization in desktop applications
Note: Localisation of numbers are super easy on backend using this PHP package https://github.com/salarmehr/cosmopolitan
Here is an approach with code shifting:
// Eastern Arabic (officially "Arabic-Indic digits")
"0123456789".replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x0630);
}); // "٠١٢٣٤٥٦٧٨٩"
// Persian variant (officially "Eastern Arabic-Indic digits (Persian and Urdu)")
"0123456789".replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x06C0);
}); // "۰۱۲۳۴۵۶۷۸۹"
DEMO: http://jsfiddle.net/bKEbR/
Here we use Unicode shift, since numerals in any Unicode group are placed in the same order as in latin group (i.e. [0x0030 ... 0x0039]). So, for example, for Arabic-Indic group shift is 0x0630.
Note, it is difficult for me to distinguish Eastern characters, so if I've made a mistake (there are many different groups of Eastern characters in Unicode), you could always calculate the shift using any online Unicode table. You may use either official Unicode Character Code Charts, or Unicode Online Chartable.
One has to decide if this is a question of appearance or of transformation. One must also decide if this is a question involving character-level semantics or numeral representations. Here are my thoughts:
The question would have entirely different semantics, if we had a situation where Unicode had not sparated out the codes for numeric characters.
Then, displaying the different glyphs as appropriate would simply be a matter of using the appropriate font. On the other hand, it would not have been possible to simply write out the different characters as I did below without changing fonts. (The situation is not exactly perfect as fonts do not necessarily cover the whole range of the 16-bit Unicode set, let alone the 32-bit Unicode set.)
9, ٩ (Arabic), ۹ (Urdu), 玖 (Chinese, complex), ๙ (Thai), ௯ (Tamil) etc.
Now, assuming we accept Unicode semantics i.e. that '9' ,'٩', and '۹' are distinct characters, we may conclude that the question is not about appearance (something that would have been in the purview of CSS), but of transformation -- a few thoughts about this later, for now let us assume this is the case.
When focusing on character-level semantics, the situation is not too dissimilar with what happens with alphabets and letters. For instance, Greek 'α' and Latin 'a' are considered distinct, even though the Latin alphabet is nearly identical to the Greek alphabet used in Euboea. Perhaps even more dramatically, the corresponding capital variants, 'Α' (Greek) and 'A' (Latin) are visually identical in practically all fonts supporting both scripts, yet distinct as far as Unicode is concerned.
Having stated the ground rules, let us see how the question can be answered by ignoring them, and in particular ignoring (character-level) Unicode semantics.
(Horrible, nasty and non-backwards compatible) Solution: Use fonts that map '0' to '9' to the desired glyphs. I am not aware of any such fonts. You would have to use #font-face and some font that has been appropriately hacked to do what you want.
Needless to say, I am not particularly fond of this solution. However, it is the only simple solution I am aware of that does what the question asks "without changing character codes" on either the server or the client side. (Technically speaking the Cufon solution I propose below does not change the character codes either, but what it does, drawing text into canvases is vastly more complex and also requires tweaking open-source code).
Note: Any transformational solution i.e. any solution that changes the DOM and replaces characters in the range '0' to '9' to, say, their Arabic equivalents will break code that expects numerals to appear in their original form in the DOM. This problem is, of course, worst when discussing forms and inputs.
An example of an answer taking the transformational approach is would be:
$("[lang='fa']").find("*").andSelf().contents().each(function() {
if (this.nodeType === 3)
{
this.nodeValue = this.nodeValue.replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x0630);
});
}
});
Note: Code taken from VisioN's second jsFiddle. If this is the only part of this answer that you like, make sure you upvote VisioN's answer, not mine!!! :-)
This has two problems:
It messes with the DOM and as a result may break code that used to work assuming it would find numerals in the "standard" form (using digits '0' to '9'). See the problem here: http://jsfiddle.net/bKEbR/10/ For instance, if you had a field containing the sum of some integers the user inputs, you might be in for a surprise when you try to get its value...
It does not address the issue of what goes on inside input (and textarea) elements. If an input field is initialised with, say, "42", it will retail that value. This can be fixed easily, but then there is the issue of actual input... One may decide to change characters as they come, convert the values when they changes and so on and so forth. If such conversion is made then both the client side and the server side will need to be prepared to deal with different kinds of numeral. What comes out of the box in Javascript, jQuery and even Globalize (client-side), and ASP.NET, PHP etc. (server-side) will break if fed with numerals in non-standard formats ...
A slightly more comprehensive solution (taking care also of input/textarea elements, both their initial values and user input) might be:
//before the DOM change, test1 holds a numeral parseInt can understand
alert("Before: test holds the value:" +parseInt($("#test1").text()));
function convertNumChar(c) {
return String.fromCharCode(c.charCodeAt(0) + 0x0630);
}
function convertNumStr(s) {
return s.replace(/\d/g, convertNumChar);
}
//the change in the DOM
$("[lang='fa']").find("*").andSelf().contents()
.each(function() {
if (this.nodeType === 3)
this.nodeValue = convertNumStr(this.nodeValue);
})
.filter("input:text,textarea")
.each(function() {
this.value = convertNumStr(this.value)
})
.change(function () {this.value = convertNumStr(this.value)});
//test1 now holds a numeral parseInt cannot understand
alert("After: test holds the value:" +parseInt($("#test1").text()))
The entire jsFiddle can be found here: http://jsfiddle.net/bKEbR/13/
Needless to say, this only solves the aforementioned problems partially. Client-side and/or server-side code will have to recognise the non-standard numerals and convert them appropriately either to the standard format or to their actual values.
This is not a simple matter that a few lines of javascript will solve. And this is but the simplest case of such possible conversion since there is a simple character-to-character mapping that needs to be applied to go from one form of numeral to the other.
Another go at an appearance-based approach:
Cufon-based Solution (Overkill, Non-Backwards Compatible (requires canvas), etc.): One could relatively easily tweak a library like Cufon to do what is envisaged. Cufon can do its thing and draw glyphs on a canvas object, except that the tweak will ensure that when elements have a certain property, the desired glyphs will be used instead of the ones normally chosen. Cufon and other libraries of the kind tend to add elements to the DOM and alter the appearance of existing elements but not touch their text, so the problems with the transformational approaches should not apply. In fact it is interesting to note that while (tweaked) Cufon provides a clearly transformational apprroach as far as the overall DOM is concerned, it is an appearance-based solution as far as its mentality goes; I would call it a hybrid solution.
Alternative Hybrid-Solution: Create new DOM elements with the arabic content, hide the old elements but leave their ids and content intact. Synchronize the arabic content elements with their corresponding, hidden, elements.
Let's try to think outside the box (the box being current web standards).
The fact that certain characters are unique does not mean they are unrelated. Moreover, it does not necessarily mean that their difference is one of appearance. For instance, 'a' and 'A' are the same letter; in some contexts they are considered to be the same and in others to be different. Having, the distinction in Unicode (and ASCII and ISO-Latin-1 etc. before it) means that some effort is required to overcome it.
CSS offers a quick and easy way for changing the case of letters. For instance, body {text-transform:uppercase} would turn all letters in the text in the body of the page into upper case. Note that this is also a case of appearance-change rather than transformation: the DOM of the body element does not change, just the way it is rendered.
Note: If CSS supported something like numerals-transform: 'ar' that would probably have been the ideal answer to the question as it was phrased.
However, before we rush to tell the CSS committee to add this feature, we may want to consider what that would mean. Here, we are tackling a tiny little problem, but they have to deal with the big picture.
Output:
Would this numerals-transform feature work allow '10' (2-characters) to appear as 十(Chinese, simple), 拾 (Chinese, complex), X (Latin) (all 1-character) and so on if instead of 'ar', the appropriate arguments were given?
Input:
Would this numerals-transform feature change '十'(Chinese, simple) into its Arabic equivalent, or would it simply target '10'? Would it somehow cleverly detect that "MMXI" (Latin numeral for 2012) is a number and not a word and convert it accordingly?
The question of number representation is not as simple as one might imagine just looking at this question.
So, where does all this leave us:
There is no simple presentation-based solution. If one appears in the future, it will not be backwards compatible.
There can be a transformational "solution" here and now, but even if this is made to work also with form elements as I have done (http://jsfiddle.net/bKEbR/13/) there need to be server-side and client-side awareness of the non-standard format used.
There may be complex hybrid solutions. They are complex but offer some of the advantages of the presentation-based approaches in some cases.
A CSS solution would be nice, but actually the problem is big and complex when one looks at the big picture which involves other numeric systems (with less trivial conversions from and to the standard system), decimal points,signs etc.
At the end of the day, the solution I see as realistic and backwards compatible would be an extension of Globalize (and server-side equivalents) possibly with some additional code to take care of user input. The idea is that this is not a problem at the character-level (because once you consider the big picture it is not) and that it will have to be treated in the same way that differences with thousands and decimal separators have been dealt with: as formatting/parsing issues.
I imagine the best way is to use a regexp to search what numeric characters should be changed via adding a class name to the div that needs a different numeric set.
You can do this using jQuery fairly easy.
jsfiddle DEMO
EDIT: And if you don't want to use a variable, then see this revised demo:
jsfiddle DEMO 2
I have been working on a general web page localization technique that does more than just numbers (its similar to .po files)
The localization files are simple (the strings can contain html if needed)
/* Localization file - save as document_url.lang.js ... index.html.en.js: */
items=[
{"id":"string1","value":"Localized text of string1 here."},
{"id":"string2", "value":"۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ "}
];
rtl=false; /* set to true for rtl languages */
This format is useful to separate out for translators (or mechanical turk)
and a basic page template
<html><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<head><title>My title</title>
<style>.txt{float:left;margin-left:10px}</style>
</head>
<body onload='setLang()'>
<div id="string1" class="txt">This is the default text of string1.</div>
<div id="string2" class="txt">0 1 2 3 4 5 6 7 8 9 </div>
</body></html>
<script>
function setLang(){
for(var i=0;i<items.length;i++){
term=document.getElementById(items[i].id)
if(term)term.innerHTML=items[i].value
if(rtl){ /* for rtl languages */
term.style.styleFloat="right"
term.style.cssFloat="right"
term.style.textAlign="right"
}
}
}
var lang=navigator.userLanguage || navigator.language;
var script=document.createElement("script");
script.src=document.URL+"-"+lang.substring(0,2)+".js"
var head = document.getElementsByTagName('head')[0]
head.insertBefore(script,head.firstChild)
</script>
I tried to keep it pretty simple, yet cover as many locales as possible so additional css is likely required (I have to admit a lack of exposure to rtl languages, so many more styles may need to be set)
I do have font checking code that would be useful if you know what fonts support your character codes well
function hasFont(f){
var s=document.createElement("span")
s.style.fontSize="72px"
s.innerHTML="MWMWM"
s.style.visibility="hidden"
s.style.fontFamily=[(f=="monospace")?'':'monospace','sans-serif','serif']
document.body.appendChild(s)
var w=s.offsetWidth
s.style.fontFamily=[f,'monospace','sans-serif','serif']
document.body.lastChild=s
return s.offsetWidth!=w
}
usage: if(hasFont("myfont"))myelement.style.fontFamily="myfont"
A new (to date) and simple JS solution would be to use Intl.NumberFormat. It supports numeral localization, formatting variations as well as local currencies (see documentation for more examples).
To use an example very similar to MDN's own:
const val = 1234567809;
console.log('Eastern Arabic (Arabic-Egyptian)', new Intl.NumberFormat('ar-EG').format(val));
console.log('Persian variant (Farsi)',new Intl.NumberFormat('fa').format(val));
console.log('English (US)',new Intl.NumberFormat('en-US').format(val));
Intl.NumberFormat also seems to support string numeric values as well as indicates when it's not a number in the local language.
const val1 = '456';
const val2 = 'Numeric + string example, 123';
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val1));
console.log('Eastern Arabic', new Intl.NumberFormat('ar-EG').format(val2));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val1));
console.log('Persian variant',new Intl.NumberFormat('fa').format(val2));
console.log('English',new Intl.NumberFormat('en-US').format(val1));
console.log('English', new Intl.NumberFormat('en-US').format(val2));
For the locale identifier (string passed to NumberFormat constructor indicating locale), I experimented with the values above and they seemed fine. I tried finding a list for all possible values, and through MDN came across this documentation and this list that could be helpful.
I'm not familiar with SEO, and am thus unsure how this answers that part of the question.
you can try this:
This is CSS source code:
#font-face
{
font-family: A1Tahoma;
src: url(yourfont.eot) format('eot')
, url(yourfont.ttf) format('truetype')
, url(yourfont.woff) format('woff')
, url(yourfont.svg) format('svg');
}
p{font-family:A1Tahoma; font-size:30px;}
And this is HTML code:
<p>سلام به همه</p>
<p>1234567890</p>
And finally you will see your result.remember that 4 font types use for any browser such as IE,FIREFOX and so on.
"salam reza ,to mituni in karo anjam bedi ta un fonte dekhaheto be site ezafe koni."
I have created a jquery plugin that can convert Western Arabic numbers to Eastern ones (Persian only). But it can be extended to convert a number to any desired numeral system. My jQuery plugin has two advantages:
Detect and convert numbers properly in child nodes.
Detect and convert point characters appropriately.
You can clone this plugin from github.
My plugin code:
(function( $ ){
$.fn.persiaNumber = function() {
var groupSelection = this;
for(i=0; i< groupSelection.length ; i++){
var htmlTxt = $(groupSelection[i]).html();
var trueTxt = convertDecimalPoint(htmlTxt);
trueTxt = convertToPersianNum(trueTxt);
$(groupSelection[i]).html(trueTxt);
}
function convertToPersianNum(htmlTxt){
var otIndex = htmlTxt.indexOf("<");
var ctIndex = htmlTxt.indexOf(">");
if(otIndex == -1 && ctIndex == -1 && htmlTxt.length > 0){
var trueTxt = htmlTxt.replace(/1/gi, "۱").replace(/2/gi, "۲").replace(/3/gi, "۳").replace(/4/gi, "۴").replace(/5/gi, "۵").replace(/6/gi, "۶").replace(/7/gi, "۷").replace(/8/gi, "۸").replace(/9/gi, "۹").replace(/0/gi, "۰");
return trueTxt;
}
var tag = htmlTxt.substring(otIndex,ctIndex + 1);
var str = htmlTxt.substring(0,otIndex);
str = convertDecimalPoint(str);
str = str.replace(/1/gi, "۱").replace(/2/gi, "۲").replace(/3/gi, "۳").replace(/4/gi, "۴").replace(/5/gi, "۵").replace(/6/gi, "۶").replace(/7/gi, "۷").replace(/8/gi, "۸").replace(/9/gi, "۹").replace(/0/gi, "۰");
var refinedHtmlTxt = str + tag;
var htmlTxt = htmlTxt.substring(ctIndex + 1, htmlTxt.length);
if(htmlTxt.length > 0 && otIndex != -1 || ctIndex != -1){
var trueTxt = refinedHtmlTxt;
var trueTxt = trueTxt + convertToPersianNum(htmlTxt);
}else{
return refinedHtmlTxt+ htmlTxt;
}
return trueTxt;
}
function convertDecimalPoint(str){
for(j=1;j<str.length - 1; j++){
if(str.charCodeAt(j-1) > 47 && str.charCodeAt(j-1) < 58 && str.charCodeAt(j+1) > 47 && str.charCodeAt(j+1) < 58 && str.charCodeAt(j) == 46)
str = str.substring(0,j) + '٫' + str.substring(j+1,str.length);
}
return str;
}
};
})( jQuery );
http://jsfiddle.net/VPWmq/2/
You can convert numbers in this way:
const persianDigits = ['۰', '۱', '۲', '۳', '۴', '۵', '۶', '۷', '۸', '۹'];
const number = 44653420;
convertedNumber = String(number).replace(/\d/g, function(digit) {
return persianDigits[digit]
})
console.log(convertedNumber) // ۴۴۶۵۳۴۲۰
If anyone is looking for localizing into Bangla numbers using this code shifting method:
$("[lang='bang']").text(function(i, val) {
return val.replace(/\d/g, function(v) {
return String.fromCharCode(v.charCodeAt(0) + 0x09B6);
});
});
You can also visit here to see the UNICODE of ASCII Hexadecimal codes of Bangla

Categories

Resources