Chinese characters encoding

Chinese characters encoding - javascript

I'm working on a multi-language website. I have a problem with the color of the Chinese characters. My text color is #333333 but the Chinese characters appear darker than the occidental chars. My content comes from a database.
I thought to do it with Javascript / jQuery. The script detects the Unicode from the paragraph with the .fromCharCode() function. But what I read was that function expects an integer and the Unicode for Chinese chars are not integers. And that should be the reason my function is not working.
EDIT
Here's an image from what I got:
My function to check for the Unicode:
if($('#container p').fromCharCode(4E00)){
alert('Chinese');
}
Any help?

The screenshot suggests that different characters have been taken from different fonts. This often happens when the primary font does not contain all the relevant characters. So the odds are that you are trying to solve the wrong problem. Perhaps you should just consider making a font suggestion that is suitable for all the characters that will appear in the content.
The code snippet is in error in several ways. For example, 4E00 should be 0x4E00. And even that way, you would check for a single character only.
You need to post the full code, or a URL, or both, to get more constructive help.

Your problem is that you are displaying Simplified Chinese in a font that was designed for Traditional Chinese. So when the display engine hits a character that's Simplified (and thus not in the Traditional font), it takes the default simplified font and uses that instead. Then it reverses back to the Traditional font. Hence the unseemly look.
You need to look into what would be the most common Simplified Chinese font (or font family) and use that specifically for Simplified Chinese texts. Something like Heiti TC and Heiti SC.

Related

Javascript wrongly decodes unicode

I'm trying to get an id of a font-awesome icon. It is located in ::before style. When i use
window.getComputedStyle(document.querySelector("[id='5']"), '::before').getPropertyValue('content')
to get it, instead of "\f458", "\"\"" is returned.
I assume that JavaScript is trying to convert the code into a char but fails. Is there any way to prevent this?

The decoding works, the problem is the font. If there's no match for this character in a font you use, it'll be mangled or in the form of the unicode value in a box.
Since it's in the private area, depending on the font it might be resolved into a glyph or be mangled or be just empty/space.
Checking in the Font Awesome Cheatsheet it looks like a an icon for quidditch.
Perhaps there's a text to image/svg map somewhere on the internet (and if not, then just copy-paste localy and create one) which you might use if the font itself isn't good or you are decoding in a problematic environment (can't install fonts, etc).

Can a font omit the space character?

I'm working on a font detection library that needs to be very, very small, suitable for including inline on every page of a website. I've already gotten it pretty small (417 bytes gzipped). You can check it out at the Github repo.
Upon further thinking, I can reduce the size of the library significantly again (around another 10%) if I can make a few basic assumptions:
It is impossible to define a working font file (for browsers, anyway) without a space character/definition.
OR, if such a space-omitting font could be created, that all modern browsers and IE9+ will choose the fallback font's space character when a space character is needed. That is, there is otherwise nothing special about the space character in terms of font-fallback.
And, tangentially, all modern browsers and IE9+ will have a different width for the space character in serif and monospace generic fonts provided by the browser (probable, but I will need to test further).
I just attempted to explicitly define a font without a space character using FontSquirrel's font generator. This was done by explicitly omitting the space from the selected Subset and disabling the 'Fix Missing Glyphs' for space. FontSquirrel still generated a font with a space character with a width differing from both serif and monospace.
I understand that some languages do not have a space character in the traditional sense, but due to the nature of font file formats and definitions, I do not think that fonts tailored to such a language could or would omit a space character.
If these assumptions all hold, the library could remove the need to support custom text checking and reduce the number of tests from 3 to 2, also speeding up the library and reducing its memory usage. The new size would be around 380 bytes or less when gzipped.
So how about it, font experts? Is it possible to define a valid font without a space character definition? If there is such a font, can you provide an example?

It appears that while it is possible, it is virtually unheard-of to create a font without a space character. Even icon font services appear to include it. Other than a special case for a font detection library, there are no known fonts that omit the space character.
This research led to an improvement to my onfontready library. Details here. Thanks to Mike 'Pomax' Kamermans for his contributions.

bidirectional text - visual to logical

I'm drawing texts on the screen letter by letter.
In English it is very simple, because the text is LTR so the letters are saved in the String in the same order they're shown.
When drawing RTL text than I need to switch the direction of the printing. but when there are letters and numbers and English and some RTL language.. than the mess starts.
For Ex.
ex.1: שלום לכם
ש- is the first letter in the string - but as we can see that it shown the last
ex.2: שלום to all
ש- is the first letter in the string- but as we can see that it is shown in the middle, before the English starts.
It is getting more complicated when numbers and math signs are getting into the picture, and special characters like '(', ')' that needed to be flipped...
Found many Bidi algorithm online that changes the logical order of the letters in the string to visual one. So when i run from left to right on the converted string i'm sure that the string will print properly.
BUT,
They are never perfect. There are cases that they are not working properly.
None of them considering the direction of the text as well (means when we press the right Ctrl+Shift on the keyboard than the visualization is changed again)
My questions are
does anybody know some bulletproof Bidi algorithm i can use to change the string from what it saved in the memory to visual order?
Is there a simpler way to solve my problem ? maybe somehow get the browser algorithm for it..

After searching for long time,
I've found that DOJO (and luckily it is the tool kit that i'm using),
has has a BIDI engine for drawing it's own UI controls, that gets few layout parameters to handle some cases of RTL, LRT, and contextual directions as well.
If this is helping someone -
http://bill.dojotoolkit.org/api/1.9/dojox/string/BidiEngine
Found another link that might help to a none DOJO developers -https://github.com/ibm-js/dbidi, but I have not check it yet

How to check special character support with javascript?

How can I check if a special character is available in the user's computer?
For example: ♥ ♦ ♣ ♠ ♪ ♫ ¶
If the user's browser doesn't support one of them, then a rectangle (󴈿) will appear instead of the symbol.

I’m afraid there’s no way to test it, and there’s the added complexity that even if a character is available, browsers (especially IE) may fail to render it.
On the other hand, the information would not be particularly useful, except perhaps in the sense that you could dynamically change the character to an image if it can’t be rendered as a character.
A better approach to having your characters rendered properly is to write your style sheets so that they select suitable fonts. This also addresses the problem that a character might be displayed using a font that does not suit the overall design, such as the basic copytext font.
For example, if you need the characters ♥ ♦ ♣ ♠ ♪ ♫ ¶, select a font that contains them and all the other characters you need. This would probably boil down just to
body { font-family: Arial, sans-serif; }

a different solution can be using google's webfonts
http://www.google.com/webfonts
if you see a character, every other computer [with a recent browser] will see it

Urdu characters joining problem

I'm trying to write Urdu language characters in a div using javascript. The problem is that they dont change their shape when i write two characters that should have different shape when written together. For example ﺝ and ا when written together should look as جا. They dont merge with each other. Similar is the problem with other characters. Please help!

I got the answer. Actually I was copying other characters that visually looked the same but where not the one I needed. For example, one character from urdu and other from arabic will not join properly. So whenever copying characters, even if they look same, do consider that they may have different unicodes for different languages.

Develop Reference

JavaScript is the programming language of the Web.