Multiple font-families for bidi text - javascript

I have unmarked bidi text like
<span>The verb קָרָא has no expressed subject</span>
I want to style it (in the browser) depending on its direction: That is, I want a Hebrew font for the Hebrew and an English font for the English. I don't suppose there's any way of doing it based on whether it's ltr or rtl when it's not marked but perhaps there's some way using unicode ranges or something.
I could use regex to put spans around text that falls in particular unicode ranges but I'm hoping for a CSS only solution (or one that doesn't involve extra markup - it's user generated text and the user needs to be able to modify it as well).
Something like (I know this doesn't exist):
.bidi-text {
font-family-rtl: "EZRA SIL"
font-family-ltr: "Comic Sans"
}
Otherwise, maybe there's a better js solution? Or some way to tell the browser to use certain fonts for certain glyps?

Related

Javascript wrongly decodes unicode

I'm trying to get an id of a font-awesome icon. It is located in ::before style. When i use
window.getComputedStyle(document.querySelector("[id='5']"), '::before').getPropertyValue('content')
to get it, instead of "\f458", "\"\"" is returned.
I assume that JavaScript is trying to convert the code into a char but fails. Is there any way to prevent this?
The decoding works, the problem is the font. If there's no match for this character in a font you use, it'll be mangled or in the form of the unicode value in a box.
Since it's in the private area, depending on the font it might be resolved into a glyph or be mangled or be just empty/space.
Checking in the Font Awesome Cheatsheet it looks like a an icon for quidditch.
Perhaps there's a text to image/svg map somewhere on the internet (and if not, then just copy-paste localy and create one) which you might use if the font itself isn't good or you are decoding in a problematic environment (can't install fonts, etc).

HTML/JS end of line punctuation wrongly aligns to the left

I encountered a strange display of punctuation within DIV elements. In my HTML the text is something like:
This is just some
random text...!!
But in the browser window, it systematically becomes:
This is just some
!!...random text
I am using the code from IntroJS, and I wonder if this has to do with default formatting of right-to-left languages (such as Persian or Arabic). I am guessing this because also trying to select the text from the DIV only works when clicking top right to bottom left.
Point is, I don't know how to remove this formatting or setting in order for punctuation to display correctly in English.
Anyone encountered this before?
See if any of your CSS has direction: rtl. If your intention is not to support RTL, then removing this should fix the problem.
If you do need to support it, then I recommend this excellent (but long!) article: http://moriel.smarterthanthat.com/tips/the-language-double-take-dealing-with-bidirectional-text-or-wait-tahw/
TLDR: the reason your punctuation changes order is due to the weak directionality of certain characters... and it's a right PITA when dealing with multilingual sites that mix LTR and RTL!

Suporting RTL languages

I want to make my site supporting both LTR and RTL languages.
What I want is, if text loaded in some element is RTL then switch direction to RTL. Also for inputs, when user type text, it should trigger if it's RTL and change direction to RTL.
Like Facebook is doing it for example. If you type some Arabic text in search it will automatically switch direction to RTL
Didn't found any practice tutorial by googling, any script or so.
I only found attribute dir="auto" which automatically triggers correct direction but looks like this it is not supported with older Browsers.
Any advice, tutorial, script how to do this would help.
If you only want to support switching the textbox context from LTR to RTL when the user types, then you will have to listen to the input events (input, keypress, keydown, etc, whichever works best for your case) and let the code decide whether the textbox is LTR aligned or RTL aligned.
You should note, though, that the algorithm for this is not all that straight forward, and that different products work differently. A few examples -
Facebook uses an algorithm that, for the most part, tries to recognize the first "strong" character, so typing a sentence with one Hebrew word followed by a lot of English will still show the paragraph as RTL aligned. (They also seem to have a difference between what you see when you type and what you see when the comment is posted but that's a different issue)
Google hangouts seems to switch its RTL/LTR contexts based on the number of strong characters in each direction. As you type, your context may switch several times from LTR to RTL if you start typing one language over the other.
There is no right or wrong here, there's only preference and what works best as your algorithm.
You can read about "strong characters" in the Unicode Bidirectional Algorithm here: http://unicode.org/reports/tr9/
You can see an example of how to recognize the first "strong character" in a string for embedding purposes in MediaWiki's language file, with the regex that tests directionality (group 1 is LTR and group 2 RTL) You can use these to create a JavaScript method that sets your textarea's dir="" attribute based on either the first strong character or the majority of characters, as you see fit:
https://github.com/wikimedia/mediawiki/blob/6f19bac69546b8a5cc06f91a81e364bf905dee7f/languages/Language.php#L174
As a side note, I will just point out that supporting RTL/LTR online is not just about typing and textboxes. Changing between LTR and RTL contexts also involves UI adjustments, like mirroring the alignment of the content and/or the positions of things like menus and the logo.
This is relevant if you want to allow your page to be translated to an RTL language, which means you will need to also mirror the layout. If your only goal is to switch contexts in the textbox, you shouldn't worry about this, but if you want to make sure the site allows for translation, you need to consider methods of mirroring your UI and your entire interface.

How to determine font-weight bold without using getComputedStyle

I'm in the process of making an HTML text parser and I would like to be able to determine when a text node appears as a header (visually, not HTML headers).
One thing that can usually be said about headers are that they are emphasized - usually in one of two ways: Bold font or larger font size.
I could get both corresponding CSS values using getComputedStyle(), but I want to avoid this because the parser needs high performance (has to run smoothly on, for example, Chromebooks) and getComputedStyle() is not particularly fast when looking through hundreds or thousands of nodes.
Figuring out a font size isn't too hard - I can just select the node with range and check its client rects from range.getClientRects().I haven't figured out a smart way to check font weight though, which is why I'm here.
Can anyone think of higher-performance way of doing this than by using getComputedStyle()?
I'm aware this might not be possible - just looking to see if someone can think of an ingenious way to solve this problem.
Edit
This is for a Google Chrome extension only.
What you're aiming to do here is really messy. Since you want to determine if text is bold visually, on some devices, depending on how they render text, the whole system may just break!
A suggestion I have is to use the HTML5 Data atrributes - find out more here - and use it like so:
<div class="header" data-bold="yes">This will appear bold?</div>
Then, using JavaScript you can just go over all div elements with the data-bold attribute.
I hope this helped ;)

Chinese characters encoding

I'm working on a multi-language website. I have a problem with the color of the Chinese characters. My text color is #333333 but the Chinese characters appear darker than the occidental chars. My content comes from a database.
I thought to do it with Javascript / jQuery. The script detects the Unicode from the paragraph with the .fromCharCode() function. But what I read was that function expects an integer and the Unicode for Chinese chars are not integers. And that should be the reason my function is not working.
EDIT
Here's an image from what I got:
My function to check for the Unicode:
if($('#container p').fromCharCode(4E00)){
alert('Chinese');
}
Any help?
The screenshot suggests that different characters have been taken from different fonts. This often happens when the primary font does not contain all the relevant characters. So the odds are that you are trying to solve the wrong problem. Perhaps you should just consider making a font suggestion that is suitable for all the characters that will appear in the content.
The code snippet is in error in several ways. For example, 4E00 should be 0x4E00. And even that way, you would check for a single character only.
You need to post the full code, or a URL, or both, to get more constructive help.
Your problem is that you are displaying Simplified Chinese in a font that was designed for Traditional Chinese. So when the display engine hits a character that's Simplified (and thus not in the Traditional font), it takes the default simplified font and uses that instead. Then it reverses back to the Traditional font. Hence the unseemly look.
You need to look into what would be the most common Simplified Chinese font (or font family) and use that specifically for Simplified Chinese texts. Something like Heiti TC and Heiti SC.

Categories

Resources