bidirectional text - visual to logical - javascript

I'm drawing texts on the screen letter by letter.
In English it is very simple, because the text is LTR so the letters are saved in the String in the same order they're shown.
When drawing RTL text than I need to switch the direction of the printing. but when there are letters and numbers and English and some RTL language.. than the mess starts.
For Ex.
ex.1: שלום לכם
ש- is the first letter in the string - but as we can see that it shown the last
ex.2: שלום to all
ש- is the first letter in the string- but as we can see that it is shown in the middle, before the English starts.
It is getting more complicated when numbers and math signs are getting into the picture, and special characters like '(', ')' that needed to be flipped...
Found many Bidi algorithm online that changes the logical order of the letters in the string to visual one. So when i run from left to right on the converted string i'm sure that the string will print properly.
BUT,
They are never perfect. There are cases that they are not working properly.
None of them considering the direction of the text as well (means when we press the right Ctrl+Shift on the keyboard than the visualization is changed again)
My questions are
does anybody know some bulletproof Bidi algorithm i can use to change the string from what it saved in the memory to visual order?
Is there a simpler way to solve my problem ? maybe somehow get the browser algorithm for it..

After searching for long time,
I've found that DOJO (and luckily it is the tool kit that i'm using),
has has a BIDI engine for drawing it's own UI controls, that gets few layout parameters to handle some cases of RTL, LRT, and contextual directions as well.
If this is helping someone -
http://bill.dojotoolkit.org/api/1.9/dojox/string/BidiEngine
Found another link that might help to a none DOJO developers -https://github.com/ibm-js/dbidi, but I have not check it yet

Related

how to match continuous string input to passage in javascript, while showing mistakes?

Hey guys I am working on an application and so far its going well but while working on a recent features i am a little stuck.
Here is my problem:
using java script I am getting continuous user speech input and transcribing it. While I am getting this speech input i want to be able to identify and highlight the text on the screen that is being read. I want to highlight mistakes (words spoken that don't match) and I want to highlight all the content that is going correct.
I am not asking for someone to code this for me, i just want to be pointed in the right direction.
A similar implementation you have probably seen is in online typing games. where you try to type parts of a passage as fast as possible and it highlights the ones you are getting right and the ones that you are getting wrong.
Any help is appreciates, libraries, algorithms, methods, or terms I should search. Thank you !
Are you indexing the text at all? Do you know the text in advance? If you created an in memory graph database using each word in the text, you could search edges to find 'weighted' hits. It's ambitious, but there's an article here:
https://graphaware.com/neo4j/2016/07/07/mining-and-searching-text-with-graph-databases.html
If you want to go dirt simple and follow your typing game analogy:
In the typing game an event is fired each input (keypress).
The key pressed is compared to the expected one.
If it is not correct it is flagged as wrong.
There is usually no way to go back and correct the mistakes.
The user has to type the next expected letter correctly to
get things rolling correctly again.
You could do the same thing.
Underline the next expected word.
Each word (whitespace) is an event.
Match the text to speech word to the expected word.
If it is not right, flag it as wrong, strike it out.
The user has to say the next underlined word correctly to start things going smoothly again.
You could allow the user to back space the expected word so he/she could start over where they wanted to.
This will have some hiccups, as all things speech to text do, but it will work like your typing game and be simple to implement.

text string / word cloud ignore line break symbol

I have an implementation of the popular d3 word cloud. It has been working great for documents that are very lengthy, but I have been having an issue when I tried to re-purpose the word cloud to visualize a very simple txt file of the following format:
interesting. interesting. interesting.
boring. boring.
amazing. amazing. amazing.
stupid. stupid. stupid.
average. average.
disappointing.
(ect...)
It's basically just a txt file that has around 20 words. Each word repeats anywhere from 1 to 5 times. I would like this information to govern the sizing of the font (repeated 5 times = big font, 1 time = small font). When I put all the words on the same line, the resulting word cloud strangely makes all words the same font size. So I started to play around with it and tried a few different things. The closest I got to the desired effect was with the above txt format, having a line break after each new word.
This did result in the correct font sizing, but it also created a new problem. When I passed word_count object to the console log, I got something like this:
"⏎boring":1,
"boring":1,
"⏎stupid":1,
"stupid":2,
This means that the word cloud has repeated words in the visual (for example, the word "stupid" in large font size and also "stupid" in small font size). To make things more confusing, this behavior was isolated to a few words. Some words did not have the line break sign in front of them (even though I still hit enter in notepad when I made the txt in the same exact was as the ones that did wind up with the ⏎ sign.)
My next stab at fixing this was to add the "⏎" sign to the list of ignored words, hoping that would make the word cloud library treat them as if they were the same. Unfortunately for me, the problem persisted.
That sums it up pretty well; it's also really easy to reproduce, just need notepad and copy and paste my above words to see what I'm talking about.
Let me know if anyone has ideas for troubleshooting further.
Minimalist Block:
https://bl.ocks.org/diggetybo/cd644316f52465495f39c8fc27f04de8
(refreshing page will randomize the layout)

HTML/JS end of line punctuation wrongly aligns to the left

I encountered a strange display of punctuation within DIV elements. In my HTML the text is something like:
This is just some
random text...!!
But in the browser window, it systematically becomes:
This is just some
!!...random text
I am using the code from IntroJS, and I wonder if this has to do with default formatting of right-to-left languages (such as Persian or Arabic). I am guessing this because also trying to select the text from the DIV only works when clicking top right to bottom left.
Point is, I don't know how to remove this formatting or setting in order for punctuation to display correctly in English.
Anyone encountered this before?
See if any of your CSS has direction: rtl. If your intention is not to support RTL, then removing this should fix the problem.
If you do need to support it, then I recommend this excellent (but long!) article: http://moriel.smarterthanthat.com/tips/the-language-double-take-dealing-with-bidirectional-text-or-wait-tahw/
TLDR: the reason your punctuation changes order is due to the weak directionality of certain characters... and it's a right PITA when dealing with multilingual sites that mix LTR and RTL!

Suporting RTL languages

I want to make my site supporting both LTR and RTL languages.
What I want is, if text loaded in some element is RTL then switch direction to RTL. Also for inputs, when user type text, it should trigger if it's RTL and change direction to RTL.
Like Facebook is doing it for example. If you type some Arabic text in search it will automatically switch direction to RTL
Didn't found any practice tutorial by googling, any script or so.
I only found attribute dir="auto" which automatically triggers correct direction but looks like this it is not supported with older Browsers.
Any advice, tutorial, script how to do this would help.
If you only want to support switching the textbox context from LTR to RTL when the user types, then you will have to listen to the input events (input, keypress, keydown, etc, whichever works best for your case) and let the code decide whether the textbox is LTR aligned or RTL aligned.
You should note, though, that the algorithm for this is not all that straight forward, and that different products work differently. A few examples -
Facebook uses an algorithm that, for the most part, tries to recognize the first "strong" character, so typing a sentence with one Hebrew word followed by a lot of English will still show the paragraph as RTL aligned. (They also seem to have a difference between what you see when you type and what you see when the comment is posted but that's a different issue)
Google hangouts seems to switch its RTL/LTR contexts based on the number of strong characters in each direction. As you type, your context may switch several times from LTR to RTL if you start typing one language over the other.
There is no right or wrong here, there's only preference and what works best as your algorithm.
You can read about "strong characters" in the Unicode Bidirectional Algorithm here: http://unicode.org/reports/tr9/
You can see an example of how to recognize the first "strong character" in a string for embedding purposes in MediaWiki's language file, with the regex that tests directionality (group 1 is LTR and group 2 RTL) You can use these to create a JavaScript method that sets your textarea's dir="" attribute based on either the first strong character or the majority of characters, as you see fit:
https://github.com/wikimedia/mediawiki/blob/6f19bac69546b8a5cc06f91a81e364bf905dee7f/languages/Language.php#L174
As a side note, I will just point out that supporting RTL/LTR online is not just about typing and textboxes. Changing between LTR and RTL contexts also involves UI adjustments, like mirroring the alignment of the content and/or the positions of things like menus and the logo.
This is relevant if you want to allow your page to be translated to an RTL language, which means you will need to also mirror the layout. If your only goal is to switch contexts in the textbox, you shouldn't worry about this, but if you want to make sure the site allows for translation, you need to consider methods of mirroring your UI and your entire interface.

Word Cloud for Other Languages

I using JasonDavies's Word Cloud for my project, but there is a problem that I using Persian[Farsi] Strings and my problem here that words have overlapping in Svg.
This is my project's output:
What happened to the Farsi words?
As explained on the About page for the project, the generator needs to retrieve the shape of a glyph to be able to compute where it is "safe" to put other words. The about page explains the process in much more detail, but here's what we care for:
Glyphs are rendered individually to a hidden <canvas> element.
Pixel data is retrieved
Bounding boxes are derived
The word cloud is generated.
Now, the critical insight is that in Western (and many other) scripts, glyphs don't change shape based on context often. Yes, there are such things as ligatures, but they are generally rare, and definitely not necessary for the script.
In Persian, however, the glyph shape will change based on context. For non-Persian readers, look at ی and س which, when combined, become یس. Yes, that last one is two glyphs!
The algorithm actually has no problem dealing with Persian characters, as you can see by hacking the demo on the about page, putting a breakpoint just after the d.code is generated, to be able to modify it:
Replacing it with 1740, which is the charCode for the first Persian glyph above, and letting the algorithm run, shows beautiful and perfectly correct bounding boxes around the glyph:
The issue is that when the word cloud is actually rendered, the glyph is placed in context and... changes shape. The generator doesn't know this, though, and continues to use the old bounding data to place other words, thus creating the overlapping you witnessed. In addition, there is probably also an issue around right-to-left handling of text, which certainly would not help.
I would encourage you to take this up the author of the generator directly. The project has a GitHub page: https://github.com/jasondavies/d3-cloud so opening an issue there (and maybe referring back to this answer) would help!

Categories

Resources