HTML/JS end of line punctuation wrongly aligns to the left

HTML/JS end of line punctuation wrongly aligns to the left - javascript

I encountered a strange display of punctuation within DIV elements. In my HTML the text is something like:
This is just some
random text...!!
But in the browser window, it systematically becomes:
This is just some
!!...random text
I am using the code from IntroJS, and I wonder if this has to do with default formatting of right-to-left languages (such as Persian or Arabic). I am guessing this because also trying to select the text from the DIV only works when clicking top right to bottom left.
Point is, I don't know how to remove this formatting or setting in order for punctuation to display correctly in English.
Anyone encountered this before?

See if any of your CSS has direction: rtl. If your intention is not to support RTL, then removing this should fix the problem.
If you do need to support it, then I recommend this excellent (but long!) article: http://moriel.smarterthanthat.com/tips/the-language-double-take-dealing-with-bidirectional-text-or-wait-tahw/
TLDR: the reason your punctuation changes order is due to the weak directionality of certain characters... and it's a right PITA when dealing with multilingual sites that mix LTR and RTL!

Related

Javascript retrieve linebreaks from dom [duplicate]

I need to add line breaks in the positions that the browser naturally adds a newline in a paragraph of text.
For example:
This is some very long text \n that spans a number of lines in the paragraph.
This is a paragraph that the browser chose to break at the position of the \n
I need to find this position and insert a 
Does anyone know of any JS libraries or functions that are able to do this?
The only solutuion that I have found so far is to remove tokens from the paragraph and observe the clientHeight property to detect a change in element height. I don't have time to finish this and would like to find something that's already tested.
Edit:
The reason I need to do this is that I need to accurately convert HTML to PDF. Acrobat renders text narrower than the browser does. This results in text that breaks in different positions. I need an identical ragged edge and the same number of lines in the converted PDF.
Edit:
#dtsazza: Thanks for your considered answer. It's not impossible to produce a layout editor that almost exactly replciates HTML I've written 99% of one ;)
The app I'm working on allows a user to create a product catalogue by dragging on 'tiles' The tiles are fixed width, absolutely positioned divs that contain images and text. All elemets are styled so font size is fixed. My solution for finding \n in paragraph is ok 80% of the time and when it works with a given paragrah the resulting PDF is so close to the on-screen version that the differences do not matter. Paragraphs are the same height (to the pixel), images are replaced with high res versions and all bitmap artwork is replaced with SVGs generated server side.
The only slight difference between my HTML and PDF is that Acrobat renderes text slightly more narrowly which results in line slightly shorter line length.
Diodeus's solution of adding span's and finding their coords is a very good one and should give me the location of the BRs. Please remember that the user will never see the HTML with the inserted BRs - these are added so that the PDF conversion produces a paragraph that is exactly the same size.
There are lots of people that seem to think this is impossible. I already have a working app that created extremely accurate HTML->PDF conversion of our docs - I just need a better solution of adding BRs because my solution sometimes misses a BR. BTW when it does work my paragraphs are the same height as the HTML equivalents which is the result we are after.
If anyone is interested in the type of doc i'm converting then you can check ou this screen cast:
http://www.localsa.com.au/brochure/brochure.html
Edit: Many thanks to Diodeus - your suggestion was spot on.
Solution:
for my situation it made more sense to wrap the words in spans instead of the spaces.
var text = paragraphElement.innerHTML.replace(/ /g, ' ');
text = ""+text+""; //wrap first and last words.
This wraps each word in a span. I can now query the document to get all the words, iterate and compare y position. When y pos changes add a br.
This works flawlessly and gives me the results I need - Thank you!

I would suggest wrapping all spaces in a span tag and finding the coordinates of each tag. When the Y-value changes, you're on a new line.

I don't think there's going to be a very clean solution to this one, if any at all. The browser will flow a paragraph to fit the available space, linebreaking where needed. Consider that if a user resizes the browser window, all the paragraphs will be rerendered and almost certainly will change their break positions. If the user changes the size of the text on the page, the paragraphs will be rerendered with different line break points. If you (or some script on your page) changes the size of another element on the page, this will change the amount of space available to a floating paragraph and again - different line break points.
Besides, changing the actual markup of your page to mimic something that the browser does for you (and does very well) seems like the wrong approach to whatever you're doing. What's the actual problem you're trying to solve here? There's probably a better way to achieve it.
Edit: OK, so you want to render to PDF the same as "the screen version". Do you have a specific definitive screen version nominated - in terms of browser window dimensions, user stylesheets, font preferences and adjusted font size? The critical thing about HTML is that it deliberately does not specify a specific layout. It simply describes what is on the page, what they are and where they are in relation to one another.
I've seen several misguided attempts before to produce some HTML that will exactly replicate a printed creative, designed in something like a DTP application where a definitive absolute layout is essential. Those efforts were doomed to failure because of the nature of HTML, and doing it the other way round (as you're trying to) will be even worse because you don't even have a definitive starting point to work from.
On the assumption that this is all out of your hands and you'll have to do it anyway, my suggestion would be to give up on the idea of mangling the HTML. Look at the PDF conversion software - if it's any good it should give you some options for font kerning and similar settings. Playing around with the details here should get you something that approximates the font rendering in the browser and thus breaks lines at the same places.
Failing that, all I can suggest is taking screenshots of the browser and parsing these with OCR to work out where the lines break (it shouldn't require a very accurate OCR since you know what the raw text is anyway, it essentially just has to count spaces). Or perhaps just embed the screenshot in the PDF if text search/selection isn't a big deal.
Finally doing it by hand is likely the only way to make this work definitively and reliably.
But really, this is still just wrong and any attempts to revise the requirements would be better. Keep going up one step in the chain - why does the PDF have to have the exact same ragged edge as some arbitrary browser rendering? Can you achieve that purpose in another (better) way?

Sounds like a bad idea when you account for user set font sizes, MS Windows accessibility mode, and the hundreds of different mobile devices. Let the browser do it's thing - trying to have exact control over the rendering will only cause you hours of frustration.

I don't think you'll be able to do this with any kind of accuracy without embedding Gecko/WebKit/Trident or essentially recreating them.

Maybe an alternative: do all line-breaks yourself, instead of relying on the browser. Place all text in pre tags, and add your own linebreaks. Now at least you don't have to figure out where the browser put them.

Suporting RTL languages

I want to make my site supporting both LTR and RTL languages.
What I want is, if text loaded in some element is RTL then switch direction to RTL. Also for inputs, when user type text, it should trigger if it's RTL and change direction to RTL.
Like Facebook is doing it for example. If you type some Arabic text in search it will automatically switch direction to RTL
Didn't found any practice tutorial by googling, any script or so.
I only found attribute dir="auto" which automatically triggers correct direction but looks like this it is not supported with older Browsers.
Any advice, tutorial, script how to do this would help.

If you only want to support switching the textbox context from LTR to RTL when the user types, then you will have to listen to the input events (input, keypress, keydown, etc, whichever works best for your case) and let the code decide whether the textbox is LTR aligned or RTL aligned.
You should note, though, that the algorithm for this is not all that straight forward, and that different products work differently. A few examples -
Facebook uses an algorithm that, for the most part, tries to recognize the first "strong" character, so typing a sentence with one Hebrew word followed by a lot of English will still show the paragraph as RTL aligned. (They also seem to have a difference between what you see when you type and what you see when the comment is posted but that's a different issue)
Google hangouts seems to switch its RTL/LTR contexts based on the number of strong characters in each direction. As you type, your context may switch several times from LTR to RTL if you start typing one language over the other.
There is no right or wrong here, there's only preference and what works best as your algorithm.
You can read about "strong characters" in the Unicode Bidirectional Algorithm here: http://unicode.org/reports/tr9/
You can see an example of how to recognize the first "strong character" in a string for embedding purposes in MediaWiki's language file, with the regex that tests directionality (group 1 is LTR and group 2 RTL) You can use these to create a JavaScript method that sets your textarea's dir="" attribute based on either the first strong character or the majority of characters, as you see fit:
https://github.com/wikimedia/mediawiki/blob/6f19bac69546b8a5cc06f91a81e364bf905dee7f/languages/Language.php#L174
As a side note, I will just point out that supporting RTL/LTR online is not just about typing and textboxes. Changing between LTR and RTL contexts also involves UI adjustments, like mirroring the alignment of the content and/or the positions of things like menus and the logo.
This is relevant if you want to allow your page to be translated to an RTL language, which means you will need to also mirror the layout. If your only goal is to switch contexts in the textbox, you shouldn't worry about this, but if you want to make sure the site allows for translation, you need to consider methods of mirroring your UI and your entire interface.

bidirectional text - visual to logical

I'm drawing texts on the screen letter by letter.
In English it is very simple, because the text is LTR so the letters are saved in the String in the same order they're shown.
When drawing RTL text than I need to switch the direction of the printing. but when there are letters and numbers and English and some RTL language.. than the mess starts.
For Ex.
ex.1: שלום לכם
ש- is the first letter in the string - but as we can see that it shown the last
ex.2: שלום to all
ש- is the first letter in the string- but as we can see that it is shown in the middle, before the English starts.
It is getting more complicated when numbers and math signs are getting into the picture, and special characters like '(', ')' that needed to be flipped...
Found many Bidi algorithm online that changes the logical order of the letters in the string to visual one. So when i run from left to right on the converted string i'm sure that the string will print properly.
BUT,
They are never perfect. There are cases that they are not working properly.
None of them considering the direction of the text as well (means when we press the right Ctrl+Shift on the keyboard than the visualization is changed again)
My questions are
does anybody know some bulletproof Bidi algorithm i can use to change the string from what it saved in the memory to visual order?
Is there a simpler way to solve my problem ? maybe somehow get the browser algorithm for it..

After searching for long time,
I've found that DOJO (and luckily it is the tool kit that i'm using),
has has a BIDI engine for drawing it's own UI controls, that gets few layout parameters to handle some cases of RTL, LRT, and contextual directions as well.
If this is helping someone -
http://bill.dojotoolkit.org/api/1.9/dojox/string/BidiEngine
Found another link that might help to a none DOJO developers -https://github.com/ibm-js/dbidi, but I have not check it yet

contentEditable insert br when new line occurs

A contentEditable has automatic word wrapping, creating a new line when you reach the width of the editable area. This is great but I am parsing the contents of this afterwards and I need it to add a <br> when it does this. I have tried everything I can think of and I can't achieve this. Any help greatly received.

This is not possible, the word wrapping point is 'browser discretion' and as such susceptible to font size differences, fonts not being installed, font render engines, anti-aliasing settings etc. etc. The line-wrap point is, so to speak, 'not your problem' from the browser's perspective, and as such it doesn't give this info away.
Theoretically you could rebuild the content word-for-word in JS in a dynamically sized and similarly styled div, and monitor for when the height changes - that's where the newlines occur. It'd be a crap load of crappy code to achieve a dodgy result though.
I can't help but feel like you're asking for an XY-solution here - if you need newlines at the given point, let the end user give them when he wants to. Simply adding overflow:auto;white-space:nowrap to the editable element forces them to. Example here.

Right-to-left and/or up-to-bottom text in html/js?

How can I make a right-to-left and up-to-bottom textfield for user input in a browser? Are there any native ways to do it? Or maybe workarounds?
(Up-to-bottom could be like Japanese, or the hieroglyphs.)

For RTL text field, you can use the HTML dir attribute (like Šime Vidas has already mentioned) or the with dir="RTL" direction attribute in CSS with direction: rtl. You can use these properties on most visual HTML elements, not just on text fields.
As for top-to-bottom direction, there's no easy standard method that I am aware of, but that's OK, since Japanese web pages rarely use vertical text that's not embedded in images or Flash objects, and I've never seen any site using vertical input. In fact, vertical input in Japanese computers is rare extremely rare even outside of HTML, and is usually found only in WSIWYG editors (such as Word Processors) that produce printed vertical text.

There's also an automatic javascript that flips ltr to rtl automatically:
https://github.com/urigoren/RTLjs

Develop Reference

JavaScript is the programming language of the Web.