measure rendered html in javascript without affecting the measurement

measure rendered html in javascript without affecting the measurement - javascript

I am doing pagination in javascript. This is typographic pagination, not chopping up database results. For the most part it works, but I have run into a Heisenberg issue where I cannot quite measure text without affecting it.
I am not trying to measure text before it is rendered. I want the actual position it shows up at on screen, so I can paginate to where it is naturally wrapped. I am measuring the vertical position of characters, not the horizontal width of strings. The way I do this is similar to this answer in that I am applying a style to a block of text, then measuring the position of the newly created span. If the span does not reach the end of the page, I clear it and make a new span in a linear search.
The problem is that the anti-aliased sub-pixel text layout is different when the span is applied. In rare cases, this causes the text to wrap differently when I measure it. I have only seen this when wrapping at a hyphen, and I assume it would not happen when wrapping at white space.
As a concrete example, "prepared-he" is the string I am having trouble with. When I measure up to "prepare" it appears, as expected, to be within the current page. When I measure "prepared" the whole phrase wraps down to the next line, moving it to the next page, so it looks like the "d" is the character to break at. I break the text between "prepare" and "d-he" and that is wrong. Trying to evaluate individual characters opens a whole can of worms I would rather avoid. The wrapping changes because, with the new span, the line is 1 pixel wider.
A solution to my problem could either be a better way to measure text using javascript, or a way to wrap text in a new element without affecting layout.
I have tried setting margin-right:-1px for the class of the span being created to wrap the text. This had no noticeable effect.
I am doing this in a UIWebView on the iPhone. There are some measurement related calls that are available in normal WebKit that are not available here. For example, Range does not have getBoundingClientRect or support setting an offset other than 0 in setStart or setEnd.
Thank you
Edit:
I tried the suggestion from unomi of making the span dimensionless. It had no effect. The class I am giving the span has no rules and is just used for quick deletion.
I tried iterating backwards instead of forwards through the text. The wrapping errors showed up in different places but the overall problem remained.
The text is mostly paragraphs with some simple styling. I do not think the method I am using would work with tables or lists. Within the paragraphs, I apply the span to one character at a time.
I tried reducing the font size for the span. The wrapping rules seem to allow wrapping at a span even if it is within a word, so that replaces one set of errors with another.

This is a bit of a cop-out, but do you really want to chop up a paragraph?
Wouldn't it improve readability to simply break at the first <p> that wanders off viewport?
Ok, so just to be clear, it sounds like you are testing the position of a span which is moved character by character through a text? If that is correct, and the issue you have is with breaking up words, why don't you simply jump from white space to white space (optionally including hyphens) rather than from character to character?
Keep 1 previous location and break at it when the current one is off viewport.
I guess before too much else is done, are we sure that we can't make that span truly dimensionless?
span.marker {
border: 0px; padding: 0px; margin: 0px;
width:0px: overflow:hidden; height:0px;
}

I did not find an ideal solution. This is the solution I came up with.
I apply the measuring span to one character at a time. I found two cases where there were problems. Sometimes a word would end up being longer, and the word would wrap to the next line when being measured. Sometimes a word with a hyphen or similar character would split differently when being measured.
For the case of a whole word wrapping differently, I change the class of the measuring span to have a smaller font size. If the same character does not wrap to the next line when using a smaller font size, I ignore the measurement as invalid and continue searching.
For the case of a split word wrapping differently, I measure the same character with the previous character. If the span is not wrapped with two characters, then I assume the next character will wrap and break there.
These problems seem to arise because formatting changes the kerning between characters. When I am using a span to measure the position of a character, it is at a slightly different position because it starts on a pixel boundary and ignores the kerning to the previous character.
I did not try spanning the entire block of text instead of a single character. It would add some complexity and I suspect the same problems would crop up in a slightly different way.

Related

Problems with Replace overflowing string characters with three dots

So ive seen some solutions on stackoverflow how to do this, but the solutions are for a static amount of characters, for example
if (string.length > 20) {
var shortstring= name.substring(0, 20) + " ...";
}
But this is for only 20 characters, I'd like to be able to do it if an element is overflowing out of a container, then replace those characters with three dots.
For example,
asadasddadasadsssssssssssssssssssssssssssssssssssdddddddddddddddddddddddd

The only way to determine whether a piece of text will overflow a layout container is to actually render the text in that container and then measure its geometry. (This is not true when drawing text in <canvas>, but I believe that's because canvas text does not care about cascading styles.)
The reason you see so many solutions driven by character-count is because it's simpler, much less computationally expensive (and thus faster), and because the dimensions of any text element are very obviously a function of the amount of text, (among other things -- and there's the rub), so people generally treat it as a quick-and-dirty proxy for actual render dimensions. There's a saying: "sometimes a little inaccuracy can save a ton of explanation." That's what's going on here.
But this is StackOverflow, so let's splurge on explanation. Buckle up. If you want to do this right, here is how you would do it:
Construct a text node with the desired text, and wrap it in some element that you know will have no impact on its layout within the target container.
This requires a deep knowledge of the styling context on your web page; maybe in your page within .Column2 is styled such that it will auto-fit to its content, but on my website every element within the target zone receives 2px padding because that's what my layout requires. These details impact the render size.
Additionally, set the opacity to 0 on your new text element.
Do this before you add the element to the target container, or you could get flicker.
Insert the text element into the target container.
Measure the text element, and the container, using element.getBoundingClientRect.
Do math on those two rects.
If the text element's width or height are bigger than the container, you've got overflow.
If the text overflows, chop off part of the string and re-measure.
This can repeat many, many times, and DOM manipulation is one of the most-expensive things a browser can do (relatively speaking). This means it's not wise to simply remove one character at a time -- if the text is very much longer than the space allows, your loop could repeat thousands of times, which users will notice, especially if you apply this to more than one chunk of text on the page; if triggered at page load, this would be happening simultaneously to many elements, aka DOM thrashing: RIP your webpage.
That means you need a more-efficient search algorithm. I think the one that's best for this is binary search.
Once you have figured out exactly which characters fit, remove the custom opacity:0 so the text becomes visible. Now you are done.
You can probably add a few clever optimizations to that algorithm. For example, to protect against huge chunks of text being crammed into tiny containers, you could create an empty 1em square element and use that to come up with an upper-bound for the number of characters that will fit. So that 10k sample might get cut down to just 175 as the first step, and then use the guess-and-test process to get from there to whatever is the final value. You could put this special short-circuiting logic inside a guard that only executes if the text is very long.
Or you might cache the final character counts in the browser's localStorage such that refreshing the page with the same content could just read the already-determined counts and render the right amount of text in one step. I suspect the major challenge there would be devising an organizing scheme for the cache data, because the "coordinates" of a cache entry need to include the full starting text and the target container, and you'd want to invalidate the cache if the text or page styling changes.
Many years ago, I used a library called three-dots.js to handle this. I always thought it used binary search, but skimming the source now, that doesn't seem to be the case. Still, it illustrates how complex this is. That is why most people go with character counts: a proper solution is surprisingly complex and much less performant.
You may be able to find a good library on NPM, or use three-dots.js. You can write your own, but unless you have an unlimited amount of time for this, you'd do well to ask yourself if it's really that important to get this exactly perfectly right in every single case. It's a lot easier, and probably fine, to just chop the text conservatively based on character count and manual measurements you take using DevTools.

As it is already mentioned in the comments, it's a much easier task to do in CSS.
It has a text-overflow property, that's invented just for that. When it is set to ellipsis, it displays a … character at the end of the visible area.
The spec also allows specifying a custom overflow string (e.g. ' ...'), however, this is only supported by Firefox.
It can be used like this:
.overflow{
border: 1px solid black;
width: 300px;
overflow: hidden;
white-space: nowrap;
text-overflow: ellipsis;
}
<div class="overflow" >Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque at enim eget purus tincidunt tincidunt non at orci. Pellentesque urna.</div>

What exactly is paragraph leading for indesign paragraph

I am working in an indesign script and I would like to know what exactly is leading for a paragraph as described in the documentation Indesign Paragraph. Is that the height for each line in the paragraph?

Yes, that is correct. Leading is the distance from one baseline to the next.
The leading of the top line can be anything because there's nothing above it to push it down.
Note: since fonts can vary in X-height and they allow space for descenders etc., they don't usually take up the full pixel dimension. In the following image, all the text is 100px and the height of the boxes surrounding the text is the same as the leading for each line.

Javascript retrieve linebreaks from dom [duplicate]

I need to add line breaks in the positions that the browser naturally adds a newline in a paragraph of text.
For example:
This is some very long text \n that spans a number of lines in the paragraph.
This is a paragraph that the browser chose to break at the position of the \n
I need to find this position and insert a 
Does anyone know of any JS libraries or functions that are able to do this?
The only solutuion that I have found so far is to remove tokens from the paragraph and observe the clientHeight property to detect a change in element height. I don't have time to finish this and would like to find something that's already tested.
Edit:
The reason I need to do this is that I need to accurately convert HTML to PDF. Acrobat renders text narrower than the browser does. This results in text that breaks in different positions. I need an identical ragged edge and the same number of lines in the converted PDF.
Edit:
#dtsazza: Thanks for your considered answer. It's not impossible to produce a layout editor that almost exactly replciates HTML I've written 99% of one ;)
The app I'm working on allows a user to create a product catalogue by dragging on 'tiles' The tiles are fixed width, absolutely positioned divs that contain images and text. All elemets are styled so font size is fixed. My solution for finding \n in paragraph is ok 80% of the time and when it works with a given paragrah the resulting PDF is so close to the on-screen version that the differences do not matter. Paragraphs are the same height (to the pixel), images are replaced with high res versions and all bitmap artwork is replaced with SVGs generated server side.
The only slight difference between my HTML and PDF is that Acrobat renderes text slightly more narrowly which results in line slightly shorter line length.
Diodeus's solution of adding span's and finding their coords is a very good one and should give me the location of the BRs. Please remember that the user will never see the HTML with the inserted BRs - these are added so that the PDF conversion produces a paragraph that is exactly the same size.
There are lots of people that seem to think this is impossible. I already have a working app that created extremely accurate HTML->PDF conversion of our docs - I just need a better solution of adding BRs because my solution sometimes misses a BR. BTW when it does work my paragraphs are the same height as the HTML equivalents which is the result we are after.
If anyone is interested in the type of doc i'm converting then you can check ou this screen cast:
http://www.localsa.com.au/brochure/brochure.html
Edit: Many thanks to Diodeus - your suggestion was spot on.
Solution:
for my situation it made more sense to wrap the words in spans instead of the spaces.
var text = paragraphElement.innerHTML.replace(/ /g, ' ');
text = ""+text+""; //wrap first and last words.
This wraps each word in a span. I can now query the document to get all the words, iterate and compare y position. When y pos changes add a br.
This works flawlessly and gives me the results I need - Thank you!

I would suggest wrapping all spaces in a span tag and finding the coordinates of each tag. When the Y-value changes, you're on a new line.

I don't think there's going to be a very clean solution to this one, if any at all. The browser will flow a paragraph to fit the available space, linebreaking where needed. Consider that if a user resizes the browser window, all the paragraphs will be rerendered and almost certainly will change their break positions. If the user changes the size of the text on the page, the paragraphs will be rerendered with different line break points. If you (or some script on your page) changes the size of another element on the page, this will change the amount of space available to a floating paragraph and again - different line break points.
Besides, changing the actual markup of your page to mimic something that the browser does for you (and does very well) seems like the wrong approach to whatever you're doing. What's the actual problem you're trying to solve here? There's probably a better way to achieve it.
Edit: OK, so you want to render to PDF the same as "the screen version". Do you have a specific definitive screen version nominated - in terms of browser window dimensions, user stylesheets, font preferences and adjusted font size? The critical thing about HTML is that it deliberately does not specify a specific layout. It simply describes what is on the page, what they are and where they are in relation to one another.
I've seen several misguided attempts before to produce some HTML that will exactly replicate a printed creative, designed in something like a DTP application where a definitive absolute layout is essential. Those efforts were doomed to failure because of the nature of HTML, and doing it the other way round (as you're trying to) will be even worse because you don't even have a definitive starting point to work from.
On the assumption that this is all out of your hands and you'll have to do it anyway, my suggestion would be to give up on the idea of mangling the HTML. Look at the PDF conversion software - if it's any good it should give you some options for font kerning and similar settings. Playing around with the details here should get you something that approximates the font rendering in the browser and thus breaks lines at the same places.
Failing that, all I can suggest is taking screenshots of the browser and parsing these with OCR to work out where the lines break (it shouldn't require a very accurate OCR since you know what the raw text is anyway, it essentially just has to count spaces). Or perhaps just embed the screenshot in the PDF if text search/selection isn't a big deal.
Finally doing it by hand is likely the only way to make this work definitively and reliably.
But really, this is still just wrong and any attempts to revise the requirements would be better. Keep going up one step in the chain - why does the PDF have to have the exact same ragged edge as some arbitrary browser rendering? Can you achieve that purpose in another (better) way?

Sounds like a bad idea when you account for user set font sizes, MS Windows accessibility mode, and the hundreds of different mobile devices. Let the browser do it's thing - trying to have exact control over the rendering will only cause you hours of frustration.

I don't think you'll be able to do this with any kind of accuracy without embedding Gecko/WebKit/Trident or essentially recreating them.

Maybe an alternative: do all line-breaks yourself, instead of relying on the browser. Place all text in pre tags, and add your own linebreaks. Now at least you don't have to figure out where the browser put them.

How can I style an HTML element to avoid hanging text?

I'm creating a page where there's a textbox that displays a text string from the database, and its length varies. Because of this, some strings happen to be long enough to run onto two lines, but that second line is short, and this doesn't look good:
Here the blue box shows the div that contains the content. It's got a fixed width (80% of the container), and text-align:center.
So my question is: how can I get the text to flow into lines where the line widths are closer to each other? I'm willing to do some math and dynamically adjust the width or font size, but I'm not sure how to do this reliably.

You need JavaScript to do this. CSS alone cannot fix this issue if you are using dynamic text.
Once you've detected that the height of the box goes beyond the limit of one line you can either shrink the font, expand the box or calculate the mid-point to add a break and have two balanced lines.
Here are various jQuery plugins that an do this for you. There are many if you look.
http://fittextjs.com/
https://github.com/jquery-textfill/jquery-textfill

personally I don't like use javascript to "simple things"
you use the following properties
overflow
white-space
http://jsfiddle.net/ALWqd/

How to find the first line in auto wrapping html

Assuming I have a div with a fixed width and some auto wrapping text inside. Now I want to insert a span element at the end of the first line. Because the text isn’t written in a specific structure, the first line could contain five words as well as three or just one. So I would need to find the position on which the automatic line-break happens. Is that possible or do I need to insert a manual br or some marker?

It's painfully possible. The question is why are you trying to do this? There may be an easier way to accomplish what you need.
If you decide you still need to do this, what you would have to do is create a clone of this div that cannot be seen. You would set this cloned div's height to 1px or something like that. You can then fill this clone with identical text word by word. Once the div's scrollHeight jumps, you know that the first word wrap has occurred. You can then use this data to figure out where the original paragraph' first line has word wrapped.

You need add or some marker. Different browser calculate how many words should be placed in the first line. And if user change setting of browser, for instance increase font size, the first line will change number of words.

Develop Reference

JavaScript is the programming language of the Web.