text string / word cloud ignore line break symbol

text string / word cloud ignore line break symbol - javascript

I have an implementation of the popular d3 word cloud. It has been working great for documents that are very lengthy, but I have been having an issue when I tried to re-purpose the word cloud to visualize a very simple txt file of the following format:
interesting. interesting. interesting.
boring. boring.
amazing. amazing. amazing.
stupid. stupid. stupid.
average. average.
disappointing.
(ect...)
It's basically just a txt file that has around 20 words. Each word repeats anywhere from 1 to 5 times. I would like this information to govern the sizing of the font (repeated 5 times = big font, 1 time = small font). When I put all the words on the same line, the resulting word cloud strangely makes all words the same font size. So I started to play around with it and tried a few different things. The closest I got to the desired effect was with the above txt format, having a line break after each new word.
This did result in the correct font sizing, but it also created a new problem. When I passed word_count object to the console log, I got something like this:
"⏎boring":1,
"boring":1,
"⏎stupid":1,
"stupid":2,
This means that the word cloud has repeated words in the visual (for example, the word "stupid" in large font size and also "stupid" in small font size). To make things more confusing, this behavior was isolated to a few words. Some words did not have the line break sign in front of them (even though I still hit enter in notepad when I made the txt in the same exact was as the ones that did wind up with the ⏎ sign.)
My next stab at fixing this was to add the "⏎" sign to the list of ignored words, hoping that would make the word cloud library treat them as if they were the same. Unfortunately for me, the problem persisted.
That sums it up pretty well; it's also really easy to reproduce, just need notepad and copy and paste my above words to see what I'm talking about.
Let me know if anyone has ideas for troubleshooting further.
Minimalist Block:
https://bl.ocks.org/diggetybo/cd644316f52465495f39c8fc27f04de8
(refreshing page will randomize the layout)

Related

Specific character changing to Small Caps

I am writing a code to automate making around 100 graphics. I am trying to write it so I do not have to do any fine tuning after running the script. I do not know how to handle adjusting single characters in a text layer though. Here is an example of what I am trying to accomplish. It is for having a last name on the back of a football jersey.
One if statement I want to make is if the last name starts with "Mc" to change the "c" to small caps. I figured the code should start like this, but not sure how to call on just the second character to make that small caps:
if (variable.startsWith("Mc"){
This would be for instance if the name is "McDonald" to have the "c" as small caps.
Thanks. Sorry I am fairly new to photoshop scripting and have been doing just the simple things. This is by far the most advanced I have done. Let me know if I need to provide anything else.

Javascript retrieve linebreaks from dom [duplicate]

I need to add line breaks in the positions that the browser naturally adds a newline in a paragraph of text.
For example:
This is some very long text \n that spans a number of lines in the paragraph.
This is a paragraph that the browser chose to break at the position of the \n
I need to find this position and insert a 
Does anyone know of any JS libraries or functions that are able to do this?
The only solutuion that I have found so far is to remove tokens from the paragraph and observe the clientHeight property to detect a change in element height. I don't have time to finish this and would like to find something that's already tested.
Edit:
The reason I need to do this is that I need to accurately convert HTML to PDF. Acrobat renders text narrower than the browser does. This results in text that breaks in different positions. I need an identical ragged edge and the same number of lines in the converted PDF.
Edit:
#dtsazza: Thanks for your considered answer. It's not impossible to produce a layout editor that almost exactly replciates HTML I've written 99% of one ;)
The app I'm working on allows a user to create a product catalogue by dragging on 'tiles' The tiles are fixed width, absolutely positioned divs that contain images and text. All elemets are styled so font size is fixed. My solution for finding \n in paragraph is ok 80% of the time and when it works with a given paragrah the resulting PDF is so close to the on-screen version that the differences do not matter. Paragraphs are the same height (to the pixel), images are replaced with high res versions and all bitmap artwork is replaced with SVGs generated server side.
The only slight difference between my HTML and PDF is that Acrobat renderes text slightly more narrowly which results in line slightly shorter line length.
Diodeus's solution of adding span's and finding their coords is a very good one and should give me the location of the BRs. Please remember that the user will never see the HTML with the inserted BRs - these are added so that the PDF conversion produces a paragraph that is exactly the same size.
There are lots of people that seem to think this is impossible. I already have a working app that created extremely accurate HTML->PDF conversion of our docs - I just need a better solution of adding BRs because my solution sometimes misses a BR. BTW when it does work my paragraphs are the same height as the HTML equivalents which is the result we are after.
If anyone is interested in the type of doc i'm converting then you can check ou this screen cast:
http://www.localsa.com.au/brochure/brochure.html
Edit: Many thanks to Diodeus - your suggestion was spot on.
Solution:
for my situation it made more sense to wrap the words in spans instead of the spaces.
var text = paragraphElement.innerHTML.replace(/ /g, ' ');
text = ""+text+""; //wrap first and last words.
This wraps each word in a span. I can now query the document to get all the words, iterate and compare y position. When y pos changes add a br.
This works flawlessly and gives me the results I need - Thank you!

I would suggest wrapping all spaces in a span tag and finding the coordinates of each tag. When the Y-value changes, you're on a new line.

I don't think there's going to be a very clean solution to this one, if any at all. The browser will flow a paragraph to fit the available space, linebreaking where needed. Consider that if a user resizes the browser window, all the paragraphs will be rerendered and almost certainly will change their break positions. If the user changes the size of the text on the page, the paragraphs will be rerendered with different line break points. If you (or some script on your page) changes the size of another element on the page, this will change the amount of space available to a floating paragraph and again - different line break points.
Besides, changing the actual markup of your page to mimic something that the browser does for you (and does very well) seems like the wrong approach to whatever you're doing. What's the actual problem you're trying to solve here? There's probably a better way to achieve it.
Edit: OK, so you want to render to PDF the same as "the screen version". Do you have a specific definitive screen version nominated - in terms of browser window dimensions, user stylesheets, font preferences and adjusted font size? The critical thing about HTML is that it deliberately does not specify a specific layout. It simply describes what is on the page, what they are and where they are in relation to one another.
I've seen several misguided attempts before to produce some HTML that will exactly replicate a printed creative, designed in something like a DTP application where a definitive absolute layout is essential. Those efforts were doomed to failure because of the nature of HTML, and doing it the other way round (as you're trying to) will be even worse because you don't even have a definitive starting point to work from.
On the assumption that this is all out of your hands and you'll have to do it anyway, my suggestion would be to give up on the idea of mangling the HTML. Look at the PDF conversion software - if it's any good it should give you some options for font kerning and similar settings. Playing around with the details here should get you something that approximates the font rendering in the browser and thus breaks lines at the same places.
Failing that, all I can suggest is taking screenshots of the browser and parsing these with OCR to work out where the lines break (it shouldn't require a very accurate OCR since you know what the raw text is anyway, it essentially just has to count spaces). Or perhaps just embed the screenshot in the PDF if text search/selection isn't a big deal.
Finally doing it by hand is likely the only way to make this work definitively and reliably.
But really, this is still just wrong and any attempts to revise the requirements would be better. Keep going up one step in the chain - why does the PDF have to have the exact same ragged edge as some arbitrary browser rendering? Can you achieve that purpose in another (better) way?

Sounds like a bad idea when you account for user set font sizes, MS Windows accessibility mode, and the hundreds of different mobile devices. Let the browser do it's thing - trying to have exact control over the rendering will only cause you hours of frustration.

I don't think you'll be able to do this with any kind of accuracy without embedding Gecko/WebKit/Trident or essentially recreating them.

Maybe an alternative: do all line-breaks yourself, instead of relying on the browser. Place all text in pre tags, and add your own linebreaks. Now at least you don't have to figure out where the browser put them.

Limit pasted words in multiple text area fields?

I'm a complete novice in JavaScript, so please forgive my ignorance. I've worked on this for several days, but it's beyond my skill level.
Please, could someone help me implement into this specific script: https://stackoverflow.com/a/11228092, something that limits words in each text area when pasted, too?
The script limits correctly when typing, but pasting sneaks past the word limit. To emphasize for clarity, I do mean counting words rather than characters.
I've tried looking at other methods, but they either limit characters rather than words, don't seem to apply to multiple, differing teaxtarea fields, or I just don't understand enough to implement them properly.
Thanks for any help anyone can give!

Word Cloud for Other Languages

I using JasonDavies's Word Cloud for my project, but there is a problem that I using Persian[Farsi] Strings and my problem here that words have overlapping in Svg.
This is my project's output:
What happened to the Farsi words?

As explained on the About page for the project, the generator needs to retrieve the shape of a glyph to be able to compute where it is "safe" to put other words. The about page explains the process in much more detail, but here's what we care for:
Glyphs are rendered individually to a hidden <canvas> element.
Pixel data is retrieved
Bounding boxes are derived
The word cloud is generated.
Now, the critical insight is that in Western (and many other) scripts, glyphs don't change shape based on context often. Yes, there are such things as ligatures, but they are generally rare, and definitely not necessary for the script.
In Persian, however, the glyph shape will change based on context. For non-Persian readers, look at ی and س which, when combined, become یس. Yes, that last one is two glyphs!
The algorithm actually has no problem dealing with Persian characters, as you can see by hacking the demo on the about page, putting a breakpoint just after the d.code is generated, to be able to modify it:
Replacing it with 1740, which is the charCode for the first Persian glyph above, and letting the algorithm run, shows beautiful and perfectly correct bounding boxes around the glyph:
The issue is that when the word cloud is actually rendered, the glyph is placed in context and... changes shape. The generator doesn't know this, though, and continues to use the old bounding data to place other words, thus creating the overlapping you witnessed. In addition, there is probably also an issue around right-to-left handling of text, which certainly would not help.
I would encourage you to take this up the author of the generator directly. The project has a GitHub page: https://github.com/jasondavies/d3-cloud so opening an issue there (and maybe referring back to this answer) would help!

bidirectional text - visual to logical

I'm drawing texts on the screen letter by letter.
In English it is very simple, because the text is LTR so the letters are saved in the String in the same order they're shown.
When drawing RTL text than I need to switch the direction of the printing. but when there are letters and numbers and English and some RTL language.. than the mess starts.
For Ex.
ex.1: שלום לכם
ש- is the first letter in the string - but as we can see that it shown the last
ex.2: שלום to all
ש- is the first letter in the string- but as we can see that it is shown in the middle, before the English starts.
It is getting more complicated when numbers and math signs are getting into the picture, and special characters like '(', ')' that needed to be flipped...
Found many Bidi algorithm online that changes the logical order of the letters in the string to visual one. So when i run from left to right on the converted string i'm sure that the string will print properly.
BUT,
They are never perfect. There are cases that they are not working properly.
None of them considering the direction of the text as well (means when we press the right Ctrl+Shift on the keyboard than the visualization is changed again)
My questions are
does anybody know some bulletproof Bidi algorithm i can use to change the string from what it saved in the memory to visual order?
Is there a simpler way to solve my problem ? maybe somehow get the browser algorithm for it..

After searching for long time,
I've found that DOJO (and luckily it is the tool kit that i'm using),
has has a BIDI engine for drawing it's own UI controls, that gets few layout parameters to handle some cases of RTL, LRT, and contextual directions as well.
If this is helping someone -
http://bill.dojotoolkit.org/api/1.9/dojox/string/BidiEngine
Found another link that might help to a none DOJO developers -https://github.com/ibm-js/dbidi, but I have not check it yet

Develop Reference

JavaScript is the programming language of the Web.