Word Cloud for Other Languages

Word Cloud for Other Languages - javascript

I using JasonDavies's Word Cloud for my project, but there is a problem that I using Persian[Farsi] Strings and my problem here that words have overlapping in Svg.
This is my project's output:
What happened to the Farsi words?

As explained on the About page for the project, the generator needs to retrieve the shape of a glyph to be able to compute where it is "safe" to put other words. The about page explains the process in much more detail, but here's what we care for:
Glyphs are rendered individually to a hidden <canvas> element.
Pixel data is retrieved
Bounding boxes are derived
The word cloud is generated.
Now, the critical insight is that in Western (and many other) scripts, glyphs don't change shape based on context often. Yes, there are such things as ligatures, but they are generally rare, and definitely not necessary for the script.
In Persian, however, the glyph shape will change based on context. For non-Persian readers, look at ی and س which, when combined, become یس. Yes, that last one is two glyphs!
The algorithm actually has no problem dealing with Persian characters, as you can see by hacking the demo on the about page, putting a breakpoint just after the d.code is generated, to be able to modify it:
Replacing it with 1740, which is the charCode for the first Persian glyph above, and letting the algorithm run, shows beautiful and perfectly correct bounding boxes around the glyph:
The issue is that when the word cloud is actually rendered, the glyph is placed in context and... changes shape. The generator doesn't know this, though, and continues to use the old bounding data to place other words, thus creating the overlapping you witnessed. In addition, there is probably also an issue around right-to-left handling of text, which certainly would not help.
I would encourage you to take this up the author of the generator directly. The project has a GitHub page: https://github.com/jasondavies/d3-cloud so opening an issue there (and maybe referring back to this answer) would help!

Related

How could I improve performance when repeatedly updating an SVG DOM in React JS?

Last year I tried to learn a bit React JS for making the project you can find here. I apologize for my rather vague / imprecise description below, but I'm by no means versed in this.
Basically, there is a single <svg> tag, which will contain a number of paths etc. as created by the user. The problem I have is that things become very slow the more paths are present. To my current understanding, this is due to the fact that the entire SVG DOM gets updated repeatedly upon user interactions that involve dragging the mouse or using the mouse wheel.
This holds true, particularly, for two user interactions:
a) Panning - all paths are being moved at the same time; I think one might circumvent this issue by taking a snapshot image first and moving that around instead. However, that's not a solution for the other user interaction, which is:
b) Expanding/collapsing paths - here, all paths are being modified in terms of coordinates of some of their points. That is, every path must be modified in a different way, but all of them must be modified at once, and this must happen repeatedly because it's a user interaction controlled with the mouse wheel where changes happen gradually and the user requires immediate visual feedback on these changes as they happen.
Particularly for b), I see no alternative that would involve a single transformation or something.
After extensive research last year, I came to the conclusion that choosing SVG to display and modify a lot of things dynamically on screen was a wrong decision in the first place, but I realized too late, so I gave up and have never touched it since. I'm pretty certain that there isn't any way to deal with the low performance that builds upon what I already have; I have no intention to start this project from scratch with a completely different approach. Also, the reason why I chose SVG was that it's easy to manipulate.
In summary, I'd basically like to get confirmation that there is no feasible way to rescue this project.

Javascript retrieve linebreaks from dom [duplicate]

I need to add line breaks in the positions that the browser naturally adds a newline in a paragraph of text.
For example:
This is some very long text \n that spans a number of lines in the paragraph.
This is a paragraph that the browser chose to break at the position of the \n
I need to find this position and insert a 
Does anyone know of any JS libraries or functions that are able to do this?
The only solutuion that I have found so far is to remove tokens from the paragraph and observe the clientHeight property to detect a change in element height. I don't have time to finish this and would like to find something that's already tested.
Edit:
The reason I need to do this is that I need to accurately convert HTML to PDF. Acrobat renders text narrower than the browser does. This results in text that breaks in different positions. I need an identical ragged edge and the same number of lines in the converted PDF.
Edit:
#dtsazza: Thanks for your considered answer. It's not impossible to produce a layout editor that almost exactly replciates HTML I've written 99% of one ;)
The app I'm working on allows a user to create a product catalogue by dragging on 'tiles' The tiles are fixed width, absolutely positioned divs that contain images and text. All elemets are styled so font size is fixed. My solution for finding \n in paragraph is ok 80% of the time and when it works with a given paragrah the resulting PDF is so close to the on-screen version that the differences do not matter. Paragraphs are the same height (to the pixel), images are replaced with high res versions and all bitmap artwork is replaced with SVGs generated server side.
The only slight difference between my HTML and PDF is that Acrobat renderes text slightly more narrowly which results in line slightly shorter line length.
Diodeus's solution of adding span's and finding their coords is a very good one and should give me the location of the BRs. Please remember that the user will never see the HTML with the inserted BRs - these are added so that the PDF conversion produces a paragraph that is exactly the same size.
There are lots of people that seem to think this is impossible. I already have a working app that created extremely accurate HTML->PDF conversion of our docs - I just need a better solution of adding BRs because my solution sometimes misses a BR. BTW when it does work my paragraphs are the same height as the HTML equivalents which is the result we are after.
If anyone is interested in the type of doc i'm converting then you can check ou this screen cast:
http://www.localsa.com.au/brochure/brochure.html
Edit: Many thanks to Diodeus - your suggestion was spot on.
Solution:
for my situation it made more sense to wrap the words in spans instead of the spaces.
var text = paragraphElement.innerHTML.replace(/ /g, ' ');
text = ""+text+""; //wrap first and last words.
This wraps each word in a span. I can now query the document to get all the words, iterate and compare y position. When y pos changes add a br.
This works flawlessly and gives me the results I need - Thank you!

I would suggest wrapping all spaces in a span tag and finding the coordinates of each tag. When the Y-value changes, you're on a new line.

I don't think there's going to be a very clean solution to this one, if any at all. The browser will flow a paragraph to fit the available space, linebreaking where needed. Consider that if a user resizes the browser window, all the paragraphs will be rerendered and almost certainly will change their break positions. If the user changes the size of the text on the page, the paragraphs will be rerendered with different line break points. If you (or some script on your page) changes the size of another element on the page, this will change the amount of space available to a floating paragraph and again - different line break points.
Besides, changing the actual markup of your page to mimic something that the browser does for you (and does very well) seems like the wrong approach to whatever you're doing. What's the actual problem you're trying to solve here? There's probably a better way to achieve it.
Edit: OK, so you want to render to PDF the same as "the screen version". Do you have a specific definitive screen version nominated - in terms of browser window dimensions, user stylesheets, font preferences and adjusted font size? The critical thing about HTML is that it deliberately does not specify a specific layout. It simply describes what is on the page, what they are and where they are in relation to one another.
I've seen several misguided attempts before to produce some HTML that will exactly replicate a printed creative, designed in something like a DTP application where a definitive absolute layout is essential. Those efforts were doomed to failure because of the nature of HTML, and doing it the other way round (as you're trying to) will be even worse because you don't even have a definitive starting point to work from.
On the assumption that this is all out of your hands and you'll have to do it anyway, my suggestion would be to give up on the idea of mangling the HTML. Look at the PDF conversion software - if it's any good it should give you some options for font kerning and similar settings. Playing around with the details here should get you something that approximates the font rendering in the browser and thus breaks lines at the same places.
Failing that, all I can suggest is taking screenshots of the browser and parsing these with OCR to work out where the lines break (it shouldn't require a very accurate OCR since you know what the raw text is anyway, it essentially just has to count spaces). Or perhaps just embed the screenshot in the PDF if text search/selection isn't a big deal.
Finally doing it by hand is likely the only way to make this work definitively and reliably.
But really, this is still just wrong and any attempts to revise the requirements would be better. Keep going up one step in the chain - why does the PDF have to have the exact same ragged edge as some arbitrary browser rendering? Can you achieve that purpose in another (better) way?

Sounds like a bad idea when you account for user set font sizes, MS Windows accessibility mode, and the hundreds of different mobile devices. Let the browser do it's thing - trying to have exact control over the rendering will only cause you hours of frustration.

I don't think you'll be able to do this with any kind of accuracy without embedding Gecko/WebKit/Trident or essentially recreating them.

Maybe an alternative: do all line-breaks yourself, instead of relying on the browser. Place all text in pre tags, and add your own linebreaks. Now at least you don't have to figure out where the browser put them.

Photoshop select script

I spent all day attempting to write a javascript which selects all white pixels in a bitmap
I used a loop within a loop to iterate through all of the pixels one by one (the outer loop went through the vertical lines and inner loop went though the horizontal ones)
and used coloursampler to detect if the pixels RGB values were close enough to 255 or not.
anyway this code took a very very long time to complete
i literally saw the colour sample cursor move over every single pixel one at a time.
I the found out that I could record an action which selects all colours within a range from the whole image and call it from my script, and this worked instantly.
I am not surprised that my way was slow.
but that raises the question
How come Photoshop is able to scan a whole document for pixels which meet certain criteria using select>range and tools such as the magic wand and quick select yet my code runs so slow
Surely photoshop must need to scan each individual pixel so achieve such effects.

For elements of Photoshop which are not directly supported by the Javascript API (color range selection is one of them) I suggest you look into using the Adobe Scripting Listener plugin, and utilize the script listener's output for the core of your script.
I have written a tutorial on how to utilize the script listener for Color selection here. The tutorials use Python, but the overall concepts are exactly the same- the scripting listener even puts out a pure JS file for you!
This will be much faster than iterating over pixels, as it gives you access to the same fast tools and actions that are core parts of the Photoshop Application.
Hope that helps out.

Algorithm for finding empty cell around desired position in a grid

I will describe my problem using the attached image :
The green block is the starting position of my game entity. Next I'd like to move it to the position marked by the orange square. But at the same time, I assume that levitation is not possible or/& this block is a wall. In either case going there is not possible. So I need to figure out a way of finding the first available place (as close to the orange square as possible) for my entity to move (in this case it would be either the top of the grey column or point two rows beneath the orange square).
I have a 2d array describing the grid, where 1 is a wall and 0 is empty space.
data = [
[1,1,1,1,...],
[1,0,0,0,0,...],
[1,0,0,...],
...
]
I was thinking about solution in this way (where for example I can check at 1. if beneath my cell is floor and end the algo, or continue if not to cell 2.) but I can't think of a way of doing this efficiently (and easily).
Does anyone has any ideas how to tackle this ? I'm not really sure what algo should I ask google for :)

You are looking for Q-learning algorithms. This is a form of reinforcement learning. Here's one http://en.wikipedia.org/wiki/SARSA
Basically you run the simulation between source and destination multiple times and each time it gets close and closer to discovering the goal.

I think you can use Cellular Automata for your case, if it is worth the trouble. It is not AI per se, easy to implement and you can replace A* as well as the final position finding problem using one logic.
Consider the eight neighbourhood cells around the game entity. Each cell can be free or blocked (0 or 1). There will be 2^8 combinations of the neighbourhood, but you may or may not have to use that many rules for the CA.
Try looking into this: http://www.cs.sun.ac.za/rw711/2012term1/documents/CABehringPathPlanning.pdf
they implemented CA for path planning in robotics, you can tweak it to suit your need.
The advantage is, with proper rule set, your CA will terminate only when the game entity has reached the appropriate position around the goal (closest to the goal and not levitating).
You can also implement multiple rule sets on the system, thereby making it more robust.

Positioning multiple, random sized, absolutely positioned elements so they don't overlap

Ok I need to be able to position a bunch of random sized absolutely positioned words on a page but I don't want any of the elements to overlap.
The end goal is to have a fluid word cloud that responds to user interaction (remember the Google Balls Doodle?). I would really like to build this from scratch to develop my understanding of this type of development. Any help in this department would also be appreciated :)

I'm not sure if you also want to position the words randomly inside a container, but i've written a fiddle that does just that. You can modify the code to position one word right after the other if you want to though. I think the key part is the method to check if there's a collision.
see http://jsfiddle.net/fZtdt/13/
EDIT: Be aware that this is very simple and unoptimized code. If for example you would add to many words, chances are that the script won't be able to fit all words inside the container, and get into an endless loop.

I have forked Jules' script to add this improvement : the search for a non-overlapping region is bounded (otherwise the original script will loop I believe), and the best region (the one with the smallest overlap) is selected.
see http://jsfiddle.net/Vnyvc/21/
play with the maxSearchIterations variable and/or the size of the whole region, it really makes a difference.

Develop Reference

JavaScript is the programming language of the Web.