Django CMS / WYMEditor Stop Stripping of Whitespace

Django CMS / WYMEditor Stop Stripping of Whitespace - javascript

I'm aware of what WYMEditor is all about and that using Paragraphs for spacing is not intended, however the problem here is with the client requiring that we give them this functionality.
I've looked high and low to find where WYMEditor does it's stripping of whitespace and can't seem to find it at all.
It seems that when you press enter it creates a P visually, however when clicking the source it doesn't contain it. Furthermore, manually editing HTML source to contain <p> </p> doesn't work, as WYMEditor strips it out.
Just wondering if anybody has had this issue before and knows how to get rid of this functionality? It's worth noting that I believe the replacement is happening both in the 'text' module of Django-CMS, and also in the Javascript for WYMEditor.

Turns out, the function that does this stripping is very simply named, for some reason I missed it in (multiple!) searches for the word 'empty' in the script file.
It's located in jquery.wymeditor.js, line ~3440 there is the function WYMeditor.XhtmlSaxListener.prototype.removeEmptyTags, simply stop the replacement:
WYMeditor.XhtmlSaxListener.prototype.removeEmptyTags = function(xhtml)
{
return xhtml;// .replace(new RegExp('<('+this.block_tags.join("|").replace(/\|td/,'').replace(/\|th/, '')+')>(<br \/>| | |\\s)*<\/\\1>' ,'g'),'');
};
That obviously stops the stripping of whitespace!

Related

HTML generated with Sphinx removes the word "module"

A colleague has pointed out to me this peculiar behavior. We use Sphinx to generate the HTML documentation of our project, and whenever we use the word 'module' in our text, this word is removed by the browser.
After some inspection, I have noticed that the word does appear in the source code of the HTML page:
but the rendered result looks like:
where clearly the word 'module' has been removed.
Further inspection on the html source code revealed the following javascript code which seems to be the culprit:
Therefore, I wonder who generates this script? Is it Sphinx or any of its extensions? Is there any workaround to get the word 'module' displayed in the rendered html?
This is the list of Sphinx extensions we have activated:
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.todo',
'sphinx_rtd_theme',
'sphinx.ext.autosectionlabel',
'sphinxcontrib.email',
'sphinxcontrib.bibtex',
'sphinx.ext.graphviz',
'sphinx_git',
]
UPDATE:
I have found the real cause of my problem. For some reason which I ignore, I had a layout.html file in my /source/_templates folder. This file seems to be extending the default one (as in a way explained here), adding the javascript function that removes the word 'module'. Next step will be to find out how did that file get there...

I seem to have come up with a workaround (although not the most satisfying solution)
I thought that this javascript code would work in a single-pass fashion, and therefore, typing 'modulemodule' instead of 'module' would result in the removal of one ocurrence, resulting in a final 'module'. And it has worked!
To make it slightly better, I have created a substitution as .. |module| replace:: modulemodule which I store in another file and include whenever needed. Now, I only need to write my sentence like Hello this is a |module|. (notice the double space between a and |module|)

How to paste plain text in TinyMCE without extra newlines?

I am having a problem in TinyMCE when I use paste_as_text: true in conjunction with forced_root_block: false. Pasting already plain-text in works fine, but pasting from Word adds extra <br> tags between every newline. It's not like I can simply parse these out, because that breaks correct double-newlines from plain text.
I have noticed that pasting with ctrl-shift-v fixes this issue, and would love to make that the default pasting method, but can't find how.
I'm currently trying to write a parser to use in paste_preprocess, but since it's possible to do in other ways, I figure there must be a better solution.

Pasting from Microsoft Word is broken in must copy and paste/Cliboard APIs. you will need to modify Newline.js or Clipboard.js manually.
For example, replace line 63 in Newline.js:
return p.split(/\n/).join('<br />');
with:
return p.replace(/\r?\n/g, '<br>');
If you can open an issue on the plugin page, I will create a proper pull request.

Chrome inserts non-breaking spaces into copy and pasted content

I'm talking about content from inside a contenteditable div, and the target is the same contenteditable div. So no external programs involved.
The structure of the HTML in this div is that each individual word is inside a span with some data we need to track. Then the whitespace is left as text nodes between the spans. This works fine for the most part (screw you newlines) but I've encountered a strange problem when copy and pasting.
Chrome turns this
<span attrs="stuff">word</span> <span attrs="stuff">another</span>
into this:
<span attrs="stuff">word </span><span attrs="stuff">another</span>
or this:
<span attrs="stuff">word</span><span style="line-height: 16.79999"> </span><span attrs="stuff">another</span>
This obviously means that if the user copy and pastes over more than one line, then the formatting is completely screwed up, and the content of the span has changed which invalidates our data that we need to track.
The core problem is that other stuff in the div may contain non-breaking spaces for real reasons, so if I globally start swapping them out, then I might break that.
For my spans with my attrs, then I know what should be in them so it's easy to strip out the non-breaking spaces and restore it to how it should be. But for these strange spans with the odd line height, I've no idea how to clean them out without nuking everything.
Right now, I've stripped all the inserted spans that contain just a non-breaking space. But what I'd really like is to either stop Chrome from doing this in the first place, or an unambiguous means to identify the problematic extra spans so that I can clean them up in safety without breaking any similar spans that exist for real reasons. I could use this strange line-height I guess but that's pretty brittle and unsafe it feels.
How can I prevent the spans from appearing or identify them unambiguously?

The problem is not a Chrome problem only. All the time you copy HTML Code somewhere something like this can happen.
This is why you can use editors like CKEditor. They have advanced filter techniques to remove such bad HTML code.
I recommend to use a clipboard program to see how the HTML code is when you copy from different places: https://softwarerecs.stackexchange.com/questions/17710/see-clipboard-contents-hex-text
But implementing this on your own would be a waste of time in my opinion.
CKEditor can be configured very well to prevent the bad HTML code.
Recent versions of CKEditor have a very sophisticated content filtering approach. It is called "Advanced Content Filter".
Basically "Advanced Content Filter" means: The whole HTML code gets parsed or checked. In the case that there is no rule which matches to the given HTML code, it gets filtered out.

How to make a contenteditable div produce smart quotes instead of dumb quotes?

When I write stuff in Google Docs I noticed I get the “” (smart quotes). But when I create contenteditable divs on my own I get "" (dumb quotes).
How to make my contenteditable divs produce smart quotes instead of dumb quotes?

'content area'... What 'content area'??
On SO, the textarea where you type an answer?
Or what you see when you post the answer?
Or just any div with property 'contenteditable'?
In all cases I'd be pretty <not feeling so nice> if good quotes (that I type/copy/paste) automatically get replaced by those pesky curly/smart quotes (my coder-tainted opinion). So (I'd hate to be proven wrong): Google Docs probably replaces them for you (just like Word (by default), from which you could also get 'infected' by them by opening/importing Word-files or simply copying text from Word to Google Docs) and I'm betting one can turn that off in Google Docs.
Thank <enter deity here>: (in Google Docs) click on "Preferences" under the "Tools" menu in an open document, then uncheck "Use Smart Quotes"
So, IF you'd want that 'feature' in your project:
Be sure one can turn it off and that setting is easy to find
Replace the straight ones with corresponding ones
before the content enters your DB
when outputting the stored content (from your DB) them to your 'content-area'
One could/should use javascript on the client-side (and preferably a preview (like on SO) as to not to interfere with the user typing and the cursor jumping around for which you'd search a jumbo-jet-weight x-browser library etc etc etc follow-up problems) to show the user how his input is going to look.
EDIT: I highlighted the word 'replace' once more (in light of the edited question).

jQuery adding unwanted characters to string?

I'm using a jQuery to add localised currency signs to a page. Seemingly an innocent and straightforward procedure:
$('.currency').text( userCurrency() )
However, if the currency string is £, instead the output is Â£.
I've got no idea what might be causing it, as I cannot recreate the issue in jsFiddle.
It doesn't happen in firefox, ie or safari, only chrome.
By setting a breakpoint at the function call it is possible to see that the text is not actually visible (or loaded?) in the browser (even though the code is only run after the window has loaded).
I understand I'm not giving you much to work with, and I'm sorry - this is a very bizarre issue indeed.
Has anyone out there encountered anything similar, maybe someone has ideas how I can go about troubleshooting?

This is due to the different encodings used. Make sure all documents use the utf-8 encoding, along with setting the meta tag for utf-8. But also as in Qambar Raza's answer mentions you should probably use the html entity of certain characters instead of the actual character.

You need to use this in your of the HTML page:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
It basically sets the content type of your page to ('utf-8') encoding meaning it can support the character set for currency.
Also, make sure you use HTML entities like:
£ for £
i.e your function userCurrency() should return html entities (e.g £) rather than a direct symbol. You can find more details about this topic on the following link:
http://webdesign.about.com/od/localization/l/blhtmlcodes-cur.htm

Develop Reference

JavaScript is the programming language of the Web.

Django CMS / WYMEditor Stop Stripping of Whitespace - javascript

Related

HTML generated with Sphinx removes the word "module"

How to paste plain text in TinyMCE without extra newlines?

Chrome inserts non-breaking spaces into copy and pasted content

How to make a contenteditable div produce smart quotes instead of dumb quotes?

jQuery adding unwanted characters to string?

Categories

Resources