CKEditor setData adding P tag - javascript

When i am passing HTML to a CKEditor instance, a P tag is being inserted within the HTML producing unexpected results.
For example, with the following code:
CKEDITOR.instances["myEditor"].setData("<div>1</div><div>2</div>");
the editor does not display them as block elements (it outputs as "12" inline). Calling getData() and i see the HTML is reformatted incorrectly as:
"<div>
<p>
1</div><div>2</div></p>
"
I've played with the enterMode configuration based on some research but haven't found a magic combination. Any suggestions? (I am using 3.6.5)

I figured it out - we we're using regex to strip out some tags when pasting and this was also impacting initial values.

Related

Is it possible truncate excerpt with DOM atributtes? HTML/JS

I have tried to shorten excerpt from Wordpress REST API, I've tried maxLength HTML attribute, but it does not work.
<p
dangerouslySetInnerHTML={{ __html: excerpt.rendered }}
maxLength={10}
/>
Is there any way I can handle it within JS/React?
Thanks in advance
Your question hardly has anything to do with React, it's more about html.
HTML's maxlength attribute is applicable only to input and textarea tags, this way it will do nothing when applied to p tag.
HTML doesn't have anything similar for p natively, so, a custom implementation is needed. You obviously could do:
'Long text of the article that is going to be shortened'.slice(0, 9).concat('…')
which would produce Long text….
However, since you use dangerouslySetInnerHTML, I guess that excerpt.rendered contains HTML tags, so, you can't just slice it.
In this case, the easiest option would be to have 2 strings:
One containing ready-to-use markup.
One containing just text content.
If it's not an option, you may try to parse HTML & extract only text content (be cautious, it might produce unexpected results):
const parsedExcerpt = new DOMParser().parseFromString(excerpt.rendered, 'text/html');
const excerptText = parsedExcerpt.body.innerText.trim();
Now you could use excerptText.slice(0, 9).concat('…') (more or less) safely.

Parsing plain text Markdown from a ContentEditable div

I know there are other questions on editable divs, but I couldn't find one specific to the Markdown-related issue I have.
User will be typing inside a ContentEditable div. And he may choose to do any number of Markdown-related things like code blocks, headers, and whatever.
I am having issues extracting the source properly and storing it into my database to be displayed again later by a standard Markdown parser. I have tried two ways:
$('.content').text()
In this method, the problem is that all the line breaks are stripped out and of course that is not okay.
$('.content').html()
In this method, I can get the line breaks working fine by using regex to replace <br\> with \n before inserting into database. But the browser also wraps things like ## Heading Here with divs, like this: <div>## Heading Here</div>. This is problematic for me because when I go to display this afterwards, I don't get the proper Markdown formatting.
What's the best (most simple and reliable) way to solve this problem as of 2015?
EDIT: Found a potential solution here: http://www.davidtong.me/innerhtml-innertext-textcontent-html-and-text/
if you check the documentation of jquery's .text() method,
The result of the .text() method is a string containing the combined text of all matched elements. (Due to variations in the HTML parsers in different browsers, the text returned may vary in newlines and other white space.)
so getting whitespaces is not guaranteed in all browsers.
try using the innerText property of the element.
document.getElementsByClassName('content')[0].innerText
this returns the text with all white spacing intact. But this is not cross browser compatible. It works in IE and Chrome, but not in Firefox.
the innerText equivalent for Firefox is textContent (link), but that strips out the whitespaces.
This is what I've been able to come up with using that link I posted above in my edit. It's in Coffeescript.
div = $('.content')[0]
if div.innerText
text = div.innerText
else
escapedText = div.innerHTML
.replace(/(?:\r\<br\>|\r|\<br\>)/g, '\n')
.replace(/(\<([^\>]+)\>)/gi, "")
text = _.unescape(escapedText)
Basically, I'm checking whether or not innerText works, and if it doesn't then we do this other thing where we:
Take the HTML, which has escaped text.
Replace all the <br> tags with line breaks.
Strip out any tags (escaped ones won't be stripped, i.e. the stuff the user types).
Unescape the escaped text.

How to retrieve the text in html CDATA section?

I have the following script element section in HTML:
<script type="text/x-markdown"><![CDATA[
# hello, This is Markdown Script Demo]]></script>
When i'm trying to retrieve the inner content via scripttag.innerHTML, it returns the text with ![CDATA[...]]>parts
Is there more efficient way to retrieve the inner part of CDATA section at once instead of applying regexp to remove it from received innerHTML data?
I don't think you will be able to retreive only whats inside the CDATA as its not a tag but plain text, when you get the innerHTML of the tag you will get everything as a string, so regexp is the only way I see you could get whats inside.
CDATA is an XML concept. It is a way of specifying a section of text inside which things that look like mark-up or special XML characters are treated as plain text. It is essentially equivalent to escaping < to < etc. everywhere within the CDATA section.
If the document has an HTML doctype, then the CDATA receives no special processing and is just more characters. If the document had an XHTML doctype, then you would be able to retrieve the CDATA section as is, with no further ado.
This question is quite old, but this might help somebody.
You can probably use textContent.
Example from parsing a rss feed node which looks like this:
<title><![CDATA[This contains the title]]></title>
Javascript:
const desc = el.querySelector('title').textContent;

why javascript string replace using regex removes a "/" from my br tag

I'm using javascript with a super simple regex to replace a "<" with the HTML character code for it so I can place some code on my site using the pre and code tags and have it done automatically.
jsFiddle link
basically I'm trying to figure out why this js code:
var str = document.getElementById("cleanme").innerHTML;
str=str.replace(/</g,"<");
document.getElementById("cleanme").innerHTML = str;
removes the "/" in the br tag
<pre><code id="cleanme">
<p><br />this is some code</p>
</code></pre>
not a huge deal because I'm just displaying code, but I'd still like to know.
it outputs this:
<p><br>this is some code</p>
thanks
I believe it has to do with the way certain browsers return the innerHTML property. If you use Google Chrome, inspect any < br/ > tag using the debugging tools and you'll notice they don't show a backslash. The same is true when Chrome returns an innerHTML property, the blackslash is stripped out.
So when you pass in:
<pre><code id="cleanme">
<p><br />this is some code</p>
</code></pre>
The browser return an innerHTML property of:
<pre><code id="cleanme">
<p><br>this is some code</p>
</code></pre>
Your RegEx is not the issue.
Your script is OK.
If you try this:
var str = '<p><br />this is some code</p>';
str=str.replace(/</g,"<");
str=str.replace(/>/g,">");
document.getElementById("cleanme").innerHTML = str;
It'll correctly print <br />.
Possibly it's effect of browser's HTML normalization.
Maybe too late to help you, and you've accepted a correct answer, but there's another big potential problem.
I tried this with Firefox 3.6.11 on Linux and 3.6.12 on Windows and they both behaved the same --
I did not see the <p><br>this is some code</p> in the Result pane on your fiddle, instead I saw simply this is some code with no markup at all.
Throwing firebug at it by adding a debugger; statement as the first line in the JavaScript pane and tracing through it, I found that str was getting a value of '\n', that is, just a newline was being returned from innerHTML and nothing else.
Thinking about this, but with no way to confirm it, I suspect it's because Firefox is building the DOM tree differently than you expect, because the HTML you're using is invalid. Inline elements are not allowed to contain block elements; specifically, the <code> tag is not allowed to contain a <p> tag, and <pre> is likewise not allowed to contain a <p> tag -- again, only limited inline elements can be used inside a <pre> tag).
I think FF is implicitly closing the code block before opening the paragraph so the innerHTML of id="cleanme" is nothing but the newline. It renders with the "pre" font as you expect because you've thrown the browser into Quirks Mode.
innerHTML does not return the literal source code, but the result of the browser's interpretation of it.
Different browsers will return very different results for innerHTML, sometimes omitting some quotes and 'optional' end tags, capitalizing some tag names and attributes, and collapsing extra white-space.
And HTML does not close open tags that can't have end tags, so they are not included either.

Content inside CDATA is not displayed properly when processed through JavaScript

I have an XML document with some sample content like this:
<someTag>
<![CDATA[Hello World]]>
</someTag>
I'm parsing the above XML in JavaScript. When I try access and render the Hello World text using xmldoc.getElementsByTagName("someTag")[0].childNodes[0].textContent all I get was a blank text on screen.
The code is not returning undefined or any error messages. So I guess the code is properly accessing the message. But due to CDATA, it is not rendering properly on screen.
Anyway to fix the issue and get the Hello World out of this xml file?
Note that Firefox's behaviour is absolutely correct. someTag has three children:
A Text node containing the whitespace between the <someTag> and <!CDATA. This is one newline and one space;
the CDATASection node itself;
another whitespace Text node containing the single newline character between the end of the CDATA and the close-tag.
It's best not to rely closely on what combination of text and CDATA nodes might exist in an element if all you want is the text value inside it. Just call textContent on <someTag> itself and you'll get all the combined text content: '\n Hello World\n'. (You can .trim() this is you like.)
If you're running Firefox, maybe this is the issue you're having. The behavior looks very similair... The following might do the trick:
xmldoc.getElementsByTagName("someTag")[0].childNodes[1].textContent;

Categories

Resources