I am finding that, for my purposes, XML namespaces are simply causing much headache and are completely unnecessary. (For example, how they complicate xpath.)
Is there a simple way to remove namespaces entirely from an XML document?
(There is a related question, but it deals with removing namespace prefixes on tags, rather than namespace declarations from the document root: "Easy way to drop XML namespaces with javascript".)
Edit: Samples and more detail below:
XML:
<?xml version="1.0" ?>
<main xmlns="example.com">
<primary>
<enabled>true</enabled>
</primary>
<secondary>
<enabled>false</enabled>
</secondary>
</main>
JavaScript:
function useHttpResponse()
{
if (http.readyState == 4)
{
if(http.status == 200)
{
var xml = http.responseXML;
var evalue = getXMLValueByPath('/main/secondary/enabled', xml);
alert(evalue);
}
}
}
function getXMLValueByPath(nodepath, xml)
{
var result = xml.evaluate(nodepath, xml, null, XPathResult.STRING_TYPE, null).stringValue;
return result;
}
The sample XML is just like the actual one I am working with, albeit much shorter. Notice that there are no prefixes on the tags for the namespace. I assume this is the null or default namespace.
The JavaScript is a snippet from my ajax functions. If I remove the xmlns="example.com" portion from the main tag, I am able to successfully get the value. As long as any namespace is present, the value becomes undefined.
Edit 2:
It may be worth mentioning that none of the declared namespaces are actually used in the XML tags (like the sample above). In the actual XML file I am working with, three namespaces are declared, but no tags are prefixed with a namespace reference. Thus, perhaps the question should be re-titled, "How to remove unused XML namespaces using Javascript?" I do not see the reason to retain a namespace if it is 1) never used and 2) complicating an otherwise simple path to a node using xpath.
This should remove any namespace declaration you find:
var xml = http.responseXML.replace(/<([a-zA-Z0-9 ]+)(?:xml)ns=\".*\"(.*)>/g, "<$1$2>");
Inorder to replace all the xmlns attributes from an XML javascript string
you can try the following regex
xmlns=\"(.*?)\"
NB: This regex can be used to replace any attributes
var str = `<?xml version="1.0" ?>
<main xmlns="example.com">
<primary>
<enabled>true</enabled>
</primary>
<secondary>
<enabled>false</enabled>
</secondary>
</main>`;
str = str.replace(/xmlns=\"(.*?)\"/g, '');
console.log(str)
Approach without using regex (This removes attributes also)
let xml = '';//input
let doc = new DOMParser().parseFromString(xml,"text/xml");
var root=doc.firstElementChild;
var newdoc = new Document();
newdoc.appendChild(removeNameSpace(root));
function removeNameSpace (root){
let parentElement = document.createElement(root.localName);
let nodeChildren = root.childNodes;
for (let i = 0; i <nodeChildren.length; i++) {
let node = nodeChildren[i];
if(node.nodeType == 1){
let child
if(node.childElementCount!=0)
child = removeNameSpace(node);
else{
child = document.createElement(node.localName);
let textNode = document.createTextNode(node.innerHTML);
child.append(textNode);
}
parentElement.append(child);
}
}
return parentElement;
}
Related
I'm trying to create the following element nodetree:
<v:custProps>
<v:cp v:nameU="Cost">
</v:custProps>
with:
newCustprop = document.createElement("v:custProps");
newcp = document.createElement("v:cp");
newcp.setAttribute("v:nameU", "Cost");
newCustprop.appendChild(newcp);
However, document.createElement("v:custProps") generates <v:custprops> as opposed to <v:custProps>. Is there anyway to escape this parsing?
Edit 1:
I'm currently reading this article on nodename case sensitivity. It's slightly irrelevant to my problem though because my code is unparsed with <![CDATA]]> and I'd rather not use .innerHTML.
You need to use createElementNS()/setAttributeNS() and provide the namespace, not only the alias/prefix. The example uses urn:v as namespace.
var xmlns_v = "urn:v";
var newCustprop = document.createElementNS(xmlns_v, "v:custProps");
var newcp = document.createElementNS(xmlns_v, "v:cp");
newcp.setAttributeNS(xmlns_v, "v:nameU", "Cost");
newCustprop.appendChild(newcp);
var xml = (new XMLSerializer).serializeToString(newCustprop);
xml:
<v:custProps xmlns:v="urn:v"><v:cp v:nameU="Cost"/></v:custProps>
It's not recommended to use document.createElement for qualified names. See if the document.createElementNS can better serve your purposes.
I still had issues where createElementNs would attach an attribute of "xmls" on my string about using new XMLSerializer().serializeToString(xmlDoc).
I ended up using the following function to create elements with case sensitive tag names:
function createElement(tagName) {
const doc = new DOMParser().parseFromString(`<${tagName}></${tagName}>`, 'text/xml')
return doc.children[0]
}
I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.
Current solutions
(1) Using javascript, creating virtual div tag and get the text
function remove_tags(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent||tmp.innerText;
}
(2) Using regex
function remove_tags(html)
{
return html.replace(/<(?:.|\n)*?>/gm, '');
}
(3) Using JQuery
function remove_tags(html)
{
return jQuery(html).text();
}
These three solutions are working correctly, but if the string is like this
<div> hello <hi all !> </div>
stripped string is like
hello . But I need only remove html tags only. like hello <hi all !>
Edited: Background is, I want to remove all the user input html tags for a particular text area. But I want to allow users to enter <hi all> kind of text. In current approach, its remove any content which include within <>.
Using a regex might not be a problem if you consider a different approach. For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names:
var protos = document.body.constructor === window.HTMLBodyElement;
validHTMLTags =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;
function sanitize(txt) {
var // This regex normalises anything between quotes
normaliseQuotes = /=(["'])(?=[^\1]*[<>])[^\1]*\1/g,
normaliseFn = function ($0, q, sym) {
return $0.replace(/</g, '<').replace(/>/g, '>');
},
replaceInvalid = function ($0, tag, off, txt) {
var
// Is it a valid tag?
invalidTag = protos &&
document.createElement(tag) instanceof HTMLUnknownElement
|| !validHTMLTags.test(tag),
// Is the tag complete?
isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;
return invalidTag || !isComplete ? '<' + tag : $0;
};
txt = txt.replace(normaliseQuotes, normaliseFn)
.replace(/<(\w+)/g, replaceInvalid);
var tmp = document.createElement("DIV");
tmp.innerHTML = txt;
return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
}
Working Demo: http://jsfiddle.net/m9vZg/3/
This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag. It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.
It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag. If the element is an instance of HTMLUnknownElement, we know that it's not a valid HTML tag. The validHTMLTags regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.
If you want to keep invalid markup untouched, regular expressions is your best bet. Something like this might work:
text = html.replace(/<\/?(span|div|img|p...)\b[^<>]*>/g, "")
Expand (span|div|img|p...) into a list of all tags (or only those you want to remove). NB: the list must be sorted by length, longer tags first!
This may provide incorrect results in some edge cases (like attributes with <> characters), but the only real alternative would be to program a complete html parser by yourself. Not that it would be extremely complicated, but might be an overkill here. Let us know.
var StrippedString = OriginalString.replace(/(<([^>]+)>)/ig,"");
Here is my solution ,
function removeTags(){
var txt = document.getElementById('myString').value;
var rex = /(<([^>]+)>)/ig;
alert(txt.replace(rex , ""));
}
I use regular expression for preventing HTML tags in my textarea
Example
<form>
<textarea class="box"></textarea>
<button>Submit</button>
</form>
<script>
$(".box").focusout( function(e) {
var reg =/<(.|\n)*?>/g;
if (reg.test($('.box').val()) == true) {
alert('HTML Tag are not allowed');
}
e.preventDefault();
});
</script>
<script type="text/javascript">
function removeHTMLTags() {
var str="<html><p>I want to remove HTML tags</p></html>";
alert(str.replace(/<[^>]+>/g, ''));
}</script>
I'm writing a little program in JavaScript in which I want to parse the following little XML snippet:
<iq xmlns="jabber:client" other="attributes">
<query xmlns="jabber:iq:roster">
<item subscription="both" jid="romeo#example.com"></item>
</query>
</iq>
Because I don't know, if the elements and attributes have namespace prefixes, I'm using the namespace-aware functions (getElementsByTagNameNS, getAttributeNS).
var queryElement = iq.getElementsByTagNameNS('jabber:iq:roster', 'query')[0];
if (queryElement) {
var itemElements = queryElement.getElementsByTagNameNS('jabber:iq:roster', 'item');
for (var i = itemElements.length - 1; i >= 0; i--) {
var itemElement = itemElements[i];
var jid = itemElement.getAttributeNS('jabber:iq:roster', 'jid');
};
};
With this code I don't get the value of the attribute jid (I get an empty string), but when I use itemElement.getAttribute('jid') instead of itemElement.getAttributeNS('jabber:iq:roster', 'jid') I'm getting the expected result.
How can I write the code in a namespace-aware manner? In my understanding of XML, the namespace of the attribute jid has the namespace jabber:iq:roster and therefore the function getAttributeNS should return the value romeo#example.com.
[UPDATE] The problem was (or is) my understanding of the use of namespaces together with XML attributes and is not related to the DOM API. Therefor I created an other question: XML Namespaces and Unprefixed Attributes. Also because XML namespaces and attributes unfortunately doesn't give me an answer.
[UPDATE] What I did now, is to first check if there is the attribute without a namespace and then if it is there with a namespace:
var queryElement = iq.getElementsByTagNameNS('jabber:iq:roster', 'query')[0];
if (queryElement) {
var itemElements = queryElement.getElementsByTagNameNS('jabber:iq:roster', 'item');
for (var i = itemElements.length - 1; i >= 0; i--) {
var itemElement = itemElements[i];
var jid = itemElement.getAttribute('jid') || itemElement.getAttributeNS('jabber:iq:roster', 'jid');
};
};
The important thing is that attributes don't get the namespace until you explicitly prefix them with it:
A default namespace declaration applies to all unprefixed element names within its scope. Default namespace declarations do not apply directly to attribute names
This is unlike elements that do inherit the default namespace from the parent unless have their own defined. With that said, your attributes are not namespaced and that's why getAttribute() works and getAttributeNS() with a namespace value doesn't.
Your source XML would need to look something like this to "namespace" the attribute:
<a:query xmlns:a="jabber:iq:roster">
<a:item a:subscription="both" a:jid="romeo#example.com"></a:item>
</a:query>
Here's some more on the subject: XML namespaces and attributes.
If you want to only use the namespace-aware methods then it should (not sure though, might be implementation specific) work for you with null namespace. Try getAttributeNS(null, "jid"). If it doesn't, you can always work around it with the hasAttributeNS() and only then a fallback to getAttributeNS() or getAttribute().
I am currently trying to code an input form where you can type and format a text for later use as XML entries. In order to make the HTML code XML-readable, I have to replace the code brackets with the corresponding symbol codes, i.e. < with < and > with >.
The formatted text gets transferred as HTML code with the variable inputtext, so we have for example the text
The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.
which needs to get converted into
The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.
I tried it with the .replace() function:
inputxml = inputxml.replace("<", "<");
inputxml = inputxml.replace(">", ">");
But this would just replace the first occurrence of the brackets. I'm pretty sure I need some sort of loop for this; I also tried using the each() function from jQuery (a friend recommended I looked at the jQuery package), but I'm still new to coding in general and I have troubles getting this to work.
How would you code a loop which would replace the code brackets within a variable as described above?
Additional information
You are, of course, right in the assumption that this is part of something larger. I am a graduate student in Japanese studies and currently, I am trying to visualize information about Japenese history in a more accessible way. For this, I am using the Simile Timeline API developed by MIT grad students. You can see a working test of a timeline on my homepage.
The Simile Timeline uses an API based on AJAX and Javascript. If you don't want to install the AJAX engine on your own server, you can implement the timeline API from the MIT. The data for the timeline is usually provided either by one or several XML files or JSON files. In my case, I use XML files; you can have a look at the XML structure in this example.
Within the timeline, there are so-called "events" on which you can click in order to reveal additional information within an info bubble popup. The text within those info bubbles originates from the XML source file. Now, if you want to do some HTML formatting within the info bubbles, you cannot use code bracket because those will just be displayed as plain text. It works if you use the symbol codes instead of the plain brackets, however.
The content for the timeline will be written by people absolutely and totally not accustomed to codified markup, i.e. historians, art historians, sociologists, among them several persons of age 50 and older. I have tried to explain to them how they have to format the XML file if they want to create a timeline, but they occasionally slip up and get frustrated when the timeline doesn't load because they forgot to close a bracket or to include an apostrophe.
In order to make it easier, I have tried making an easy-to-use input form where you can enter all the information and format the text WYSIWYG style and then have it converted into XML code which you just have to copy and paste into the XML source file. Most of it works, though I am still struggling with the conversion of the text markup in the main text field.
The conversion of the code brackets into symbol code is the last thing I needed to get working in order to have a working input form.
look here:
http://www.bradino.com/javascript/string-replace/
just use this regex to replace all:
str = str.replace(/\</g,"<") //for <
str = str.replace(/\>/g,">") //for >
To store an arbitrary string in XML, use the native XML capabilities of the browser. It will be a hell of a lot simpler that way, plus you will never have to think about the edge cases again (for example attribute values that contain quotes or pointy brackets).
A tip to think of when working with XML: Do never ever ever build XML from strings by concatenation if there is any way to avoid it. You will get yourself into trouble that way. There are APIs to handle XML, use them.
Going from your code, I would suggest the following:
$(function() {
$("#addbutton").click(function() {
var eventXml = XmlCreate("<event/>");
var $event = $(eventXml);
$event.attr("title", $("#titlefield").val());
$event.attr("start", [$("#bmonth").val(), $("#bday").val(), $("#byear").val()].join(" "));
if (parseInt($("#eyear").val()) > 0) {
$event.attr("end", [$("#emonth").val(), $("#eday").val(), $("#eyear").val()].join(" "));
$event.attr("isDuration", "true");
} else {
$event.attr("isDuration", "false");
}
$event.text( tinyMCE.activeEditor.getContent() );
$("#outputtext").val( XmlSerialize(eventXml) );
});
});
// helper function to create an XML DOM Document
function XmlCreate(xmlString) {
var x;
if (typeof DOMParser === "function") {
var p = new DOMParser();
x = p.parseFromString(xmlString,"text/xml");
} else {
x = new ActiveXObject("Microsoft.XMLDOM");
x.async = false;
x.loadXML(xmlString);
}
return x.documentElement;
}
// helper function to turn an XML DOM Document into a string
function XmlSerialize(xml) {
var s;
if (typeof XMLSerializer === "function") {
var x = new XMLSerializer();
s = x.serializeToString(xml);
} else {
s = xml.xml;
}
return s
}
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/replace
You might use a regular expression with the "g" (global match) flag.
var entities = {'<': '<', '>': '>'};
'<inputtext><anotherinputext>'.replace(
/[<>]/g, function (s) {
return entities[s];
}
);
You could also surround your XML entries with the following:
<![CDATA[...]]>
See example:
<xml>
<tag><![CDATA[The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.]]></tag>
</xml>
Wikipedia Article:
http://en.wikipedia.org/wiki/CDATA
What you really need, as mentioned in comments, is to XML-encode the string. If you absolutely want to do this is Javascript, have a look at the PHP.js function htmlentities.
I created a simple JS function to replace Greater Than and Less Than characters
Here is an example dirty string: < noreply#email.com >
Here is an example cleaned string: [ noreply#email.com ]
function RemoveGLthanChar(notes) {
var regex = /<[^>](.*?)>/g;
var strBlocks = notes.match(regex);
strBlocks.forEach(function (dirtyBlock) {
let cleanBlock = dirtyBlock.replace("<", "[").replace(">", "]");
notes = notes.replace(dirtyBlock, cleanBlock);
});
return notes;
}
Call it using
$('#form1').submit(function (e) {
e.preventDefault();
var dirtyBlock = $("#comments").val();
var cleanedBlock = RemoveGLthanChar(dirtyBlock);
$("#comments").val(cleanedBlock);
this.submit();
});
I can use the getElementsByTagName() function to get a collection of elements from an element in a web page.
I would like to be able to use a similar function on the contents of a javascript string variable instead of the contents of a DOM element.
How do I do this?
EDIT
I can do this by creating an element on the fly.
var myElement = new Element('div');
myElement.innerHTML = "<strong>hello</strong><em>there</em><strong>hot stuff</strong>";
var emCollection = myElement.getElementsByTagName('em');
alert(emCollection.length); // This gives 1
But creating an element on the fly for the convenience of using the getElementsByTagName() function just doesn't seem right and doesn't work with elements in Internet Explorer.
Injecting the string into DOM, as you have shown, is the easiest, most reliable way to do this. If you operate on a string, you will have to take into account all the possible escaping scenarios that would make something that looks like a tag not actually be a tag.
For example, you could have
<button value="<em>"/>
<button value="</em>"/>
in your markup - if you treat it as a string, you may think you have an <em> tag in there, but in actuality, you only have two button tags.
By injecting into DOM via innerHTML you are taking advantage of the browser's built-in HTML parser, which is pretty darn fast. Doing the same via regular expression would be a pain, and browsers don't generally provide DOM like functionality for finding elements within strings.
One other thing you could try would be parsing the string as XML, but I suspect this would be more troublesome and slower than the DOM injection method.
function countTags(html, tagName) {
var matches = html.match(new RegExp("<" + tagName + "[\\s>]", "ig"));
return matches ? matches.length : 0;
}
alert(
countTags(
"<strong>hello</strong><em>there</em><strong>hot stuff</strong>",
"em"
)
); // 1
var domParser = new DOMParser();
var htmlString = "<strong>hello</strong><em>there</em><strong>hot stuff</strong>";
var docElement = domParser.parseFromString(htmlString, "text/html").documentElement;
var emCollection = docElement.getElementsByTagName("em");
for (var i = 0; i < emCollection.length; i++) {
console.log(emCollection[i]);
}
HTML in a string is nothing special. It's just text in a string. It needs to be parsed into a tree for it to be useful. This is why you need to create an element, then call getElementsByTagName on it, as you show in your example.