Parsing MathML to plain math expression - javascript

I am using MathDox formula editor to produce MathML. Now I want to convert the MathML produced by MathDox to expression which I can later use to evaluate to find the answer.
For eg:
MathML:
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow>
<mn>3</mn>
<mo>+</mo>
<mn>5</mn>
</mrow>
</math>
Want to convert to expression as:
3+5
Now I can use 3+5 to get answer 8.
I am in a search of javascript or c# solution for this conversion. Tried to google it, but didn't get much help. Somewhat closer solution I found here, but it is a desktop application and commercial too. However, I want open source web app solution for my problem. Any help will be appreciated.
Note: For simplicity I've mentioned only simple addition in example above but the mathml can also contain complex epression like derivations and log.

This can be achieved using the following steps in JavaScript:
Convert from MathML to XML DOM
Convert from XML DOM to plain text
Use the "eval" function to get the decimal value of the expression
The following code does precisely that:
function getDOM(xmlstring) {
parser=new DOMParser();
return parser.parseFromString(xmlstring, "text/xml");
}
function remove_tags(node) {
var result = "";
var nodes = node.childNodes;
var tagName = node.tagName;
if (!nodes.length) {
if (node.nodeValue == "π") result = "pi";
else if (node.nodeValue == " ") result = "";
else result = node.nodeValue;
} else if (tagName == "mfrac") {
result = "("+remove_tags(nodes[0])+")/("+remove_tags(nodes[1])+")";
} else if (tagName == "msup") {
result = "Math.pow(("+remove_tags(nodes[0])+"),("+remove_tags(nodes[1])+"))";
} else for (var i = 0; i < nodes.length; ++i) {
result += remove_tags(nodes[i]);
}
if (tagName == "mfenced") result = "("+result+")";
if (tagName == "msqrt") result = "Math.sqrt("+result+")";
return result;
}
function stringifyMathML(mml) {
xmlDoc = getDOM(mml);
return remove_tags(xmlDoc.documentElement);
}
// Some testing
s = stringifyMathML("<math><mn>3</mn><mo>+</mo><mn>5</mn></math>");
alert(s);
alert(eval(s));
s = stringifyMathML("<math><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>+</mo><mn>1</mn></math>");
alert(s);
alert(eval(s));
s = stringifyMathML("<math><msup><mn>2</mn><mn>4</mn></msup></math>");
alert(s);
alert(eval(s));
s = stringifyMathML("<math><msqrt><mn>4</mn></msqrt></math>");
alert(s);
alert(eval(s));
Following the previous code, it is possible to extend the accepted MathML. For example, it would be easy to add trigonometry or any other custom function.
For the purpose of this post, I used the tool from mathml editor to build the MathML (used in the test part of the code).

Related

Make a mountain out of a molehill by replacing it with JavaScript

I want to replace multiple words on a website with other words. That is, I am interested in finding all instances of a source word and replacing it with a target word.
Sample Cases:
Source | Target
Molehill => Mountain
Green => Grey
Google => <a href="http://google.com">
Sascha => Monika
Football => Soccer
This is somewhat of a half answer. It shows the basic process, but also illustrates some of the inherent difficulties in a process like this. Detecting capitalization and properly formatting the replacements would be a bit intensive (probably utilizing something like this on a case-by-case basis How can I test if a letter in a string is uppercase or lowercase using JavaScript?). Also, when dealing with text nodes, innerHTML isn't an option, so the google replacement comes out as plain text instead of HTML.
TLDR - If you have another way to do this that doesn't involve javascript, do it that way.
var body = document.querySelector('body')
function textNodesUnder(el){
var n, a=[], walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode()) a.push(n);
return a;
}
function doReplacements(txt){
txt = txt.replace(/sascha/gi, 'monika')
txt = txt.replace(/mountain/gi, 'molehill')
txt = txt.replace(/football/gi, 'soccer')
txt = txt.replace(/google/gi, 'google')
console.log(txt)
return txt
}
var textnodes = textNodesUnder(body),
len = textnodes.length,
i = -1, node
console.log(textnodes)
while(++i < len){
node = textnodes[i]
node.textContent = doReplacements(node.textContent)
}
<div>Mountains of Sascha</div>
<h1>Playing football, google it.</h1>
<p>Sascha Mountain football google</p>
Here is the JS:
function replaceWords () {
var toReplace = [
["Green","Grey"],
["Google","<a href='http://google.com'>"]
];
var input = document.getElementById("content").innerHTML;
console.log("Input: " + input);
for (var i = 0; i < toReplace.length; i++) {
var reg = new RegExp(toReplace[i][0],"g");
input = input.replace(reg,toReplace[i][1]);
}
document.getElementById("content").innerHTML = input;
};
replaceWords();

Get Array of Strings In Between Two Strings with Javascript

This question has been asked a few times before, here's an example. However, the question linked only asks about getting one string out of the result. The text I would like to parse has many different instances of the trailing and leading strings, and thus the code below does not work:
test.match("SomeString(.*)TrailingString");
As shown in this fiddle. I will show you the intended result below:
If I were to have a string composed of the following elements STARTINGTEXTText I wantENDINGTEXT Text I don't want STARTINGTEXTMore text I wantENDINGTEXT Text I don't want
I would like to have a function that I can pass in the arguments STARTINGTEXT and ENDINGTEXT and it would return an array with "Text I want" and "More text I want"
Thanks!
EDIT - This is a Pebble Application so JQuery isn't an option.
This similar thing has been done in Objective-C:
-(NSMutableArray*)stringsBetweenString:(NSString*)start andString:(NSString*)end
{
NSMutableArray* strings = [NSMutableArray arrayWithCapacity:0];
NSRange startRange = [self rangeOfString:start];
for( ;; )
{
if (startRange.location != NSNotFound)
{
NSRange targetRange;
targetRange.location = startRange.location + startRange.length;
targetRange.length = [self length] - targetRange.location;
NSRange endRange = [self rangeOfString:end options:0 range:targetRange];
if (endRange.location != NSNotFound)
{
targetRange.length = endRange.location - targetRange.location;
[strings addObject:[self substringWithRange:targetRange]];
NSRange restOfString;
restOfString.location = endRange.location + endRange.length;
restOfString.length = [self length] - restOfString.location;
startRange = [self rangeOfString:start options:0 range:restOfString];
}
else
{
break;
}
}
else
{
break;
}
}
return strings;
}
If you would prefer a RegExp solution, you could do something like this:
var test = "STARTINGTEXTText I wantENDINGTEXT Text I don't want STARTINGTEXTMore text I wantENDINGTEXT Text I don't want";
var matches = test.match(/STARTINGTEXT(.*?)ENDINGTEXT/g);
The key to this is the "g" (or global) flag, and the non-greedy repeat operator "*?". See this link for an explanation of the "g" flag and the non-greedy operator.
Here is a modification of your fiddle: link. I changed it so that the alert would show a stringified JSON of the results, so that you could see it matching both strings.
This methodology uses very little code:
function getBetweenText(fromString, ignoreStart, ignoreEnd){
var s = fromString.split(new RegExp(ignoreStart+'|'+ignoreEnd)), r = [];
for(var i=1,l=s.length; i<l; i+=2){
r.push(s[i]);
}
return r;
}
console.log(getBetweenText("STARTINGTEXTText I wantENDINGTEXT Text I don't want STARTINGTEXTMore text I wantENDINGTEXT Text I don't want", 'STARTINGTEXT', 'ENDINGTEXT'));
You can do this using jQuery. To select all the elements with specific tag you just do something like this: ** UPDATED WITH NON-JQUERY VERSION **
var HTMLelements = document.getElementsByTagName("tag");
var results = [];
for(var i = 0; i < HTMLelements.length; i++){
results.push(HTMLelements[i].innerHTML);
}

What's the right way to decode a string that has special HTML entities in it? [duplicate]

This question already has answers here:
Unescape HTML entities in JavaScript?
(33 answers)
Closed 5 years ago.
Say I get some JSON back from a service request that looks like this:
{
"message": "We're unable to complete your request at this time."
}
I'm not sure why that apostraphe is encoded like that ('); all I know is that I want to decode it.
Here's one approach using jQuery that popped into my head:
function decodeHtml(html) {
return $('<div>').html(html).text();
}
That seems (very) hacky, though. What's a better way? Is there a "right" way?
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Don’t use the DOM to do this if you care about legacy compatibility. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results on non-modern browsers.
For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.
Here’s how you’d use it:
he.decode("We're unable to complete your request at this time.");
→ "We're unable to complete your request at this time."
Disclaimer: I'm the author of the he library.
See this Stack Overflow answer for some more info.
If you don't want to use html/dom, you could use regex. I haven't tested this; but something along the lines of:
function parseHtmlEntities(str) {
return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
var num = parseInt(numStr, 10); // read num as normal number
return String.fromCharCode(num);
});
}
[Edit]
Note: this would only work for numeric html-entities, and not stuff like &oring;.
[Edit 2]
Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/
There's JS function to deal with &#xxxx styled entities:
function at GitHub
// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
return str.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
};
var encodeHtmlEntity = function(str) {
var buf = [];
for (var i=str.length-1;i>=0;i--) {
buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
}
return buf.join('');
};
var entity = '高级程序设计';
var str = '高级程序设计';
let element = document.getElementById("testFunct");
element.innerHTML = (decodeHtmlEntity(entity));
console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true
<div><span id="testFunct"></span></div>
jQuery will encode and decode for you.
function htmlDecode(value) {
return $("<textarea/>").html(value).text();
}
function htmlEncode(value) {
return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
$("#encoded")
.text(htmlEncode("<img src onerror='alert(0)'>"));
$("#decoded")
.text(htmlDecode("<img src onerror='alert(0)'>"));
});
</script>
<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>
_.unescape does what you're looking for
https://lodash.com/docs/#unescape
This is so good answer. You can use this with angular like this:
moduleDefinitions.filter('sanitize', ['$sce', function($sce) {
return function(htmlCode) {
var txt = document.createElement("textarea");
txt.innerHTML = htmlCode;
return $sce.trustAsHtml(txt.value);
}
}]);

Convert HTML Character Entities back to regular text using javascript

the questions says it all :)
eg. we have >, we need > using only javascript
Update: It seems jquery is the easy way out. But, it would be nice to have a lightweight solution. More like a function which is capable to do this by itself.
You could do something like this:
String.prototype.decodeHTML = function() {
var map = {"gt":">" /* , … */};
return this.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);?/gi, function($0, $1) {
if ($1[0] === "#") {
return String.fromCharCode($1[1].toLowerCase() === "x" ? parseInt($1.substr(2), 16) : parseInt($1.substr(1), 10));
} else {
return map.hasOwnProperty($1) ? map[$1] : $0;
}
});
};
function decodeEntities(s){
var str, temp= document.createElement('p');
temp.innerHTML= s;
str= temp.textContent || temp.innerText;
temp=null;
return str;
}
alert(decodeEntities('<'))
/* returned value: (String)
<
*/
I know there are libraries out there, but here are a couple of solutions for browsers. These work well when placing html entity data strings into human editable areas where you want the characters to be shown, such as textarea's or input[type=text].
I add this answer as I have to support older versions of IE and I feel that it wraps up a few days worth of research and testing. I hope somebody finds this useful.
First this is for more modern browsers using jQuery, Please note that this should NOT be used if you have to support versions of IE before 10 (7, 8, or 9) as it will strip out the newlines leaving you with just one long line of text.
if (!String.prototype.HTMLDecode) {
String.prototype.HTMLDecode = function () {
var str = this.toString(),
$decoderEl = $('<textarea />');
str = $decoderEl.html(str)
.text()
.replace(/<br((\/)|( \/))?>/gi, "\r\n");
$decoderEl.remove();
return str;
};
}
This next one is based on kennebec's work above, with some differences which are mostly for the sake of older IE versions. This does not require jQuery, but does still require a browser.
if (!String.prototype.HTMLDecode) {
String.prototype.HTMLDecode = function () {
var str = this.toString(),
//Create an element for decoding
decoderEl = document.createElement('p');
//Bail if empty, otherwise IE7 will return undefined when
//OR-ing the 2 empty strings from innerText and textContent
if (str.length == 0) {
return str;
}
//convert newlines to <br's> to save them
str = str.replace(/((\r\n)|(\r)|(\n))/gi, " <br/>");
decoderEl.innerHTML = str;
/*
We use innerText first as IE strips newlines out with textContent.
There is said to be a performance hit for this, but sometimes
correctness of data (keeping newlines) must take precedence.
*/
str = decoderEl.innerText || decoderEl.textContent;
//clean up the decoding element
decoderEl = null;
//replace back in the newlines
return str.replace(/<br((\/)|( \/))?>/gi, "\r\n");
};
}
/*
Usage:
var str = ">";
return str.HTMLDecode();
returned value:
(String) >
*/
Here is a "class" for decoding whole HTML document.
HTMLDecoder = {
tempElement: document.createElement('span'),
decode: function(html) {
var _self = this;
html.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);/gi,
function(str) {
_self.tempElement.innerHTML= str;
str = _self.tempElement.textContent || _self.tempElement.innerText;
return str;
}
);
}
}
Note that I used Gumbo's regexp for catching entities but for fully valid HTML documents (or XHTML) you could simpy use /&[^;]+;/g.
There is nothing built in, but there are many libraries that have been written to do this.
Here is one.
And here one that is a jQuery plugin.

JavaScript string.format function does not work in IE

I have a JavaScript from this source in a comment of a blog: frogsbrain
It's a string formatter, and it works fine in Firefox, Google Chrome, Opera and Safari.
Only problem is in IE, where the script does no replacement at all. The output in both test cases in IE is only 'hello', nothing more.
Please help me to get this script working in IE also, because I'm not the Javascript guru and I just don't know where to start searching for the problem.
I'll post the script here for convenience. All credits go to Terence Honles for the script so far.
// usage:
// 'hello {0}'.format('world');
// ==> 'hello world'
// 'hello {name}, the answer is {answer}.'.format({answer:'42', name:'world'});
// ==> 'hello world, the answer is 42.'
String.prototype.format = function() {
var pattern = /({?){([^}]+)}(}?)/g;
var args = arguments;
if (args.length == 1) {
if (typeof args[0] == 'object' && args[0].constructor != String) {
args = args[0];
}
}
var split = this.split(pattern);
var sub = new Array();
var i = 0;
for (;i < split.length; i+=4) {
sub.push(split[i]);
if (split.length > i+3) {
if (split[i+1] == '{' && split[i+3] == '}')
sub.push(split[i+1], split[i+2], split[i+3]);
else {
sub.push(split[i+1], args[split[i+2]], split[i+3]);
}
}
}
return sub.join('')
}
I think the issue is with this.
var pattern = /({?){([^}]+)}(}?)/g;
var split = this.split(pattern);
Javascript's regex split function act different in IE than other browser.
Please take a look my other post in SO
var split = this.split(pattern);
string.split(regexp) is broken in many ways on IE (JScript) and is generally best avoided. In particular:
it does not include match groups in the output array
it omits empty strings
alert('abbc'.split(/(b)/)) // a,c
It would seem simpler to use replace rather than split:
String.prototype.format= function(replacements) {
return this.replace(String.prototype.format.pattern, function(all, name) {
return name in replacements? replacements[name] : all;
});
}
String.prototype.format.pattern= /{?{([^{}]+)}}?/g;

Categories

Resources