how to convert a decimal unicode into string using Javascript/Node - javascript

There are some sentences in my database ( Arabic ) has decimal uni codes for quotations mark and some other elements like it.
an example of a text I have:
"كريم نجار: تداعيات “كورونا” ستغير مستقبل سوق السيارات العالمية وقد تشهد السوق المحلية إرتفاعاً في الأسعار"
I searched on how to decode something like this in NodeJS but I didn't find anything useful, for example, I have tried the unescape package but didn't work for me.

A possible simple solution is to push your string into an HTML text area and read the output back. This will not work in Node.
<script>
function decodeEntity(inStr) {
var textarea = document.createElement("textarea");
textarea.innerHTML = inStr;
return textarea.value;
}
let str = "كريم نجار: تداعيات “كورونا” ستغير مستقبل سوق السيارات العالمية وقد تشهد السوق المحلية إرتفاعاً في الأسعار"
console.log(decodeEntity(str));
</script>

Related

Can't get values past array[0] to translate properly

Okay, to start with I should mention this is a very small personal project, and I've only have a handful of coding classes several years ago now. I can figure out a lot of the (very) basics, but have a hard time troubleshooting. I'm in a little bit over my head here, and need a dumbed down solution.
I'm trying to put together a VERY simple translator that takes in a word or sentence from the user via a text input box, puts each word of the string into an array, translates each word in order, then spits out each translated word in the order it was input. For example, typing "I like cats" would output "Ich mag Katze" in German.
I've got most of it, but I CAN'T get anything but the first array element to translate. It comes out like "Ich like cats".
I've used a loop, probably because I'm an amateur and don't know another way of doing this, and I'd rather not use any libraries or anything. This is a very small project I want to have a couple of friends utilize locally; and I know there has to be some very simple code that will just take a string, put it into an array, swap one word for another word, and then output the results, but I'm damned if I can make it work.
What I currently have is the closest I've gotten, but like I said, it doesn't work. I've jerry-rigged the loop and clearly that's the totally wrong approach, but I can't see the forest for the trees. If you can help me, please make it "Javascript for Babies" picture book levels of simple, I cannot stress enough how inexperienced I am. This is just supposed to be a fun little extra thing for my D&D group.
function checkForTranslation(input, outputDiv) {
var input = document.getElementById("inputTextField").value;
var outputDiv = document.getElementById("translationOutputDiv");
input = input.toLowerCase();
//puts user input into an array and then outputs it word by word
const myArray = input.split(" "); //added .split, thank you James, still otherwise broken
let output = "";
let translation = "";
for (let i = 0; i < myArray.length; i++) {
output += myArray[i]; //up to here, this works perfectly to put each word in the string into an array
//prints all words but doesnt translate the second onwards
translation += myArray[i];
if (output == "") {
//document.getElementById("print2").innerHTML = "Translation Here";
}
else if (output == "apple") {
translation = "x-ray";
}
else if (output == "banana") {
translation = "yak";
}
else {
translation = "???";
}
output += " "; //adds a space when displaying original user input
} // END FOR LOOP
document.getElementById("print").innerHTML = output; //this outputs the original user input to the screen
document.getElementById("print3").innerHTML = translation; //this should output the translated output to the screen
} // END FUNCTION CHECKFORTRANSLATION
What it looks like
P.S. I'm not worried about Best Practices here, this is supposed to be a quickie project that I can send to a couple friends and they can open the HTML doc, saved locally, in their browser when they want to mess around with it if they want their half-orc character to say "die by my hammer!" or something. If you have suggestions for making it neater great, but I'm not worried about a mess, no one is going to be reading this but me, and hopefully once it's fixed I'll never have to read it again either!
Since it is a manual simple translation, you should just create a "dictionary" and use it to get the translations.
var dictionary = {
"apple": "x-ray",
"banana": "yak"
}
function checkForTranslation() {
var input = document.getElementById("inputTextField").value.toLowerCase();
var words = input
.split(' ') // split string to words
.filter(function(word) { // remove empty words
return word.length > 0
});
var translatedWords = words.map(function(word) {
var wordTranslation = dictionary[word]; // get from dictionary
if (wordTranslation) {
return wordTranslation;
} else { // if word was not found in dictionary
return "???";
}
});
var translatedText = translatedWords.join(' ');
document.getElementById("translationOutputDiv").innerHTML = translatedText;
}
document.getElementById('translate').addEventListener('click', function() {
checkForTranslation();
});
<input type="text" id="inputTextField" />
<button id="translate">translate</button>
<br/>
<hr />
<div id="translationOutputDiv"></div>
Or if you want it a little more organized, you could use
const dictionary = {
"apple": "x-ray",
"banana": "yak"
}
function getTranslation(string) {
return string
.toLowerCase()
.split(' ')
.filter(word => word)
.map(word => dictionary[word] || '???')
.join(' ');
}
function translate(inputEl, outputEl) {
outputEl.innerHTML = getTranslation(inputEl.value);
}
document.querySelector('#translate').addEventListener('click', function() {
const input = document.querySelector('#inputTextField');
const output = document.querySelector('#translationOutputDiv');
translate(input, output);
});
<input type="text" id="inputTextField" />
<button id="translate">translate</button>
<br/>
<hr />
<div id="translationOutputDiv"></div>

Arabic Text issue with PDFKit plugin

To generate dynamic PDF files, I'm using PDFKit.
The generation works fine, but I'm having trouble displaying arabic characters, even after installing an arabic font.
Also, Arabic text is generated correctly, but I believe the word order is incorrect.
As an example,
I'm currently using pdfkit: "0.11.0"
Text: مرحبا كيف حالك ( Hello how are you )
Font: Amiri-Regular.ttf
const PDFDocument = require("pdfkit");
var doc = new PDFDocument({
size: [595.28, 841.89],
margins: {
top: 0,
bottom: 0,
left: 0,
right: 0,
},
});
const customFont = fs.readFileSync(`${_tmp}/pdf/Amiri-Regular.ttf`);
doc.registerFont(`Amiri-Regular`, customFont);
doc.fontSize(15);
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك");
doc.pipe(fs.createWriteStream(`${_tmp}/pdf/arabic.pdf`));
doc.end();
OUTPUT:
PDF with arabic text
this problem allowed me to go through here, but unfortunately I am not convinced by the answers posted and even add a library to change the direction of the text with pdfkit.
after several minutes on the pdfkit guide docs, here is the solution:
doc.text("مرحبا كيف حالك", {features: ['rtla']})
You are right the order of the Arabic words are wrong and you habe to set-up the direction of the sentence
try to use this
doc.rtl(true);
or This as a configuration for single line or text
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك", {rtl: true});
Answer adapted from the info here:
install the package: npm install twitter_cldr
Run this function to generate the text:
const TwitterCldr = TwitterCldrLoader.load("en");
private maybeRtlize(text: string) {
if (this.isHebrew(text)) {
var bidiText = TwitterCldr.Bidi.from_string(text, { direction: "RTL" });
bidiText.reorder_visually();
return bidiText.toString();
} else {
return text;
}
}
Value = maybeRtlize("مرحبا كيف حالك")
doc.font(`Amiri-Regular`).fillColor("black").text(Value);
Another method that's also possible is to reverse the text (using something such as text.split(' ').reverse().join(' ');, however while this will work for simple arabic text, it will start having issues the moment you introduce English-numericals for example. so the first method is recommended.
I would suggest you do one of the following depending on your needs
1 ) if you have a low number of doc.text functions used to generate the document you can add {features: ['rtla']} as second parameter to the function as follows:
doc.text('تحية طيبة وبعد', { features: ['rtla'] });
2 ) if you have many calls to doc.text instead of adding {features: ['rtla']} as a parameter to each call, you can reverse all you text before hand by iterating on your data object and reversing the word order as follows:
let str = "السلام عليكم ورحمة الله وبركاته";
str = str.split(' ').reverse().join(' ');
doc.text(str);

Getting JQuery from given HTML text?

I got a question while I parse html using JQuery.
Let me have a simple example for my question.
As you might definitely know, when I need to parse ...
<li class="info"> hello </li>
I get text by
$(".info").text()
my question is.. for given full html and token of text ,can I find query string ?
in case of above example, what I want to get is.
var queryStr = findQuery(html,"hello") // queryStr = '.info'
I know there might be more than one result and there would be various type of expression according to DOM hierarchy.
So.. generally... If some text (in this example, 'hello' ) is unique in the whole HTML, I might guess there must be a unique and shortest 'query' string which satisfies $(query).text() = "hello"
My question is.. If my guess is valid, How can I get unique (and if possible, shortest ) query string for each given unique text.
any hint will be appreciated, and thx for your help guys!
I create a little function that may help you:
function findQuery(str) {
$("body").children().each(function() {
if ( $.trim($(this).text()) == $.trim(str) ) {
alert("." + $(this).attr("class"))
}
});
}
See working demo
I am not sure what you're actually trying to achieve, but, based on your specific question, you could do the following.
var queryStr = findQuery($('<li class="info"> hello </li>'), "hello"); // queryStr = '.info'
// OR
var queryStr = findQuery('<li class="info"> hello </li>', "hello"); // queryStr = '.info'
alert (queryStr); // returns a string of class names without the DOT. You may add DOT(s) if need be.
function findQuery(html, str) {
var $html = html instanceof jQuery && html || $(html);
var content = $html.text();
if ( content.match( new RegExp(str, "gi") ) ) {
return $html.attr("class");
}
return "no matching string found!";
}
Hope this demo helps you!!
$(document).ready(function() {
var test = $("li:contains('hello')").attr('class');
alert(test);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<ul>
<li class="info">hello</li>
</ul>
Have used the jQuery attribute ":contains".

What's the right way to decode a string that has special HTML entities in it? [duplicate]

This question already has answers here:
Unescape HTML entities in JavaScript?
(33 answers)
Closed 5 years ago.
Say I get some JSON back from a service request that looks like this:
{
"message": "We're unable to complete your request at this time."
}
I'm not sure why that apostraphe is encoded like that ('); all I know is that I want to decode it.
Here's one approach using jQuery that popped into my head:
function decodeHtml(html) {
return $('<div>').html(html).text();
}
That seems (very) hacky, though. What's a better way? Is there a "right" way?
This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.
function decodeHtml(html) {
var txt = document.createElement("textarea");
txt.innerHTML = html;
return txt.value;
}
Example: http://jsfiddle.net/k65s3/
Input:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Output:
Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Don’t use the DOM to do this if you care about legacy compatibility. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results on non-modern browsers.
For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.
Here’s how you’d use it:
he.decode("We're unable to complete your request at this time.");
→ "We're unable to complete your request at this time."
Disclaimer: I'm the author of the he library.
See this Stack Overflow answer for some more info.
If you don't want to use html/dom, you could use regex. I haven't tested this; but something along the lines of:
function parseHtmlEntities(str) {
return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
var num = parseInt(numStr, 10); // read num as normal number
return String.fromCharCode(num);
});
}
[Edit]
Note: this would only work for numeric html-entities, and not stuff like &oring;.
[Edit 2]
Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/
There's JS function to deal with &#xxxx styled entities:
function at GitHub
// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
return str.replace(/&#(\d+);/g, function(match, dec) {
return String.fromCharCode(dec);
});
};
var encodeHtmlEntity = function(str) {
var buf = [];
for (var i=str.length-1;i>=0;i--) {
buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
}
return buf.join('');
};
var entity = '高级程序设计';
var str = '高级程序设计';
let element = document.getElementById("testFunct");
element.innerHTML = (decodeHtmlEntity(entity));
console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true
<div><span id="testFunct"></span></div>
jQuery will encode and decode for you.
function htmlDecode(value) {
return $("<textarea/>").html(value).text();
}
function htmlEncode(value) {
return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
$("#encoded")
.text(htmlEncode("<img src onerror='alert(0)'>"));
$("#decoded")
.text(htmlDecode("<img src onerror='alert(0)'>"));
});
</script>
<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>
_.unescape does what you're looking for
https://lodash.com/docs/#unescape
This is so good answer. You can use this with angular like this:
moduleDefinitions.filter('sanitize', ['$sce', function($sce) {
return function(htmlCode) {
var txt = document.createElement("textarea");
txt.innerHTML = htmlCode;
return $sce.trustAsHtml(txt.value);
}
}]);

Markdown And Escaping Javascript Line Breaks

I am writing a markdown compiler in Erlang for server-side use. Because it will need to work with clients I have decided to adopt the client side library (showdown.js) as the specification and then test my code for compatibility with that.
In the first couple of iterations I built up 260-odd unit tests which checked that my programme produced output which was compatible with what I thought was valid markdown based on reading the syntax notes.
But now I am trying to write a javascript programme to generate my unit tests.
I have an input like:
"3 > 4\na"
I want to run 'showdown' on it to get:
"<p>3 > 4\na</p>"
and I want to stitch this up into an EUnit assertion:
"?_assert(conv(\"3 > 4\na\") == \"<p>3 > 4\na</p>\"),",
which is the valid Erlang syntax for the unit test. To make life easy, and to make the unit test generator portable I am doing it inside a web page so that by appending some lines to a javascript file and then view the page you get the new set of unit tests inside a <textarea /> which you then copy down into the module to run EUnit.
The problem is that I can't get the line breaks to convert to \n in the string so I end up with:
"?_assert(conv(\"3 > 4
a\") == \"<p>3 > 4
a</p>\"),",
I've tried converting the linefeeds to their escaped versions using code like:
text.replace("\\", "\\\\");
text.replace("\n", "\\n");
but no joy...
Tom McNulty helped me out and pointed out that my regex's were super-pants, in particular I wasn't using the global flag :(
The working code is:
var converter;
var text = "";
var item;
var input;
var output;
var head;
var tail;
converter = new Attacklab.showdown.converter();
item = document.getElementById("tests");
for (var test in tests) {
input = tests[test].replace(/[\n\r]+/gi,"\\n" );
input = input.replace(/[\"]+/gi, "\\\"");
output = converter.makeHtml(tests[test]).replace(/[\n\r]+/gi, "\\n");
output = output.replace(/[\"]+/gi, "\\\"");
text += " ?_assert(conv(\"" + input + "\") == \"" + output + "\"),\n";
};
head = "unit_test_() -> \n [\n";
tail = "\n ].";
text = text.slice(0, text.length - 2);
item.value = head + text + tail;

Categories

Resources