Arabic Text issue with PDFKit plugin

Arabic Text issue with PDFKit plugin - javascript

To generate dynamic PDF files, I'm using PDFKit.
The generation works fine, but I'm having trouble displaying arabic characters, even after installing an arabic font.
Also, Arabic text is generated correctly, but I believe the word order is incorrect.
As an example,
I'm currently using pdfkit: "0.11.0"
Text: مرحبا كيف حالك ( Hello how are you )
Font: Amiri-Regular.ttf
const PDFDocument = require("pdfkit");
var doc = new PDFDocument({
size: [595.28, 841.89],
margins: {
top: 0,
bottom: 0,
left: 0,
right: 0,
},
});
const customFont = fs.readFileSync(`${_tmp}/pdf/Amiri-Regular.ttf`);
doc.registerFont(`Amiri-Regular`, customFont);
doc.fontSize(15);
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك");
doc.pipe(fs.createWriteStream(`${_tmp}/pdf/arabic.pdf`));
doc.end();
OUTPUT:
PDF with arabic text

this problem allowed me to go through here, but unfortunately I am not convinced by the answers posted and even add a library to change the direction of the text with pdfkit.
after several minutes on the pdfkit guide docs, here is the solution:
doc.text("مرحبا كيف حالك", {features: ['rtla']})

You are right the order of the Arabic words are wrong and you habe to set-up the direction of the sentence
try to use this
doc.rtl(true);
or This as a configuration for single line or text
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك", {rtl: true});

Answer adapted from the info here:
install the package: npm install twitter_cldr
Run this function to generate the text:
const TwitterCldr = TwitterCldrLoader.load("en");
private maybeRtlize(text: string) {
if (this.isHebrew(text)) {
var bidiText = TwitterCldr.Bidi.from_string(text, { direction: "RTL" });
bidiText.reorder_visually();
return bidiText.toString();
} else {
return text;
}
}
Value = maybeRtlize("مرحبا كيف حالك")
doc.font(`Amiri-Regular`).fillColor("black").text(Value);
Another method that's also possible is to reverse the text (using something such as text.split(' ').reverse().join(' ');, however while this will work for simple arabic text, it will start having issues the moment you introduce English-numericals for example. so the first method is recommended.

I would suggest you do one of the following depending on your needs
1 ) if you have a low number of doc.text functions used to generate the document you can add {features: ['rtla']} as second parameter to the function as follows:
doc.text('تحية طيبة وبعد', { features: ['rtla'] });
2 ) if you have many calls to doc.text instead of adding {features: ['rtla']} as a parameter to each call, you can reverse all you text before hand by iterating on your data object and reversing the word order as follows:
let str = "السلام عليكم ورحمة الله وبركاته";
str = str.split(' ').reverse().join(' ');
doc.text(str);

Related

Array of unicode char to string : strange behavior

I try to reduce the size of my Fontawesome icon fonts.
First I parse my css file for retrieving all the unicode codes
with
const numericValue = parseInt(unicodeHex, 16)
const character = String.fromCharCode(numericValue)
glyphList.push(character)
Next I generate a string with
const glyphListStr = glyphlist.join(' ')
console.log(glyphListStr)
It gives       (tested it contains \uF5EC \uE00F \uF4CD \uE022 \uE2CE \uF3CA ) it is OK
My strange behavior.
Working code
const fontmin = new Fontmin()
.use(Fontmin.glyph({
text: '     ' ,//glyphListStr
hinting: false
}))
But when I use the variable it fails:
What I make wrong?
const fontmin = new Fontmin()
.use(Fontmin.glyph({
text: glyphListStr, //'     '
hinting: false
}))

After many hours I decided to upgrade my typescript to 4.8.2
and I cannot understand why but now it is working !
It was probably a wrong type somewhere.

how to convert a decimal unicode into string using Javascript/Node

There are some sentences in my database ( Arabic ) has decimal uni codes for quotations mark and some other elements like it.
an example of a text I have:
"كريم نجار: تداعيات “كورونا” ستغير مستقبل سوق السيارات العالمية وقد تشهد السوق المحلية إرتفاعاً في الأسعار"
I searched on how to decode something like this in NodeJS but I didn't find anything useful, for example, I have tried the unescape package but didn't work for me.

A possible simple solution is to push your string into an HTML text area and read the output back. This will not work in Node.
<script>
function decodeEntity(inStr) {
var textarea = document.createElement("textarea");
textarea.innerHTML = inStr;
return textarea.value;
}
let str = "كريم نجار: تداعيات “كورونا” ستغير مستقبل سوق السيارات العالمية وقد تشهد السوق المحلية إرتفاعاً في الأسعار"
console.log(decodeEntity(str));
</script>

Why aren't there line breaks in this <pre> tag?

I'm using highlight.js to display some JSON I'm receiving from a pubnub subscription. It is coloring the text but it is not adding line breaks as expected (via their demos). Also, a couple places in the documentation give the impression that the library generates new lines. See the useBR option here.
Here is my current code (I've tried a few different things):
pubnub.subscribe({
channel : 'TEST',
message : function(m){
console.log(m);
var hlt = hljs.highlight('json',m);
$('#jsonOutput').html("<pre>" + hlt.value + "</pre>");
}
});
And here is what the DOM looks like:
But here is the output:
How can I get line breaks? I want it to look similar to this:
{
"id":"TESTWIDGET1",
"value":371,
"timestamp":"2016-08-31T11:39:57.8733485-05:00"
}
fiddle: https://jsfiddle.net/vgfnod58/

You don't have any line-breaks in your code. The highlight function will only apply the formatting options, when the json-string was formatted. You string is only one single line. So, you will have to bring it in the right format first and then you can highlight it:
function print_r(object,html){
if(html) return '<pre>' + JSON.stringify(object, null, 4) + '</pre>';
else return JSON.stringify(object, null, 4);
}
var m = {"id":"TESTWIDGET1","value":351,"timestamp":"2016-08-31T12:03:24.3403952-05:00"};
var hlt = hljs.highlight('json',print_r(m));
$('#codehere').html(hlt.value);
Please be aware that I changed the var m from string to object (just remove the sourrunding ').
A working fiddle: https://jsfiddle.net/WalterIT/vgfnod58/2/

You should be able to substitute using <div> element with css white-space set to pre for <pre> element
Edit, Updated
Alternative approach inserting non-breaking space and newline characters before and after highlighted <span> elements
var m = '{"id":"TESTWIDGET1","value":351,"timestamp":"2016-08-31T12:03:24.3403952-05:00"}';
// hljs.configure({useBR: true});
var hlt = hljs.highlight('json',m);
$('#codehere').html(hlt.value)
$('#codehere span').each(function(i) {
if (i % 2 === 0)
$(this).before("\n ");
if (i === $('#codehere span').length -1)
$(this).after("\n")
});
jsfiddle https://jsfiddle.net/vgfnod58/3/

What's the best way to mask a credit card in JavaScript?

In Node, I need to turn a credit card into something like this before rendering the view layer: ************1234.
Without loops and ugliness is there a utility or one liner for this? The credit card can potentially look one of these ways:
1234567898765432
1234-5678-9876-5432
1234 5678 9876 5432

Here's one way with Ramda and some RegEx:
var ensureOnlyNumbers = R.replace(/[^0-9]+/g, '');
var maskAllButLastFour = R.replace(/[0-9](?=([0-9]{4}))/g, '*');
var hashedCardNumber = R.compose(maskAllButLastFour, ensureOnlyNumbers);
hashedCardNumber('1234567898765432'); // ************5432
Demo : http://jsfiddle.net/7odv6kfk/

No need for a regex:
var cc='1234-5678-9012-3456';
var masked = '************'+cc.substr(-4); // ************3456
Will work for any format provided the last four digits are contiguous.

This is for everyone who said they didn't need another way to mask a credit card. This solution will append the last 4 chars of the card number with asterisk.
var cardNumber = '4761640026883566';
console.log(maskCard(cardNumber));
function maskCard(num) {
return `${'*'.repeat(num.length - 4)}${cardNumber.substr(num.length - 4)}`;
}
jsfiddle example

I use this function that is useful for me, because mask the credit card number and format it in blocks of four characters like this **** **** **** 1234, here the solution:
const maskCreditCard = (card) => {
return card
.replace(/.(?=.{5})/g, "*")
.match(/.{1,4}/g)
.join(" ");
};

Here's plain JavaScript using Regex with lookahead
var cardNumbers = [
"1234567898765432",
"1234-5678-9876-5432",
"1234 5678 9876 5432"
];
console.log(cardNumbers.map(maskCardNumber));
//> ["************5432", "************5432", "************5432"]
function maskCardNumber(cardNumber) {
return cardNumber.replace(/^[\d-\s]+(?=\d{4})/, "************");
};
Unlike AllienWebguy's implementation:
doesn't require an external library
does everything in one replace() call
replaces whatever number of digits with the constant number of asterisks (it should be a bit faster, but it may not be what you want)
supports only described formats (will not work, for example, with "1B2C3D4E5F6G7H89876-5432" or "1234+5678+9876=54-32")

Remove non digits, generate an asterisk string of that length - 4, append the last 4:
var masked = Array(cc.replace(/[^\d]/g, "").length - 3).join("*") + cc.substr(cc.length - 4);
Or to include space/hyphens in the mask:
var masked = Array(cc.length - 3).join("*") + cc.substr(cc.length - 4);

Extract text from HTML with Javascript

I would like to extract text from HTML with pure Javascript (this is for a Chrome extension).
Specifically, I would like to be able to find text on a page and extract text after it.
Even more specifically, on a page like
https://picasaweb.google.com/kevin.smilak/BestOfAmericaSGrandCircle#4974033581081755666
I would like to find text "Latitude" and extract the value that goes after it. HTML there is not in a very structured form.
What is an elegant solution to do it?

There is no elegant solution in my opinion because as you said HTML is not structured and the words "Latitude" and "Longitude" depends on page localization.
Best I can think of is relying on the cardinal points, which might not change...
var data = document.getElementById("lhid_tray").innerHTML;
var lat = data.match(/((\d)*\.(\d)*)°(\s*)(N|S)/)[1];
var lon = data.match(/((\d)*\.(\d)*)°(\s*)(E|W)/)[1];

you could do
var str = document.getElementsByClassName("gphoto-exifbox-exif-field")[4].innerHTML;
var latPos = str.indexOf('Latitude')
lat = str.substring(str.indexOf('<em>',latPos)+4,str.indexOf('</em>',latPos))

The text you're interested in is found inside of a div with class gphoto-exifbox-exif-field. Since this is for a Chrome extension, we have document.querySelectorAll which makes selecting that element easy:
var div = document.querySelectorAll('div.gphoto-exifbox-exif-field')[4],
text = div.innerText;
/* text looks like:
"Filename: img_3474.jpg
Camera: Canon
Model: Canon EOS DIGITAL REBEL
ISO: 800
Exposure: 1/60 sec
Aperture: 5.0
Focal Length: 18mm
Flash Used: No
Latitude: 36.872068° N
Longitude: 111.387291° W"
*/
It's easy to get what you want now:
var lng = text.split('Longitude:')[1].trim(); // "111.387291° W"
I used trim() instead of split('Longitude: ') since that's not actually a space character in the innerText (URL-encoded, it's %C2%A0 ...no time to figure out what that maps to, sorry).

I would query the DOM and just collect the image information into an object, so you can reference any property you want.
E.g.
function getImageData() {
var props = {};
Array.prototype.forEach.apply(
document.querySelectorAll('.gphoto-exifbox-exif-field > em'),
[function (prop) {
props[prop.previousSibling.nodeValue.replace(/[\s:]+/g, '')] = prop.textContent;
}]
);
return props;
}
var data = getImageData();
console.log(data.Latitude); // 36.872068° N

Well if a more general answer is required for other sites then you can try something like:
var text = document.body.innerHTML;
text = text.replace(/(<([^>]+)>)/ig,""); //strip out all HTML tags
var latArray = text.match(/Latitude:?\s*[^0-9]*[0-9]*\.?[0-9]*\s*°\s*[NS]/gim);
//search for and return an array of all found results for:
//"latitude", one or 0 ":", white space, A number, white space, 1 or 0 "°", white space, N or S
//(ignores case)(ignores multi-line)(global)
For that example an array of 1 element containing "Latitude: 36.872068° N" is returned (which should be easy to parse).

Develop Reference

JavaScript is the programming language of the Web.

Arabic Text issue with PDFKit plugin - javascript

You are right the order of the Arabic words are wrong and you habe to set-up the direction of the sentence try to use this doc.rtl(true); or This as a configuration for single line or text doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك", {rtl: true});

Related

Array of unicode char to string : strange behavior

how to convert a decimal unicode into string using Javascript/Node

Why aren't there line breaks in this <pre> tag?

What's the best way to mask a credit card in JavaScript?

Extract text from HTML with Javascript

Categories

Resources