Combining values from two arrays - javascript

I am working on an Express app and have an issue trying to match up the values of two arrays
I have a user-entered string which which come through to me from a form (e.g.let analyseStory = req.body.storyText). This string contains line breaks as \r\n\.
An example of string is
In the mens reserve race, Cambridges Goldie were beaten by Oxfords
Isis, their seventh consecutive defeat. \r\n\r\nIn the womens reserve
race, Cambridges Blondie defeated Oxfords Osiris
However before I print this to the browser the string is run through a text analysis library called pos e.g.
const tagger = new pos.Tagger();
res.locals.taggedWords = tagger.tag(analyseStory);
This returns to me an array of words in the string and their grammatical type
[ [ 'In', 'Noun, sing. or mass' ],
[ 'the', 'Determiner' ],
[ 'mens', 'Noun, plural' ],
[ 'reserve', 'Noun, sing. or mass' ],
[ 'race', 'Noun, sing. or mass' ],
[ ',', 'Comma' ],
[ 'Cambridges', 'Noun, plural' ],
[ 'Goldie', 'Proper noun, sing.' ],
[ 'were', 'verb, past tense' ],
[ 'beaten', 'verb, past part' ],
[ 'by', 'Preposition' ],
[ 'Oxfords', 'Noun, plural' ],
....
]
Currently when I print this user-entered text to the screen I loop through the array and print out the key and then wrap that in a class containing the value. This gives a result like:
<span class="noun-sing-or-mass">In</span>
<span class="determiner">the</span>
<span class="noun-plural">mens</span>
so that I can style them.
This all works fine but the problem is that I lose my line breaks in the process. I'm really not sure how to solve this problem but I was thinking that perhaps I could do this on the client side if I break the initial string I get (analyseStory) into an array (where commas, full stops are array items as they are in the above) and then apply the grammatical type supplied in res.locals.taggedWords to the array generated from analyseStory string. However I'm not sure how to do this or even if it is the right solution to the problem.
FWIW if I print analyseStory to the screen without pushng it through text analysis I handle line breaks by wrapping the string in <span style="white-space: pre-line">User entered string</span> rather than converting to <br />.
Any help much appreciated.

This solution uses ES6 Map, and String.replace() with a RegExp to find all words in the analysis, and replace them with a span that has the relevant class name.
You can see in the demo that it preserves the line breaks. Inspect the elements to see the spans with the classes.
const str = 'In the mens reserve race, Cambridges Goldie were beaten by Oxfords Isis, their seventh consecutive defeat. \r\n\r\nIn the womens reserve race, Cambridges Blondie defeated Oxfords Osiris';
const analyzed = [["In","Noun, sing. or mass"],["the","Determiner"],["mens","Noun, plural"],["reserve","Noun, sing. or mass"],["race","Noun, sing. or mass"],[",","Comma"],["Cambridges","Noun, plural"],["Goldie","Proper noun, sing."],["were","verb, past tense"],["beaten","verb, past part"],["by","Preposition"],["Oxfords","Noun, plural"]];
// create Map from the analyzed array. Use Array.map() to change all keys to lower case, and prepare the class name
const analyzedMap = new Map(analyzed.map(([k, v]) =>
[k.toLowerCase(), v.trim().toLowerCase().replace(/\W+/g, '-')]));
// search for a sequence word characters or special characters such as comman and period
const result = str.replace(/(:?\w+|,|.)/gi, (m) => {
// get the class name from the Map
const className = analyzedMap.get(m.toLowerCase());
// if there is a class name return the word/character wrapped with a span
if(className) return `<span class="${className}">${m}</span>`;
// return the word
return m;
});
demo.innerHTML = result;
#demo {
white-space: pre-line;
}
<div id="demo"></div>

<span> is not a block level element. By default it will not line break. You need to either make it block level with css or wrap your text in something that is block level like a <p> tag.
CSS To Make Block
span { display: block; }

You can pre-process text before analyzing text and replace line breaks with some special characters. Something like the following:
const story_with_br = analyseStory.replace(/\n/g, "__br__");
const tagger = new pos.Tagger();
res.locals.taggedWords = tagger.tag(story_with_br);
Hopefully, taggedWords array will contain "__br__" and if it does then while rendering you can add line breaks instead of "__br__"

What you can do is :
Option 1
Edit the library you're using so that it doesn't ignore your \r\n
Option 2
Define a complex key which will define the newlines :
const newlinesKey = 'yourkeyvalue';
Then you replace all newlines by your newlinesKey :
analyseStory.replace(/\r\n/g, newlinesKey);
And after that you can call the text analysis library :
const tagger = new pos.Tagger();
res.locals.taggedWords = tagger.tag(analyseStory);
Like this you would be able to detect when you have to put a new line if the tagger doesn't ignore the keyValue.

Related

Arabic Text issue with PDFKit plugin

To generate dynamic PDF files, I'm using PDFKit.
The generation works fine, but I'm having trouble displaying arabic characters, even after installing an arabic font.
Also, Arabic text is generated correctly, but I believe the word order is incorrect.
As an example,
I'm currently using pdfkit: "0.11.0"
Text: مرحبا كيف حالك ( Hello how are you )
Font: Amiri-Regular.ttf
const PDFDocument = require("pdfkit");
var doc = new PDFDocument({
size: [595.28, 841.89],
margins: {
top: 0,
bottom: 0,
left: 0,
right: 0,
},
});
const customFont = fs.readFileSync(`${_tmp}/pdf/Amiri-Regular.ttf`);
doc.registerFont(`Amiri-Regular`, customFont);
doc.fontSize(15);
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك");
doc.pipe(fs.createWriteStream(`${_tmp}/pdf/arabic.pdf`));
doc.end();
OUTPUT:
PDF with arabic text
this problem allowed me to go through here, but unfortunately I am not convinced by the answers posted and even add a library to change the direction of the text with pdfkit.
after several minutes on the pdfkit guide docs, here is the solution:
doc.text("مرحبا كيف حالك", {features: ['rtla']})
You are right the order of the Arabic words are wrong and you habe to set-up the direction of the sentence
try to use this
doc.rtl(true);
or This as a configuration for single line or text
doc.font(`Amiri-Regular`).fillColor("black").text("مرحبا كيف حالك", {rtl: true});
Answer adapted from the info here:
install the package: npm install twitter_cldr
Run this function to generate the text:
const TwitterCldr = TwitterCldrLoader.load("en");
private maybeRtlize(text: string) {
if (this.isHebrew(text)) {
var bidiText = TwitterCldr.Bidi.from_string(text, { direction: "RTL" });
bidiText.reorder_visually();
return bidiText.toString();
} else {
return text;
}
}
Value = maybeRtlize("مرحبا كيف حالك")
doc.font(`Amiri-Regular`).fillColor("black").text(Value);
Another method that's also possible is to reverse the text (using something such as text.split(' ').reverse().join(' ');, however while this will work for simple arabic text, it will start having issues the moment you introduce English-numericals for example. so the first method is recommended.
I would suggest you do one of the following depending on your needs
1 ) if you have a low number of doc.text functions used to generate the document you can add {features: ['rtla']} as second parameter to the function as follows:
doc.text('تحية طيبة وبعد', { features: ['rtla'] });
2 ) if you have many calls to doc.text instead of adding {features: ['rtla']} as a parameter to each call, you can reverse all you text before hand by iterating on your data object and reversing the word order as follows:
let str = "السلام عليكم ورحمة الله وبركاته";
str = str.split(' ').reverse().join(' ');
doc.text(str);

Build array from text file (weird format)

I have a text file that looks like this:
name
birthday
text
other
name2
birthday2
text2
other2
that goes over 10000 lines.
I want to turn that into a javascript array that looks like this:
[[name,birthday,text,other],[name2,birthday2,text2,other2], ...]
There are 4 lines in between each 2 groups (between "other" and "name2"). It would take me hours to do it manually.
The readfile functions I found for javascript while searching all deal with line by line formats and none have group formatting functions like that.
You can read the text file and parse it accordingly depending the number of lines you expect per group:
const fs = require('fs')
fs.readFile('test.txt', 'utf-8', (err, data) => {
let rows = data.split('\n\n\n\n').map(row => row.split('\n'))
console.log(rows)
})
The first split of '\n\n\n\n' is for the 4 lines separation.
This will print:
[ ['name', 'birthday', 'text', 'other'], ['name2', 'birthday2', 'text2', 'other2'] ]
#dandavis 's answer works perfectly.
Load the entire file into one string then do that:
var myarray = str.split(/\n{3,}/).map(x=>x.trim().split(/\n/));

How to count occurrence of multiple sub-string in a long string with JavaScript

I am a fresh with JavaScript. I just tried a lot, but did not get the answer and information to show how to count occurrence of multiple sub-string in a long string at one time.
Further information: I need get the occurrence of these sub-string and if the number of their occurrence to much, I need replace them at one time,so I need get the occurrence at one time.
Here is an example:
The long string Text as below,
Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
The sub-string is a question, but what I need is to count each word occurrence in this sub-string at one time. for example, the word "name","NFL","championship","game" and "is","the" in this string.
What is the name of NFL championship game?
One of problems is some sub-string is not in the text, and some have shown many times.(which I might replaced it)
The Code I have tried as below, it is wrong, I have tried many different ways but no good results.
$(".showMoreFeatures").click(function(){
var text= $(".article p").text(); // This is to get the text.
var textCount = new Array();
// Because I use match, so for the first word "what", will return null, so
this is to avoid this null. and I was plan to get the count number, if it is
more than 7 or even more, I will replace them.
var qus = item2.question; //This is to get the sub-string
var checkQus = qus.split(" "); // I split the question to words
var newCheckQus = new Array();
// This is the array I was plan put the sub-string which count number less than 7, which I really needed words.
var count = new Array();
// Because it is a question as sub-string and have many words, so I wan plan to get their number and put them in a array.
for(var k =0; k < checkQus.length; k++){
textCount = text.match(checkQus[k],"g")
if(textCount == null){
continue;
}
for(var j =0; j<checkQus.length;j++){
count[j] = textCount.length;
}
//count++;
}
I was tried many different ways, and searched a lot, but no good results. The above code just want to show what I have tried and my thinking(might totally wrong). But actually it is not working , if you know how to implement it,solve my problem, please just tell me, no need to correct my code.
Thanks very much.
If I have understood the question correctly then it seems you need to count the number of times the words in the question (que) appear in the text (txt)...
var txt = "Super Bowl 50 was an American ...etc... Arabic numerals 50.";
var que = "What is the name of NFL championship game?";
I'll go through this in vanilla JavaScript and you can transpose it for JQuery as required.
First of all, to focus on the text we can make things a little simpler by changing the strings to lowercase and removing some of the punctuation.
// both strings to lowercase
txt = txt.toLowerCase();
que = que.toLowerCase();
// remove punctuation
// using double \\ for proper regular expression syntax
var puncArray = ["\\,", "\\.", "\\(", "\\)", "\\!", "\\?"];
puncArray.forEach(function(P) {
// create a regular expresion from each punctuation 'P'
var rEx = new RegExp( P, "g");
// replace every 'P' with empty string (nothing)
txt = txt.replace(rEx, '');
que = que.replace(rEx, '');
});
Now we can create a cleaner array from str and que as well as a hash table from que like so...
// Arrays: split at every space
var txtArray = txt.split(" ");
var queArray = que.split(" ");
// Object, for storing 'que' counts
var queObject = {};
queArray.forEach(function(S) {
// create 'queObject' keys from 'queArray'
// and set value to zero (0)
queObject[S] = 0;
});
queObject will be used to hold the words counted. If you were to console.debug(queObject) at this point it would look something like this...
console.debug(queObject);
/* =>
queObject = {
what: 0,
is: 0,
the: 0,
name: 0,
of: 0,
nfl: 0,
championship: 0,
game: 0
}
*/
Now we want to test each element in txtArray to see if it contains any of the elements in queArray. If the test is true we'll add +1 to the equivalent queObject property, like this...
// go through each element in 'queArray'
queArray.forEach(function(A) {
// create regular expression for testing
var rEx = new RegExp( A );
// test 'rEx' against elements in 'txtArray'
txtArray.forEach(function(B) {
// is 'A' in 'B'?
if (rEx.test(B)) {
// increase 'queObject' property 'A' by 1.
queObject[A]++;
}
});
});
We use RegExp test method here rather than String match method because we just want to know if "is A in B == true". If it is true then we increase the corresponding queObject property by 1. This method will also find words inside words, such as 'is' in 'San Francisco' etc.
All being well, logging queObject to the console will show you how many times each word in the question appeared in the text.
console.debug(queObject);
/* =>
queObject = {
what: 0
is: 2
the: 17
name: 0
of: 2
nfl: 1
championship: 0
game: 4
}
*/
Hoped that helped. :)
See MDN for more information on:
Array.forEach()
Object.keys()
RegExp.test()

Parsing javascript array with multiple keys

Hi I need to parse a JavaScript array that has multiple keys in it. Here is an example of what I need to do. Any help is appreciated.
[
week1{
Meth:100,
Marijuana:122,
pDrugs:12,
},
week2{
Meth:15,
Marijuana:30,
pDrugs:22,
},
]
I need this to be broken into separate arrays based on if it is week1 or week2. Thanks again in advance.
The end needs to be like this.
week1 = ["Meth:100,Marijuana:122,pDrugs12"] etc.
Your JSON has severe improper formatting. If it's already an object (which I'm guessing it isn't -- otherwise, you'd be getting unexpected token errors in your browser console), then change the brackets to braces, remove the trailing commas, and add colons after the object items that don't have them (after week1 and week2).
If what you have is a string (obtained from XHR or similar), you'll have to do all the changes mentioned above, as well as enclosing each object item within quotation marks. It should look like:
{
"week1": {
"Meth":100,
"Marijuana":122,
"pDrugs":12
},
"week2": {
"Meth":15,
"Marijuana":30,
"pDrugs":22
}
}
Whatever you're dealing with that's serving such horribly invalid JSON ought to be taken out back and shot. Be that as it may, this'll require some serious string manipulation. You're going to have to do some thorough massaging with String.replace() and some regular expressions.
After you get the JSON valid, then you can get week1 with JSON.parse and drilling down the resulting object.
function log(what) { document.getElementById('out').value += what + '\n------------------\n'; }
var tree = '[ week1{ Meth:100, Marijuana:122, pDrugs:12, }, week2{ Meth:15, Marijuana:30, pDrugs:22, }, ]';
// string is raw
log(tree);
tree = tree.replace(
'/\r?\n/g', '' // remove line breaks to make further regexps easier
).replace(
'[','{' // replace [ with {
).replace(
']','}' // replace ] with }
).replace(
/\w+(?=[\{\:])/g, // add quotes to object items
function($1) { return '"'+$1+'"'; } // using a lambda function
).replace(
/"\{/g, '": {' // add colon after object items
).replace(
/,(?=\s*\})/g, '' // remove trailing commas
);
// string has been fixed
log(tree);
var obj = JSON.parse(tree);
log('obj.week1 = ' + JSON.stringify(obj.week1));
log('obj.week1.Meth = ' + obj.week1.Meth);
#out {
width: 100%;
height: 170px;
}
<textarea id="out"></textarea>

title casing and Abbreviations in javascript

I am trying to Titlecase some text which contains corporate names and their stock symbols.
Example (these strings are concatenated as corporate name, which gets title cased and the symbol in parens): AT&T (T)
John Deere Inc. (DE)
These corporate names come from our database which draws them from a stock pricing service. I have it working EXCEPT for when the name is an abbreviation like AT&T
That is return, and you guessed it right, like At&t. How can I preserve casing in abbreviations. I thought to use indexof to get the position of any &'s and uppercase the two characters on either side of it but that seems hackish.
Along the lines of(pseudo code)
var indexPos = myString.indexOf("&");
var fixedString = myString.charAt(indexPos - 1).toUpperCase().charAt(indexPos + 1).toUpperCase()
Oops, forgot to include my titlecase function
function toTitleCase(str) {
return str.replace(/([^\W_]+[^\s-]*) */g, function (txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
}
Any better suggestions?
A better title case function may be
function toTitleCase(str) {
return str.replace(
/(\b.)|(.)/g,
function ($0, $1, $2) {
return ($1 && $1.toUpperCase()) || $2.toLowerCase();
}
);
}
toTitleCase("foo bAR&bAz a.e.i."); // "Foo Bar&Baz A.E.I."
This will still transform AT&T to At&T, but there's no information in the way it's written to know what to do, so finally
// specific fixes
if (str === "At&T" ) str = "AT&T";
else if (str === "Iphone") str = "iPhone";
// etc
// or
var dict = {
"At&T": "AT&T",
"Iphone": "iPhone"
};
str = dict[str] || str;
Though of course if you can do it right when you enter the data in the first place it will save you a lot of trouble
This is a general solution for title case, without taking your extra requirements of "abbreviations" into account:
var fixedString = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
Although I agree with other posters that it's better to start with the data in the correct format in the first place. Not all proper names conform to title case, with just a couple examples being "Werner von Braun" and "Ronald McDonald." There's really no algorithm you can program into a computer to handle the often arbitrary capitalization of proper names, just like you can't really program a computer to spell check proper names.
However, you can certainly program in some exception cases, although I'm still not sure that simply assuming that any word with an ampersand in it should be in all caps always appropriate either. But that can be accomplished like so:
var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var fixedString = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);
Note that your second example of "John Deere Inc. (DE)" still isn't handled properly, though. I suppose you could add some other logic to say, put anything word between parentheses in all caps, like so:
var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var titleCaseCapAmps = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);
var fixedString = titleCaseCapAmps.replace(/\(.*\)/g, String.toUpperCase);
Which will at least handle your two examples correctly.
How about this: Since the number of registered companies with the stock exchange is finite, and there's a well-defined mapping between stock symbols and company names, your best best is probably to program that mapping into your code, to look up the company name by the ticker abbreviation, something like this:
var TickerToName =
{
A: "Agilent Technologies",
AA: "Alcoa Inc.",
// etc., etc.
}
Then it's just a simple lookup to get the company name from the ticker symbol:
var symbol = "T";
var CompanyName = TickerToName[symbol] || "Unknown ticker symbol: " + symbol;
Of course, I would be very surprised if there was not already some kind of Web Service you could call to get back a company name from a stock ticker symbol, something like in this thread:
Stock ticker symbol lookup API
Or maybe there's some functionality like this in the stock pricing service you're using to get the data in the first place.
The last time I faced this situation, I decided that it was less trouble to simply include the few exceptions here and there as need.
var titleCaseFix = {
"At&t": "AT&T"
}
var fixit(str) {
foreach (var oldCase in titleCaseFix) {
var newCase = titleCaseFix[oldCase];
// Look here for various string replace options:
// http://stackoverflow.com/questions/542232/in-javascript-how-can-i-perform-a-global-replace-on-string-with-a-variable-insi
}
return str;
}

Categories

Resources