Truncate Text with Pattern - javascript

I want to truncate text in a pattern, this is a function to highlight text from an array containing matched indexes and text, but I want to truncate the text which doesn't include the part with match, see code below
const highlight = (matchData, text) => {
var result = [];
var matches = [].concat(matchData);
var pair = matches.shift();
for (var i = 0; i < text.length; i++) {
var char = text.charAt(i);
if (pair && i == pair[0]) {
result.push("<u>");
}
result.push(char);
if (pair && i == pair[1]) {
result.push("</u>");
truncatedIndex = i;
pair = matches.shift();
}
}
return result.join("");
};
console.log(
highlight(
[[23, 29], [69, 74]],
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
)
);
// This returns the highlighted HTML - Result will be => "Some text that doesn't <u>include</u> the main thing, the main thing is the <u>result</u>, you may know I meant that"
But this returns whole text, I want to truncate other texts in the range, I want to truncate other text but not in range of 20 characters before and after the result so the text can be clean as well as understandable. Like
"... text that doesn't <u>include</u> the main thing ... the <u>result</u> you may know I ..."
I can't find out a way to make that. Help is appreciated.

I've modified your function considerably, to make it easier to understand, and so that it works...
Instead of using an array of arrays, which I find cumbersome to deal with, I modified to use an array of objects. The objects are simple:
{
start: 23,
end: 30
}
Basically, it just adds names to the indices you had previously.
The code should be relatively easy to follow. Here's a line-by-line explanation:
Armed with the new structure, you can use a simple substring command to snip the appropriate piece of text.
Since we're in a loop, and I don't want two sets of ellipses between matches, I check to see if we're on the first pass through and only add an ellipses before the match on the first pass.
The text before the piece we've snipped is the 20 characters before the start of the match, or the number of characters to the beginning of the string. Math.max() provides a easy way of getting the highest index available.
The text after the piece we've snippet is the 20 characters after the end of the match, or the number of characters to the end of the string. Math.min() provides a easy way of getting the lowest index available.
Concatentating them together, we get the match's new text. I'm using template literals to make that easier to read than a bunch of + " " + and whatnot.
const highlight = (matches, text) => {
let newText = '';
matches.forEach((match) => {
const piece = text.substring(match.start, match.end);
const preEllipses = newText.length === 0 ? '... ' : '';
const textBefore = text.substring(Math.max(0, match.start - 20), match.start);
const textAfter = text.substring(match.end, Math.min(text.length - 1, match.end + 20));
newText += `${preEllipses}${textBefore}<u>${piece}</u>${textAfter} ... `;
});
return newText.trim();
}
// Sample Usage
const result = highlight([{ start: 23, end: 30 }, { start: 69, end: 75 }], "Some text that doesn't include the main thing, the main thing is the result, you may know I meant that");
console.log(result);
document.getElementById("output").innerHTML = result;
// Result will be => "... e text that doesn't <u>include</u> the main thing, the ... e main thing is the <u>result</u>, you may know I mea ..."
<div id="output"></div>
Note that I am using simple string concatenation here, rather than putting parts into an array and using join. Modern JavaScript engines optimize string concatenation extremely well, to the point where it makes the most sense to just use it. See e.g., Most efficient way to concatenate strings in JavaScript?, and Dr. Axel Rauschmayer's post on 2ality.

Note
There's an update below that I think shows a better version of this same idea. But this is where it started.
Original Version
Here's another attempt, building a more flexible solution out of reusable parts.
const intoPairs = (xs) =>
xs .slice (1) .map ((x, i) => [xs[i], x])
const splitAtIndices = (indices, str) =>
intoPairs (indices) .map (([a, b]) => str .slice (a, b))
const alternate = Object.assign((f, g) => (xs, {START, MIDDLE, END} = alternate) =>
xs .map (
(x, i, a, pos = i == 0 ? START : i == a.length - 1 ? END : MIDDLE) =>
i % 2 == 0 ? f (x, pos) : g (x, pos)
),
{START: {}, MIDDLE: {}, END: {}}
)
const wrap = (before, after) => (s) => `${before}${s}${after}`
const truncate = (count) => (s, pos) =>
pos == alternate.START
? s .length <= count ? s : '... ' + s .slice (-count)
: pos == alternate.END
? s .length <= count ? s : s .slice (0, count) + ' ...'
: // alternate.MIDDLE
s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)
const highlighter = (f, g) => (ranges, str, flip = ranges[0][0] == 0) =>
alternate (flip ? g : f, flip ? f : g) (
splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
) .join ('')
const highlight = highlighter (truncate (20), wrap('<u>', '</u>'))
#output {padding: 0 1em;} #input {padding: .5em 1em 0;} textarea {width: 50%; height: 3em;} button, input {vertical-align: top; margin-left: 1em;}
<div id="input"> <textarea id="string">Some text that doesn't include the main thing, the main thing is the result, you may know I meant that</textarea> <input type="text" id="indices" value="[23, 30], [69, 75]"/> <button id="run">Highlight</button></div><h4>Output</h4><div id="output"></div> <script>document.getElementById('run').onclick = (evt) => { const str = document.getElementById('string').value; const idxString = document.getElementById('indices').value; const idxs = JSON.parse(`[${idxString}]`); const result = highlight(idxs, str); console.clear(); document.getElementById('output').innerHTML = ''; setTimeout(() => { console.log(result); document.getElementById('output').innerHTML = result; }, 300)}</script>
This involves the helper functions intoPairs, splitAtIndices alternate, wrap and truncate. I think they are best show by examples:
intoPairs (['a', 'b', 'c', 'd']) //=> [['a', 'b'], ['b', 'c'], ['c', 'd']]
splitAtIndices ([0, 3, 7, 15], 'abcdefghijklmno') //=> ["abc", "defg", "hijklmno"]
// ^ ^ ^ ^ `---' `----' `--------'
// | | | | | | |
// 0 3 7 15 0 - 3 4 - 7 8 - 15
alternate (f, g) ([a, b, c, d, e, ...]) //=> [f(a), g(b), f(c), g(d), f(e), ...]
wrap ('<div>', '</div>') ('foo bar baz') //=> '<div>foo bar baz</div>
//chars---+ input---+ position---+ output--+
// | | | |
// V V V V
truncate (10) ('abcdefghijklmnop', ~START~) //=> '... ghijklmnop'
truncate (10) ('abcdefghijklmnop', ~END~) //=> 'abcdefghij ...'
truncate (10) ('abcdefghijklmnop', ~MIDDLE~) //=> 'abcdefghijklmnop'
truncate (10) ('abcdefghijklmnopqrstuvwxyz', ~MIDDLE~) //=> 'abcdefghij ... qrstuvwxyz'
All of these are potentially reusable, and I personally have intoPairs and wrap in my general utility library.
truncate is the only complex one, and that is mostly because it does triple duty, handling the first string, the last string, and all the others in three distinct manners. You first supply a count and the you give a string as well as the position (START, MIDDLE, END, stored as properties of alternate.) For the first string, it includes an ellipsis (...) and the last count characters. For the last one, it includes the first count characters and an ellipsis. For the middle ones, if the length is shorter than double count, it returns the whole thing; otherwise it includes the first count characters, an ellipsis and the last count characters. This behavior might be different from what you want; if so,
The main function is highlighter. It accepts two functions. The first one is how you want to handle the non-highlighted sections. The second is for the highlighted ones. It returns the style function you were looking for, one that accepts an array of two-element arrays of numbers (the ranges) and your input string, returning a string with the highlighted ranges and the non-highlighted ranges.
We use it to generate the highlight function by passing it truncate (20) and wrap('<u>', '</u>').
The intermediate forms might make it clearer what's going on.
We start with these indices:
[[23, 30], [69, 75]]]
and our 103-character string,
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
First we flatten the ranges, prepending a zero if the first range doesn't start there and appending the last index of the string, to get this:
[0, 23, 30, 69, 75, 102]
We pass that to splitAtIndices, along with our string, to get
[
"Some text that doesn't ",
"include",
" the main thing, the main thing is the ",
"result",
", you may know I meant that"
]
Then we map the appropriate functions over each of these strings to get
[
"... e text that doesn't ",
"<u>include</u>",
" the main thing, the main thing is the ",
"<u>result</u>",
", you may know I mea ..."
]
and join those together to get our final results:
"... e text that doesn't <ul>include</ul> the main thing, the main thing is the <ul>result</ul>, you may know I mea ..."
I like the flexibility this offers. It's easy to alter the highlight strategy as well as how you handle the unhighlighted parts -- just pass a different function to highlighter. It's also a useful breakdown of the work into reusable parts.
But there are two things I don't like.
First, I'm not thrilled with the handling of middle unhighlighted sections. Of course it's easy to change; but I don't know what would be appropriate. You might, for instance, want to change the doubling applied to the count there. Or you might have an entirely different idea.
Second, truncate is dependent upon alternate. We have to somehow pass signals from alternate to the two functions supplied to it to let them know where we are. My first pass involved passing the index and the entire array (the Array.prototype.map signature) to those functions. But that felt too coupled. We could make START, MIDDLE, and END into module-local properties, but then alternate and truncate would not be reusable. I'm not going to go back and try it now, but I think a better solution might be to pass four functions to highlighter: the function for the highlighted sections, and one each for start, middle, and end positions of the non-highlighted ones.
Update
I did go ahead and try that alternative I mentioned, and I think this version is cleaner, with all the complexity located in the single function highlighter:
const intoPairs = (xs) =>
xs .slice (1) .map ((x, i) => [xs[i], x])
const splitAtIndices = (indices, str) =>
intoPairs (indices) .map (([a, b]) => str .slice (a, b))
const wrap = (before, after) => (s) => `${before}${s}${after}`
const truncateStart = (count) => (s) =>
s .length <= count ? s : '... ' + s .slice (-count)
const truncateMiddle = (count) => (s) =>
s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)
const truncateEnd = (count) => (s) =>
s .length <= count ? s : s .slice (0, count) + ' ...'
const highlighter = (highlight, start, middle, end) =>
(ranges, str, flip = ranges[0][0] == 0) =>
splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
.map (
(s, i, a) =>
(flip
? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
: (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
) (s)
) .join ('')
const highlight = highlighter (
wrap('<u>', '</u>'),
truncateStart(20),
truncateMiddle(20),
truncateEnd(20)
)
console .log (
highlight (
[[23, 30], [69, 75]],
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
)
)
console .log (
highlight (
[[23, 30], [86, 92]],
"Some text that doesn't include the main thing, because you see, the main thing is the result, you may know I meant that"
)
)
There is some real complexity built into highlighter, but I think it's fairly intrinsic to the problem. On each iteration, we have to choose one of our four functions based on the index, the length of the array, and whether the first range started at zero. This bit here simply chooses the function based on all that:
(flip
? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
: (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
)
where the flip boolean simply reports whether the first range starts at 0, a is the array of substrings to handle., and i is the current index in the array. If you see a cleaner way of choosing the function, I'd love to know.
If we wanted to write a gloss for this sort of highlighting, we could easily write
const truncatingHighlighter = (count, start, end) =>
highlighter (
wrapp(start, end),
truncateStart(count),
truncateMiddle(count),
truncateEnd(count)
)
const highlight = truncatingHighlighter (20, '<u>', '</u>')
I definitely think this is a superior solution.

Related

Using parser combinator to parse simple math expression

I'm using parsimmon to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */ has higher precedence than +-).
Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.
Thank you.
I used TypeScript:
"use strict";
// Run me with Node to see my output!
import * as P from "parsimmon";
import {Parser} from "parsimmon";
// #ts-ignore
import util from "util";
///////////////////////////////////////////////////////////////////////
// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);
// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
return parser.skip(whitespace);
}
// Several parsers are just strings with optional whitespace.
function word(str: string) {
return P.string(str).thru(token);
}
let MathParser = P.createLanguage({
expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
iExpr: r => P.alt(r.iExpr, r.number), // Issue here! this causes infinite recursion
// iExpr: r => r.number // this will fix infinite recursion but yields invalid parse
number: () =>
token(P.regexp(/[0-9]+/))
.map(Number)
.desc("number"),
plus: () => word("+"),
minus: () => word("-"),
plusOrMinus: r => P.alt(r.plus, r.minus),
multiply: () => word("*"),
divide: () => word("/"),
multiplyOrDivide: r => P.alt(r.multiply, r.divide),
operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});
///////////////////////////////////////////////////////////////////////
let text = "3 / 4 - 5 * 6 + 5";
let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));
This is my repo
Currently the grammar implemented by your parser looks like this (ignoring white space):
expr: sExpr2 | sExpr1 | number
sExpr1: iExpr plusOrMinus expr
sExpr2: iExpr multiplyOrDivide expr
// Version with infinite recursion:
iExpr: iExpr | number
// Version without infinite recursion:
iExpr: number
It's pretty easy to see that iExpr: iExpr is a left-recursive production and the cause of your infinite recursion. But even if Parsimmon could handle left-recursion, there just wouldn't be any point in that production. If it didn't mess up the parser, it just wouldn't do anything at all. Just like an equation x = x does not convey any information, a production x: x doesn't make a grammar match anything it didn't match before. It's basically a no-op, but one that breaks parsers that can't handle left recursion. So removing it is definitely the right solution.
With that fixed, you now get wrong parse trees. Specifically you'll get parse trees as if all your operators were of the same precedence and right-associative. Why? Because the left-side of all of your operators is iExpr and that can only match single numbers. So you'll always have leaves as the left child of an operator node and the tree always grows to the right.
An unambiguous grammar to correctly parse left-associative operators can be written like this:
expr: expr (plusOrMinus multExpr)?
multExpr: multExpr (multiplyOrDivide primaryExpr)?
primaryExpr: number | '(' expr ')'
(The | '(' expr ')' part is only needed if you want to allow parentheses, of course)
This would lead to the correct parse trees because there's no way for a multiplication or division to have an unparenthesized addition or subtraction as a child and if there are multiple applications of operators of the same precedence ,such as 1 - 2 - 3, the outer subtraction would contain the inner subtraction as its left child, correctly treating the operator as left-associative.
Now the problem is that this grammar is left recursive, so it's not going to work with Parsimmon. One's first thought might be to change the left recursion to right recursion like this:
expr: multExpr (plusOrMinus expr)?
multExpr: primaryExpr (multiplyOrDivide multExpr)?
primaryExpr: number | '(' expr ')'
But the problem with that is that now 1 - 2 - 3 wrongly associates to the right instead of the left. Instead, the common solution is to remove the recursion altogether (except the one from primaryExpr back to expr of course) and replace it with repetition:
expr: multExpr (plusOrMinus multExpr)*
multExpr: primaryExpr (multiplyOrDivide primaryExpr)*
primaryExpr: number | '(' expr ')'
In Parsimmon you'd implement this using sepBy1. So now instead of having a left operand, an operator and a right operand, you have a left operand and then arbitrarily many operator-operand pairs in an array. You can create a left-growing tree from that by simply iterating over the array in a for-loop.
If your want to learn how to deal with left recursion, you can start from https://en.wikipedia.org/wiki/Parsing_expression_grammar
or more precisely https://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion
And then read more about PEG online.
But basically a standard way is to use cycles:
Expr ← Sum
Sum ← Product (('+' / '-') Product)*
Product ← Value (('*' / '/') Value)*
Value ← [0-9]+ / '(' Expr ')'
Your can find examples of this grammar everywhere.
If your want to stick to a more pleasant left-recursive grammars, you can read about packrat parser. And find another parser for yourself. Because, I'm no sure, but it looks like parsimmon is not one of them.
If you just what a working code, than you can go to https://repl.it/repls/ObviousNavyblueFiletype
I've implemented the above grammar using parsimmon API
Expr : r => r.AdditiveExpr,
AdditiveExpr: r => P.seqMap(
r.MultExpr, P.seq(r.plus.or(r.minus), r.MultExpr).many(),
left_association
),
MultExpr : r => P.seqMap(
r.UnaryExpr, P.seq(r.multiply.or(r.divide), r.UnaryExpr).many(),
left_association
),
UnaryExpr : r => P.seq(r.minus, r.UnaryExpr).or(r.PrimaryExpr),
PrimaryExpr : r => P.seqMap(
r.LPAREN, r.Expr, r.RPAREN, // without parens it won't work
(lp,ex,rp) => ex // to remove "(" and ")" from the resulting AST
).or(P.digits),
plus : () => P.string('+').thru(p => p.skip(P.optWhitespace)),
minus : () => P.string('-').thru(p => p.skip(P.optWhitespace)),
multiply : () => P.string('*').thru(p => p.skip(P.optWhitespace)),
divide : () => P.string('/').thru(p => p.skip(P.optWhitespace)),
We also need to use parentheses, or else why would you need recursion? Value ← [0-9]+ will suffice. Without parentheses, there is no need to reference Expr inside grammar. And if we do, it would not consume any input, it would have no sense whatsoever, and it would hang in infinite recursion.
So let's also add:
LPAREN : () => P.string('(').thru(p => p.skip(P.optWhitespace)),
RPAREN : () => P.string(')').thru(p => p.skip(P.optWhitespace)),
I also added unary expressions, just for completeness.
And now for the most interesting part - production functions. Without them the result for, let' say:
(3+4+6+(-7*5))
will look like this:
[[["(",[["3",[]],[["+",["4",[]]],["+",["6",[]]],["+",[["(",[[["-","7"],[["*","5"]]],[]],")"],[]]]]],")"],[]],[]]
With them, it'll be:
[[[["3","+","4"],"+","6"],"+",[["-","7"],"*","5"]]]
Much nicer.
So for left associative operators we will need this:
// input (expr1, [[op1, expr2],[op2, expr3],[op3, expr4]])
// -->
// output [[[expr1, op1, expr2], op2, expr3], op3, expr4]
const left_association = (ex, rest) => {
// console.log("input", ex, JSON.stringify(rest))
if( rest.length === 0 ) return ex
if( rest.length === 1 ) return [ex, rest[0][0], rest[0][1]]
let node = [ex]
rest.forEach(([op, ex]) => {
node[1] = op;
node[2] = ex;
node = [node]
})
// console.log("output", JSON.stringify(node))
return node
}

Codewars Challenge - The Deaf Rats of Hamelin - JavaScript - How to Split a String by a Certain Set of Characters?

Link to challenge
Essentially, the idea is that if a rat - O~ - is facing the Piper - P - then it's going the correct way.
Here is a rat going left: O~
Here is a rat going right: ~O
We want to count how many 'deaf rats' exist in a string - how many are facing the wrong way.
Here, there is 1 deaf rat: P O~ O~ ~O O~
My logic is to first check for if the Piper is on the left or right side of the string.
If he's not, then we need to split the string based on where the Piper is located in the string, and then count how many rats on the left side are not facing him, and how many rats on the right side are not facing him.
var countDeafRats = function(town) {
town = town.replace(/[ ]/g, '');
let deafCount = 0;
//if piper's on the left
if (town[0] === 'P') {
for (let i = 0; i < town.length; i++) {
if (town[i] === '~O') {
deafCount ++;
}
}
}
//if piper's on the right
if (town[town.length - 1] === 'P') {
for (let j = 0; j < town.length; j++) {
if (town[j] === 'O~') {
deafCount ++;
}
}
}
let rats = town.split('P');
console.log('ratssss', rats);
let leftRats = [];
let rightRats = [];
return deafCount;
}
console.log(countDeafRats("~O~O P ~O~O"));
In the case above, here is the array of rats: [ '~O~O', '~O~O' ] So there are two rats on the right side that are going the wrong way (deaf rats).
What I'd like to do, is to push the left and right rats into their own arrays, and then count the deaf rats in each individual array.
But I'm not able to split the string ~O~O by either ~O or O~.
I would like to get ['~O', '~O'] for the leftRats and ['~O', '~O'] for the rightRats.
I had tried:
leftRats.push(rats[0].split('~O'));
rightRats.push(rats[1].split('O~'));
console.log('leftRatsssss', leftRats);
console.log('rightRatssss', rightRats);
But it's doing something very different from what was expected:
leftRatsssss [ [ '', '', '' ] ]
rightRatssss [ [ '~', 'O' ] ]
Split is not splitting the string by the string given as a parameter. So that's the real point of my question, then - how can you split a string by a certain set of characters?
I.e. If I wanted to split the string 'the' by 'he', I'd want 'the' to become ['t', 'he'] - I've tried regex for this case?
console.log("the".split(/[he]/gi));
And it seems to be doing the opposite of what I want. Any suggestions, then, for splitting a string by a given set of characters?
You can split the string by the P regardless, then turn each segment (on the left and right of the P) into an array of each 2-character chunk. Eg ~O~O P ~O~O turns into ['~O', '~O'] for the left chunk, and ['~O', '~O'] for the right chunk. Then count up the number of occurrences of O~ in the first chunk, and the number of occurrences of ~O in the second chunk:
var countDeafRats = function(town) {
const [leftRats, rightRats] = town
.replace(/[^O~P]/g, '')
.split('P')
.map(segment => segment.match(/.{2}/g) || []);
return (
leftRats.reduce((a, rat) => a + (rat === 'O~'), 0) +
rightRats.reduce((a, rat) => a + (rat === '~O'), 0)
);
}
console.log(countDeafRats("~O~O P ~O~O"));

Get initials and full last name from a string containing names

Assume there are some strings containing names in different format (each line is a possible user input):
'Guilcher, G.M., Harvey, M. & Hand, J.P.'
'Ri Liesner, Peter Tom Collins, Michael Richards'
'Manco-Johnson M, Santagostino E, Ljung R.'
I need to transform those names to get the format Lastname ABC. So each surename should be transformed to its initial which are appended to the lastname.
The example should result in
Guilcher GM, Harvey M, Hand JP
Liesner R, Collins PT, Richards M
Manco-Johnson M, Santagostino E, Ljung R
The problem is the different (possible) input format. I think my attempts are not very smart, so I'm asking for
Some hints to optimize the transformation code
How do I put those in a single function at all? I think first of all I have to test which format the string has...??
So let me explain how far I tried to solve that:
First example string
In the first example there are initials followed by a dot. The dots should be removed and the comma between the name and the initals should be removed.
firstString
.replace('.', '')
.replace(' &', ', ')
I think I do need an regex to get the comma after the name and before the initials.
Second example string
In the second example the name should be splitted by space and the last element is handled as lastname:
const elm = secondString.split(/\s+/)
const lastname = elm[elm.length - 1]
const initials = elm.map((n,i) => {
if (i !== elm.length - 1) return capitalizeFirstLetter(n)
})
return lastname + ' ' + initals.join('')
...not very elegant
Third example string
The third example has the already the correct format - only the dot at the end has to be removed. So nothing else has to be done with that input.
It wouldn't be possible without calling multiple replace() methods. The steps in provided solution is as following:
Remove all dots in abbreviated names
Substitute lastname with firstname
Replace lastnames with their beginning letter
Remove unwanted characters
Demo:
var s = `Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.`
// Remove all dots in abbreviated names
var b = s.replace(/\b([A-Z])\./g, '$1')
// Substitute first names and lastnames
.replace(/([A-Z][\w-]+(?: +[A-Z][\w-]+)*) +([A-Z][\w-]+)\b/g, ($0, $1, $2) => {
// Replace full lastnames with their first letter
return $2 + " " + $1.replace(/\b([A-Z])\w+ */g, '$1');
})
// Remove unwanted preceding / following commas and ampersands
.replace(/(,) +([A-Z]+)\b *[,&]?/g, ' $2$1');
console.log(b);
Given your example data i would try to make guesses based on name part count = 2, since it is very hard to rely on any ,, & or \n - which means treat them all as ,.
Try this against your data and let me know of any use-cases where this fails because i am highly confident that this script will fail at some point with more data :)
let testString = "Guilcher, G.M., Harvey, M. & Hand, J.P.\nRi Liesner, Peter Tom Collins, Michael Richards\nManco-Johnson M, Santagostino E, Ljung R.";
const inputToArray = i => i
.replace(/\./g, "")
.replace(/[\n&]/g, ",")
.replace(/ ?, ?/g, ",")
.split(',');
const reducer = function(accumulator, value, index, array) {
let pos = accumulator.length - 1;
let names = value.split(' ');
if(names.length > 1) {
accumulator.push(names);
} else {
if(accumulator[pos].length > 1) accumulator[++pos] = [];
accumulator[pos].push(value);
}
return accumulator.filter(n => n.length > 0);
};
console.log(inputToArray(testString).reduce(reducer, [[]]));
Here's my approach. I tried to keep it short but complexity was surprisingly high to get the edge cases.
First I'm formatting the input, to replace & for ,, and removing ..
Then, I'm splitting the input by \n, then , and finally (spaces).
Next I'm processing the chunks. On each new segment (delimited by ,), I process the previous segment. I do this because I need to be sure that the current segment isn't an initial. If that's the case, I do my best to skip that inital-only segment and process the previous one. The previous one will have the correct initial and surname, as I have all the information I neeed.
I get the initial on the segment if there's one. This will be used on the start of the next segment to process the current one.
After finishing each line, I process again the last segment, as it wont be called otherwise.
I understand the complexity is high without using regexp, and probably would have been better to use a state machine to parse the input instead.
const isInitial = s => [...s].every(c => c === c.toUpperCase());
const generateInitial = arr => arr.reduce((a, c, i) => a + (i < arr.length - 1 ? c[0].toUpperCase() : ''), '');
const formatSegment = (words, initial) => {
if (!initial) {
initial = generateInitial(words);
}
const surname = words[words.length - 1];
return {initial, surname};
}
const doDisplay = x => x.map(x => x.surname + ' ' + x.initial).join(', ');
const doProcess = _ => {
const formatted = input.value.replace(/\./g, '').replace(/&/g, ',');
const chunks = formatted.split('\n').map(x => x.split(',').map(x => x.trim().split(' ')));
const peoples = [];
chunks.forEach(line => {
let lastSegment = null;
let lastInitial = null;
let lastInitialOnly = false;
line.forEach(segment => {
if (lastSegment) {
// if segment only contains an initial, it's the initial corresponding
// to the previous segment
const initialOnly = segment.length === 1 && isInitial(segment[0]);
if (initialOnly) {
lastInitial = segment[0];
}
// avoid processing last segments that were only initials
// this prevents adding a segment twice
if (!lastInitialOnly) {
// if segment isn't an initial, we need to generate an initial
// for the previous segment, if it doesn't already have one
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
lastInitialOnly = initialOnly;
// Skip initial only segments
if (initialOnly) {
return;
}
}
lastInitial = null;
// Remove the initial from the words
// to avoid getting the initial calculated for the initial
segment = segment.filter(word => {
if (isInitial(word)) {
lastInitial = word;
return false;
}
return true;
});
lastSegment = segment;
});
// Process last segment
if (!lastInitialOnly) {
const people = formatSegment(lastSegment, lastInitial);
peoples.push(people);
}
});
return peoples;
}
process.addEventListener('click', _ => {
const peoples = doProcess();
const display = doDisplay(peoples);
output.value = display;
});
.row {
display: flex;
}
.row > * {
flex: 1 0;
}
<div class="row">
<h3>Input</h3>
<h3>Output</h3>
</div>
<div class="row">
<textarea id="input" rows="10">Guilcher, G.M., Harvey, M. & Hand, J.P.
Ri Liesner, Peter Tom Collins, Michael Richards
Manco-Johnson M, Santagostino E, Ljung R.
Jordan M, Michael Jackson & Willis B.</textarea>
<textarea id="output" rows="10"></textarea>
</div>
<button id="process" style="display: block;">Process</button>

Can't get two variables to concatenate

I'm a newbie, making a little exercise to practice with arrays. I've tried solving this from previous articles but none seem to have the relevant scenario.
I want to randomly generate sentences into paragraphs using phrases from an array. I got the random sentence generation part working fine.
var ipsumText = ["adventure", "endless youth", "dust", "iconic landmark", "spontaneous", "carefree", "selvedge","on the road", "open road", "stay true", "free spirit", "urban", "live on the edge", "the true wanderer", "vintage motorcyle", "american lifestyle", "epic landscape", "low slung denim", "naturaL"];
//a simple function to print a sentence //
var sentence = function (y) {
var printSentence = "";
for (i=0; i<7; i++) {
//random selection of string from array //
var x = Math.floor(Math.random() * 20);
printSentence += y [x] + " ";
}
return printSentence
};
console.log(sentence(ipsumText));
But now I want to be able to add a comma or full stop to the end of the sentence.
Because each word/phrase from the array used in the sentence prints with a space after it, I need to add an extra word with a full stop or comma right after it to avoid the space between them. To do this I created an extra variable
// create a word and full stop to end a sentence//
var addFullstop = ipsumText[Math.floor(Math.random() * ipsumText.length)] + ". ";
var addComma = ipsumText[Math.floor(Math.random() * ipsumText.length)] + ", ";
These variables work on their own how I expect. They print a random word with a comma or full stop right after them.
However now I can't work out how to get them to add to the end of the sentence. I have tried quite a few versions referencing articles here, but I'm missing something, because when I test it, I get nothing printing to the console log.
This is what I have most recently tried.
// add the sentence and new ending together //
var fullSentence = sentence(ipsumText) + addFullstop;
console.log(fullSentence)
Can someone explain why this wouldn't work? And suggest a solution to try?
thanks
See ES6 fiddle: http://www.es6fiddle.net/isadgquw/
Your example works. But consider a different approach which is a bit more flexible. You give it the array of words, how long you want the sentence to be, and if you want an ending to the sentence, pass in end, otherwise, just leave it out and it will not be used.
The first line generates an array of length count which is composed of random indices to be used to index into the words array. The next line maps these indices to actual words. The last line joins all of these into a sentence separated by a single space, with an optional end of the sentence which the caller specifies.
const randomSent = (words, count, end) =>
[...Array(count)].map(() => Math.floor(Math.random() * words.length))
.map(i => words[i])
.join(' ') + (end || '')
randomSent (['one','two','x','compound word','etc'], 10, '! ')
// x x one x compound word one x etc two two!
To make it more flexible, consider making a function for each task. The code is reusable, specific, and no mutable variables are used, making it easy to test, understand, and compose however you like:
const randInt = (lower, upper) =>
Math.floor(Math.random() * (upper-lower)) + lower
const randWord = (words) => words[randInt(0, words.length)]
const randSentence = (words, len, end) =>
[...Array(len)].map(() => randWord(words))
.join(' ') + (end || '')
const randWordWithEnd = (end) => randWord(ipsumText) + end
const randWordWithFullStop = randWordWithEnd('. ')
const randWordWithComma = randWordWithEnd(', ')
// Now, test it!
const fullSentWithEnd = randSentence(ipsumText, 8, '!!')
const fullSentNoEnd = randSentence(ipsumText, 5)
const fullSentComposed = fullSentNoEnd + randWordWithFullStop
Link again for convenience: http://www.es6fiddle.net/isadgquw/

JavaScript - Matching alphanumeric patterns with RegExp

I'm new to RegExp and to JS in general (Coming from Python), so this might be an easy question:
I'm trying to code an algebraic calculator in Javascript that receives an algebraic equation as a string, e.g.,
string = 'x^2 + 30x -12 = 4x^2 - 12x + 30';
The algorithm is already able to break the string in a single list, with all values on the right side multiplied by -1 so I can equate it all to 0, however, one of the steps to solve the equation involves creating a hashtable/dictionary, having the variable as key.
The string above results in a list eq:
eq = ['x^2', '+30x', '-12', '-4x^2', '+12x', '-30'];
I'm currently planning on iterating through this list, and using RegExp to identify both variables and the respective multiplier, so I can create a hashTable/Dictionary that will allow me to simplify the equation, such as this one:
hashTable = {
'x^2': [1, -4],
'x': [30, 12],
' ': [-12]
}
I plan on using some kind of for loop to iter through the array, and applying a match on each string to get the values I need, but I'm quite frankly, stumped.
I have already used RegExp to separate the string into the individual parts of the equation and to remove eventual spaces, but I can't imagine a way to separate -4 from x^2 in '-4x^2'.
You can try this
(-?\d+)x\^\d+.
When you execute match function :
var res = "-4x^2".match(/(-?\d+)x\^\d+/)
You will get res as an array : [ "-4x^2", "-4" ]
You have your '-4' in res[1].
By adding another group on the second \d+ (numeric char), you can retrieve the x power.
var res = "-4x^2".match(/(-?\d+)x\^(\d+)/) //res = [ "-4x^2", "-4", "2" ]
Hope it helps
If you know that the LHS of the hashtable is going to be at the end of the string. Lets say '4x', x is at the end or '-4x^2' where x^2 is at end, then we can get the number of the expression:
var exp = '-4x^2'
exp.split('x^2')[0] // will return -4
I hope this is what you were looking for.
function splitTerm(term) {
var regex = /([+-]?)([0-9]*)?([a-z](\^[0-9]+)?)?/
var match = regex.exec(term);
return {
constant: parseInt((match[1] || '') + (match[2] || 1)),
variable: match[3]
}
}
splitTerm('x^2'); // => {constant: 1, variable: "x^2"}
splitTerm('+30x'); // => {constant: 30, variable: "x"}
splitTerm('-12'); // => {constant: -12, variable: undefined}
Additionally, these tool may help you analyze and understand regular expressions:
https://regexper.com/
https://regex101.com/
http://rick.measham.id.au/paste/explain.pl

Categories

Resources