I want to truncate text in a pattern, this is a function to highlight text from an array containing matched indexes and text, but I want to truncate the text which doesn't include the part with match, see code below
const highlight = (matchData, text) => {
var result = [];
var matches = [].concat(matchData);
var pair = matches.shift();
for (var i = 0; i < text.length; i++) {
var char = text.charAt(i);
if (pair && i == pair[0]) {
result.push("<u>");
}
result.push(char);
if (pair && i == pair[1]) {
result.push("</u>");
truncatedIndex = i;
pair = matches.shift();
}
}
return result.join("");
};
console.log(
highlight(
[[23, 29], [69, 74]],
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
)
);
// This returns the highlighted HTML - Result will be => "Some text that doesn't <u>include</u> the main thing, the main thing is the <u>result</u>, you may know I meant that"
But this returns whole text, I want to truncate other texts in the range, I want to truncate other text but not in range of 20 characters before and after the result so the text can be clean as well as understandable. Like
"... text that doesn't <u>include</u> the main thing ... the <u>result</u> you may know I ..."
I can't find out a way to make that. Help is appreciated.
I've modified your function considerably, to make it easier to understand, and so that it works...
Instead of using an array of arrays, which I find cumbersome to deal with, I modified to use an array of objects. The objects are simple:
{
start: 23,
end: 30
}
Basically, it just adds names to the indices you had previously.
The code should be relatively easy to follow. Here's a line-by-line explanation:
Armed with the new structure, you can use a simple substring command to snip the appropriate piece of text.
Since we're in a loop, and I don't want two sets of ellipses between matches, I check to see if we're on the first pass through and only add an ellipses before the match on the first pass.
The text before the piece we've snipped is the 20 characters before the start of the match, or the number of characters to the beginning of the string. Math.max() provides a easy way of getting the highest index available.
The text after the piece we've snippet is the 20 characters after the end of the match, or the number of characters to the end of the string. Math.min() provides a easy way of getting the lowest index available.
Concatentating them together, we get the match's new text. I'm using template literals to make that easier to read than a bunch of + " " + and whatnot.
const highlight = (matches, text) => {
let newText = '';
matches.forEach((match) => {
const piece = text.substring(match.start, match.end);
const preEllipses = newText.length === 0 ? '... ' : '';
const textBefore = text.substring(Math.max(0, match.start - 20), match.start);
const textAfter = text.substring(match.end, Math.min(text.length - 1, match.end + 20));
newText += `${preEllipses}${textBefore}<u>${piece}</u>${textAfter} ... `;
});
return newText.trim();
}
// Sample Usage
const result = highlight([{ start: 23, end: 30 }, { start: 69, end: 75 }], "Some text that doesn't include the main thing, the main thing is the result, you may know I meant that");
console.log(result);
document.getElementById("output").innerHTML = result;
// Result will be => "... e text that doesn't <u>include</u> the main thing, the ... e main thing is the <u>result</u>, you may know I mea ..."
<div id="output"></div>
Note that I am using simple string concatenation here, rather than putting parts into an array and using join. Modern JavaScript engines optimize string concatenation extremely well, to the point where it makes the most sense to just use it. See e.g., Most efficient way to concatenate strings in JavaScript?, and Dr. Axel Rauschmayer's post on 2ality.
Note
There's an update below that I think shows a better version of this same idea. But this is where it started.
Original Version
Here's another attempt, building a more flexible solution out of reusable parts.
const intoPairs = (xs) =>
xs .slice (1) .map ((x, i) => [xs[i], x])
const splitAtIndices = (indices, str) =>
intoPairs (indices) .map (([a, b]) => str .slice (a, b))
const alternate = Object.assign((f, g) => (xs, {START, MIDDLE, END} = alternate) =>
xs .map (
(x, i, a, pos = i == 0 ? START : i == a.length - 1 ? END : MIDDLE) =>
i % 2 == 0 ? f (x, pos) : g (x, pos)
),
{START: {}, MIDDLE: {}, END: {}}
)
const wrap = (before, after) => (s) => `${before}${s}${after}`
const truncate = (count) => (s, pos) =>
pos == alternate.START
? s .length <= count ? s : '... ' + s .slice (-count)
: pos == alternate.END
? s .length <= count ? s : s .slice (0, count) + ' ...'
: // alternate.MIDDLE
s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)
const highlighter = (f, g) => (ranges, str, flip = ranges[0][0] == 0) =>
alternate (flip ? g : f, flip ? f : g) (
splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
) .join ('')
const highlight = highlighter (truncate (20), wrap('<u>', '</u>'))
#output {padding: 0 1em;} #input {padding: .5em 1em 0;} textarea {width: 50%; height: 3em;} button, input {vertical-align: top; margin-left: 1em;}
<div id="input"> <textarea id="string">Some text that doesn't include the main thing, the main thing is the result, you may know I meant that</textarea> <input type="text" id="indices" value="[23, 30], [69, 75]"/> <button id="run">Highlight</button></div><h4>Output</h4><div id="output"></div> <script>document.getElementById('run').onclick = (evt) => { const str = document.getElementById('string').value; const idxString = document.getElementById('indices').value; const idxs = JSON.parse(`[${idxString}]`); const result = highlight(idxs, str); console.clear(); document.getElementById('output').innerHTML = ''; setTimeout(() => { console.log(result); document.getElementById('output').innerHTML = result; }, 300)}</script>
This involves the helper functions intoPairs, splitAtIndices alternate, wrap and truncate. I think they are best show by examples:
intoPairs (['a', 'b', 'c', 'd']) //=> [['a', 'b'], ['b', 'c'], ['c', 'd']]
splitAtIndices ([0, 3, 7, 15], 'abcdefghijklmno') //=> ["abc", "defg", "hijklmno"]
// ^ ^ ^ ^ `---' `----' `--------'
// | | | | | | |
// 0 3 7 15 0 - 3 4 - 7 8 - 15
alternate (f, g) ([a, b, c, d, e, ...]) //=> [f(a), g(b), f(c), g(d), f(e), ...]
wrap ('<div>', '</div>') ('foo bar baz') //=> '<div>foo bar baz</div>
//chars---+ input---+ position---+ output--+
// | | | |
// V V V V
truncate (10) ('abcdefghijklmnop', ~START~) //=> '... ghijklmnop'
truncate (10) ('abcdefghijklmnop', ~END~) //=> 'abcdefghij ...'
truncate (10) ('abcdefghijklmnop', ~MIDDLE~) //=> 'abcdefghijklmnop'
truncate (10) ('abcdefghijklmnopqrstuvwxyz', ~MIDDLE~) //=> 'abcdefghij ... qrstuvwxyz'
All of these are potentially reusable, and I personally have intoPairs and wrap in my general utility library.
truncate is the only complex one, and that is mostly because it does triple duty, handling the first string, the last string, and all the others in three distinct manners. You first supply a count and the you give a string as well as the position (START, MIDDLE, END, stored as properties of alternate.) For the first string, it includes an ellipsis (...) and the last count characters. For the last one, it includes the first count characters and an ellipsis. For the middle ones, if the length is shorter than double count, it returns the whole thing; otherwise it includes the first count characters, an ellipsis and the last count characters. This behavior might be different from what you want; if so,
The main function is highlighter. It accepts two functions. The first one is how you want to handle the non-highlighted sections. The second is for the highlighted ones. It returns the style function you were looking for, one that accepts an array of two-element arrays of numbers (the ranges) and your input string, returning a string with the highlighted ranges and the non-highlighted ranges.
We use it to generate the highlight function by passing it truncate (20) and wrap('<u>', '</u>').
The intermediate forms might make it clearer what's going on.
We start with these indices:
[[23, 30], [69, 75]]]
and our 103-character string,
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
First we flatten the ranges, prepending a zero if the first range doesn't start there and appending the last index of the string, to get this:
[0, 23, 30, 69, 75, 102]
We pass that to splitAtIndices, along with our string, to get
[
"Some text that doesn't ",
"include",
" the main thing, the main thing is the ",
"result",
", you may know I meant that"
]
Then we map the appropriate functions over each of these strings to get
[
"... e text that doesn't ",
"<u>include</u>",
" the main thing, the main thing is the ",
"<u>result</u>",
", you may know I mea ..."
]
and join those together to get our final results:
"... e text that doesn't <ul>include</ul> the main thing, the main thing is the <ul>result</ul>, you may know I mea ..."
I like the flexibility this offers. It's easy to alter the highlight strategy as well as how you handle the unhighlighted parts -- just pass a different function to highlighter. It's also a useful breakdown of the work into reusable parts.
But there are two things I don't like.
First, I'm not thrilled with the handling of middle unhighlighted sections. Of course it's easy to change; but I don't know what would be appropriate. You might, for instance, want to change the doubling applied to the count there. Or you might have an entirely different idea.
Second, truncate is dependent upon alternate. We have to somehow pass signals from alternate to the two functions supplied to it to let them know where we are. My first pass involved passing the index and the entire array (the Array.prototype.map signature) to those functions. But that felt too coupled. We could make START, MIDDLE, and END into module-local properties, but then alternate and truncate would not be reusable. I'm not going to go back and try it now, but I think a better solution might be to pass four functions to highlighter: the function for the highlighted sections, and one each for start, middle, and end positions of the non-highlighted ones.
Update
I did go ahead and try that alternative I mentioned, and I think this version is cleaner, with all the complexity located in the single function highlighter:
const intoPairs = (xs) =>
xs .slice (1) .map ((x, i) => [xs[i], x])
const splitAtIndices = (indices, str) =>
intoPairs (indices) .map (([a, b]) => str .slice (a, b))
const wrap = (before, after) => (s) => `${before}${s}${after}`
const truncateStart = (count) => (s) =>
s .length <= count ? s : '... ' + s .slice (-count)
const truncateMiddle = (count) => (s) =>
s .length <= (2 * count) ? s : s .slice (0, count) + ' ... ' + s .slice (-count)
const truncateEnd = (count) => (s) =>
s .length <= count ? s : s .slice (0, count) + ' ...'
const highlighter = (highlight, start, middle, end) =>
(ranges, str, flip = ranges[0][0] == 0) =>
splitAtIndices ([...(flip ? [] : [0]), ...ranges .flat() .sort((a, b) => a - b), str.length], str)
.map (
(s, i, a) =>
(flip
? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
: (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
) (s)
) .join ('')
const highlight = highlighter (
wrap('<u>', '</u>'),
truncateStart(20),
truncateMiddle(20),
truncateEnd(20)
)
console .log (
highlight (
[[23, 30], [69, 75]],
"Some text that doesn't include the main thing, the main thing is the result, you may know I meant that"
)
)
console .log (
highlight (
[[23, 30], [86, 92]],
"Some text that doesn't include the main thing, because you see, the main thing is the result, you may know I meant that"
)
)
There is some real complexity built into highlighter, but I think it's fairly intrinsic to the problem. On each iteration, we have to choose one of our four functions based on the index, the length of the array, and whether the first range started at zero. This bit here simply chooses the function based on all that:
(flip
? (i % 2 == 0 ? highlight : i == a.length - 1 ? end : middle)
: (i == 0 ? start : i % 2 == 1 ? highlight : i == a.length - 1 ? end : middle)
)
where the flip boolean simply reports whether the first range starts at 0, a is the array of substrings to handle., and i is the current index in the array. If you see a cleaner way of choosing the function, I'd love to know.
If we wanted to write a gloss for this sort of highlighting, we could easily write
const truncatingHighlighter = (count, start, end) =>
highlighter (
wrapp(start, end),
truncateStart(count),
truncateMiddle(count),
truncateEnd(count)
)
const highlight = truncatingHighlighter (20, '<u>', '</u>')
I definitely think this is a superior solution.
I have a javascript which work correctly in Firefox, but it doesn't work in chrome!
The code is two array list that is supposed to sort a list of movies in alphabetical order.
One list from A to Z and one list with Z to A.
Any ideas what i have done wrong?
Thanks alot
var Movies = ["Pulp Fiction", "The Dark Knight", "Fight Club", " Terminator", "Matrix", "American History X", "Memento "];
function listMovies()
{
document.writeln("<b>Movies sort from A to B:</b>");
document.writeln(Movies.sort().join("<br>"));
document.writeln("<br><b>Movies sort from B to A:</b>");
document.writeln(Movies.sort(function(a, b){return b-a}).join("<br>"));
}
listMovies();
Two issues:
You have a space in front of Terminator which is throwing off the results.
Your "B to A" function is using the - operator on strings, which is not going to give you reliable results.
To solve #2, use a function that doesn't try to subtract strings:
document.writeln(Movies.sort(function(a, b){
return a == b ? 0 : (a < b ? 1 : -1)
}).join("<br>"));
Live Example
Side note: I'd avoid using document.writeln and similar.
First, you probably shouldn't be using document.writeLn. It's almost never necessary. Use DOM methods instead.
The problem, however, is that your code is trying to subtract one string from another in calculating the relative positions:
Movies.sort(function(a, b){return b-a})
This works fine with numbers (4 - 2 === 2) but not with strings ('Terminator' - 'American History' is NaN).
You need to compare with < and > instead, which makes your function a little more complex:
function listMovies()
{
document.writeLn("<b>Movies sort from A to B:</b>");
document.writeLn(Movies.sort().join("<br>"));
document.writeLn("<br><b>Movies sort from B to A:</b>");
document.writeLn(Movies.sort(function(a, b){
if (b === a) return 0;
if (b > a) return 1;
return -1;
}).join("<br>"));
}
The other oddity you have is that Terminator has a leading space, which causes it to go the beginning of the list alphabetically.
What about this DEMO ?
var Movies = ["Pulp Fiction", "The Dark Knight", "Fight Club", " Terminator", "Matrix", "American History X", "Memento "];
function listMovies(){
for(var i=0;i<Movies.length;i++){
Movies[i] = Movies[i].trim();
}
document.writeln("<b>Movies sort from A to B:</b>");
document.writeln(Movies.sort().join("<br>"));
document.writeln("<br><b>Movies sort from B to A:</b>");
document.writeln(Movies.sort(function(a, b){return b-a}).join("<br>"));
}
listMovies();
I am facing quite a challenge here. I am to sort certain Chinese "expressions" by pinyin.
The question:
How could I sort by pinyin in Firefox?
Is there a way to sort properly in IE 9 and 10? (They are also to be supported by the website)
Example:
财经传讯公司
财经顾问
房地产及按揭
According to a translator agency, this is what the sort order of the words should be. The translations are as follows:
Financial communication agencies
Financial consultancies
Real estate and mortgages
The pronanciations in latin alphabet:
cai jing chuan xun gong si
cai jing gu wen
fang di chan ji an jie
String.localeCompare:
MDN Docs
From what I understand I am to provide a 2nd argument to the String.localeCompare method that "tells" the method to sort by pinyin in BCP 47 format which should be zh-CN-u-co-pinyin.
So the full code should look like this:
var arr = [ "财经传讯公司", "财经顾问", "房地产及按揭"];
console.dir(arr.sort(function(a, b){
return a.localeCompare(b, [ "zh-CN-u-co-pinyin" ]);
}));
jsFiddle working example
I expected this to log to console the expressions in the order I entered them in the array but the output differs.
On FX 27, the order is: 3, 1, 2
In Chrome 33: 1, 2, 3
In IE 11: 1, 2, 3
Note:
Pinyin is the official phonetic system for transcribing the Mandarin
pronunciations of Chinese characters into the Latin alphabet.
This works on Chrome:
const arr = ["博","啊","吃","世","中","超"]
arr.sort((x,y)=>x.localeCompare(y, 'zh-CN'))
In general, people will use the following method for Chinese characters pinyin sort
var list=[' king ', 'a', 'li'];
list.Sort(function (a, b) {return a.localeCompare(b); });
localeCompare () : with local specific order to compare two strings.
This approach to pinyin sort is unreliable.
Second way: very dependent on Chinese operating system
Is very dependent on the browser kernel that is to say, if your site visitors are through the Chinese system, or the Internet explorer browser (Chrome), then he will probably unable to see the pinyin sort the result we expected.
Here I'll introduce my solution to this problem, hope to be able to derive somehow:
this method supports the Unicode character set x4e00 from 0 to 0 x9fa5 area a total of 20902 consecutive from China (including Taiwan), Japan, South Korea, Chinese characters, namely, CJK (Chinese Japanese Korean) characters.
var CompareStrings={.........}
getOrderedUnicode: function (char) {
var originalUnicode=char.charCodeAt ();
if (originalUnicode >=0 x4e00 && originalUnicode <=0 x9fa5) {
var index=this.Db.IndexOf (char);
if (index >1) {
return index + 0 x4e00;
}}
return originalUnicode;
},
compare: function (a, b) {
if (a==b) {return 0; }
//here can be rewritten according to the specific needs and the writing is the empty string at the bottom the if (a.length==0) {return 1; }
if (b.length==0) {return - 1; }
var count=a.length >B.length? B.length: a.length;
for (var i=0; i<count; i++) {
var au=this.GetOrderedUnicode (a [i]);
var bu=this.GetOrderedUnicode [i] (b);
if (au >bu) {
return 1;
} else if (au <bu) {
return - 1;
}}
return a.length >B.length? 1:1;
}}
//rewriting system native localeCompare
The prototype:
LocaleCompare = function (param) {
return CompareStrings.compare said (enclosing the toString (), param);
}
you can through the links below to download the complete code
A brief introduction of the principle of implementation:
According to pinyin sort good character (db) : there are multiple ways to achieve a goal, I am done with JavaScript + c# combination, use the script first put all the enumeration of Chinese characters, and then submitted to the c #good background sort, and output to the front desk, this is just the preparation, what all can.
Identify two characters who is bigger (getOrderedUnicode) : because when ordering, not only to deal with Chinese characters, and Chinese characters outside of the characters, so the comparator must be able to identify all of the characters, we here by judging whether a character is to discriminate Chinese characters: if it is Chinese characters, then the sort good word library search index, the index value plus the Unicode character set the location of the first Chinese characters, is after the "calibration" of the Unicode character set of the index value; If not Chinese characters, then return it directly on the index value of the Unicode character set.
Compare two strings (compare) : by comparing two each of the characters (within the effective range comparison, that is, the shorter the length of the string), if you find a greater than b, it returns 1, vice return 1.
Within the effective range after the comparison if haven't the tie, just see who is longer, such as a='123', b='1234', so long b to row in the back.
EDIT
You can also use JQuery plugin:
jQuery.extend( jQuery.fn.dataTableExt.oSort, {
"chinese-string-asc" : function (s1, s2) {
return s1.localeCompare(s2);
},
"chinese-string-desc" : function (s1, s2) {
return s2.localeCompare(s1);
}
} );
See the original post.
According to MDN, locales and options arguments in localeCompare() have been added in Firefox 29. You should be able to sort by pinyin now.
Here is a solution:
<!--
pinyin_dict_notone.js and pinyinUtil.js is available in URL below:
https://github.com/sxei/pinyinjs
-->
<script src="pinyin_dict_notone.js"></script>
<script src="pinyinUtil.js"></script>
<script>
jQuery.extend(jQuery.fn.dataTableExt.oSort, {
"chinese-string-asc": function(s1, s2) {
s1 = pinyinUtil.getPinyin(s1);
s2 = pinyinUtil.getPinyin(s2);
return s1.localeCompare(s2);
},
"chinese-string-desc": function(s1, s2) {
s1 = pinyinUtil.getPinyin(s1);
s2 = pinyinUtil.getPinyin(s2);
return s2.localeCompare(s1);
}
});
jQuery(document).ready(function() {
jQuery('#mydatatable').dataTable({
"columnDefs": [
{ type: 'chinese-string', targets: 0 }
]
});
});
</script>