I build on a web-scrapper, that, lets say scrap URLs from google
I get an array of URLs from google results:
const linkSelector = 'div.yuRUbf > a'
let links = await page.$$eval(linkSelector, link => {
return link.map( x => x.href)
})
the output of 'links' is something like that:
[
'https://google.com/.../antyhing'
'https://amazon.com/.../antyhing'
'https://twitter.com/.../antyhing'
]
Now I have a 'blacklist', with something like that:
[
'https://amazon.com'
]
At the moment I stuck at that point where I can compare both arrays, and remove these URLs from 'links' which are listed within my blacklist.
So I came up with the idea, to get the domain of the url within my links array - like so:
const linkList = []
for ( const link of links ) {
const url = new URL(link)
const domain = url.origin
linkList.push(domain)
}
Yes, now i got two arrays which i can compare against each other and remove the blacklisted domain, but i lost the complete url i need to work with...
for( let i = linkList.length - 1; i >= 0; i--){
for( let j=0; j < blacklist.length; j++){
if( linkList[i] === blacklist[j]){
linkList.splice(i, 1);
}
}
}
Code Snippet is part of the give answer, here:
Compare two Javascript Arrays and remove Duplicates
Any ideas how can i do this, with puppeteer and node.js?
I couldn't find an obvious dupe, so converting my comments to an answer:
.includes:
const allowedLinks = links.filter(link => !blacklist.some(e => link.includes(e)))
.startsWith:
const allowedLinks = links.filter(link => !blacklist.some(e => link.startsWith(e)))
The second version is more precise. If you want to use the URL version, this should work:
const links = [
"https://google.com/.../antyhing",
"https://amazon.com/.../antyhing",
"https://twitter.com/.../antyhing",
];
const blacklist = ["https://amazon.com"];
const allowedLinks = links.filter(link =>
!blacklist.some(black =>
black.startsWith(new URL(link).origin) // or use ===
)
);
console.log(allowedLinks);
As for Puppeteer, I doubt it matters whether you do this Node-side or browser-side, unless these arrays are enormous. On that train of thought, technically we have a quadratic algorithm here but I wouldn't worry about it unless you have many hundreds of thousands of elements and are noticing slowness. In that case, you can put the blacklisted origins into a Set data and look up each link's origin in that. The problem with this is it's a precise ===, so you'd have to build a prefix set if you need to preserve .startsWith semantics. This is likely unnecessary and out of scope for this answer, but worth mentioning briefly.
Related
I have thought about this alot but i cant find a good solution..that is also fast in Javascript.
I have an array of objects..the objects are game searches for a random player.
The array may look like this:
const GameSearches[
{fromPlayerId:"378329",goalScore:20}
{fromPlayerId:"125342",goalScore:20}
{fromPlayerId:"378329",goalScore:20}
{fromPlayerId:"918273",goalScore:20}
{fromPlayerId:"378329",goalScore:20}
]
In this array i need to rund a function called CreateNewGame(Player1,Player2).
In this array i could create games with for example index 0 and 1. Index 2 and 3. Index 4 would be left in the array as there are no more players to match on.
Anyone got a good solution to this? It would really help me out.
I have tried different filter and map without finding a good solution.
The output should call a function createnewgame
Example:
createNewGame(GameSearches[0].from,GameSearches[1].from)
this function will be called as there are two players looking for a game. they do not have they same fromPlayerId so they should match.
I see some comments on that StackOverflow is not a free codingservice..the app has thousands of lines..this is only a small part. Im asking becouse i cant figure out the logic on how to to this. I do not need a full working example.
You can try something like this
const GameSearches = [
{fromPlayerId:"378329",goalScore:20},
{fromPlayerId:"125342",goalScore:20},
{fromPlayerId:"378329",goalScore:20},
{fromPlayerId:"918273",goalScore:20},
{fromPlayerId:"378329",goalScore:20}
];
const createNewGames = (player1, player2) => console.log(player1.fromPlayerId, player2.fromPlayerId)
const getMatch = (GameSearches) => {
while([...new Set(GameSearches)].length > 1){
const player1 = GameSearches[0];
GameSearches.shift();
const player2Index = GameSearches.findIndex(el => el.fromPlayerId !== player1.fromPlayerId)
const player2 = GameSearches[player2Index];
GameSearches.splice(player2Index,1)
createNewGames(player1, player2);
}
}
getMatch(GameSearches);
I think maybe i can use the suggestion of a for loop..and it will work fine.
for (let i = 0; i < games20GoalScore.length; i = i + 2) {
if (games20GoalScore[i + 1] !== undefined) {
console.log(games20GoalScore[i] + " : " + games20GoalScore[i + 1]);
if (games20GoalScore[i].from !== games20GoalScore[i + 1].from) {
console.log("Match");
}
}
}
This code is run each time the array get a new item.
I am attempting to iterate over a very large 2D array in JavaScript within an ionic application, but it is majorly bogging down my app.
A little background, I created custom search component with StencilJS that provides suggestions upon keyup. You feed the component with an array of strings (search suggestions). Each individual string is tokenized word by word and split into an array and lowercase
For example, "Red-Winged Blackbird" becomes
['red','winged','blackbird']
So, tokenizing an array of strings looks like this:
[['red','winged','blackbird'],['bald','eagle'], ...]
Now, I have 10,000+ of these smaller arrays within one large array.
Then, I tokenize the search terms the user inputs upon each keyup.
Afterwards, I am comparing each tokenized search term array to each tokenized suggestion array within the larger array.
Therefore, I have 2 nested for-of loops.
In addition, I am using Levenshtein distance to compare each search term to each element of each suggestion array.
I had a couple drinks so please be patient while i stumble through this.
To start id do something like a reverse index (not very informative). Its pretty close to what you are already doing but with a couple extra steps.
First go through all your results and tokenize, stem, remove stops words, decap, coalesce, ects. It looks like you've already done this but i'm adding an example for completion.
const tokenize = (string) => {
const tokens = string
.split(' ') // just split on words, but maybe rep
.filter((v) => v.trim() !== '');
return new Set(tokens);
};
Next what we want to do is generate a map that takes a word as an key and returns us a list of document indexes the word appears in.
const documents = ['12312 taco', 'taco mmm'];
const index = {
'12312': [0],
'taco': [0, 1],
'mmm': [2]
};
I think you can see where this is taking us... We can tokenize our search term and find all documents each token belongs, to work some ranking magic, take top 5, blah blah blah, and have our results. This is typically the way google and other search giants do their searches. They spend a ton of time in precomputation so that their search engines can slice down candidates by orders of magnitude and work their magic.
Below is an example snippet. This needs a ton of work(please remember, ive been drinking) but its running through a million records in >.3ms. Im cheating a bit by generate 2 letter words and phrases, only so that i can demonstrate queries that sometimes achieve collision. This really doesn't matter since the query time is on average propionate to the number of records. Please be aware that this solution gives you back records that contain all search terms. It doesn't care about context or whatever. You will have to figure out the ranking (if your care at this point) to achieve the results you want.
const tokenize = (string) => {
const tokens = string.split(' ')
.filter((v) => v.trim() !== '');
return new Set(tokens);
};
const ri = (documents) => {
const index = new Map();
for (let i = 0; i < documents.length; i++) {
const document = documents[i];
const tokens = tokenize(document);
for (let token of tokens) {
if (!index.has(token)) {
index.set(token, new Set());
}
index.get(token).add(i);
}
}
return index;
};
const intersect = (sets) => {
const [head, ...rest] = sets;
return rest.reduce((r, set) => {
return new Set([...r].filter((n) => set.has(n)))
}, new Set(head));
};
const search = (index, query) => {
const tokens = tokenize(query);
const canidates = [];
for (let token of tokens) {
const keys = index.get(token);
if (keys != null) {
canidates.push(keys);
}
}
return intersect(canidates);
}
const word = () => Math.random().toString(36).substring(2, 4);
const terms = Array.from({ length: 255 }, () => word());
const documents = Array.from({ length: 1000000 }, () => {
const sb = [];
for (let i = 0; i < 2; i++) {
sb.push(word());
}
return sb.join(' ');
});
const index = ri(documents);
const st = performance.now();
const query = 'bb iz';
const results = search(index, query);
const et = performance.now();
console.log(query, Array.from(results).slice(0, 10).map((i) => documents[i]));
console.log(et - st);
There are some improvements you can make if you want. Like... ranking! The whole purpose of this example is to show how we can cut down 1M results to maybe a hundred or so canidates. The search function has some post filtering via intersection which probably isn't what you want you want but at this point it doesn't really matter what you do since the results are so small.
Currently, I have a huge JavaScript array where each element is like this:
[{"open":235.86,
"high":247.13,
"low":231.5,
"close":244.1,
"volume":55025735,
"date":"2019-05-01T21:00:00.000Z"}
...
I need to remove everything except the price after high. What is the most efficient way I can do this?
I've tried popping the individual elements, but I can't help but feel as if there is a more efficient/easier way to do this.
So hopefully the ending array would just be [235.86].
The below code should work. It's efficient enough :D
for (i in arrayName){
// Loops through array
delete arrayName[i].high
delete arrayName[i].low
delete arrayName[i].close
delete arrayName[i].volume
delete arrayName[i].date
// Deletes unwanted properties
}
console.log(arrayName)
// Print output
One potential solution would be to map the array to a new array like so:
const yourArray = [
{"open":235.86, "high":247.13, "low":231.5, "close":244.1, "volume":55025735},
{"open":257.52, "high":234.53, "low":220.2, "close":274.1, "volume":23534060},
]
const mappedArray = yourArray.map(el => el.open);
// mappedArray = [235.86, 257.52]
Check out the MDN documentation for the map method, Array.prototype.map()
Note: The above example uses ECMAScript 6 arrow functions and implicit returns. It is functionally equivalent to:
const yourArray = [
{"open":235.86, "high":247.13, "low":231.5, "close":244.1, "volume":55025735},
{"open":257.52, "high":234.53, "low":220.2, "close":274.1, "volume":23534060},
]
const mappedArray = yourArray.map(function(el){
return el.open
});
You can use reduce for this scenario. Example
var temp = [{"open":235.86, "high":247.13, "low":231.5, "close":244.1, "volume":55025735, "date":"2019-05-01T21:00:00.000Z"}];
var highValArray = temp.reduce((arr, t) => {
return arr.concat(t['high']);
}, []);
You can learn more about reduce function at the MDN website.
This should work:
your_array.map((item) => {
return item.high
})
I am using firebase storage bucket to save files and save their download link to database.
All should should work well, except that the file index are mixed up after I do a foreach loop(_lodash).
getFiles(e){
this.outPutFiles = e;
_.each(this.outPutFiles, ((file) => {
const ref = this._storage.ref(file);
return ref.getDownloadURL().subscribe(url => this.img_array.push(url));
}));
}
Expected behavior should be:
this.outPutFiles = [
0:"o-t-status-files/.....Qd5bGm"
1:"o-t-status-files/.....dz2bd8"
]
And the
this.img_array = [
0:"https://firebase....%2FJ6Qx9........Qd5bGm"
1:"https://firebase....%2FJ6Qx9........dz2bd8"
]
Unfortunately sometimes the file index in the this.img_array doesn't match the file at index on the this.outPutFiles variable.
For instance, the this.img_array may become something like this...
this.img_array = [
0:"https://firebase....%2FJ6Qx9........dz2bd8"
1:"https://firebase....%2FJ6Qx9........Qd5bGm"
]
The file at index 0 moved to index 1 and the file at index 1 moved to index 0.
How can I prevent this, to make sure the files index match in both the this.outPutFiles and this.img_array arrays?
Cause ref.getDownloadURL() is async, so the outPutFiles[1].getDownloadURL() may finish prior outPutFiles[0].getDownloadURL().
Try something like this (or using rxjs operators):
Promise.all(_.map(this.outPutFiles, file => this._storage.ref(file).getDownloadURL()))
.then(img_array => console.log(img_array))
I'd like to come up with a good way to have a "suggested" order for how to sort an array in javascript.
So say my first array looks something like this:
['bob','david','steve','darrel','jim']
Now all I care about, is that the sorted results starts out in this order:
['jim','steve','david']
After that, I Want the remaining values to be presented in their original order.
So I would expect the result to be:
['jim','steve','david','bob','darrel']
I have an API that I am communicating with, and I want to present the results important to me in the list at the top. After that, I'd prefer they are just returned in their original order.
If this can be easily accomplished with a javascript framework like jQuery, I'd like to hear about that too. Thanks!
Edit for clarity:
I'd like to assume that the values provided in the array that I want to sort are not guaranteed.
So in the original example, if the provided was:
['bob','steve','darrel','jim']
And I wanted to sort it by:
['jim','steve','david']
Since 'david' isn't in the provided array, I'd like the result to exclude it.
Edit2 for more clarity:
A practical example of what I'm trying to accomplish:
The API will return something looking like:
['Load Average','Memory Usage','Disk Space']
I'd like to present the user with the most important results first, but each of these fields may not always be returned. So I'd like the most important (as determined by the user in some other code), to be displayed first if they are available.
Something like this should work:
var presetOrder = ['jim','steve','david']; // needn't be hardcoded
function sortSpecial(arr) {
var result = [],
i, j;
for (i = 0; i < presetOrder.length; i++)
while (-1 != (j = $.inArray(presetOrder[i], arr)))
result.push(arr.splice(j, 1)[0]);
return result.concat(arr);
}
var sorted = sortSpecial( ['bob','david','steve','darrel','jim'] );
I've allowed for the "special" values appearing more than once in the array being processed, and assumed that duplicates should be kept as long as they're shuffled up to the front in the order defined in presetOrder.
Note: I've used jQuery's $.inArray() rather than Array.indexOf() only because that latter isn't supported by IE until IE9 and you've tagged your question with "jQuery". You could of course use .indexOf() if you don't care about old IE, or if you use a shim.
var important_results = {
// object keys are the important results, values is their order
jim: 1,
steve: 2,
david: 3
};
// results is the orig array from the api
results.sort(function(a,b) {
// If compareFunction(a, b) is less than 0, sort a to a lower index than b.
// See https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/sort
var important_a = important_results[a],
important_b = important_results[b],
ret;
if (important_a && !important_b) {ret = -1}
else if (important_b && !important_a) {ret = 1}
else if (important_a && important_b) {ret = important_a - important_b}
else {ret = 0}; // keep original order if neither a or b is important
return(ret);
}
)
Use a sorting function that treats the previously known important results specially--sorts them to the head of the results if present in results.
items in important_results don't have to be in the results
Here's a simple test page:
<html>
<head>
<script language="javascript">
function test()
{
var items = ['bob', 'david', 'steve', 'darrel', 'jim'];
items.sort(function(a,b)
{
var map = {'jim':-3,'steve':-2,'david':-1};
return map[a] - map[b];
});
alert(items.join(','));
}
</script>
</head>
<body>
<button onclick="javascript:test()">Click Me</button>
</body>
</html>
It works in most browsers because javascript typically uses what is called a stable sort algorithm, the defining feature of which is that it preserves the original order of equivalent items. However, I know there have been exceptions. You guarantee stability by using the array index of each remaining item as it's a1/b1 value.
http://tinysort.sjeiti.com/
I think this might help. The $('#yrDiv').tsort({place:'start'}); will add your important list in the start.
You can also sort using this function the way you like.
Live demo ( jsfiddle seems to be down)
http://jsbin.com/eteniz/edit#javascript,html
var priorities=['jim','steve','david'];
var liveData=['bob','david','steve','darrel','jim'];
var output=[],temp=[];
for ( i=0; i<liveData.length; i++){
if( $.inArray( liveData[i], priorities) ==-1){
output.push( liveData[i]);
}else{
temp.push( liveData[i]);
}
}
var temp2=$.grep( priorities, function(name,i){
return $.inArray( name, temp) >-1;
});
output=$.merge( temp2, output);
there can be another way of sorting on order base, also values can be objects to work with
const inputs = ["bob", "david", "steve", "darrel", "jim"].map((val) => ({
val,
}));
const order = ["jim", "steve", "david"];
const vMap = new Map(inputs.map((v) => [v.val, v]));
const sorted = [];
order.forEach((o) => {
if (vMap.has(o)) {
sorted.push(vMap.get(o));
vMap.delete(o);
}
});
const result = sorted.concat(Array.from(vMap.values()));
const plainResult = result.map(({ val }) => val);
Have you considered using Underscore.js? It contains several utilities for manipulating lists like this.
In your case, you could:
Filter the results you want using filter() and store them in a collection.
var priorities = _.filter(['bob','david','steve','darrel','jim'],
function(pName){
if (pName == 'jim' || pName == 'steve' || pName == 'david') return true;
});
Get a copy of the other results using without()
var leftovers = _.without(['bob','david','steve','darrel','jim'], 'jim', 'steve', 'david');
Union the arrays from the previous steps using union()
var finalList = _.union(priorities, leftovers);