I have a lot of operations that involve bulk inserting self generated polygons into a mongodb collection that contains a 2dsphere index. Sometimes these polygons have self intersections or crossing edges result in the bulk insert failing. As far as I know, these need to be fixed before I try again to insert them into the collection (if we can ignore these errors that might be a suitable solution too).
Is there a way to identify these errors before I attempt to bulk insert? So for example, I generate the polygon geometry, pass the newly created polygon through a function that would return the same error as if I had uploaded it to the database.
Related
We have a MongoDB database in a development environment. There are a lot of collections that contain names of people. What we want to do is the following:
mask the names in each collection, the fields need to be updated directly in the database, cannot run them through some external pipeline
once masked, it is ok if we are unable to retrieve the original names (so one-way masking)
every unique name should result in the same mask
the masking script can be run on the mongodb cli or a MongoDB gui like Studio3T
I was thinking of maybe using MD5 or SHA, but I am not sure if either is available to use directly in mongo operations like update or even in javascript without external libraries.
Also, since MD5 always produces the same hash, if someone were to get access to the document, since we will not be masking the field name, it would be fairly easy to feed typical names into the algorithm until the hash matches to figure out the name, but I think we may be able to live with this.
An alternative I was thinking of was, to loop through the unique names we have, and create a map from names to UUIDs. Then, go through each collection and use this map to update the names with the UUIDs. The problem with this is that we'll need to keep this mapping dictionary for when we receive additional documents for an existing person.
following problem:
I've got an Array with thousands of entries (people with id and geolocation (lat,long)). The aim is to connect each person with another person within a given radius (e.g. 20km). I'm looking for an efficient way to do so. I've already tried Geohashes but since every entry of the array needs to be compared with every other entry, the execution time is horrible when scaling. I would appreciate any great hint! Thanks a lot. I'm using a NodeJS server for the matching algorithm.
I don't understand why you need to compare the geohash of every entry with every other entry?
Start with an empty Map where each key is a geohash and each value is a Set of ids.
For each entry, compute the geohash and add the entry id to the related Set.
Now for each entry you can pick any other id from the set, if there is no other you'll need to decrease the precision. (Implementation will be bit different if an entry cannot be matched with multiple others.)
I can think of several improvements here:
If you calculate a distance from Ai point to Aj point then you already have a distance from Aj to Ai. It means you have N*(N-1)/2 comparisons instead of N*(N-1).
Because you have a constant radius You can take radius * radius (i.e radius^2) and not to use a square root to compare a calculated distance with a given radius
If all calculations are synchronous you should use something like worker threads because Node.Js is single-threaded by design.
Maybe you can take a glance at JSTS package because it may save you some time with geospatial calculations.
Another approach is to load all data into DB with a geospatial support, add needed geospatial indexes and use specialized geospatial functions of this DB to calculate a distance and group people by it
Is there a way to get a document's position relative to its collection based on one of its properties in Mongodb?
The use case is that I am building a leaderboard, but want to provide an easy way for the user to know their rank without having to scroll through all the entries.
Is there some mongoose schema magic or mongodb query that will help me easily get the position of a user based on their score?
Currently, my solution is to create an index on the score, query the entire collection and find the index of the user in the result.
Strictly speaking, this is not possible, as MongoDB does not store its documents in any certain order. Your current approach should be fine.
You're not the first one with this question: there's a feature request for this exact thing with a high priority.
https://jira.mongodb.org/browse/SERVER-4588
I am developing a web app based on the Google App Engine.
It has some hundreds of places (name, latitude, longitude) stored in the Data Store.
My aim is to show them on google map.
Since they are many I have registered a javascript function to the idle event of the map and, when executed, it posts the map boundaries (minLat,maxLat,minLng,maxLng) to a request handler which should retrieve from the data store only the places in the specified boundaries.
The problem is that it doesn't allow me to execute more than one inequality in the query (i.e. Place.latminLat, Place.lntminLng).
How should I do that? (trying also to minimize the number of required queries)
You could divide the map into regions, make an algorithm to translate the current position into a region, and then get the places by an equality query. In this case you would need overlapping regions, allow places to be part of many of them, and ideally make regions bigger than the size of the map, in order to minimize the need for multiple queries.
That was just an outline of the idea, an actual implementation would be a little bit more complicated, but I don't have one at hand.
Another option is using geohashes, which are actually pretty cool, you can read a write up about them, along with code samples, here: Scalable, fast, accurate geo apps using Google App Engine + geohash + faultline correction
You didn't say how frequently the data points are updated, but assuming 1) they're updated infrequently and 2) there are only hundreds of points, then consider just querying them all once, and storing them sorted in memcache. Then your handler function would just fetch from memcache and filter in memory.
This wouldn't scale indefinitely but it would likely be cheaper than querying the Datastore every time, due to the way App Engine pricing works.
We are creating a tree structure with the help of a custom tool develop in JavaScript/Jquery.
It works great now we have to create that tree with help of a feed file (CSV file).
I am working on creating a POC to understand the behavior of JS file for 25k nodes.
The problem is how do I Insert such volume of data in my Database to check the behavior in browser.
Let me brief you about our approach for inserting the tree in the DB . We create the Left right value using the
NSM model. then insert it in two table one with a collection of node names. Other with left right values and some other
Attributes. So I need to Insert such volume of data at least ( 10K nodes) with left right values of it.
We supply a json object for rendering tree on client side then recursively calling the function to redraw the structure.
Question is not entirely clear, but whenever I need to insert a large amount of data into sql server I use BCP, especially since your data is in CSV format, it should be easy:
http://msdn.microsoft.com/en-us/library/ms162802.aspx