How to use mapbox vector tiles in a performant way?

How to use mapbox vector tiles in a performant way? - javascript

I am a bit confused about Mapbox MVT. As I understood, a tile is a little piece of map, as in a jigsaw puzzle.
Not completely sure about the working of MVT.
https://docs.mapbox.com/data/tilesets/guides/vector-tiles-introduction/#benefits-of-vector-tiles
Here, it says Vector tiles are really small, enabling global high resolution maps, fast map loads, and efficient caching.
So the thing is I am trying to get all the coordinates from db which can go up to more than 10K and currently getting the data from postgis using:
with connection.cursor() as cursor:
cursor.execute(query)
rows = cursor.fetchall()
mvt = bytes(rows[-1][-1])
return Response(
mvt, content_type="application/vnd.mapbox-vector-tile", status=200
)
Now I am wondering about the performance issues, as everytime a user will visit it will put stress on db.
And another problem I am having is when using vector tiles as a source, it calls the source url and (hitting the db) everytime I move the map.
type: 'vector',
tiles: [
'http://url/{z}/{x}/{y}.mvt'
]
Is it possible to call the source url at a specific zoom level and until then all the points remains on the map?
for eg.
Mapbox should call source url (only one time from zoom level 1-7) at zoom level 1 and draw points on to the map and when zoom level reach 7 then mapbox should call the source url(only one time from zoom level 7-22) and update the map.
Really be grateful if anyone can help.

When it comes to tiling data (vector, raster, whatever format), you will almost always want a service that has a caching strategy, especially if the tiles are being created in real time from data in a database. Calling directly into the DB for when every tile is needed should only be done during development/testing, or for simple demos. Vector tiles alone are not the solution, you need an end-to-end architecture for serving the data. Here are a couple of examples of caching strategies:
If your data is constantly changing by the second or faster, you often don't need to show the update that quickly on a map. If you have a lot of users a 15 second cache expiry on tiles can drastically reduce the number of times a tile has to be created. Azure Maps does this for their creator platform which is designed to show the near real time sensor readings in a digital twin scenario. It is capable of supporting billions of sensors updating the map every 15 seconds. If you truly need to see updates as they happen, then vector tiles likely isn't the right approach, instead consider using a stream service like SignalR, and limiting the data sent to the map based on bounding box and zoom level.
In most applications the data doesn't update that quickly, so a longer cache header can be used.
In some cases, the data changes so infrequently (or not at all) that it makes more sense to pre-generate the tiles once, host those, and serve those directly. You would still have a caching strategy so that users aren't constantly requesting the same tiles from your server when they could simply pull it from their browser cache. In this case, it is useful to put the tiles into a MBTile file which is basically a Sqlite DB with a standardized structure. The benifit of putting the tiles in an MBTile file is you only have to move one file around, instead of thousands of tiles (moving this many files creates a ton of IO read/writes that can make deploying extremely slow).
It's also important to note that you should optimize the data in your tiles. For example, if you are working with high resolution polygons, for tiles that are zoomed out, you will likely find there are a lot of coordinates that fit inside the same pixel, so when you request the polygon from the database, reduce its resolution to match the resolution of the tile. This will drastically reduce the size of the tile, and the amount of data the database is outputting.
You can find a lot of tools and articles for vector tiles here: https://github.com/mapbox/awesome-vector-tiles
There are also lots of blogs out there on how to serve vector tiles and create pipelines.

Related

Efficient/Performant way to visualise a lot of data in javascript + D3/mapbox

I am currently looking at an efficient way to visualise a lot of data in javascript. The data is geospatial and I have approximately 2 million data points.
Now I know that I cannot give that many datapoint to the browser directly otherwise it would just crash most of the time (or the response time will be very slow anyway).
I was thinking of having a javascript window communicating with a python which would do all the operations on the data and stream json data back to the javascript app.
My idea was to have the javascript window send in real time the bounding box of the map (lat and lng of north east and south west Point) so that the python script could go through all the entries before sending the json of only viewable objects.
I just did a very simple script that could do that which basically
Reads the whole CSV and store data in a list with lat, lng, and other attributes (2 or 3)
A naive implementation to check whether points are within the bounding box sent by the javascript.
Currently, going through all the datapoints takes approximately 15 seconds... Which is way too long, since I also have to then transform them into a geojson object before streaming them to my javascript application.
Now of course, I could first of all sort my points in ascending order of lat and lng so that the function checking if a point is within the javascript sent bounding box would be an order of magnitude faster. However, the processing time would still be too slow.
But even admitting that it is not, I still have the problem that at very low zoom levels, I would get too many points. Constraining the min_zoom_level is not really an option for me. So I was thinking that I should probably try and cluster data points.
My question is therefore do you think that this approach is the right one? If so, how does one compute the clusters... It seems to me that I would have to generate a lot of possible clusters (different zoom levels, different places on the map...) and I am not sure if this is an efficient and smart way to do that.
I would very much like to have your input on that, with possible adjustments or completely different solutions if you have some.
This is almost language agnostic, but I will tag as python since currently my server is running python script and I believe that python is quite efficient for large datasets.
Final note:
I know that it is possible to pre-compute tiles that I could just feed my javascript visualization but as I want to have interactive control over what is being displayed, this is not really an option for me.
Edit:
I know that, for instance, mapbox provides the clustering of data point to facilitate displaying something like a million data point.
However, I think (and this is related to an open question here
) while I can easily display clusters of points, I cannot possibly make a data-driven style for my cluster.
For instance, if we take the now famous example of ethnicity maps, if I use mapbox to cluster data points and a cluster is giving me 50 people per cluster, I cannot make the cluster the color of the most represented ethnicity in the sample of 50 people that it gathers.
Edit 2:
Also learned about supercluster, but I am quite unsure whether this tool could support multiple million data points without crashing either.

Implement progressively rendered layer in Cesium

I'm trying to implement a layer which displays raster data sent by a server.
The data sent by the server protocol does not have a built-in support in the widely-used browser (this is a jpeg2000 data).
Thus I'm decoding the data by myself and let it to Cesium to show.
What makes it a little complicated:
The server is stateful, thus both client and server should maintain a channel. The channel is associated with a single region of interest. The region may change over time, but in every time point there is only single region for which the server sends data on the channel.
I can use some channels in the session, but the server does not perform well with more than very small amount of channels.
The region of interest is of uniform resolution (thus problematic for 3D).
The server supports progressive sending of data gradually enhance the quality ("quality layers" in jpeg2000), a property which I want to use due to very low network resource available.
The decoding is heavy in terms of CPU time.
As a first stage I've implemented an ImageryProvider which just creates a channel for each tile requested by the rendering engine. It worked but created too many connections and I didn't enjoy the progressive rendering. In addition the performance was poor, a problem which was almost resolved by implementing a priority mechanism which first decoded tiles in the view area of the Cesium viewer.
Next I implemented a self-rendered raster "layer" which change the region of interest of the channel according to the view area. Then the multiple channels problem was resolved and I enjoyed progressive rendering. However I encountered the following problems:
a. The method I used to show the decoded pixels was to implement an imagery provider which shows a single Canvas with the decoded pixels. Each time the image was updated (repositioned or progressively-decoded) I had to remove the old imagery provider and replace it with a new one. I guess that's not the correct way to do such things, and it may cause some bad behavior like wrong z-ordering when replacing the old provider with a new one, etc. Some of these issues may be resovled by using primitives with Image material, but then I have to use the data URL form of images. Doing that degrades performance, because it will cause a lot of conversions from canvas into data URLs.
b. I had to write special code to understand the view area in order to send it to the server (using pickEllipsoid and similar functionality). I guess this code is a duplication of something that is done within Cesium engine. In addition I saw in some discussions that pickEllipsoid is not supported in 2D. Generally I was very happy to have a function which calcualtes the view area for me, rather than implementing that code by myself.
c. The way I implemented it raises an API issue: As opposed to the nice API of Cesium to add and remove imagery provider (addImageryProvider() method and removeLayer() ), in my implementation the user need to use only the methods I expose to him (for example a method add() which accepts the Viewer as argument).
d. In 3D mode, when the resolution is not uniform the image is not sharp in the close region. I know that's an inherent problem because of the way that my server works, just point it out.
I think what I'm really missing here is a way to implement a plugin which is more powerful than the interface of ImageryProvider: Implementing a self-rendered raster layer, which receives view area change events from the render engine and can decide when and how to refresh its tiles.
Another alternative (which is even better for me, but is less reusable by others I guess), is to expose a list of the tiles in the view area to the ImageryProvider implementation.
What is the right way to cope with this scenario?

Google App Engine NDB Query on Many Locations

I am developing a web app based on the Google App Engine.
It has some hundreds of places (name, latitude, longitude) stored in the Data Store.
My aim is to show them on google map.
Since they are many I have registered a javascript function to the idle event of the map and, when executed, it posts the map boundaries (minLat,maxLat,minLng,maxLng) to a request handler which should retrieve from the data store only the places in the specified boundaries.
The problem is that it doesn't allow me to execute more than one inequality in the query (i.e. Place.latminLat, Place.lntminLng).
How should I do that? (trying also to minimize the number of required queries)

You could divide the map into regions, make an algorithm to translate the current position into a region, and then get the places by an equality query. In this case you would need overlapping regions, allow places to be part of many of them, and ideally make regions bigger than the size of the map, in order to minimize the need for multiple queries.
That was just an outline of the idea, an actual implementation would be a little bit more complicated, but I don't have one at hand.
Another option is using geohashes, which are actually pretty cool, you can read a write up about them, along with code samples, here: Scalable, fast, accurate geo apps using Google App Engine + geohash + faultline correction

You didn't say how frequently the data points are updated, but assuming 1) they're updated infrequently and 2) there are only hundreds of points, then consider just querying them all once, and storing them sorted in memcache. Then your handler function would just fetch from memcache and filter in memory.
This wouldn't scale indefinitely but it would likely be cheaper than querying the Datastore every time, due to the way App Engine pricing works.

Alternatives to Googles Distance Matrix service?

I am working on a quote calculator that will generate a quote based on mileage between various locations (amongst other conditionals). Up until two days ago, I had planned to use Google's Distance Matrix service until I discovered:
Display of a Google Map
Use of the Distance Matrix service must relate to the display of information on a Google Map; for example, to determine origin-destination pairs that fall within a specific driving time from one another, before requesting and displaying those destinations on a map. Use of the service in an application that doesn't display a Google map is prohibited.
I had hoped to use only the services that I require:
- Distance by Road Measurement between up to three different locations
- Address Autocomplete Service usable on an input box
- Accurate, reliable service that can provide multiple different routes to create an average distance
I know there are other methods available for this, but I doubt many can be as accurate and reliable as Google, I've found it challenging to find anything comparable to Google Maps for the purposes I require.
So, unless you guys can point me to something that I can use, my only option is to use a Google Map where I don't need it, adding additional loading time and altering the UX design I had planned.
Are there any free services available for what I require (preferably with a JS API)?
On a slightly different note
If I do use a Google map, would it have to be displayed immediately, or could I hide it and add an option to 'Show On Map', and have it .slideToggle revealed?

Unfortunately for the Distance Matrix API, Google strictly says you NEED to display the map in your application:
Use of the Distance Matrix API must relate to the display of information on a Google Map; for example, to determine origin-destination pairs that fall within a specific driving time from one another, before requesting and displaying those destinations on a map. Use of the service in an application that doesn't display a Google map is prohibited.
http://developers.google.com/maps/documentation/distancematrix/#Limits
However, what I think is more useful for your need is Google Directions API. The directions API allows you to cover your requirements.
The total distance is returned in the JSON object from the request.
You can select upto as many different locations to find distances between using the Waypoints in your search request. The distances between these locations are then returned in each "leg".
You can obtain the average distance from multiple different routes to your destination by specifying the alternatives parameter in your search request to true. See: http://developers.google.com/maps/documentation/directions/#RequestParameters
Best of all, there is no requirement from Google to display the Google Map in your application when using this service.
I should also mention the drawbacks to this service, if you choose to use it.
The request time it takes to process your request will be slightly longer than if you were to use the Distance Matrix API.
You'll have a lot of unneeded data in the return object, for instance the individual "steps" of the route in the returned json object is not necessary based on your application requirements.
Given the drawbacks, I'd still highly recommend looking into the Directions API for your application.

I don't know if Google Static Maps count as a map, but it should, since it's a Map and from Google.
You could calculate the route and then show it as an image from Static Maps. No extra map loading times required. Only one image.
https://developers.google.com/maps/documentation/staticmaps/#Paths

Many developers have been able to do this with the Bing Maps REST routing service http://msdn.microsoft.com/en-us/library/ff701705.aspx. It requires a bit more development but works well. Here is an example: http://code.msdn.microsoft.com/Bing-Maps-trip-optimizer-c4e037f7

Streetmap and arcserver can solve a vehicle routing problem but it's not free. Read more here: http://www.esri.com/data/streetmap.

finding shortest paths using google maps for a large number of nodes

I'm trying to do some network analysis for a client. The provided road-network GIS layer is of bad quality; therefore, I have to resort to Google maps to provide me shortest path between 200 points, to produce time and distance matrices between each point.
is there a way i can input the layer as a set of KML points to obtain outputs of the distance and time between these points ?
if this is doable via the api, do you have any hints or suggestions on how to write such a script?
EDIT
the ideal final result would be a CSV file of the following form:
node_1, node_2, distance, travel_time
node_n, node_m, distance, travel_time

I won't write the whole script for you, but this can be done with the maps API. Open up the maps sandbox and add to the onGDirectionsLoad function:
alert(gdir.getDistance().meters);
You can find the documentation here - a getDuration() is also available. Then all you need to do is issue a new request once one finished, getting directions for each pair of start and end point.
However, note that if you're planning on getting 200*200 paths, google may decide to rate limit you at some point. Use this method at your own risk, and with a delay between requests.
Note also that google's builtin KML support doesn't seem to support giving you the list of points - this makes sense, since the client may only have those that are currently onscreen. You might need to write your own KML loader if you want to use KML as the input format. Or use a simpler format, as in this example.

Develop Reference

JavaScript is the programming language of the Web.