Scalable international geocoding solutions

Scalable international geocoding solutions - javascript

We want out website to support international geocoding. We are wary of using APIs that throttle and/or cap requests, which could leave us with our pants down as the service gains volume. How do modern websites such as Facebook implement geocoding? Are there tools to implement accurate in-house scalable geocoding solutions for the entire world?

I don't know about APIs for international geoconding. However, maybe Cassandra http://cassandra.apache.org/ can add something of value for you. Although it is a storage-centric solution, it surely can scale.
Twitter is using it for storing geolocation and places of interest data http://engineering.twitter.com/2010/07/cassandra-at-twitter-today.html as well as https://simplegeo.com/

I know that this is an older question. You could always use a service, then cache the relevant result in a local database... simply look local first before falling back to the service. After a while, most of your requests will be hits against your local database, with very few fallbacks.
For a U.S. based project, we started with a seeded U.S. zipcode database, and using Bing as a fallback, never once hit the 5,000/day (iirc) limit. YMMV with this solution, but it isn't an unreasonable one.

Related

Cloud storage provider with REST API available in China

I work on a web application which uses cloud storage as the primary save mechanism for users. We don't have our own cloud storage service, instead we leverage the REST APIs for Google Drive, Dropbox, etc.
Recently we have noticed an increasingly number of users in China, unfortunately none of the cloud services we currently integrate with are available in China. The alternative forms of saving in the app have limitations which cannot be resolved, and developing our own cloud storage system that functions in China would be a very large engineering undertaking.
As such I have been trying to find a service which we could add that works in China, but have had very little success. Nearly all of the sites are in Chinese, and even with Google Translate I haven't been able to find any company that claims to offer these services. Most have some form of Cloud storage for users, but I haven't been able to find a REST API that would allow us to integrate with them. Any leads someone could provide would be much appreciated.
Requirements:
HTTP API that can be used from the browser similar to Dropbox HTTP documentation
Can be accessed in the UK ( for development ) and China
Preferably some documentation in English...
Ideally free and commonly used in China

I have implemented client-side code to work with Chinese storage cloud services (REST API). However this already dates back quite some time. Basically, there is often no English documentation and sadly you have to rely on translation. Also I suggest using a VPN (if that helps) because accessing these services outside from China is often terrible slow. This is also valid for most of the developer docs.
What direction I can point to is Baidu PCS (sadly all links are 404 so take a look here) which still exist. But as the biggest player this is the first thing I would try. There are probably other providers but the question is how long they will exist because others I have implemented are already gone (e.g. Kanbox).
This is just my experience and it might not fully apply today but maybe it helps.
Update 1: What I also found in a quick search is Weiyun which has a reverse engineered API but yeah that's far from reliable and official supported, but worth a look.
Update 2: Added alternative link for Baidu PCS documentation.

API, back-end and front-end as all three separate components

I tried to find something on the internet but could not find anything similar. So I'm asking it here:
SITUATION: I have a big API which does some heavy calculations and has a lot of functionality. There are some clients using this API and has implemented it in their software. Now I want to write some front-end for that API so some users could manage their workflow more easily.
CONSIDERED SOLUTION: I am considering of making a separate back-end application which would use an API and serve for the front end (look at the picture attached). The backend would do authorization / caching / data-adapting operations.
QUESTION: But I have never ever crossed such app design with three layers API-BE-FE. So is it worth making things this way? Are there any significant drawbacks? Is it safe to put some oauth authorisation in the back-end side, not api itself? Like what are your thoughts about it?

I agree with your design. You have a specific API which is meant to serve specific endpoints. This way you are separating your concerns, as you can add to your BE things that aren't related to the API itself, but are related to the FE.
Also, many APIs are using credentials and keys so you can implement a similar functionality.

Your considered solution on architecture looks good.
The most biggest advantage to implement a back-end between front-end and API is, it can provide good separation of concerns. It usually happens around me that front-end engineers ask API engineers every time when they need new endpoints. It looks just cooperation, but sometimes goes too much. This kind of conversation has potential to result in making too many endpoints in API which shouldn't have had. I am not really sure what the architecture policy of API team in your company is, but just to allow API to be growing big for front-end is not good. The more functionalities the API has now, the worse it will easily be.
In your plan, you are trying to implement back-end to access API for front-end. It was similar to the architecture of BFF (Back-end For Front-end) described by Sam Newman (http://samnewman.io/patterns/architectural/bff/). With this concept, you can implement a back-end as a kind of a gateway which handles front-end specific requests to API. Back-end can even buffer the potential influence to API caused by change in front-end if needed. Everything can be kept well separated.
In BFF, I don't think that back-end plays a role to provide application-related functionalities such as authorization, caching, and data-adapting operations, but this depends on you. You can implement new APIs to handle those functionalities and have back-end just be a gateway which ties them up. It would also work just to put those things into back-end as long as it is not too fat.
Drawback?
The possible drawback, I suppose, is maintainability of scaling. This totally depends on the infrastructure team or members you work with, but on production, API and backend will run on each different server or stack, so you might need to take care of scaling consistency among them under the large amount of traffic to your application. However, this independency could also be advantageous in monitoring hardware resources. You'd better to find a sweet spot.

Can a Google Analytics result be passed to a JavaScript function while the user is still on the page?

On my website, users enter some personal information, including ZIP code. This information will be passed to a function that will determine the display of the next page.
The problem is that the function utilizes an underlying statistical model, for which zip codes have too many possible values (~43,000) to be useful. I want to map zip codes to something broader, like designated market area (DMA has around 200 possible values).
But using Google Analytics and BigQuery, I already have the user's DMA before they even enter their ZIP code. Is there a way to access that information while they are still on the page so I can input it to the function?

In case you are wondering if you can use Google Analytics Information in realtime (not quite clear from the question), that will not work - GA does not work in realtime; data processing time is announced as 4 hours for the "premium" version to 24 hours to the standard version, and even if it's often faster you probably do not want to build you business on an undocumented feature that might or might not work as expected.
Also API limits make realtime data retrieval unfeasible even for smaller sites.
If however you have a stash of precomputed data that can be linked to the current user via an identifier (clientId or similar) it would probably be best to export this to external storage as suggested by Willian Fuks.
Since you mentioned personal data, keep in mind that this must not be stored within Google Analytics as per Google's TOS.

(not quite an answer, more like some thoughts of mine)
I don't think that running queries in BQ for each user you find in production is a good approach.
Costs will increase considerably, performance will not be satisfactory by any means in this scenario and you might start hitting quotas limits for jobs against a single table.
One possibility that might work is having your back-end use some google analytics client for retrieving data from G.A. Still, you should check if the quotas are appropriate for you.
Another possibility (I suspect this might be the best option) that you may consider for your scenario is using Google Datastore. It might suit your needs quite well; you could have some table from BigQuery being exported to Datastore and have your back-end system query it directly for the user DMA.

Collaborators chat with Google Drive / Realtime API

I am looking to extend a web application using the new real time API in order to support collaboration (javascript). For that purpose I would also like to include a chat which will be available to users collaborating on same document. After extensive search I cannot find parts of the Drive API that can be used for this. Furthermore none of the open-source examples provided by google implement chat functionality.
Is there existing services/code that I can use to intergrade
chat into my application or I would need to implement it?
As mentioned before, the chat should be available to those collaborating on same document.
p.s I do not require any special features, just a simple chat as the one found in google docs etc.

Right now I don't know of any out-of-the-box solution to this problem. Some people have implemented chat in a realtime document by just placing the chat messages in the realtime data model.
This works fine so long as you don't also want to use undo/redo feature. If you are using undo/redo, then people would end up undoing the chat messages.
We are interested in adding some better support for this eventually, but no promises on anything in the short term.

You will need to build your own chat system, as Google just killed off XMPP support on the Chat API (which largely kills off the use of the API). The new Hangouts API does not provide access to Chat ( though some additional methods may come in handy: https://developers.google.com/+/hangouts/ for the dev API).
You are therefore left to your own devices. Fear not, however, writing a chat system is pretty easy. I am about to release a (mostly free) service to do so, so if you want to not have to write the code for it, I can keep you posted.
If, however, you'd prefer to build the code for it, you will most likely want to look into either socket.io or postal.js. both provide the same thing: a pub/sub model. From there, you will need to implement a choice of either:
Long polling: supported by all browsers but a bit clunky
Websockets: not supported by IE8 and below
This will serve as your data transfer.
Two other possible options are paid services: you can retrofit RabbitMQ to do what you want to do (this, however, will seem clunky). You can also retrofit the Meteord daemon, which does what you want natively, but has an outdated JS library.
The keyword of all this is pub/sub, though.

Real time collaborative editing - how does it work?

I'm writing an application in which I'd like to have near real time collaborative editing features for documents (Very similar to Google Documents style editing).
I'm aware of how to keep track of cursor position, that's simple. Just poll the server ever half second or second with the current user id, filename, line number and row number which can be stored in a database, and the return value of this polling request is the position of other user's cursors.
What I don't know how to do is update the document in such a way that it won't throw your cursor off and force a full reload as that would be far to slow for my purposes.
This really only has to work in Google Chrome, preferably Firefox as well. I don't need to support any other browser.

The algorithm used behind the scenes for merging collaborative edits from multiple peers is called operational transformation. It's not trivial to implement though.
See also this question for useful links.

Real time collaborative editing requires several things to be effective. Most of the other answers here focus on only one aspect of the problem; namely distributed state (aka shared-mutable-state). Operational Transformation (OT), Conflict-Free Replicated Data Types (CRDT), Differential Synchronization, and other related technologies are all approaches to achieving near-real-time distributed state. Most focus on eventual consistency, which allow temporary divergences of each of the participants state, but guarantee that each participants state will eventually converge when editing stops. Other answers have mentioned several implementations of these technologies.
However, once you have shared mutable state, you need several other features to provide a reasonable user experience. Examples of these additional concepts include:
Identity: Who the people you are collaborating with are.
Presence: Who is currently "here" editing with you now.
Communication: Chat, audio, video, etc., that allow users to coordinate actions
Collaborative Cueing: Features that give indications as to what the other participants are doing and/or are about to do.
Shared cursors and selections are examples of Collaborative Cueing (a.k.a Collaboration Awareness). They help users understand the intentions and likely next actions of the other participants. The original poster was partly asking about the interplay between shared mutable state and collaborative cueing. This is important because the location of a cursor or selection in a document is typically described via locations within the document. The issue is that the location of a cursor (for example) is dependent on the context of the document. When I say my cursor is at index 37, that means character 37 in the document I am looking at. The document you may have right now may be different than mine, due to your edits or those of other users, and therefore index 37 in your document may not be correct.
So the mechanism you use to distribute cursor locations must be somehow integrated into or at least aware of the mechanism of the system that provides concurrency control over the shared mutable state. One of the challenges today is that while there are many OT / CRDT, bidirectional messaging, chat, and other libraries out there, they are isolated solutions that are not integrated. This makes it hard to build an end user system that provides a good user experience, and often results in technical challenges left to the developer to figure out.
Ultimately, to implement an effective real time collaborative editing system, you need to consider all of these aspects; and we haven't even discussed history, authorization, application level conflict resolution, and many other facets. You must build or find technologies that support each of these concepts in a way that make sense for your use case. Then you must integrate them.
The good news is that applications that support collaborative editing are becoming much more popular. Technologies that support building them are maturing and new ones are becoming available every month. Firebase was one of the first solutions that tried to wrap in many of these concepts into an easy to use API. A new-comer Convergence (full disclosure, I am a founder of Convergence Labs), provides an all-in-one API that supports the majority of these collaborative editing facets and can significantly reduce the time, cost, and complexity of building real time collaborative editing apps.

You don't need xmpp or wave for this necessarily. Most of the work on an opensource implementation called infinote already have been done with jinfinote ( https://github.com/sveith/jinfinote). Jinfinote was recently also ported to python ( https://github.com/phrearch/py-infinote) to handle concurrency and document state centrally. I currently use both within the hwios project ( https://github.com/phrearch/hwios), which relies on websockets and json transport. You don't want really want to use polling for these kind of applications. Also xmpp seems to complicate things unnecessarily imo.

After coming upon this question and doing a more careful search, I think the best standalone application to check out would be Etherpad, which runs as a JS browser app and using Node.js on the server side. The technology behind this is known as operational transformation.
Etherpad was originally a pretty heavyweight application that was bought by Google and incorporated into Google Wave, which failed. The code was released as open source and the technology was rewritten in Javascript for Etherpad Lite, now renamed just "Etherpad". Some of the Etherpad technology was probably also incorporated into Google Docs.
Since Etherpad, there have been various versions to this technology, notably some Javascript libraries that allow for integrating this directly into your web app:
ShareJS
ot.js
I am the maintainer of the meteor-sharejs package for adding realtime editors directly to a Meteor app, which IMHO is the best of both worlds :)

As Gintautas pointed out, this is done by Operational Transformation. As I understand it, the bulk of the research and development on this feature was done as part of the now-defunct Google Wave project, and is known as the Wave Protocol. Fortunately, Google Wave is open-sourced, so you can get some good code samples at http://code.google.com/p/wave-protocol/

The Google Docs team did a little bit of a case study around how the real time collaboration worked, but I can't find the blog entry.
There is some decent stuff on the wikipedia page, though:
http://en.wikipedia.org/wiki/Collaborative_real-time_editor

I've recently published a repository with a working example of what seems you're trying to achieve:
https://quill-sharedb-cursors.herokuapp.com
It's based off ShareDB (OT) working as the backend and Quill rich text editor on the frontend.
Basically just wires all these things with some more code to draw the cursors. The code should be fairly simple to understand and to copy over to any specific solution.
Hope it helps with the endeavor.

Develop Reference

JavaScript is the programming language of the Web.