Proper API architecture for building AngularJS applications - javascript

Lately I have been talking with a lot of my mid-tier developers about how to structure the APIs in order to better accommodate the 2-way binding that AngularJS offers. We have been trying to decide whether or not the APIs should be very explicity with their definitions which would work better w/Angular but cause a little more work for the mid-tier or be more implicit and have extra logic in Angular to "massage" the data into a good Angular model.
Lets start with an example. Suppose we are talking about some sort of data backup service. The service allows you to backup data and retain the data for X number of years OR indefinitely. The UI has 2 elements to control this logic. There is a <select> that allows the user to select whether or not they want to delete the data "Never" or "After" X years. If "Never" is selected then we hide the years input, but if "After" is selected, then we show the years input and allow them to input a number between 1-99.
Doing this I have introduced 2 different element controls, each controlling a different property on the $scope model.
However, on the API my mid-tier guy wants to control all of this using a single property called "YearsRetention". If YearsRetention == 0 then that "implicitly" means that we want unlimited retention, but if it is set to anything > 0 then retention is set to that value.
So basically he wants to control the retention settings using this single value, which would force me to write some sort of transformation function in order to set values on the $scope to acheive the same effect in the UI. This transformation would have to happen on both incoming and outgoing data.
In the end, I want to know if the API should be defined implicitly (API sends a single value and Angular will then have to transform the data into usable view model) or explicitly (API sends all values needed to bind directly to the UI and reduces the need to transform the JSON)?

I think there are 2 bad ideas in the designs you describe.
Defining the data structures based on UI convenience. This is a bad idea because you want your API to be clear, multipurpose (supporting different clients with different UIs potentially), and long-lived (API refactoring is operationally expensive). Instead, try to accurately and concisely represent your data in the purest, most accurate, most generalized form, and leave presentation issues such as formatting, truncation, localization, units of measure, page layout, etc to the UI.
Overloading a single data field to express a concept that it doesn't naturally model by way of a "magic value". Assigning extra semantic meaning to the number zero is an example of this, and it's generally regarded as error prone and confusing and a leaky abstraction. Every client will have to encode the magic semantic that zero means forever. Of course, there's the glaring cognitive dissonance that the true meaning of zero would be "not at all". I'd model this as 2 fields, an enumeration called retentionPeriod allowing exactly 2 values: "PERMANENT" and "YEARS" and a separate field perhaps retentionValue to store the integer representing the years. If you end up losing the argument with your back end developer, I'd at least argue that the magic value should be -1 meaning forever instead of 0. (I also think null matches "not at all" more than "forever" which is why I think -1 is the least-bad of the bad magic options. There is some precedent out there for this, at least)
In your specific case I'd argue one of your UI drop-downs would control retentionPeriod and the other would control retentionValue. But my reasoning for this is not because it happens to pair up with your current UI implementation in a straightforward way (that's more of a happy coincidence), it's because it's a clearer representation of the data.
That said, in my experience this specific instance is fairly mild in it's badness. I'd be much more strongly concerned about incorrect choice of array vs object, vague or confusing naming, gigantic data structures, overly chatty APIs, etc.

Related

Confusion regarding bounded contexts and interaction between them

I'm trying to implement my first domain driven application, after going through Eric Evan's book on Domain-Driven Design. I'm a bit confused on how to go about this.
In my application, a user can purchase a service for getting them certain number of views on a video they post in Youtube, which is fulfilled by the other users of my app who watch those videos(Basically a replica of the many youtube promoter apps already available, for learning).
Say the service is represented in the app as an entity called WatchTime aggregate. The WatchTime entity contains some information like the id of user who purchased this service, the max number of views purchased, number of views already fulfilled, and points earned by someone who views the video once.
I decided to go with 3 bounded contexts, one for authentication, one for handling the watchtimes, like adding or removing them, and one for managing users and their data. Now the user has his personal info and some points that he collected while using the application.
At first I was thinking that all the user data and related actions be in the 3rd context, like adding more points to a user and or reducing his points, but then while making the model, I realized that that if the watch time purchasing service is going to be in the second one, then its going to have to communicate to the third one every time a WatchTime is purchased to tell a service there to reduce points for that purchase. It wouldn't make sense to keep them in two different ones.
So instead what I'm thinking of is have a model of the user in the 2nd bounded context, but with only points and the WatchTimes that this user purchased, so now it doesnt have to call something on the 3rd context.
My question is how to properly seperate things into contexts? Is it like based on the models, or should it be based on the functionality, and all models related to those functionality are going to be in the same context?
And another thing, how to ensure that all the objects of the same entity have the same value and properly persisted in the database? Should only one object representing a particular entity be present at a time, which will be persisted and disposed by the end of a function? Because I was thinking that if two objects representing the same entity be present at the same time, there's a possibility of both having different values or changing to different values.
If i sound like im rambling, please let me know if I have to be more clear. Thanks.
Bounded contexts basically define areas of functionality where the ubiquitous language (and thus the model) are the same. In different bounded contexts, "user" can mean different things: in a "user profile" context, you might have their email address but in the "viewing time" context, you'd just have the points granted and viewership purchased.
Re "another thing", in general you need to keep an aggregate strongly consistent and only allow an update to succeed if the update is cognizant of every prior update which succeeded, including any updates which succeeded after a read from the datastore. This is the single-writer principle.
There are a couple of ways to accomplish this. First, you can use optimistic concurrency control and store a version number with each aggregate. You then update the aggregate in the DB only if the version hasn't changed; otherwise you attempt the operation (performing all the validations etc.) against the new version of the aggregate. This requires some support in the DB for an atomic check of the version and update (e.g. a transaction).
An alternative approach (my personal preference) is to recognize that a DDD aggregate has a high level of mechanical sympathy to the actor model of computation (e.g. both are units of strong consistency). There are implementations of the actor model (e.g. Microsoft Orleans, Akka Cluster Sharding) which allow an aggregate to be represented by at most one actor at a given time (even if there is a cluster of many servers).

does anyone know how to retrain Object Detection (coco-ssd) of TFJS for object 91?

So far I seen so many discussion on this topic and using different approaches to achieve this (https://github.com/tensorflow/models/issues/1809) but I want to know if anyone managed to successfully use Tensorflowjs to achieve this.
I know some also achieved this using transfer learning but it is not same as being able to add my own new class.
The short answer: No, not yet, though technically possible, I have not seen an implementation of this in the wild.
The longer answer - why:
Given that "transfer learning" essentially means reusing the existing knowledge in a trained model to help you then classify things of a similar nature without having to redo all the prior learning there are actually 2 ways to do that:
1) This is the easier route but may not be possible for some use cases: Use one of the high level layers of the frozen model that you have access to (eg the models that are released by TF.js are frozen models I believe - the ones on GitHub). This allows you to reuse some of its lower layers (or final output) which may already be good at picking out certain features that are useful for the use case you need eg object detection in a general sense, which you can then feed into your own unfrozen layers that sit on top of that output you are sampling from (which is where the new training would happen). This is faster as you are only updating weights etc for the new layers you have added, however because the original model is frozen, it means you would have to replicate in TF.js the layers you were bypassing to ensure you have the same resulting model architecture for COCO-SSD in this case if you wanted the architecture. This may not be trivial to do.
2) Retraining the original model - can think of tuning the original model - but this is only possible if you have access to the original unfrozen model and the data used to train that. This would take longer as you are essentially retraining the whole model on all the data + your new data. If you do not have the original unfrozen model, then the only way to do this would be to implement the said model in TF.js yourself using the layers / ops APIs as needed and then use that to train on your own data.
What?!
So an easier to visualize example of this is if we consider PoseNet - the one that estimates where human joints/skeletons are.
Now in this Posenet example imagine you wanted to make a new ML model that could detect when a person is in a certain position - eg waving a hand.
In this example you could use method 1 to simply take the output of existing posenet predictions for all the joints it has detected and feed that into a new layer - something simple like a multi layered perceptron - that could then very quickly learn from example data when a hand was in a waving position for example. In this case we are simply adding to the existing architecture to achieve a new result - gesture prediction vs the raw x-y point predictions for the joints themselves.
Now consider case 2 for PoseNet - you want to be able to recognise a new part of the body that it currently does not. For that to happen you would need to retrain the original model so that it could learn to predict that new body part as part of its output.
This is much harder as you would need to retrain the base model to do this, which means you need to have access to the unfrozen model to do that. If you didn't have access to the unfrozen model then you would have no choice but attempt to recreate PoseNet architecture entirely yourself and then train that with your own data. As you can see this 2nd use case is much harder and more involved to do.

Call SQL "function" (stored procedure?) every time a database column is selected

I am running MySQL 5.6. I have a number of various "name" columns in the database (in various tables). These get imported every year by each customer as a CSV data dump. There are a number of places that these names are displayed throughout this website. The issue is, the names have almost no formatting (and to this point, no sanitization existed upon importation):
Phil Eaton, PHIL EATON, Phil EATON, etc.
Thus, the website sometimes look like a mess when these names are involved. There are a number of ways that I can think to do this, but none that are particularly appealing.
First, I can have a filter in Javascript. However, as I said, these names exist in a number of places throughout this (large) site. I may end up missing a page. The names do not exist already within nice "name"-classed divs/spans, etc.
Second, I could filter in PHP (the backend). This seems about as effective as doing it in Javascript. I could do it on the API, but there was still not a central method for pulling names from the database. So I could still miss an API call anyway.
Finally, the obvious "best" way is to sanitize the existing data in place for each name column. Then at the same time, immediately start sanitizing all names that get imported each time we add a customer. The issue with the first part of this is that there are hundreds of millions of rows of names in the database. Updating these could take a long amount of time and be disruptive to the clients' daily routines.
So, the most appealing way to correct this in the short-term is to invoke a function every time a column is selected. In this way I could "decorate" every name column with a formatting function so the names will appear uniform on the frontend. So ultimately, my question is: is it possible to invoke a specific function in SQL to format each row every time a specific column is selected? In other words, maybe can I call a stored procedure every time a column is selected? (Point being, I'm trying to keep the formatting in SQL to avoid the propagation of usage.)
In MySQL you can't trigger something on SELECT, but I have an idea (it's only an idea, now I don't have time to try it, sorry).
You probably can create a VIEW on this table, with the same structure, but with the stored procedure applied to the names fields, and select from this view in your PHP.
But it has two backdraw:
You have to modify all your SELECT statements in your PHPs.
The server will always call that procedure. Maybe you can store the formatted values, then check for it (cache them).
On the other hand I agree with HLGEM, I also suggest to format the data on import, because it's a very bad practice to import something you don't check into a DB (SQL Injections?). The batch tasking is also a good idea to clean up the mess.
I presume names are called frequently so invoking a sanitization function every time they are called could severely slow down your system. Further, you can't just do a simple setting to get this, you would have to change every buit of SQL code that is run that includes names.
Personally how I would handle it is to fix the imports so they put in a sanitized version for new names. It is a bad idea to directly put any data into a database without some sort of staging and clean up.
Then I would tackle the old names and fix them in batches in a nightly run that is scheduled when the fewest people are using the system. You would have to do some testing on dev to determine how big a batch you could run without interfering with other things the database is doing. The alrger the batch the sooner you would get through all the names, but even though this will take time, it is the surest method of getting the data cleaned up and over time the data will appear better to the users. If the design of your datbase allows you to identify which are the more active names (such as an is_active flag for a customer or am order in the last year), I would prioritize the update by that. Alternatively, you could clean up one client at a time starting with whichever one has noticed the problem and is driving this change.
Other answers before give some possible solutions. But, the short answer for the specific option you are asking is : No. There is no such thing called a
"Select Statement Trigger", that too for a single column, although triggers come close for this kind of expectation, but only for Insert, Update and Delete operations.

Saving and updating the state of an AJAX based graph in Rails

I have a Google Chart's ColumnChart in a Rails project. This is generated and populated in JavaScript, by calling a Rails controller action which renders JSON.
The chart displays a month's worth of information for a customer.
Above the chart I have next and previous arrows which allow a customer to change the month displayed on the chart. These don't have any functionality as it stands.
My question is, what is the best way to save the state of the chart, in terms of it's current month for a customer viewing the chart.
Here is how the I was thinking of doing the workflow:
One of the arrows is selected.
This event is captured in JavaScript.
Another request to the Rails action rendering JSON is performed with an additional GET parameter
passed, based on an data attribute of the arrow button (Either + or - ).
The chart is re-rendered using the new JSON response.
Would the logic around incrementing or decrementing the graphs current date be performed on the server side? With the chart's date being stored in a session array defaulting to the current date on first load?
On the other hand would it make sense to save the chart state on the client side within the JavaScript code or in cookie, then manipulate the date before it's sent to the Rails controller?
I've been developing with Rails for about 6 months and feel comfortable with it, but have only just recently started developing with JavaScript, using AJAX. My experience tying JS code together with Rails is some what limited at this point, so looking for some advice/best practices about how to approach this.
Any advice is much appreciated.
I'm going to go through a couple of options, some good, some bad.
First, what you definitely don't want to do is maintain any notion of what month you are in in cookies or any other form of persistent server-side storage. Certainly sometimes server state is necessary, but it shouldn't be used when their are easy alternatives. Part of REST (which Rails is largely built around) is trying to represent data in pure attributes rather than letting it's state be spread around like that.
From here, most solutions are probably acceptable, and opinion plays a greater role. One thing you could do is calculate a month from the +/- sign using the current month and send that to the server, which will return the information for the month requested.
I'm not a huge fan of this though, as you have to write javascript that's capable of creating valid date ranges, and most of this functionality will probably be on the server already. Just passing a +/- and the current month to the server will work as well, you'll just have to do a bit of additional routing and logic to resolve the sign on the server to a different month.
While either of these would work, my preferred solution would instead have the initial request for the month generate valid representations of the neighbouring months, and returning this to the client. Then, when you update the graph with the requested data, you also replace the forward/backward links on the graph with the ones provided by the server. This provides a nice fusion of the benefits of the prior two solutions - no additional routing on the server, and no substantive addition to the client-side code. Also, you have the added benefit of being able to grey out transitions to months where no data was collected from the client (i.e. before they were a customer and the future). Without this, you'd have to create separate logic to handle client requests for information that doesn't exist, which is extra work for you and more confusion for the customer.

What is the best way to filter spam with JavaScript?

I have recently been inspired to write spam filters in JavaScript, Greasemonkey-style, for several websites I use that are prone to spam (especially in comments). When considering my options about how to go about this, I realize I have several options, each with pros/cons. My goal for this question is to expand on this list I have created, and hopefully determine the best way of client-side spam filtering with JavaScript.
As for what makes a spam filter the "best", I would say these are the criteria:
Most accurate
Least vulnerable to attacks
Fastest
Most transparent
Also, please note that I am trying to filter content that already exists on websites that aren't mine, using Greasemonkey Userscripts. In other words, I can't prevent spam; I can only filter it.
Here is my attempt, so far, to compile a list of the various methods along with their shortcomings and benefits:
Rule-based filters:
What it does: "Grades" a message by assigning a point value to different criteria (i.e. all uppercase, all non-alphanumeric, etc.) Depending on the score, the message is discarded or kept.
Benefits:
Easy to implement
Mostly transparent
Shortcomings:
Transparent- it's usually easy to reverse engineer the code to discover the rules, and thereby craft messages which won't be picked up
Hard to balance point values (false positives)
Can be slow; multiple rules have to be executed on each message, a lot of times using regular expressions
In a client-side environment, server interaction or user interaction is required to update the rules
Bayesian filtering:
What it does: Analyzes word frequency (or trigram frequency) and compares it against the data it has been trained with.
Benefits:
No need to craft rules
Fast (relatively)
Tougher to reverse engineer
Shortcomings:
Requires training to be effective
Trained data must still be accessible to JavaScript; usually in the form of human-readable JSON, XML, or flat file
Data set can get pretty large
Poorly designed filters are easy to confuse with a good helping of common words to lower the spamacity rating
Words that haven't been seen before can't be accurately classified; sometimes resulting in incorrect classification of entire message
In a client-side environment, server interaction or user interaction is required to update the rules
Bayesian filtering- server-side:
What it does: Applies Bayesian filtering server side by submitting each message to a remote server for analysis.
Benefits:
All the benefits of regular Bayesian filtering
Training data is not revealed to users/reverse engineers
Shortcomings:
Heavy traffic
Still vulnerable to uncommon words
Still vulnerable to adding common words to decrease spamacity
The service itself may be abused
To train the classifier, it may be desirable to allow users to submit spam samples for training. Attackers may abuse this service
Blacklisting:
What it does: Applies a set of criteria to a message or some attribute of it. If one or more (or a specific number of) criteria match, the message is rejected. A lot like rule-based filtering, so see its description for details.
CAPTCHAs, and the like:
Not feasible for this type of application. I am trying to apply these methods to sites that already exist. Greasemonkey will be used to do this; I can't start requiring CAPTCHAs in places that they weren't before someone installed my script.
Can anyone help me fill in the blanks? Thank you,
There is no "best" way, especially for all users or all situations.
Keep it simple:
Have the GM script initially hide all comments that contain links and maybe universally bad words (F*ck, Presbyterian, etc.). ;)
Then the script contacts your server and lets the server judge each comment by X criteria (more on that, below).
Show or hide comments based on the server response. In the event of a timeout, show or reveal based on a user preference setting ("What to do when the filter server is down? (show/hide comments with links) ).
That's it for the GM script; the rest is handled by the server.
As for the actual server/filtering criteria...
Most important is do not dare to assume that you can guess what a user will want filtered! This will vary wildly from person to person, or even mood to mood.
Setup the server to use a combination of bad words, bad link destinations (.ru and .cn domains, for example) and public spam-filtering services.
The most important thing is to offer users some way to choose and ideally adjust what is applied, for them.

Categories

Resources