I have a ton of associated data revolving around a school, students, teachers, classes, locations, etc etc
I am faced with a challenge put fourth by my client; they want to have reports on everything. This means they want the ability to cross reference data points every which way and i think i'm just short of writing a pretty query builder. :/
This stack question is aimed at soliciting opinions on how to structure a reporting interface beautifully.
Any suggestions, references, examples, jQ plugins etc would be amazing.
Thank you!
I find the Trac's query builder rather acceptable for what it is meant to do.
But most probably your clients don't want everything, they are just too lazy to think about what they want now. You could help them decide by analyzing the use cases together, and come up at least with a few kinds of queries with just a few parts customizable -- in the worst case -- or just a few canned queries they really need -- in the best.
You should probably schedule a meeting with your client to determine what they need to do. This does not mean having them speculate about how great it would be if your software could do everything, was ultra-flexible yet totally easy to use, etc... but sit down and find out what they are doing right now. I'm saying this because that "oh, I'd like to be able to cross-reference everything with everything else!" sounds a bit too familiar, and might end in an ugly case of inner-platform effect.
I've found that rapid paper prototyping with the client is a great way to explore possible ideas, as it shifts their attention away from "can you make this button yellow?" issues to The Big Picture, to let them make up their minds what they actually need. Plus, it's ridiculously inexpensive to do.
Apart from that, for inspiration, there are UI pattern languages that address handling potentially large amounts of interconnected data. What's great about these is that you will often be able to use these patterns to communicate ideas to your client, since a well-structured pattern language will guide a non-expert through domain-relevant design decisions in increasing detail.
First, I can only support the other voices: work out with the clients what they actually need. A good argument is "I can do, but it will cost you X thousand dollars, every user will need Y hours of training, and you'll need a $100.000K/year developer to maintain it."
(Unfortunately, most clients at that point prefer to pick the guy who says "yes, can do cheaper!")
Only second, and only if the client says "yes we do need everything":
What works well is a list/grid view progressive filtering. Instead of buildign the SQL query, then running it, let the user directly work with the results: e.g. right clicking a cell, and selecting "limit to this value" could add a WHERE colN = <constant> constraint.
You can generate suggestions for columns from SELECT DISTINCT calls - if it returns less than, say, 20 values, you can offer checkboxes for a OR combination of possible values.
It would be interesting to discuss en elegant UI for the sea of remaining problems: OR'ed conditions across multiple columns, ordering by more than one column, grouping, ...
Related
I’m a music producer/composer who will be submitting works to new music libraries. In some cases, I’d like to use previous projects as a starting point. So while the result will be new, unique compositions, I want to avoid a scenario where an algorithm might mistake a new song (or song segment) for a previous work.
I’d like to develop some rules of thumb to keep in mind to ensure this doesn’t happen. Specifically, to understand more about how music identifying algorithms work and what combination of parameters need to be different - and to what degrees they need to be different - so as to avoid creating false positive identifications against my other works.
For example:
Imagine “song a” is part of “library a”. Then I create “song b” for “library b”. The arrangement is similar, same instruments are used, same tempo, same key, and mix is essentially the same. But the chord progression and melody are different, though a similar vibe. Could that trigger a false positive?
Or a scenario like above where maybe the instrumentation is similar, but also using some alternate voices (Like an alternate synth patch for the baseline, and similar but different percusssion samples). New key, and a speed increase of 5 bpm. Is that enough to differentiate?
Or imagine a scenario where the bulk of the track is significantly different for all parameters, including a new tempo and key, except there is a 20 second break in the middle that resembles a previous work: an ambient tonal bed with light percussion. The same tonal bed is used, but in the new key and tempo, and the percussion is close to the same. Then a user uses only those 20 seconds in a video. How different would those 20 seconds need to be from the original, and across what parameters, to avoid a false positive?
These examples are just thought experiments to try and understand how it all works. I imagine any new compositions I make should easily be adequately different from previous compositions, and the cumulative differences would easily extend beyond tenants listed in above scenarios.
But given the fact that there are some parameters that could be very similar…(even just from a mix perspective and instruments used), I would like to develop a deeper understanding of what gets analyzed. And consequently, what sort of differences I should ensure remain constant - because it seems to me even 20 seconds of enough similarity could trigger a potential issue.
Thanks!
Ps:
Note I welcome any insight offered, and am certainly receptive to the answer being couched in coding language…this is stack exchange after all, and it could be pretty interesting. But at the end of the day, I’m not a coder (though i am coding curious), and need to translate any clarity offered into practical considerations that could be employed from a music production POV. Which is to say, if it’s easy enough to include some language/concepts with that in mind, I’d be very grateful. Parameters like: tempo, key, chord progressions, rhythm elements, frequency considerations, sounds used, overall mix, etc etc. Thanks again!
attempting to actually answer the question, despite the discussion in the comments, I happen to know of the existence of this video by computerphile. at least some of the music matching algorithms out in the wild must be based on that.
P.S. linked is How Shazam Works (Probably!) featuring David Domminney Fowler. I barely remember the details of the video, except its existence, which is why the answer is so bad. edits are welcome.
I'm working about audio but I'm a newbie in this area. I would like to matching sound from microphone to my source audio(just only 1 sound) like Coke Ads from Shazam. Example Video (0.45 minute) However, I want to make it on website by JavaScript. Thank you.
Building something similar to the backend of Shazam is not an easy task. We need to:
Acquire audio from the user's microphone (easy)
Compare it to the source and identify a match (hmm... how do... )
How can we perform each step?
Aquire Audio
This one is a definite no biggy. We can use the Web Audio API for this. You can google around for good tutorials on how to use it. This link provides some good fundametal knowledge that you may want to understand when using it.
Compare Samples to Audio Source File
Clearly this piece is going to be an algorithmic challenge in a project like this. There are probably various ways to approach this part, and not enough time to describe them all here, but one feasible technique (which happens to be what Shazam actually uses), and which is also described in greater detail here, is to create and compare against a sort of fingerprint for smaller pieces of your source material, which you can generate using FFT analysis.
This works as follows:
Look at small sections of a sample no more than a few seconds long (note that this is done using a sliding window, not discrete partitioning) at a time
Calculate the Fourier Transform of the audio selection. This decomposes our selection into many signals of different frequencies. We can analyze the frequency domain of our sample to draw useful conclusions about what we are hearing.
Create a fingerprint for the selection by identifying critical values in the FFT, such as peak frequencies or magnitudes
If you want to be able to match multiple samples like Shazam does, you should maintain a dictionary of fingerprints, but since you only need to match one source material, you can just maintain them in a list. Since your keys are going to be an array of numerical values, I propose that another possible data structure to quickly query your dataset would be a k-d tree. I don't think Shazam uses one, but the more I think about it, the closer their system seems to an n-dimensional nearest neighbor search, if you can keep the amount of critical points consistent. For now though, just keep it simple, use a list.
Now we have a database of fingerprints primed and ready for use. We need to compare them against our microphone input now.
Sample our microphone input in small segments with a sliding window, the same way we did our sources.
For each segment, calculate the fingerprint, and see if it matches close to any from storage. You can look for a partial match here and there are lots of tweaks and optimizations you could try.
This is going to be a noisy and inaccurate signal so don't expect every segment to get a match. If lots of them are getting a match (you will have to figure out what lots means experimentally), then assume you have one. If there are relatively few matches, then figure you don't.
Conclusions
This is not going to be an super easy project to do well. The amount of tuning and optimization required will prove to be a challenge. Some microphones are inaccurate, and most environments have other sounds, and all of that will mess with your results, but it's also probably not as bad as it sounds. I mean, this is a system that from the outside seems unapproachably complex, and we just broke it down into some relatively simple steps.
Also as a final note, you mention Javascript several times in your post, and you may notice that I mentioned it zero times up until now in my answer, and that's because language of implementation is not an important factor. This system is complex enough that the hardest pieces to the puzzle are going to be the ones you solve on paper, so you don't need to think in terms of "how can I do X in Y", just figure out an algorithm for X, and the Y should come naturally.
Due to the reasons outlined in this question I am building my own client side search engine rather than using the ydn-full-text library which is based on fullproof. What it boils down to is that fullproof spawns "too freaking many records" in the order of 300.000 records whilst (after stemming) there are only about 7700 unique words. So my 'theory' is that fullproof is based on traditional assumptions which only apply to the server side:
Huge indices are fine
Processor power is expensive
(and the assumption of dealing with longer records which is just applicable to my case as my records are on average 24 words only1)
Whereas on the client side:
Huge indices take ages to populate
Processing power is still limited, but relatively cheaper than on the server side
Based on these assumptions I started of with an elementary inverted index (giving just 7700 records as IndexedDB is a document/nosql database). This inverted index has been stemmed using the Lancaster stemmer (most aggressive one of the two or three popular ones) and during a search I would retrieve the index for each of the words, assign a score based on overlap of the different indices and on similarity of typed word vs original (Jaro-Winkler distance).
Problem of this approach:
Combination of "popular_word + popular_word" is extremely expensive
So, finally getting to my question: How can I alleviate the above problem with a minimal growth of the index? I do understand that my approach will be CPU intensive, but as a traditional full text search index seems unusably big this seems to be the only reasonable road to go down on. (Pointing me to good resources or works is also appreciated)
1 This is a more or less artificial splitting of unstructured texts into small segments, however this artificial splitting is standardized in the relevant field so has been used here as well. I have not studied the effect on the index size of keeping these 'snippets' together and throwing huge chunks of texts at fullproof. I assume that this would not make a huge difference, but if I am mistaken then please do point this out.
This is a great question, thanks for bringing some quality to the IndexedDB tag.
While this answer isn't quite production ready, I wanted to let you know that if you launch Chrome with --enable-experimental-web-platform-features then there should be a couple features available that might help you achieve what you're looking to do.
IDBObjectStore.openKeyCursor() - value-free cursors, in case you can get away with the stem only
IDBCursor.continuePrimaryKey(key, primaryKey) - allows you to skip over items with the same key
I was informed of these via an IDB developer on the Chrome team and while I've yet to experiment with them myself this seems like the perfect use case.
My thought is that if you approach this problem with two different indexes on the same column, you might be able to get that join-like behavior you're looking for without bloating your stores with gratuitous indexes.
While consecutive writes are pretty terrible in IDB, reads are great. Good performance across 7700 entries should be quite tenable.
I'm trying to nut out a highlevel tech spec for a game I'm tinkering with as a personal project. It's a turn based adventure game that's probably closest to Archon in terms of what I'm trying to do.
What I'm having trouble with is conceptualising the best way to develop a combat system that I can implement simply at first, but that will allow expansion and complexity to be added in the future.
Specifically I'm having trouble trying to figure out how to handle combat special effects, that is, bonuses or negatives that may be applied or removed by an actor, an item or an environment.
Do I have the actor handle all effects that are in play for/against them should the game itself check each weapon, armour, actor and location each time it tries to make a decisive roll.
Are effects handled in individual objects or is there an 'effect' object or a bit of both?
I may well have not explained myself at all well here, and I'm more than happy to try and expand the question if my request is simply too broad and airy. But my intial thinking is that smarter people than me have spent the time and effort in figuring things like this out and frankly I don't want to taint the conversation with the cul-de-sac of my own stupidity too early.
The language in question is javascript, although at this point I don't imagine it makes a great difference.
What you're calling 'special effects' used to be called 'modifiers' but nowadays go by the term popular in MMOs as 'buffs'. Handling these is as easy or as difficult as you want it to be, given that you get to choose how much versatility you want to be able to bestow at each stage.
Fundamentally though, each aspect of the system typically stores a list of the modifiers that apply to it, and you can query them on demand. Typically there are only a handful of modifiers that apply to any one player at any given time so it's not a problem - take the player's statistics and any modifiers imparted by skills/spells/whatever, add on any modifiers imparted by worn equipment, then add anything imparted by the weapon in question. If you come up with a standard interface here (eg. sumModifiersTo(attributeID)) that is used by actors, items, locations, etc., then implementing this can be quick and easy.
Typically the 'effect' objects would be contained within the entity they pertain to: actors have a list of effects, and the items they wear or use have their own list of effects. Where effects are explicitly activated and/or time-limited, it's up to you where you want to store them - eg. if you have magical potions or other consumables, their effects will need to be appended to the Actor rather than the (presumably destroyed) item.
Don't be tempted to try and have the effects modify actor attributes in-place, as you quickly find that it's easy for the attributes to 'drift' if you don't ensure all additions and removals are done following the correct protocol. It also makes it much harder to bypass certain modifiers later. eg. Imagine a magical shield that only protects against other magic - you can pass some sort of predicate to your modifier totalling function that disregards certain types of effect to do this.
Take a look at the book, Head First Design Patterns, by Elisabeth Freeman. Specifically, read up on the Decorator and Factory patterns and the method of programming to interfaces, not implementations. I found that book to be hugely effective in illustrating some of the complex concepts that may get you going on this.
Hope this helps to point you in the right direction.
At first blush I would say that the individual combatants (player and NPC) have a role in determining what their combat characteristics are (i.e. armor value, to-hit number, damage range, etc.) given all the modifiers that apply to that combatant. So then the combat system is not trying to figure out whether or not the character's class gives him/her an armor bonus, whether a magic weapon weighs in on the to hit, etc.
But I would expect the combat system itself to be outside of the individual combatants. That it would take information about an attacker and a desired type of attack and a target or set of targets and resolve that.
To me, that kind of model reflects how we actually ran combat in pencil and paper RPGs. The DM asked each player for the details of his or her character and then ran the combat using that information as the inputs. That it works in the real world suggests its a pretty flexible system.
I am talking about Google Text Translation User Interface, in Google Language Tools.
I like the fact that you can get translations of text for a lot of languages. However, I think is not so good always to show all options of translation. I believe is preferably to show, in first instance, only the most frequent options for text translation.
Really, it has become very annoying trying to translate from English to spanish, for example. Using the keyboard (E, Tab, then S Key repeatedly), the first three options presented are Serbian, Slovak, Slovenian, and finally Spanish...
Another example: from English to French. Using the keyboard again (F key repeatedly) shows Filipino and Finish before French!!!
What sort of ideas do you think can it be applied to this GUI to make it more effective for real people usage?
I think it's probably fine. There are only a little over 30 languages in the list, and close to half of them are pretty common languages, so I don't think it really makes sense to put the common ones first. It's not like a country list where you have to search through 180+ countries to find yours.
The only thing I would probably do is use a cookie to store your last language selection(s).
I think the best would be an autocomplete input field similar to the one used for tags on Stack Overflow and the one used for search on Facebook. Each letter you type narrows the field of results down and allows you to easily choose the right one with either the mouse or the arrow keys.
You could also keep track of the most popular ones and sort the results by most frequently used, like Stack Overflow does with their suggested tags.
I've been frustrated with this interface as well. I think it would be a good idea to (a) use cookies to give preference to the languages this user has selected in the past; and (b) to display a limited list (4-8 languages) of the most common languages, with a "more..." option that expands the list.
I really appreciate the fact that a lot of websites and software applications have started using this approach when asking you to specify your time zone. Why display "Mid-Atlantic", "Azores", etc. if you expect 95% of your users to be in (for example) the 5 U.S. time zones.
The simplest way to do what you are asking is to sort by request frequency and then by alpha/numeric. This will put languages where translation requests are most common to the top. It still won't solve your problem perfectly, but it would be an easy improvement, and one that would work better for most people.
Now, if only there were some google employees who came to this website ;-)
I'd try and detect their locale through browser/ISP meta data if I could, then default to that - but most people expect an alphabeticly-ordered list of languages. What if they're looking for Serbian, but after they hit 'S' once they get spanish, with no Serbian in sight? They might assume that there is no Serbian, since it's not where they expect it (before spanish) and leave. That'd be bad.
I would agree with most previous responses, as plain as this page is there is not much you could do upfront, languages should stay sorted alphabetically.
But there are some things that one could do in the background, store last settings or letting you bookmark translation settings.
Don't forget that some browser will let you do multiple letters for shortcuts, e.g. in Firefox you can type 'SP' to get to spanish.