Parse different sources differently

Parse different sources differently - javascript

Let's say I wanted to parse information from different radio stations websites (the songs that were just played) and store them in a database. The websites differ (obviously), so I need to parse them differently. My way to do that is to create a super class "RadioStation" with the common functions and derive subclasses for each website in which I define the special parse function. However I don't think that's the right way to go because i would have to write 100+ subclasses. What is the correct solution here?
Thank you!

You could try and write an intelligent parser or you could write the 100+ subclasses, there is no simple solution to trying to parse data from different sources in different formats.
Though I would not be surprised if webradios would provide data in some kind of standard format (SOAP, XML, something...) as I suppose there are already quite a few applications that use it.

Related

Run Database Stored RegEx against DOM

I have a question about how to approach a certain scenario before I get halfway through it and figure out it was not the best option.
I work for a large company that has a team that creates tools for the team mates to use that aren’t official enterprise tools. We have no access to the database directly, just access to an internal server to store our files to run and be able to access the main site with javascript etc (same domain).
What I am working on is a tool that has a ton of options in it that allow you to select that I will call “data points” on a page.
There are things like “Account status, Balance, Name, Phone number, email etc” and have it save those to an excel sheet.
So you input account numbers, choose what you need and then using IE Objects it navigates to the page and scrapes data you request.
My question is as follows..
I want to make the scraping part pretty Dynamic in the way it works. I want to be able to add new datapoints on the fly.
My goal or idea is so store the regular expression needed to get the specific piece of data in the table with the “data point option”.
If I choose “Name” it knows the expression for name in the database to run again the DOM.
What would be the best way about creating that type of function in Javascript / Jquery?
I need to pass a Regex to a function, have it run against the DOM and then return the result.
I have a feeling that there will be things that require more than 1 step to get the information etc.
I am just trying to think of the best way to approach it without having to hardcode 200+ expressions into the file as the page may get updated and need to be changed.
Any ideas?

IRobotSoft scraper may be the tool you are looking for. Check this forum and see if questions are similar to what you are doing: http://irobotsoft.org/bb/YaBB.pl?board=newcomer. It is free.
What it uses is not regular expression but a language called HTQL, which may be more suitable for extracting web pages. It also supports regular expression, but not as the main language.
It organizes all your actions well with a visual interface, so you can dynamically compose actions or tasks for changing needs.

How to implement different languages on html page

I am just a newcomer developing an app with html/css/js via phonegap. I've been searching info on how to make my app be displayed in different languages and Google doesn't understand me.
So the idea is to have a button on index.html that let the user choose the language in which the app will be displayed, in this case Spanish/English, nothing strange like arabic blablabla....
So I guess that the solution must be related to transform all the text that I load in html to variables and then depending on the language selected display the correct one. I have no idea how to make this, and Im not able to find examples. So that's what Im asking for... if someone could give some code snipet to see how html variables works and how should I save user language selection...
Appreciated guys!

This can be done by internationalization (such as i18N). To do this you need separate file for each language and put all your text in it. Search Google for internationalization.
Otherwise you can look into embeding Google Translate.

This depends on the complexity of language-dependencies in the application. If you have just a handful of short texts in a strongly graphic application, you can just store the texts in JavaScript variables or, better, in properties of an object, with one object per language.
But if you expect to encounter deeper language-dependencies as well (e.g., displaying dynamically computed decimal numbers, which should be e.g. 1.5 in English and 1,5 in Spanish), then it’s probably better to use a library like Globalize.js (described in some detail in my book Going Global with JavaScript and Globalize.js). That way you could use a unified approach, writing e.g. a string using Globalize.localize('greeting') and a number using Globalize.format(x, 'n1') and a date using Globalize.format(date, 'MMM d').

Generating a form dynamically in JQuery

I have my form information in a JSON object(Custom Format). Now i have to populate the From and display in predefined placeholder. I came across this post http://neyeon.com/2011/01/creating-forms-with-json-and-jquery/, which is very helpful.
But the problem is, i have to parse my JSON and then i have to create the new JSON in the required format so that the form will be created. Is this the right way to do it? or is there any other options available for me to do this?

If the author's code expects a certain structure for it's API and it differs from what you're creating, then yes, you'll have to translate. There's no "JSON Form" standard here to really dictate whether you should format your data a certain way.
If there's a well known, popular, jQuery form plugin (I've never heard of this one), it might make sense for you to simply format your data accordingly from the get go. OTOH, you might have better ideas and specialized needs anyway so that might not make sense either.
Either way, it shouldn't be too much work. Just write up a neat conversion function so that you can do your translation consistently.

Namespaces in JSON

Is there such a thing as JSON namespaces, just like XML namespaces? Has anyone created a spec or libraries for this? Is this a good or a terrible idea?
I want to make a data spec that can be represented in XML as well as JSON. However I also need the namespace concept, that the data can be extended by annotations in different vocabularies.
To be more specific, this is about representing events. My schema will describe the event in basic terms (time and location), though if you think about it, events can be annotated with different information e.g. attendees or image URLs which I don't want to specify in my schema.

JSON-LD might help :
"JSON-LD (JavaScript Object Notation for Linking Data) is a lightweight Linked Data format that gives your data context."

JSON Schema might be the right thing for this:
http://json-schema.org/
Althought I don't know how well it's implemented.

This is quite an old thread, but there are JSON prefixes, which are almost like namespaces. If you are using Java server-side with Jettison, you can easily meet them.

Why is JSON important?

I've only recently heard about JSON (Javascript Object Notation).
Can anybody explain why it is considered (by some websites/blogs/etc) to be important?
We already have XML, why is JSON better (apart from being 'native to Javascript')?
Edit: Hmm, the main answer theme seems to be 'it is smaller'. However, the fact that it allows data fetching across domains, seems important to me. Or is this in practice not (yet) much used?

XML has several drawbacks:
It's heavy!
It provides a hierarchical representation of content which is not exactly the same as (but pretty much similar to) Javascript object model.
Javascript is available everywhere. Without any external parsers, you can process JSONs directly with JS interpreter.
Clearly it's not meant to replace XML completely. For JS based Web apps, its advantages can be useful.

JSON is generally much smaller than its XML equivalent. Smaller transfer means faster transfer, which results in a better user experience.

JSON is much more concise. XML:
<person>
<name>John Doe</name>
<tags>
<tag>friend</tag>
<tag>male</tag>
</tags>
</person>
JSON:
{"name": "John Doe", "tags": ["friend", "male"]}
There's fewer overlapping features, too. For example, in XML there's tension between choosing to use elements (as above), versus attributes (<person name="John Doe">).

JSON came into popular use primarily because it offers a way to circumvent the same-origin policy used in web browsers and thereby allow mashups.
Let's say you're writing a web service on domain A. You can't load XML data from domain B and parse it because the only way to do that would be XMLHttpRequest, and XMLHttpRequest was originally limited by the same-origin policy to talking to only URLs at the same domain as the containing page.
It turns out that for a variety of reasons, you are allowed to request <script> tags across origins. Clever people realized this was a good way to work around the limitation with XMLHttpRequest. Instead of the server returning XML, it can return a series of JavaScript object and array literals.
(bonus question left as an exercise to the reader: why is <script src="..."> allowed across domains without server opt-in but XHR isn't?)
Of course, returning a <script> which consists of nothing more than object literals is not useful because without assigning the values to some variable, you can't do anything with it. Thus, most services use a variant of JSON, called JSONP (http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/).
With the rise in popularity of mashups, people realized that JSON was a convenient data interchange format in general, especially when JavaScript is one end of the channel. For example, JSON is used extensively in Chromium, even in cases where C++ is on both sides. It's just a nice lightweight way to represent simple data, that good parsers exist for in many languages.
Amusingly, using <script> tags to do mashups is incredibly insecure because it is essentially XSS'ing yourself on purpose. So native JSON (http://ejohn.org/blog/native-json-support-is-required/) had to be introduced, which obviates the original benefits of the format. But by that time, it was already super popular :)

If you are working in Javascript, it is much easier to us JSON. This is because JSON can be directly evaluated into a Javascript object, which is much easier to work with than the DOM.
Borrowing and slightly altering the XML and JSON from above
XML:
<person>
<name>John Doe</name>
<tag>friend</tag>
<tag>male</tag>
</person>
JSON:
{ person: {"name": "John Doe", "tag": ["friend", "male"]} }
If you wanted to get the second tag object with XML, you'd need to use the powerful but verbose DOM apis:
var tag2=xmlObj.getElementsByTagName("person")[0].getElementsByTagName("tag")[1];
Whereas with a Javascript object that came in via JSON, you could simply use:
var tag2=jsonObj.person.tag[1];
Of course, Jquery makes the DOM example much simpler:
var tag2=$("person tag",xmlObj).get(1);
However, JSON just "fits" in a Javascript world. If you work with it for a while, you will find that you have much less mental overhead than involving XML based data.
All the above examples ignore the possibility that one or more nodes are available, duplicated, or the possibility that the node has just one or no children. However, to illustrate the native-ness of JSON, to do this with the jsonObj, you'd just have to:
var tag2=(jsonObj.person && jsonObj.person.tags && jsonObj.person.tags.sort && jsonObj.person.tags.length==2 ? jsonObj.person.tags[1] : null);
(some people might not like that long of ternary, but it works). But XML would be (in my opinion) nastier (I don't think you'd want to go the ternary approach because you'd keep calling the dom methods which may have to do the work over again depending on implementation):
var tag2=null;
var persons=xmlObj.getElementsByTagName("person");
if(persons.length==1) {
var tags=persons[0].getElementsByTagName("tag");
if(tags.length==2) { tag2=tags[1]; }
}
Jquery (untested):
var tag2=$("person:only-child tag:nth-child(1)",xmlObj).get(0);

These web pages may help:
JSON - The Fat Free alternative to xml
Why JSON is Important to You!

It depends on what you are going to do. There are a lot of answers here that prefer JSON over XML. If you take a deeper look there isn't a big difference.
If you have a tree of objects you get only tree of javascript objects back. If you take a look at the tension to use OOP style access than turns back on you. Assume you have an object of type A, B ,C that are constructed in a tree. You can easily enable them to be serialzed to JSON. If you read them back in you only get a tree of javascript objects. To reconstruct your A, B, C you have to stuff the values manually into manually created objects or you doing some hacks. Sound like parsing XML and creating objects? Well, yes :)
This days only the newest browsers come with native support for JSON. To support more browsers you have two options: a) you load a json paraser in javascript that helps you parsing. So, how fat does this sound regarding fatreeness? The other option as I often see is eval. You can just do eval() on a JSON String to get the objects. But that introduces a whole new set of security problems. JSON is specified so it can't contain functions. If you are not checking the objects for function someone can easily send you code that is being executed.
So it might depend on what you like more: JSON or XML. The biggest difference is propably the ways of accessing things, be it script tags XMLHTTPRequest... I would decide upon this what to use. In my opinion if there would be proper support for XPATH in the browsers I would often decide for XML to use. But the fashion is directed towards json and loading additional json parsers in javascript.
If you can't decide and you know you need something really powerful you ight have to take a look at YAML. Reading about YAML is very interesting to get more insight in the topic. But it really depends on what you are trying to do.

JSON is a way to serialize data in Javascript objects. The syntax is taken from the language, so it should be familiar to the developer dealing with Javascript, and -- being the stringification of an object -- it's a more-natural serialization method for interaction within the browser than a full-fledged XML derivative (with all the arbitrary design decisions that implies).
It's light and intuitive.

JSON's a text-based object serialization format that's more lightweight than XML and that directly integrates with JavaScript's object model. That's most of its advantages right there.
Its disadvantages (compared to XML) are, roughly: fewer available tools (forget about standard validation and/or transformation, to say nothing of syntax highlighting or well-formedness checking in most editors), less likely to be human-readable (there's huge variations in the readability of both JSON and XML, so that's a necessarily fuzzy statement), tight integration with JavaScript makes for not-so-tight integration with other environments.

It's not that it is better, but that it can tie many things together to allow seamless data transfer without manual parsing!
For example javascript -> C# web service -> javascript

Develop Reference

JavaScript is the programming language of the Web.