I would like get meaning of selected word using wikionary API.
Content retrieve data should be the same as is presented in "Word of the day", only the basic meaning without etympology, Synonyms etc..
for example
"postiche n
Any item of false hair worn on the head or face, such as a false beard or wig."
I tried use documentation but i can find similar example, can anybody help with this problem?
Although MediaWiki has an API (api.php), it might be easiest for your purposes to just use the action=raw parameter to index.php if you just want to retrieve the source code of one revision (not wrapped in XML, JSON, etc., as opposed to the API).
For example, this is the raw word of the day page for November 14:
http://en.wiktionary.org/w/index.php?title=Wiktionary:Word_of_the_day/November_14&action=raw
What's unfortunate is that the format of wiki pages focuses on presentation (for the human reader) rather than on semantics (for the machine), so you should not be surprised that there is no "get word definition" API command. Instead, your script will have to make sense of the numerous text formatting templates that Wiktionary editors have created and used, as well as complex presentational formatting syntax, including headings, unordered lists, and others. For example, here is the source code for the page "overflow":
http://en.wiktionary.org/w/index.php?title=overflow&action=raw
There is a "generate XML parse tree" option in the API, but it doesn't break much of the presentational formatting into XML. Just see for yourself:
http://en.wiktionary.org/w/api.php?action=query&titles=overflow&prop=revisions&rvprop=content&rvgeneratexml=&format=jsonfm
In case you are wondering whether there exists a parser for MediaWiki-format pages other than MediaWiki, no, there isn't. At least not anything written in JavaScript that's currently maintained (see list of alternative parsers, and check the web sites of the two listed ones). And even then, supporting most/all of the common templates will be a big challenge. Good luck.
OK, I admit defeat.
There are some files relating to Wiktionary in Pywikipediabot and I looking at the code, it does look like you should be able to get it to parse meaning/definition fields for you.
However the last half an hour has convinced me otherwise. The code is not well written and I wonder if it has ever worked.
So I defer to idealmachine's answer, but I thought I would post this to save anyone else from making the same mistakes. :)
As mentioned earlier, the content of the Wiktionary pages is in human-readable format, wikitext, so MediaWiki API doesn't allow to get word meaning because the data is not structured.
However, each page follows specific convention, so it's not that hard to extract the meanings from the wikitext. Also, there're some APIs, like Wordnik or Lingua Robot that parse Wiktionary content and provide it in JSON format.
MediaWiki does have an API but it's low-level and has no support for anything specific to each wiki. For instance it has no encyclopedia support for Wikipedia and no dictionary support for Wiktionary. You can retrieve the raw wikitext markup of a page or a section using the API but you will have to parse it yourself.
The first caveat is that each Wiktionary has evolved its own format but I assume you are only interested in the English Wiktionary. One cheap trick many tools use is to get the first line which begins with the '#' character. This will usually be the text of the definition of the first sense of the first homonym.
Another caveat is that every Wiktionary uses many wiki templates so if you are looking at the raw text you will see plenty of these. The only way to reliably expand these templates is by calling the API with action=parse.
Related
I am just a newcomer developing an app with html/css/js via phonegap. I've been searching info on how to make my app be displayed in different languages and Google doesn't understand me.
So the idea is to have a button on index.html that let the user choose the language in which the app will be displayed, in this case Spanish/English, nothing strange like arabic blablabla....
So I guess that the solution must be related to transform all the text that I load in html to variables and then depending on the language selected display the correct one. I have no idea how to make this, and Im not able to find examples. So that's what Im asking for... if someone could give some code snipet to see how html variables works and how should I save user language selection...
Appreciated guys!
This can be done by internationalization (such as i18N). To do this you need separate file for each language and put all your text in it. Search Google for internationalization.
Otherwise you can look into embeding Google Translate.
This depends on the complexity of language-dependencies in the application. If you have just a handful of short texts in a strongly graphic application, you can just store the texts in JavaScript variables or, better, in properties of an object, with one object per language.
But if you expect to encounter deeper language-dependencies as well (e.g., displaying dynamically computed decimal numbers, which should be e.g. 1.5 in English and 1,5 in Spanish), then it’s probably better to use a library like Globalize.js (described in some detail in my book Going Global with JavaScript and Globalize.js). That way you could use a unified approach, writing e.g. a string using Globalize.localize('greeting') and a number using Globalize.format(x, 'n1') and a date using Globalize.format(date, 'MMM d').
Is it possible to use gdata javascript or any other javascript api to retrieve the list of blog posts based on labels?
My usage case:
Each blog post has a label that means its category. Some posts are labelled with 'Summary' and the category it belongs.
I want to be able to display the summary of MyCategory(Label) on the label's page. e.g. http://myblog.blogspot.com/search/label/MyCategory
Is it possible to retrieve the list of blog posts matching 'Summary' and 'MyCategory'?
UPDATE:
more details:
it is a blog I have edit access to
the js can be placed on google sites or inside the blog html
the blog has 18k+ posts, so listing all posts and filtering is not an option.
myblog.blogspot was referring to any blogger, not the actual one. I was just talking about label-based blogger filter.
I've read and re-read this question and blogspot-link a couple of times. It's difficult to understand.
I think it would help if you gave some more information:
where do you want to place this javascript? I mean: is it going to be
placed on the same blog? I'm asking because this determines cross-site security requirements.
I have a strong feeling this is actually a question where you want to a cross-domain request (load data from a different domain|server (blogspot.com)) that you do not control, otherwise you'd be playing with 'Access-Control-Allow-Origin' on the server-side.
Will this script be located in a online or local (x)html source?
Could you please provide a more elaborate example (or sample) of an existing list that contain's this labels, or do you want to crawl a blog like a spider|index-robot?
If the above assumptions are correct, the first part of your problem is retrieving cross-domain data (which is hard nowadays using simple solutions like XMLHttpRequest aka AJAX).
You could then start looking at some own server-side scripts (php) to get this data and send it (pre-parsed) to your browser-application (effectively this is simply a proxy located on your own domain).
I have also heard of using a java-object (or silverlight? or flash which nowadays also suffers from cross-domain-security restrictions), to get around this modern day cross-domain security.
You could then embed one or more of these objects (that retrieve the source) and communicate with them through javascript. A variation of this technique is also often used for cross-browser multiple file-uploads.
There is a big chance there is already a solution (object) to this part of your problem here on StackOverflow.
If you fix this first part of the problem, the second part of your problem simply comes down to parsing (regex for example) your retrieved 'label'-data, building new links from them to retrieve the 'summary'-content you where after, using the same data-retrieval technique that was used to get the labels-list in the first place..
Is this what you are after?
UPDATE:
In pure javascript/json there is an excellent topic here on SO.
Should you go with java, you could look at this.
In php you use file_get_contents() or file_get_html(). See also this topic on SO.
UPDATE2: The accepted ANSWER (out of comment's below:)
On google's developers blogger docs 2.0 you can find: RetrievingWithQuery.
Quote:
/category
Specifies categories (also known as labels) to filter the feed results. For example,
blogger.com/feeds/blogID/posts/default/-/Fritz/Laurie returns entries
with both the labels Fritz and Laurie.
You can also find a working piece of javascript that uses this technique over here: list-recent-posts-by-label
Now you can simply continue 'AJAX'ing your summary's out of this filtered list.
Good luck!
For my web front end I have to implement subsets of the wiki-syntax in my system. Do I need to manually specify rules and reinvent the wheel? Is there an existing javascript library or jquery plugin that could help out with it?
For example a user enters == Header == Since this needs to get converted to a medium header for example (assuming medium is defined in this context as a span as below)
<span class="mediumHeader" id = "Header">Header</span>
Now when the user edits the above text I'm guessing it'll involve replacing the
<span...> ... </span> with ==...==
Now for every system I design this will be as per 'my rules' and will almost always have to reinvent the wheel. Is there something that I could use to ease this wiki to/from HTML transformation using Jquery/Javascript? I'm sure it's a problem with a known solution.
I would prefer to customize what's acceptable and what isn't i.e. I don't everything to be translated into wiki syntax (or HTML) only subsets of it. Should I just roll my own for my application?
It's been long enough that you may not need this, but yours was the top SO hit when I started looking into it.
There are a couple javascript options - you're probably looking at instaview (check out test/test.js), or maybe Wiky.js (the less fully documented).
If you aren't limited to Javascript, check out the exhaustive list of MediaWiki parsers at http://www.mediawiki.org/wiki/Alternative_parsers - lots of tools for C++, Java, Perl, ruby, and more. That's the link to watch for new developments.
At the time of writing, Parsoid seems to be the only one which translates in both directions. This one also powers the visual editor on Wikipedia. But this is no handy client-site lib to include in your app, but a full-blown parsing and transformation server suite. A production version of Parsoid on the Wikimedia cluster can be accessed at http://parsoid-lb.eqiad.wikimedia.org/.
Other JavaScript Libraries, which are translating from WikiText to HTML only (ordered by popularity), are:
Wiky.js - doesn't support full WikiText syntax. (by tanin47, not to be confused with Wiki.js from Requarks - a different project completely)
wtf_wikipedia - isn't directly translating to HTML but JSON, which results in much more powerful possiblities (e.g. info-boxes as key-value pairs). This is the most up-to-date library and "its a combination of instaview, txtwiki, and uses the inter-language data from Parsoid javascript parser."
instaview - no updates in the last 2 years.
Also checkout the current and full list of alternative MediaWiki parsers.
I want fulltext search for my JavaScript code, but I'm usually not interested in matches from the comments.
How can I have fulltext search ignoring any commented match? Such a feature would increase my productivity as a programmer.
Also, how can I do the opposite: search within the comments only?
(I'm currently using Text Mate, but happy to change.)
See our Source Code Search Engine (SCSE). This tool indexes your code base using the langauge structure to guide the indexing; it can do so for many languages including JavaScript. Search queries are then stated in terms of abstract language tokens, e.g., to find identifiers involving the string "tax" multiplied by some constant, you'd write:
I=*tax* '*' N
This will search all indexed languages only for identifiers (in each language) following by a '*' token, followed by some kind of number. Because the tool understands language structure, it isn't confused by whitespace, formatting or interverning comments. Because it understands comments, you can search inside just comments (say, for authors):
C=*Author*
Given a query, the SCSE finds all the hits across the code base (possibly millions of lines), and offers these as set of choices; clicking on choice pulls up the file with the hit in the middle outlined where the match occurs.
If you insist on searching just raw text, the SCSE provides grep-style searches. If you have only a small set of files, this is still pretty fast. If you have a big set of files, this is a lot slower than language-structure based searches. In both cases, grep like searches get you more hits, usually at the cost of false positives (e.g., finding "tax" in a comment, or finding a variable named "Authorization_code"). But at least you have the choice.
While this doesn't operate from inside an editor, you can launch your editor (for most editors) on a file once you've found the hit you want.
Use ultraedit , It fully supports full text search ignoring comment or also within the comment search
How about NetBeans way (Find Symbol in the Navigate Menu),
It searches all variables,functions,objects etc.
Or you could customize JSLint and customize it if you want to integrate it in a web application or something like that.
I personnaly use Notepad++ wich is a great free code editor. It seems you need an editor supporting regular expression search (in one or many files). If you know Reg you can use powerfull search like in/out javascript comments...the work will be to build the right expression and test it with one file with all differents cases to be sure it will not miss things during real search, or maybe you can google for 'javascript comments regular expression' or something like...
Then must have a look at Notepad++ plugins, one is 'RegEx Helper' wich helps for building regular expressions.
For years I've been reading about XML and I have just not quite grokked it. Most documents I see about it simply explain the syntax (extraordinarily easy to understand) and say that it's portable: I've worked with Unix my whole life so the idea of putting things in plain text to be portable is hardly revolutionary. My specific question is that I have a document (my CV) that I would like to present to web visitors in several formats: as a webpage, as a pdf, or even as plain text. Is XML and Javascript the right approach to take?
What I need is for the document to be easily editable, conversion easy and just easy general upkeep. For example, when I publish a paper, I'd like to take less than five minutes to add the info and then have everything go automatically from there.
Give me your opinions: I also use LaTeX compulsively, so my current approach has been just to have my CV in LaTeX and to convert it to a web-page using LaTeXML. However, I sorta have the feeling that with everybody jumping up and down about XML and Javascript, that there might be something good to learn about it.
I would also like to simplify maintaining my homepage by not duplicating the same footer for every single page that I set up.
Thanks,
Joel
Edit: I'll also take any book recommendations!
I think this is a slight misunderstanding of the combination of JavaScript and XML.
XML, in and of itself is an excellent means of representing data. It's largely human-readable, and easily parsed with libraries in nearly every programming language. That is the main benefit of XML.
Using XML with JavaScript is certainly a solution, but I think it's a matter of the question you're asking. JavaScript can parse XML, and allow you to obtain and manipulate data from your XML document. If you want to grab data from a server without reloading your HTML page (synchronously or asynchronously), then using JavaScript and XML is a valid way to do that.
If you want to, however, display your XML as a webpage, you would likely be better off using XML and XSLT [wikipedia], or perhaps PHP and XPath, to transform the document into browser-readable HTML. On the other hand, you could use nearly any language to convert the XML to a plain-text file, rich text file, or store it in a normalized database.
To sum up, XML is a great way to store data, because it can be used in so many different ways, and by so many different languages. It's an answer to many different questions; you just have to figure out which questions you're asking.
To elaborate on my comment
The transformation to whatever output you desire is depending on how you store your CV on your server and whether you have the possibility to process it on the server. If you store it in XML, you can transform it to desired (binary) output using server based tools - that would for php be pdf and word (on windows server platform) for example. XML would be interesting from a mark-up point of view since it would make it clear where the table of contents, headers, lists of experience and so one would be found.
JavaScript cannot transform something into PDF or word, that has to be done on the server. What javascript can do is to get a text from the server in XML or JSON using AJAX and manipulate this into what the user sees on the screen. For XML that can be done with XSL(T) too. If you want for self-education purposes to use JavaScript, JSON is very nice since it is in my opinion more readable than XML and it creates a populated javascript object with the least work.
Footer in javascript: in the page have
<script type="text/javascript" src="footer.js"></script> and in footer.js, you can for example do
var footerText = 'Here goes whatever you want';
document.write(footerText);
Comparison between XML and JSON
I've got a webpage with browser-side XSLT transformation up and running for years. It's a playground, only some words in german. See how easy it is to build this on heese.net/test. You can switch between "Beispiel" (=Demo) and XSL. The sourcecode of the page in the iframe is the XML. You can do this serverside with 3 lines of PHP-code.
On Javascript: you can use it with XSLT and I show this on my site, but it can't interact. First the XSLT builds an HTML page out of your XML data and after this job is completely done the Javascript in the resultig HTML document begins to work.
Parsing XML with Javascript is a different task.