Yahoo YQL RSS returning no results - javascript

I'm having a bit of issue with YQL. I'm trying to query an RSS from a URL but I get no results.
Entered this query:
select title from rss where url="http://www.spoilertv.com/feeds/posts/default/-/Aftermath"
...and instead of getting the titles as requested, the console shows:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="0" yahoo:created="2017-02-23T17:13:16Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-start-time="1" execution-stop-time="101"
execution-time="100" id="5080cef9-75a4-4356-9b08-1d68fb3d855d"><![CDATA[http://www.spoilertv.com/feeds/posts/default/-/Aftermath]]></url>
<user-time>118</user-time>
<service-time>100</service-time>
<build-version>2.0.84</build-version>
</diagnostics>
<results/>
</query>
What could be wrong?

When I made a few requests to your URL sample using YQL, I have two results, getting the results in JSON or XML it was always NULL.
According to this link:
YQL produces only xml or json. You can create rss or atom feed but it
always will be inside yql root element. So you need another one tool
to extract a feed from response. I'm using Google Apps Script for
that. It can parse and create xml/rss and uses server-side javascript
e.g:
var url = 'http://query.yahooapis.com/v1/public/yql?q=...'; // rest query to YQL table
var xml = UrlFetchApp.fetch(url).getContentText(); // this is an xml string
var root = XmlService.parse(xml).getRootElement(); // now we can modify this xml or create a new one.
What I suggest you is make the following test instead:
Check this link and follow the step #6:
Click Copy URL. From yql_news_app.html, paste the URL into the src
attribute of the second script tag as seen below which in your case
you can check as follows:
<body>
<div id='results'></div>
<escript src='https://query.yahooapis.com/v1/public/yql?q=select%20title%20from%20rss%20where%20url%3D%22http%3A%2F%2Fwww.spoilertv.com%2Ffeeds%2Fposts%2Fdefault%2F-%2FAftermath%22&diagnostics=true'>
</escript>
</body>
In your browser, press F12 (and check the Console tab). see if there're errors. If so, please update you question with the details of the results.

Related

Confluence API - Get child pages of parent also filtered by date range

I am trying to get all the child pages from the parent page and also trying to get the filtered results by the date.
I am trying the following query but it is throwing all the child pages but it is not showing the results within the provided date range.
API endpoint:
https://domain-name.atlassian.net/wiki/rest/api/content/search?cql=parent=2342344&created=2021-01-01%20and%20created=2022-01-01
In the results, I am seeing the child pages but I am also seeing the pages created before 2021
Your query is almost right, you just have to use < and > to filter your dates.
For example, the following (Python) request works for me:
response = requests.get(
url="https://jira.<my domain>.com/confu/rest/api/content/search",
params={
"cql": "parent=2342344 and created>2021-01-01 and created<2022-12-11"
},
headers={"Authorization": "Basic <token>"}
)
Just make sure your URL encoding is correct. Encoded, it should look like:
/confu/rest/api/content/search?cql=parent%3D207377670%20and%20created%3E2021-01-01%20and%20created%3C2022-12-11
You can find documentation here: https://developer.atlassian.com/server/confluence/advanced-searching-using-cql/

How to filter out non-json documents in MarkLogic?

I have a lot of data loaded in my database where some of the documents loaded are not JSON files & just binary files. Correct data looks like this: "/foo/bar/1.json" but the incorrect data is in the format of "/foo/bar/*". Is there a mechanism in MarkLogic using JavaScript where I can filter out this junk data and delete them?
PS: I'm unable to extract files with mlcp that have a "?" in the URI and maybe when I try to reload this data I get this error. Any way to fix that extract along with this?
If all of the document URIs contain a ? and are in that directory, then you could use cts.uriMatch()
declareUpdate();
for (const uri of cts.uriMatch('/foo/bar/*?*') ) {
xdmp.documentDelete(uri)
}
Alternatively, if you are looking to find the binary() documents, you can apply the format-binary option to a cts.search() with a cts.directoryQuery() and then delete them.
declareUpdate();
for (const doc of cts.search(cts.directoryQuery("/foo/bar/"), ['format-json']) ) {
xdmp.documentDelete(fn.baseUri(doc));
}
They are probably being persisted as binary because there is no file extension when the URI ends with a question mark and some querystring parameter values i.e. 1.json?foo=bar instead of 1.json
It is difficult to diagnose and troubleshoot without seeing what your MLCP job configs are and knowing more about what you are doing to load the data.

Parsing a stringified JSON coming from a Google Sheet Web App

I'm trying to parse a stringified JSON output from a web app created from a google sheets script. I thought it couldn't be that complicated, but I've tried everything I could think of or find out online... so now asking for help if that's OK!
on the web app / Google Sheets side, the code is:
function doGet(e) {
var spreadsheet = SpreadsheetApp.openById('spreadsheetID');
var worksheet = spreadsheet.getSheetByName('Rankings C/U');
var output = JSON.stringify({ data: worksheet.getDataRange().getValues() });
return HtmlService.createHtmlOutput(output);
}
I've published the script, the web app works, I'm OK with that bit.
I've put random values on the spreadsheet: [[1,2],[3,4]] if we speak in matrix format.
on the other end, I've tried a bunch of stuff including .fetch, JSON.parse() to get the data in a usable format within the Google Sites embedded code, but the real issue is that I think I can't get to allocate the payload to a variable?
I'm using Google Sites to fetch the data.
with the basic module "<> embed", with the "by URL" option, with the following code:
https://script.google.com/macros/s/scriptID/exec
I get the following output - that looks what it should be:
{"data":[[1,2],[3,4]]}
but when trying to include this in a script module ("embed code") - no chance!
<form name="get-images">
<input name="test" id="test" value="we'll put the contents of cell A1 here">
</form>
<script>
const form = document.forms['get-images']
var usableVariable = JSON.parse("https://script.google.com/macros/s/scriptID/exec"); // here I'm trying to allocate the stringified JSON to a variable
form.elements['test'].value = usableVariable[1,1]; //allocating the first element of the parsed array
</script>
I'm pretty sure I'm missing something obvious - but now I ran out of ideas!
Thanks for any help :)
I believe your goal as follows.
In your situation, the bottom script is embedded to the Google site.
You want to retrieve the values from doGet and want to put the value of cell "B2" to the input tag.
The settings of Web Apps is Execute the app as: Me and Who has access to the app: Anyone, even Anonymous.
Modification points:
In your case, I think that return ContentService.createTextOutput(output); is suitable instead of return HtmlService.createHtmlOutput(output); in Google Apps Script.
In order to retrieve the values from doGet, in this modification, fetch is used.
You want to retrieve the cell "B2" from usableVariable[1,1];, please modify it to usableVariable[1][1];
When above points are reflected to your script, it becomes as follows.
Modified script:
Google Apps Script side:
function doGet(e) {
var spreadsheet = SpreadsheetApp.openById('spreadsheetID');
var worksheet = spreadsheet.getSheetByName('Rankings C/U');
var output = JSON.stringify({ data: worksheet.getDataRange().getValues() });
return ContentService.createTextOutput(output);
}
HTML & Javascript side:
<form name="get-images">
<input name="test" id="test" value="we'll put the contents of cell A1 here">
</form>
<script>
let url = "https://script.google.com/macros/s/###/exec";
fetch(url)
.then((res) => res.json())
.then((res) => {
const usableVariable = res.data;
const form = document.forms['get-images'];
form.elements['test'].value = usableVariable[1][1]; // usableVariable[1][1] is the cell "B2".
});
</script>
Note:
When you modified the Google Apps Script of Web Apps, please redeploy the Web Apps as new version. By this, the latest script is reflected to the Web Apps. Please be careful this.
In my environment, I could confirm that above HTML & Javascript worked in the Google site by embedding.
References:
Class ContentService
Using Fetch
Web Apps

Performing image scraping using YQL with lowest resources usage possible i.e. lowest number of queries

I am trying to perform some image scraping tool which enables the user to scrape all the images contained within a given page using xpath process the scraped images to find which have an alt tags and which doesn't and return the result as two separate json objects
i.e. {alted:["<img ......>","<img ......>"],nonAlted:["<img ......>","<img ......>"]}
now comes my problem, although i am able to scrape the page and retrieve all the images and separate them to the alted and nonAlted categories i can't put them in the response object !
I think to further clarify my issue it would be better to add some code, so the following code is what i use in the execute block of my YQL table:
query = "select * from html where url='http://www.example.com/page-path' and xpath='//li'";
var result = y.query(query);
y.log(result.results..img.(#alt));
var querieselement = <urls/>;
querieselement.query = result.results..img.(#alt);
response.object = querieselement;
So my question is how can i set the response object to contain the processed list of the images, note that after running the query the result doesn't show any data although the log is showing the list, hope someone can point me to the cause of that problem.
P.S. The reason i mentioned "resources usage" in the title is that because i am aware of the ability to perform to separate calls for each images category which means scraping the same page two times which i think is kind of inefficient.
P.S. i would also be glad if someone can help me understand what is the meaning of those two lines
querieselement = <urls/>;
querieselement.query = result.results..img.(#alt);
why "<urls/>" and why "querieselement.query", i don't know what they are supposed to do while they seem to be doing critical job as changing them breaks the code.
Thanks.
So my question is how can i set the response object to contain the processed list of the images
Use a stylesheet rather than an XPath selector:
select * from xslt where url="http://www.mysite.com/page-path" and stylesheet="http://www.mysite.com/page-path.xsl"
Define the stylesheet as such:
<xsl:template match="img[#alt]">
<xsl:for-each select="#alt">
<script>
alt.push(<xsl:value-of select="."/>);
</script>
</xsl:for-each>
</xsl:template>
<xsl:template match="img[not(#alt)]">
<xsl:for-each select="#src">
<script>
noalt.push(<xsl:value-of select="."/>);
</script>
</xsl:for-each>
</xsl:template>

base link and search api

I am attempting to query a database through an API which I don't fully understand. I have been sent an example of the API being used with a keyword search form. The form is an html file and uses jquery return JSON documents, format items into an array array, and display.
I tried to build the design of my application and manipulate the form to work within my pages. The file the uses the API requires that the a base link be used.
<base href="{{app_root}}">
If I remove this base link my functionality of the search is lost. If I use the base link all of presentation and CSS is lost.
I thought maybe I could change the base link dynamically when I needed to call the search file with:
<script type="text/javascript">
function setbasehref(basehref) {
var thebase = document.getElementsByTagName("base");
thebase[0].href = basehref;
}
//setbasehref("{{app_root}}");
setbasehref("{{app_root}}");
</script>
Then use setbasehref() to change it back to my original base link, but that didn't work.
I'm new to javascript and JSON, and I'm not entirely sure what app_root is doing. Any thoughts?

Categories

Resources