Extracting data from interactive line chart - svg path - python 2.7 - javascript

I would like to get the data depicted on the sentiment value line chart:
http://sentdex.com/financial-analysis/?i=TWTR&tf=7d
Looking for answers I went through
Web scraping data from an interactive chart that seems to be very similar to my case.
Also went through:
Scraping graph data from a website using Python
This is my last attempt:
import re
svg_string = "M 364.5 53 L 364.5 171.35000000000002 M 364.5 184.5 L 364.5 302.85 M 364.5 184.5 L 364.5 302.85"
print repr(svg_string)
data = [map(float, xy.split(',')) for xy in re.split('[ML]', svg_string)[1:]]
print data
I am facing at least 3 issues:
The first one is that the data for svg_string represents coordinates vs. real values so I am not sure how to access the interesting data.
The second is that even when I play with this code I am getting
ValueError: invalid literal for float(): 364.5 53
And last, the string for svg_string does not even represent the graph properly (I cannot find the right code).
How do I extract the values?
Thank you in advance.

It's hard to know exactly what you're after overall, but the ValueError you are getting is because your data is not exactly the same as the other question you referenced. You have spaces in your data where the other question had commas.
To alleviate the ValueError change:
data = [map(float, xy.split(',')) for xy in re.split('[ML]', svg_string)[1:]]
to:
data = [map(float, xy.split()) for xy in re.split('[ML]', svg_string)[1:]]
Hopefully this gets you onto the next step.
Edit:
Ok so I looked at the page again, and the data is literally just in a js variable that you can grab from the response. The variable name is 'series' so you either need to do some parsing yourself to grab the data or find a library to work with (e.g. BeautifulSoup, etc.).

Related

How to find the javascript code responsible changing data in an HTML element (based on JSON)?

I am scraping a website where there are clothes with javascript rendered size tables.
Sometimes the sizes are in number format like 38, 40, 50 and sometimes in character format like S, M, L etc.
Examples for these 2 types:
https://www2.hm.com/hu_hu/productpage.0872568004.html ->
Available sizes from the khaki colored one: XS, L
https://www2.hm.com/hu_hu/productpage.0881349001.html -> Available sizes from the beige colored one: 40, 42, 44
The size availability data for the first one (khaki color) comes from this json:
https://www2.hm.com/hmwebservices/service/product/hu/availability/0872568.json
In this list the first seven characters (0872568) are the base product code, the second 3 characters are the color code, and the last 3 characters are the sizes. This means we have sizes like:
002 = XS
005 = L
The size availability for the second coat comes from this json:
https://www2.hm.com/hmwebservices/service/product/hu/availability/0881349.json
based on the previously mentioned logic for this we have sizes like:
005 = 40
006 = 42
007 = 44
As you can see the same looking availability data once mean size L, then mean size 40.
I want to find the code responsible for deciding which format the user will see on the frontend based on this json data.
I went through the source code but I can not find the information I need.
What I've done is to use the recording in Chrome's Developers tool when clicking on the sizes field and looking for the product code, the json url but there is nothing there. Is there any other way I can check what function is called when I click on the size field? I guess the way it handles the json file must be hidden there.
Any guidance would be greatly appreciated.

Gleaning data from tab-delimited txt file with missing values

I need to get make a json file from a whitespace-delineated txt file.
However:
1. the whitespaces are inconsistent in length and
2. some of the data of each "column" is missing.
A single row looks like this in the txt file:
5653 Phrakhtaes Phrakhtaes 34.56717 33.02724 L LCTY GB 05 0 32 Asia/Nicosia 2014-09
Ultimately, this data will go onto Redis. But without some means of creating keys for each "column", I don't see how I can work with this data.
Please, I could really use the help!
Thanks in advance!
Simply just split where there are 2 or more spaces in between your data:
var line = "5653 Phrakhtaes Phrakhtaes 34.56717 33.02724 L LCTY GB 05 0 32 Asia/Nicosia 2014-09";
console.log(line.split(/ +/));
As far as data missing, I'd recommend you just check the length of the array, and < the number of expected results, you simply discard. The only other option is to loop through, and judge which one may be missing (Based on string type, if it's in integer, uppercase, etc...) if there are a variable number of spaces in between data points.

What is the best way to convert Matlab multidimensional cell array to Javascript array?

I have a very large 4D Matlab matrix (31x31x86x127) that I wish to convert into a Javascript 4D array. What is the best way to do this?
Currently my tentative approach will be to either:
1) Write the Matlab matrix into a binary file, and then read that in and build the Javascript.
2) Use JSONlab (http://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab--a-toolbox-to-encode-decode-json-files-in-matlab-octave) to convert the Matlab matrix into a JSON string and then write a custom decoder to turn that JSON string into a Javascript Array. Issue is that the JSON text file is 1.98GB...
3) This may be the best way.
fileID = fopen('test.bin', 'w');
fwrite(fileID,value,'double');
Test.bin is then around 82MB, which is actually what I expect. 31*31*86*127*8bits/double = 82ish MB! However, how do I then read (in the browser) this binary file to a 4d Javascript array? Thanks!
Thoughts?
Thanks for your help!
save is not the right function to write a text file. Use savejson or saveubjson and pass the filename to the function. Do not use the return argument of these functions. Doing so I get a ubjson with less than 100MB and a json with less than 150MB.
My original answer, based on insufficient knowledge about the used code:
Instead of writing your own binary format, use one of the already available binary formats. Try writing it to universal binary json, jsonlab does support it. you should end up with a reasonable sized data without losing the advantages of a standardized file exchange format.
I think the best way is to
Write the matrix out as a string or text file (binary file is not necessary). You will need n-1 delimiters, where n=4 is the number of dimensions for your case. See this Saturn Fiddle as an example for a 2D matrix. Code below
Read the text file into a JavaScript string. How you do this really depends on if you're using JavaScript on the server or web browser.
Parse the string into a JavaScript array. You will have to use the split function on the delimiters from (1) and then enter them into an array like this example.
Code part (1):
% Welcome to SaturnAPI!
% Start collaborating with MATLAB-Octave fiddles and accomplish more.
% Start your script below these comments.
A = [ 1 2 3 ; 4 5 6 ; 7 8 9 ]
for ii=1:size(A)(1)
for jj=1:size(A)(2)
printf(" %d ", A(ii,jj));
end
printf(";");
end
Code part (3):
function make(dim, lvl, arr) {
if (lvl === 1) return [];
if (!lvl) lvl = dim;
if (!arr) arr = [];
for (var i = 0, l = dim; i < l; i += 1) {
arr[i] = make(dim, lvl - 1, arr[i]);
}
return arr;
}
var myMultiArray = make(4);

Angular / Javascript 'rounding' long values?

I have the following JSON:
[{"hashcode": 4830991188237466859},{...}]
I have the following Angular/JS code:
var res = $resource('<something>');
...
res.query({}, function(json) {hashcode = json[0].hashcode;};
...
Surprisingly (to me, I'm no JS expert), I find that something (?) is rounding the value to the precision of 1000 (rounding the last 3 digits). This is a problem, since this is a hash code of something.
If, on the other hand I write the value as a String to the JSON, e.g -
[{"hashcode": "4830991188237466859"},{...}]
this does not happen. But this causes a different problem for me (with JMeter/JSON Path, which extracts the value ["4830991188237466859"] by running my query $.hashcode - which I can't use as a HTTP request parameter (I need to add ?hashcode=... to the query, but I end up with ?hashcode=["..."]
So I appreciate help with:
Understanding who and why -- is rounding my hash, and how to avoid it
Help with JMeter/JSON Path
Thanks!
Each system architecture has a maximum number it can represent. See Number.MAX_VALUE or paste your number into the console. You'll see it happens at the JavaScript level, nothing to do with angular. Since the hash doesn't represent the amount of something, it's perfectly natural for it to be a string. Which leads me to
Nothing wrong with site.com/page?hashcode=4830991188237466859 - it's treated as a string there and you should keep treating it as such.
The javascript Number type is floating point based, and can only represent all integers in the range between -253 and 253. Some integers outside this range are therefore subject to "rounding" as you experience.
In regards to JMeter JSON Path Extractor plugin, the correct JSON Path query for your hashcode will look like
$..hashcode[0]
See Parsing JSON chapter of the guide for XPath to JSON Path mappings and more details.

How do I make Internet Explorer include the required line feeds when transferring innerHTML to a <textarea>?

The purpose of this JavaScript program is to enable the user to report a problem on a social network with all the pertinent information in his initial message.
The user enters the information by answering a set of questions on a form consisting of text boxes, &c.  The answers are used to create an array of string literals, the elements of which are concatenated to form a single string. This string is then presented at the end of the page for the user to copy and then to paste on to the social-network page.
Hitherto this has been done by placing the string (using document.getElementById('divName').innerHTML) in a <div> set up for the purpose. That's fine: works in all five browsers and even on the Iphone.
Now, in order to make it easier for the user to make minor changes to the report before posting it (and to make it easier to copy), I want to be able to place the report not in a <div> but in a <textarea> part of the input form. This too is fine: works in Firefox, Chrome, Opera and Safari – even on the Iphone – but...
With some inevitability the only browser on the whole parade ground marching in step – MSIE – cannot handle it. It puts the information in the <textarea>– minus all the line feeds.
The array is initialized:
function createReport() { // [0] and [21] are constant.
outReportArray[0] = 'E M E R G E N C Y' + horizLine;
for (i=1 ; i<outReportArray.length ; i++) {
outReportArray[i] = '';
}
outReportArray[21] = 'Thank you.';
}
(Global variable horizLine is a sequence of m-rules (—) with a line feed at each end.)
As the user progresses through the form, the array is updated:
outReportArray[element] = label + value + (underLine ? horizLine : '
');
(The following also have been tried for generating the line feed: '\n', '\r\n', '\r', '
', '
')
The output string is continually rebuilt and pasted to the <textarea> so that when he arrives at the foot of the form he presses an ‘update’ button and is simply transferred to the <<textarea>:
outReportStr = ''; // Initialize the output string.
// Build the output string from the output array.
for (i=0 ; i<outReportArray.length ; i++) {
outReportStr += outReportArray[i];
}
// Populate the <textarea> 'outbox'.
document.getElementById('outbox').innerHTML = outReportStr;
Normally the content of the <textarea> looks like this:
E M E R G E N C Y
———————————
Name: John Doe
Land-line: (213) 555 1234
Cell-phone: (213) 555 1235
E-mail: JDoe#aol.com
———————————
Location of animal now:
Washington (St Landry Parish), La
———————————
&c.
(There’s more to it but this demonstrates the layout required in the output.)
In Internet Explorer, however, each line feed is replaced by a single space:
E M E R G E N C Y ——————————— Name: John Doe Land-line: (213) 555 1234 Cell-phone: (213) 555 1235 E-mail: JDoe#aol.com ——————————— Location of animal now: Washington (St Landry Parish), La ——————————— &c.
My question is this: how do I make Internet Explorer include the required line feeds in the innerHTML transferred to <textarea> 'outbox'?
(I have tried creating a textNode consisting of the innerHTML and then appending it as a child to the <textarea>; no dice: what I then get are all the character-entity codes (&#...) instead of the characters themselves.)
I'm very much an amateur at this game so, quite apart from not wanting to impose anything more complicated than HTML, CSS and JavaScript on the user, I don't want to get involved with complicated add-ins and proprietary libraries. A front seat at the pearly gates to any-one that can help me solve this problem!
The opener re-bids
First let me express sincere thanks to all that responded to my question. I had had some doubt of even getting a response, never mind one so quick.
Your insightful answers not only solved my problem but taught me quite a bit about the languages I'm deploying way above my pay grade – even of one I've never used!
#Kolink and #JayC told me to use .value rather than .innerHTML (and Kolink was quite right to adopt a tone of admonishment). Although I was aware of .value as part of the process of transferring data from an input element to the program, it had not even occurred to me that it might be written to: d'oh! I believe is the term.
Thank you, #RobG also, for your account of the use of \u codes; when it came to using .value vice .innerHTML, that was an important part of the solution.
#deceze recommended, indeed pleaded, that I learn 'Markdown' (which I always thought was something retailers applied to merchandise they were putting in to the clearance sale). He didn't say whether that was for my benefit or for his as a possible respondent but, searching for it on Google, I found a very interesting alternative to the jEdit I use, its strength being that the 'code' (the Markdown version of the text) is legible by a human, which must make it much simpler to edit. Thank you, deceze, I'll look in to that in due course (I've even tried coding this message in it); for the time being, not to the best of my knowledge having Perl to hand, I shall have to stick with what I (almost) know.
And, naturally, you all qualify for a front seat at the pearly gates. Thank you all for making my first visit to such a forum both fruitful and enjoyable.
ΠΞ
Set the textarea's value, not its innerHTML. In HTML, whitespace is stripped - you should know that just from making a basic webpage. Internet Explorer is just doing things right, unlike the other browsers...
textareas are very much like inputs. why don't you use the value attribute?
'\n' should work. Inserting the following as the value of a textarea element in IE 8 puts in the returns (wrapped for convenience only):
var s = 'E M E R G E N C Y\n' +
'\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\n' +
'Name: John Doe Land-line';
I've substituted the unicode value rather than the HTML character entity for the "—" character only. You might want to use "\n\r" but I don't think it's necessary.
Oh, you can also do things like:
var t = ['E M E R G E N C Y',
'\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014',
'Name: John Doe Land-line'];
then set the value to:
t.join('\n') // or t.join('\u000d');
You can also substitute \u000a for linefeed and \u000d for carraige return.

Categories

Resources