I have a large number of txt file which contains information in a key value pair format
"Site Code": "LEYB"
"Also known as": ""
"Location": "Pier Site, Poblacion del Sur, Villaba, Southern Leyte"
"Contact person(s)": ""
"Coordinates[1]": "11 12 40.302622, 124 23 21.450632"
"Coordinates[2]": "11.211195, 124.389292"
"School ID": ""
"Site Description": "Benchmark LEYB is on end part of right side wall,leading to the seaport"
"Sketch": "./LEYB.docx"
"Constructed": "PHIVOLCS - October 2009"
"Method" : "Campaign"
All I want to do is to extract those information to create a master file. maybe in a column format such as csv, JSON or excel.
Can you suggest a tool or a file system strategy in Node.js that can achieve my goal.
Try this. Assuming file.txt is the file where you have the data in key value pair (but not in proper json format)
var fs = require("fs");
var content = fs.readFileSync("file.txt");
var lines = content.toString().split('\n');
var myObj = {};
for(var line = 0; line < lines.length; line++){
var currentline = lines[line].split(':');
myObj[currentline[0].trim().replace(/["]/g, "")] = currentline[1].trim().replace(/["]/g, "");
}
console.log(myObj);
This will give you a proper object which you can then use to convert to csv,json or whatever.
To convert to JSON use.
JSON.stringify(myObj);
Related
I am trying to scrape a javascript web page. Having read some of the posts I managed to write the following:
from bs4 import BeautifulSoup
import requests
website_url = requests.get('https://ec.europa.eu/health/documents/community-register/html/reg_hum_atc.htm').text
soup= BeautifulSoup(website_url,'lxml')
print(soup.prettify())
and recover the following scripts as follows:
soup.find_all('script')[3]
which gives:
<script type="text/javascript">
// Initialize script parameters.
var exportTitle ="Centralised medicinal products for human use by ATC code";
// Initialise the dataset.
var dataSet = [
{"id":"A","parent":"#","text":"A - Alimentary tract and metabolism"},
{"id":"A02","parent":"A","text":"A02 - Drugs for acid related disorders"},
{"id":"A02B","parent":"A02","text":"A02B - Drugs for treatment of peptic ulcer"},
{"id":"A02BC","parent":"A02B","text":"A02BC - Proton pump inhibitors"},
{"id":"A02BC01","parent":"A02BC","text":"A02BC01 - omeprazole"},
{"id":"ho15861","parent":"A02BC01","text":"Losec and associated names (referral)","type":"pl"},
...
{"id":"h154","parent":"V09IA05","text":"NeoSpect (withdrawn)","type":"pl"},
{"id":"V09IA09","parent":"V09IA","text":"V09IA09 - technetium (<sup>99m</sup>Tc) tilmanocept"},
{"id":"h955","parent":"V09IA09","text":"Lymphoseek (active)","type":"pl"},
{"id":"V09IB","parent":"V09I","text":"V09IB - Indium (<sup>111</sup>In) compounds"},
{"id":"V09IB03","parent":"V09IB","text":"V09IB03 - indium (<sup>111</sup>In) antiovariumcarcinoma antibody"},{"id":"h025","parent":"V09IB03","text":"Indimacis 125 (withdrawn)","type":"pl"},
...
]; </script>
Now the problem that I am facing is to apply .text() to soup.find_all('script')[3] and recover a json file from that. When I try to apply .text(), the result is an empty string: ''.
So my question is: why is that? Ideally I would like to end up with:
A02BC01 Losec and associated names (referral)
...
V09IA05 NeoSpect (withdrawn)
V09IA09 Lymphoseek
V09IB03 Indimacis 125 (withdrawn)
...
Firstly, you get the text and after that, some string processing - get all the text after 'dataSet = ' and remove the last ';' to have a beautiful JSON array. At the end to process the JSON array in small jsons and print the data.
data = soup.find_all("script")[3].string
dataJson = data.split('dataSet = ')[1].split(';')[0]
jsonArray = json.loads(dataJson)
for jsonElement in jsonArray:
print(jsonElement['parent'], end=' ')
print(jsonElement['text'])
I'm trying to assign an R output
my_r_output <- data.frame( Abundances=c(50.00, 30.00, 20.00), Microbes=c("Microbe1", "Microbe2", "Microbe3"), Test=c(1,1,1))
to a javascript variable in my RMarkdown document to obtain something like this when it is converted into a .md document:
var abundances = 'Abundances Microbes Test
50.00 Microbe1 1
30.00 Microbe2 1
20.00 Microbe3 1
'
My R output is a dataframe and the Javascript variable has to be a TSV formatted string.
I tried the following command:
var abundances = '`r write.table( my_r_output, file = "", sep = "\t" )`'
file = "" to write to stdout
sep = "\t"to obtain TSV formatted string
But I obtain the following result when the .Rmd is converted into a .md document:
var abundances = '
UPDATE1:
Based on Martin Schmelzer reply I tried the following command in my .Rmd document:
var abundances = '`r paste(capture.output(write.table( my_r_output, file = "", sep = "\t" )), "\n", sep="") `'
And I obtain the follwing results in my .md document:
var abundances = 'Abundances Microbes Test
, 50.00 Microbe1 1
, 30.00 Microbe2 1
, 20.00 Microbe3 1
'
But it added commas at the beginning of each row.
I have two strings that i would like to extract specific strings from.
var companyString1;
var companyName1;
var companyString2;
var companyName2;
var stockString1 = "STOCKDETAILS:TWEETS:FB:Facebook Inc.";
var stockString2 = "I have returned -- Facebook Inc. (FB) -- stock back.";
companySymbol1 = ? //would like this to be "FB"
companyName1 = ? //would like this to be "Facebook Inc."
companySymbol2 = ? //would like this to be "FB"
companyName2 = ? //would like this to be "Facebook Inc."
What is regex i can apply to stockSTring1 to extract "FB" (into companySymbol1 var) and "Facebook Inc." (into companyName1 var). Similarly i want to extract "FB" (into companySymbol2 var) and "Facebook Inc." (into companyName2 var) fro stockString2.
The format of stockString1, stockString2 will be guaranteed to be consistent fro source -- so you can assume there could be other symbols and names (e.g. GOOG/Google Inc, MSFT/Microsoft Corp. etc)
Truly appreciate any help.
You could do this with split, and one regular expression just to chop off the closing ) in the second case:
function getStock(s) {
var parts = s.split(/\)? -- /);
if (parts.length < 2) { // other format
return s.split(':').slice(2, 4);
}
return parts[1].split(' (').reverse();
}
var stockString1 = "STOCKDETAILS:TWEETS:FB:Facebook Inc.";
var stockString2 = "I have returned -- Facebook Inc. (FB) -- stock back.";
var [companySymbol1, companyName1] = getStock(stockString1);
var [companySymbol2, companyName2] = getStock(stockString2);
console.log(companySymbol1, companyName1);
console.log(companySymbol2, companyName2);
I Have this kind of JSON Object
"{\"date\": \" 9 Marzo\", \"time\": \" 07:00 - 13:20\", \"descrizione\": \" concerto\", \"alimenti\": [{ \"macchina\":\"si\", \"pollo\":\"no\" }] }";
I want to get exactly the string "macchina" and "pollo", which are the keys text/value (I get the Object from an ajax, so "9 Marzo" would be like response.date), and same for "si" and "no", I cannot arrive to them.
I have tryed console.log(response.alimenti[i][0]); but it's undefined.
i come from the cicle: for (i = 0; i < response.ruoli.length; i++)
This will get you to the strings "macchina" and "pollo":
var json = "{\"date\": \" 9 Marzo\", \"time\": \" 07:00 - 13:20\", \"descrizione\": \" concerto\", \"alimenti\": [{ \"macchina\":\"si\", \"pollo\":\"no\" }] }";
var obj = JSON.parse(json);
for (var k in obj.alimenti[0]) {
console.log(k);
}
or their values:
for (var k in obj.alimenti[0]) {
console.log(obj.alimenti[0][k]);
}
You'd be better off parsing the JSON object and then extracting the string from the javascript object.
i.e var obj = JSON.parse("{\"date\": \" 9 Marzo\", \"time\": \" 07:00 - 13:20\", \"descrizione\": \" concerto\", \"alimenti\": { \"macchina\":\"si\", \"pollo\":\"no\" } }";);
console.log(obj.alimenti[0].macchina);
or pollo
console.log(obj.alimenti[0].pollo);
Also, that object structure is a little weird. You might want to remove the array from within the alimenti to better access the data.
Using response.alimenti[i][0] won't work because alimenti is an array of object(s), not an array of arrays.
This instead:
var alimenti = response.alimenti[0];
console.log(alimenti.maccina);
console.log(alimenti.pollo);
Example: http://jsfiddle.net/zcuwfb9s/
i want generate .csv in javascript. i have the object, the name is "archivo". the
This problem is when generate the file csv, in each line add ',', i don't know what happen
archivo=[], each line is string + '\n'.
if (navigator.appName == 'Microsoft Internet Explorer') {
var popup = window.open('','csv','');
popup.document.body.innerHTML = '<pre>' + archivo[i] + '</pre>';
}else{
location.href='data:application/download; charset=utf8,' + encodeURIComponent(archivo);
}
any can help me?
You should consider using a CSV generator library, that handles all this for you.
I've written a light-weight client-side CSV generator library that might come in handy. Check it out on http://atornblad.se/github/ (scroll down to the headline saying Client-side CSV file generator)
It requires a functioning FileSaver implementation handling calls to window.saveAs(). Check out Eli Grey's solution on http://eligrey.com/blog/post/saving-generated-files-on-the-client-side
When in place, you simply generate and save CSV files on the fly like this:
var propertyOrder = ["name", "age", "height"];
var csv = new Csv(propertyOrder);
csv.add({ name : "Anders",
age : 38,
height : "178cm" });
csv.add({ name : "John Doe",
age : 50,
height : "184cm" });
csv.saveAs("people.csv");