Related
I have collected an array of weather data that looks like this:
const data = [{
"city_name": "London",
"lat": 51.507351,
"lon": -0.127758,
"main": {
"temp": 289.89,
"temp_min": 288.468,
"temp_max": 291.15,
"feels_like": 287.15,
"pressure": 1004,
"humidity": 77
},
"wind": { "speed": 5.1, "deg": 230 },
"clouds": { "all": 90 },
"weather": [
{
"id": 804,
"main": "Clouds",
"description": "overcast clouds",
"icon": "04n"
}
],
"dt": 1593561600,
"dt_iso": "2020-07-01 00:00:00 +0000 UTC",
"timezone": 3600
},
...
];
This data continues in ascending date order (hour by hour), for the last 40 years.
(sample: https://pastebin.com/ciHJGhnq ) - the entire dataset is over 140MB.
From this data, I'd like to obtain the average temperature (object.main.temp) for each Month and Week of month, across the entire dataset.
The question I am trying to answer with my data is:
What is the average temperature for January, across the last 40 years.
What is the average temperature for February, across the last 40 years.
...
(get the temperature of each week in January and divide by the number of weeks, repeat for all of the other Januaries in the dataset and average that out too).
Repeat for remaining months.
The output I am aiming to create after parsing the data is:
{
[
"JANUARY": {
"weekNumber": {
"avgWeekTemp": 100.00
}
"avgMonthTemp": 69.00,
...
},
...
]
}
The city name & structure of the objects are always the same, in this case London.
// build a unique number of months
// work through our data to work out the week numbers
// work through the data once again and place the data in the right week inside finalOutput
// work through the final output to determine the average values
Unfortunately I'm not very proficient in JavaScript, so I couldn't get past the second obstacle:
"use strict";
const moment = require("moment");
const data = require("./data.json");
let months = [
{
January: [],
},
{
February: [],
},
{
March: [],
},
{
April: [],
},
{
May: [],
},
{
June: [],
},
{ July: [] },
{ August: [] },
{ September: [] },
{ October: [] },
{ November: [] },
{ December: [] },
];
const finalOutput = [];
finalOutput.push(...months);
data.forEach((object) =>
finalOutput.forEach((month) => {
if (
Object.keys(month)[0] === moment(new Date(object.dt_iso)).format("MMMM")
) {
[month].push(object.dt_iso);
}
})
);
console.log(finalOutput);
Which only returned the array of months with nothing in each month.
[
{ January: [] },
{ February: [] },
{ March: [] },
{ April: [] },
{ May: [] },
{ June: [] },
{ July: [] },
{ August: [] },
{ September: [] },
{ October: [] },
{ November: [] },
{ December: [] }
]
How can I calculate the average values per week & month across my entire data set?
I'm going to write your script for you, but while you wait here's some high-level guidance.
First, let's study your data. Each row is an hourly weather measurement. As a result, each datapoint you want will be an aggregate over a set of these rows. We should organize the script along those lines:
We'll write a function that accepts a bunch of rows and returns the arithmetic mean of the temperatures of those rows: function getAvgTemp(rows) => Number
We'll write another function that takes a bunch of rows, plus the desired month, and returns all the rows for just that month: function getRowsByMonth(month) => Array(rows)
We'll write another function that takes a bunch of rows, plus the desired week number, and returns all the rows for just that week: function getRowsByWeekNumber(rows, weekNumber) => Array(rows)
^^ that's if "week number" means 1-52. But if "week number" means "week within the month," then instead we'll do:
A function will also take a month: function getRowsByMonthWeek(rows, month, weekNumber) => Array(rows)
From these basic building blocks, we can write a routine that assembles the data you want.
What would that routine look like?
Probably something like this:
Loop through all the months of the year. We won't look in the data for these months, we'll hard-code them.
For each month, we'll call getRowsByMonth on the full data set. Call this set monthRows.
We'll pass monthRows to getAvgTemp -- it doesn't care what the timespan is, it just extracts and crunches the temp data it receives. That's our avgMonthTemp solved for.
Depending on what you mean by "week number," we'll divide the monthRows into smaller sets and then pass each set into getAvgTemp. (The hardest part of your script will be this division logic, but that's not to say it will be that hard.) This gives us your weekly averages.
We'll assemble these values into a data structure and insert it into the final structure that ultimately gets returned/logged.
Here's the implementation. It's a little different than I expected.
The biggest change is that I did some pre-processing up front so that the date values don't have to be parsed multiple times. While doing that, I also calculate each row's weekNumber. As a consequence, the week logic took the form of grouping rows by their weekNumbers rather than querying the dataset by weekNumber.
Some notes:
I decided that "weekNumber" means "week-of-year."
Instead of using Moment, I found a week-number algorithm on StackOverflow. If you want to use Moment's algo instead, go ahead.
The output data structure is not what you described.
Your example is not valid JSON, so I made a guess as to what you had in mind.
Here's an example of what it looks like:
{
"JUNE": {
"avgMonthTemp": 289.9727083333334,
"avgWeekTemps": {
"25": 289.99106382978727,
"26": 289.11
}
},
"JULY": {
"avgMonthTemp": 289.9727083333334,
"avgWeekTemps": {
"27": 289.99106382978727,
"30": 289.11
}
}
}
The output will include a top-level entry for every month, whether or not there is any data for that month. However, the avgWeekTemps hash will only have entries for weeks that are present in the data. Both behaviors can be changed, of course.
It's a reusable script that processes arbitrary JSON files in the format you shared.
You mentioned that each file has data from one city, so I figured you'll be running this on multiple files. I set it up so you can pass the path to the data file as a command-line argument. Note that the CLI logic is not sophisticated, so if you're doing funky things you will have a bad time. Doing CLI stuff well is a whole separate topic.
If your data for London is in a file named london.json, this is how you would process that file and save the results to the file london-temps.json:
$ node meantemp.js london.json > london-temps.json
// meantemp.js
const FS = require('fs')
// sets the language used for month names
// for language choices, see: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
const MONTH_NAME_LANG_CODE = 'en-US'
// generate the list of month names once
const MONTH_NAMES = Array(12).fill().map(
( _, monthNum ) => new Date(2020, monthNum).toLocaleDateString(MONTH_NAME_LANG_CODE, { month: 'long' }).toUpperCase()
)
main()
function main() {
let filepath = process.argv[2]
let cityData = readJsonFile(filepath)
// before working on the data, prep the date values for processing
let allRows = cityData.map(row => {
let _date = new Date(row.dt_iso)
let _weekNum = getWeekNum(_date)
return { ...row, _date, _weekNum }
})
let output = MONTH_NAMES.reduce(( hash, monthName, monthNum ) => {
// grab this month's rows
let monthRows = getRowsForMonth(allRows, monthNum)
// calculate monthly average
let avgMonthTemp = getMeanTemp(monthRows)
// calculate weekly averages
let rowsByWeekNum = groupRowsByWeekNum(monthRows)
let avgWeekTemps = Object.keys(rowsByWeekNum)
.reduce(( hash, weekNum ) => ({
...hash,
[weekNum]: getMeanTemp(rowsByWeekNum[weekNum])
}), {})
return {
...hash,
[monthName]: { avgMonthTemp, avgWeekTemps }
}
}, {})
console.log(JSON.stringify(output))
}
function readJsonFile( path ) {
try {
let text = FS.readFileSync(path, 'utf8')
return JSON.parse(text)
} catch ( error ) {
if(error.code === 'ENOENT') {
console.error(`Could not find or read path ${JSON.stringify(path)}`)
process.exit()
} else if(error instanceof SyntaxError) {
console.error(`File is not valid JSON`)
process.exit()
} else {
throw error
}
}
}
function getRowsForMonth( rows, monthNum ) {
return rows.filter(row => monthNum === row._date.getUTCMonth())
}
function groupRowsByWeekNum( rows ) {
return rows.reduce(( hash, row ) => {
if(!hash.hasOwnProperty(row._weekNum)) {
hash[row._weekNum] = []
}
hash[row._weekNum].push(row)
return hash
}, {})
}
// ISO8601-compliant week-of-year function
// taken from https://stackoverflow.com/a/39502645/814463
// modified by me to prevent mutation of args
function getWeekNum( date ) {
// if date is a valid date, create a copy of it to prevent mutation
date = date instanceof Date
? new Date(date.getUTCFullYear(), date.getUTCMonth(), date.getUTCDate())
: new Date()
let nDay = (date.getDay() + 6) % 7
date.setDate(date.getDate() - nDay + 3)
let n1stThursday = date.valueOf()
date.setMonth(0, 1)
if (date.getDay() !== 4) {
date.setMonth(0, 1 + ((4 - date.getDay()) + 7) % 7)
}
return 1 + Math.ceil((n1stThursday - date) / 604800000)
}
function getMeanTemp( hourlyReadings ) {
let temps = hourlyReadings.map(reading => reading.main.temp)
let mean = getMean(temps)
return mean
}
function getMean( numbers ) {
let sum = numbers.reduce(( sum, num ) => sum + num, 0)
let mean = sum / numbers.length
return mean
}
have a little newbe problem although I couldn't find a solution for similar problems, that worked for me.
Here is my collection:
{
"_id": ObjectId("5bc712851224ceec702d9bdf"),
"index": "123456",
"name": "Jan",
"surname": "Nowak",
"grades": {
"IABD": [
2,
3.5,
4
],
"NPAD": [
4,
4,
5
]
}
}
now I need to push additional grades to specific (function parameters) courses.
So I tried tackling it on a few levels and I'd love somebody to walk me through it according to this:
First I wanted to succeed not passing course as a parameter:
function add_grade(index="123456", course="IABD", grade=5.5)
{
db.students.update( {"index" : index }, { $push: { "grades" : { "IABD" : grade } } } );
}
well nothing happened (grade was not added to the list of grades)
I wanted to see some result, so I wanted to see if $set would work and it did!
function add_grade(index="123456", course="IABD", grade=5.5)
{
db.students.update( {"index" : index }, { $set: { "grades" : { "IABD" : grade } } } );
}
but it threw away my entire grades object (as expected). At least I know I'm on the right track.
Question 1: Why $push didn't work the way I expected
Question 2: how to use course parameter in set/push?
Just to clarify Q2, I'm not lazy, I've tried many approaches, none of which worked, please help!
You can try below query. That push 6 into IABD
db.getCollection('students').update( { "index": "123456" }, { $push: { "grades.IABD" : 6 } });
$push is not working the way you are expecting because the array field is in an embedded document and to push you need to use
dot notation i.e. instead of
{ "$push": { "grades" : { "IABD" : grade } } }
what you need is to specify the field as dot notation
{ "$push": { "grades.IABD" : grade } }
To use the course parameter in push you would want to create an update object that holds the dot notation
{ "grades.<course>" : grade }
for example
var course = "IABD";
var grade = 5.5;
var update = {};
update["grades." + course] = grade;
printjson(update) // prints { "grades.IABD" : 5.5 }
So your function will look like
function add_grade(index="123456", course="IABD", grade=5.5) {
var update = {};
update["grades." + course] = grade;
db.students.update(
{ "index" : index },
{ "$push": update }
);
}
I'm trying to search for the existence of a nested element, as well as getting a timestamp greater than:
db.stats.find( { $and: [ { 'data.Statistics': {$exists: true} },{ timestamp: {$gte: 1} } ] }
From the docs I can't see where I'm going wrong, but I'm not getting anything back.
Just doing:
var query = {};
query["data.Statistics"] = {$exists: true}
works however.
The $and operator is not really necessary in this case as it can be implicitly used by just specifying a comma separated list of expressions thus you can re-write your query as:
db.stats.find({
"data.Statistics": { "$exists": true },
"timestamp": { "$gte": 1 }
});
If using a variable to create the query object using the square-bracket notation, you can approach it like
var query = {};
query["data.Statistics"] = { "$exists": true };
query["timestamp"] = { "$gte": 1 };
db.stats.find(query);
I'm new to MongoDB and Mongoose and I'm trying to use it to save stock ticks for daytrading analysis. So I imagined this Schema:
symbolSchema = Schema({
name:String,
code:String
});
quoteSchema = Schema({
date:{type:Date, default: now},
open:Number,
high:Number,
low:Number,
close:Number,
volume:Number
});
intradayQuotesSchema = Schema({
id_symbol:{type:Schema.Types.ObjectId, ref:"symbol"},
day:Date,
quotes:[quotesSchema]
});
From my link I receive information like this every minute:
date | symbol | open | high | low | close | volume
2015-03-09 13:23:00|AAPL|127,14|127,17|127,12|127,15|19734
I have to:
Find the ObjectId of the symbol (AAPL).
Discover if the intradayQuote document of this symbol already exists (symbol and date combination)
Discover if the minute OHLCV data of this symbol exists on the quotes array (because it could be repeated)
Update or create the document and update or create the quotes inside the array
I'm able to accomplish this task without veryfing if the quotes already exists, but this method can creates repeated entries inside quotes array:
symbol.find({"code":mySymbol}, function(err, stock) {
intradayQuote.findOneAndUpdate({
{ id_symbol:stock[0]._id, day: myDay },
{ $push: { quotes: myQuotes } },
{ upsert: true },
myCallback
});
});
I already tried:
$addToSet instead of $push, but unfortunatelly this doesn't seems to work with array of documents
{ id_symbol:stock[0]._id, day: myDay, 'quotes["date"]': myDate } on the conditions of findOneAndUpdate; but unfortunatelly if mongo doesn't find it, it creates a new document for the minute instead of appending to the quotes array.
Is there a way to get this working without using one more query (I'm already using 2)? Should I rethink my Schema to facilitate this job? Any help will be appreciated. Thanks!
Basically put an $addToSet operator cannot work for you because your data is not a true "set" by definition being a collection of "completely distinct" objects.
The other piece of logical sense here is that you would be working on the data as it arrives, either as a sinlge object or a feed. I'll presume its a feed of many items in some form and that you can use some sort of stream processor to arrive at this structure per document received:
{
"date": new Date("2015-03-09 13:23:00.000Z"),
"symbol": "AAPL",
"open": 127.14
"high": 127.17,
"low": 127.12
"close": 127.15,
"volume": 19734
}
Converting to a standard decimal format as well as a UTC date since any locale settings really should be the domain of your application once data is retrieved from the datastore of course.
I would also at least flatten out your "intraDayQuoteSchema" a little by removing the reference to the other collection and just putting the data in there. You would still need a lookup on insertion, but the overhead of the additional populate on read would seem to be more costly than the storage overhead:
intradayQuotesSchema = Schema({
symbol:{
name: String,
code: String
},
day:Date,
quotes:[quotesSchema]
});
It depends on you usage patterns, but it's likely to be more effective that way.
The rest really comes down to what is acceptable to
stream.on(function(data) {
var symbol = data.symbol,
myDay = new Date(
data.date.valueOf() -
( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
delete data.symbol;
symbol.findOne({ "code": symbol },function(err,stock) {
intraDayQuote.findOneAndUpdate(
{ "symbol.code": symbol , "day": myDay },
{ "$setOnInsert": {
"symbol.name": stock.name
"quotes": [data]
}},
{ "upsert": true }
function(err,doc) {
intraDayQuote.findOneAndUpdate(
{
"symbol.code": symbol,
"day": myDay,
"quotes.date": data.date
},
{ "$set": { "quotes.$": data } },
function(err,doc) {
intraDayQuote.findOneAndUpdate(
{
"symbol.code": symbol,
"day": myDay,
"quotes.date": { "$ne": data.date }
},
{ "$push": { "quotes": data } },
function(err,doc) {
}
);
}
);
}
);
});
});
If you don't actually need the modified document in the response then you would get some benefit by implementing the Bulk Operations API here and sending all updates in this package within a single database request:
stream.on("data",function(data) {
var symbol = data.symbol,
myDay = new Date(
data.date.valueOf() -
( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
delete data.symbol;
symbol.findOne({ "code": symbol },function(err,stock) {
var bulk = intraDayQuote.collection.initializeOrderedBulkOp();
bulk.find({ "symbol.code": symbol , "day": myDay })
.upsert().updateOne({
"$setOnInsert": {
"symbol.name": stock.name
"quotes": [data]
}
});
bulk.find({
"symbol.code": symbol,
"day": myDay,
"quotes.date": data.date
}).updateOne({
"$set": { "quotes.$": data }
});
bulk.find({
"symbol.code": symbol,
"day": myDay,
"quotes.date": { "$ne": data.date }
}).updateOne({
"$push": { "quotes": data }
});
bulk.execute(function(err,result) {
// maybe do something with the response
});
});
});
The point is that only one of the statements there will actually modify data, and since this is all sent in the same request there is less back and forth between the application and server.
The alternate case is that it might just be more simple in this case to have the actual data referenced in another collection. This then just becomes a simple matter of processing upserts:
intradayQuotesSchema = Schema({
symbol:{
name: String,
code: String
},
day:Date,
quotes:[{ type: Schema.Types.ObjectId, ref: "quote" }]
});
// and in the steam processor
stream.on("data",function(data) {
var symbol = data.symbol,
myDay = new Date(
data.date.valueOf() -
( data.date.valueOf() % 1000 * 60 * 60 * 24 ));
delete data.symbol;
symbol.findOne({ "code": symbol },function(err,stock) {
quote.update(
{ "date": data.date },
{ "$setOnInsert": data },
{ "upsert": true },
function(err,num,raw) {
if ( !raw.updatedExisting ) {
intraDayQuote.update(
{ "symbol.code": symbol , "day": myDay },
{
"$setOnInsert": {
"symbol.name": stock.name
},
"$addToSet": { "quotes": data }
},
{ "upsert": true },
function(err,num,raw) {
}
);
}
}
);
});
});
It really comes down to how important to you is it to have the data for quotes nested within the "day" document. The main distinction is if you want to query those documents based on the data some of those "quote" fields or otherwise live with the overhead of using .populate() to pull in the "quotes" from the other collection.
Of course if referenced and the quote data is important to your query filtering, then you can always just query that collection for the _id values that match and use an $in query on the "day" documents to only match days that contain those matched "quote" documents.
It's a big decision where it matters most which path you take based on how your application uses the data. Hopefully this should guide you on the general concepts behind doing what you want to achieve.
P.S Unless you are "sure" that your source data is always a date rounded to an exact "minute" then you probably want to employ the same kind of date rounding math as used to get the discrete "day" as well.
I am trying to load the value from the local sqlite dB (which I have tested and the values are there) into a global model that I can use in my views. When I try to print the values of the model after creating it using Ti.API.info(library.at(i));in index.js, it returns undefined most of the time and sometimes a function call like function lastIndexOf() { [native code] }. What is going on and how can I fix it? Here is my model (UpcomingRaces.js):
exports.definition = {
config: {
columns: {
"Name": "string",
"Location": "string",
"Date": "string",
"Url": "string"//,
//"Id" : "INTEGER PRIMARY KEY AUTOINCREMENT"
},
defaults: {
"Name": "string",
"Location": "string",
"Date": "string",
"Url": "string"
},
adapter: {
type: "sql",
collection_name: "UpcomingRaces",
//idAttribute: "Id"
}
},
extendModel: function(Model) {
_.extend(Model.prototype, {
// extended functions and properties go here
});
return Model;
},
extendCollection: function(Collection) {
_.extend(Collection.prototype, {
// extended functions and properties go here
comparator: function(UpcomingRaces) {
return UpcomingRaces.get('Name');
}
});
return Collection;
}
};
Here is how I am reading it into a model (index.js):
var library = Alloy.Collections.UpcomingRaces;
library.fetch();
function prepareView()
{
// Automatically Update local DB from remote DB
updateRaces.open('GET', 'http://fakeurl.com/api/UpcomingRaces');
updateRaces.send();
library && library.fetch();
// Read DB to create current upcomingRaces model
// Insert the JSON data to the table view
for ( var i in library ) {
Ti.API.info(library.at(i));
data.push(Alloy.createController('row', {
Name : library[i]['Name'],
Date : library[i]['Date']
}).getView());
}
$.table.setData(data);
}
Also I've got this in my alloy.js file
Alloy.Collections.UpcomingRaces = Alloy.createCollection('UpcomingRaces');
the problem is with your for loop:
// Insert the JSON data to the table view
for ( var i in library ) { // i here is an instance of the Model, not an index
Ti.API.info(library.at(i)); // <-- error here
data.push(Alloy.createController('row', {
Name : library[i]['Name'],
Date : library[i]['Date']
}).getView());
}
no need to call library.at(i), just use i element. So the correct code would be:
// Insert the JSON data to the table view
for ( var element in library ) {
Ti.API.info(JSON.stringify(element));
data.push(Alloy.createController('row', {
Name : element.get('Name'),
Date : element.get('Date')
}).getView());
}