Guidance on user data management and comparison. Javascript - javascript

Im looking for someone to point me in the right direction to solve a small project im working on using javascript. The idea is i would like the user to be able to input some raw data (which has been copied and pasted) from a website into a form box or input of some sort on day one and then again on day two etc etc.
What i would like the JS to do is compare the two sets of data and return any changes. For example
Day One Raw Data: (copy and pasted from a website)
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3000 low
B
Betty 19 4'8 $2800 low
Bella 23 5'2 £4300 medium
C
Charles 43 5'3 $5000 high
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout
Day Two Raw Data: (copy and pasted from a website)
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3200 low
B
Betty 19 4'8 $2900 low
Bella 23 5'2 £3900 high
C
Charles 43 5'3 $7000 high
Carrie 18 5'8 $1000 medium
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout
The only bit of data im looking to compare is the name + information lines
Andy 29 5'9 $3200 low
for example. The rest of the raw data is just noise which should always be the same, the links on the top of the page for example and the footer at the bottom also including the A, B,C etc which are alphabet links.
What i would like the outcome to be is something like the following:
Results: (printed to page)
Andy 29 5'9 $3200 low --- (+ $200)
Betty 19 4'8 $2900 low --- (+ $100)
Bella 23 5'2 £3900 high --- (- $400 medium)
Charles 43 5'3 $7000 high --- (+ $2000)
Carrie 18 5'8 $1000 medium --- (**New Entry**)
How the results are displayed and the actually figures are irrelevant. Im looking for suggestions for methods to actually achieving this kind of data comparisons where i ignore certain parts of the raw input and compare those that are of importance. Report back with the new and removed entries, changes to duplicate entries. The only data that will ever change is the amount of people in the raw data the headers, footers and alphabet tags will always be there.
Hopefully ive explained well enough to get pointed in the right direction. Thanks for any help in advance.

Ok this is messy (its late) but this will do what you want I think...
There is huge room for cleaning this up so take this as a steer in the right direction. The key is you need regex to analyse the strings. Then there's a fair amount of manipulation to compare the results.
<script>
var dayOne = `Welcome
Home
Contact
Info
A
Andy 29 5'9 $3000 low
B
Betty 19 4'8 $2800 low
Bella 23 5'2 £4300 medium
C
Charles 43 5'3 $5000 high
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout `;
var dayTwo = `
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3200 low
B
Betty 19 4'8 $2900 low
Bella 23 5'2 £3900 high
C
Charles 43 5'3 $7000 high
Carrie 18 5'8 $1000 medium
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout `;
/**
* Converts an array to an object with keys for later comparison
*/
function convertNamesToKeys(arr){
var obj = {}
for(var i=0, j=arr.length; i<j; i+=1){
var name = arr[i].substring(0,arr[i].indexOf(' '));
obj[name] = arr[i];
}
return obj;
}
/**
* Count object length
*/
function getObjectLength(obj) {
var length = 0;
for( var key in obj ) {
if( obj.hasOwnProperty(key) ) {
length+=1;
}
}
return length;
};
/**
* Compares two objects for differences in values
* retains objects with different keys
*/
function compareObjectValue(primaryObject, secondaryObject){
for(var name in primaryObject){
if( primaryObject.hasOwnProperty(name)
&& name in secondaryObject){
if(primaryObject[name] === secondaryObject[name]){
delete primaryObject[name];
}
}
}
//This is your final array which should contain just unique values between the two days
console.log(primaryObject);
}
//split the large string into lines for manageability and simplicity of regex
var dayOneArray = dayOne.match(/[^\r\n]+/g);
var dayTwoArray = dayTwo.match(/[^\r\n]+/g);
//discard any lines which are noise
var regex = /^[a-z\s0-9']+(\$|£)[0-9\sa-z]+$/i
var dayOneFiltered = dayOneArray.filter(line => regex.test(line));
var dayTwoFiltered = dayTwoArray.filter(line => regex.test(line));
//convert the arrays into objects using name as key for easy comparison
var dayOneConverted = convertNamesToKeys(dayOneFiltered);
var dayTwoConverted = convertNamesToKeys(dayTwoFiltered);
//Determine which of the two objects is the larger and loop that one
//We will unset keys which have values that are the same and leave keys
//in the larger array which must be unique - not sure if you want that?
if( getObjectLength(dayOneConverted) > getObjectLength(dayTwoConverted)){
compareObjectValue(dayOneConverted, dayTwoConverted)
}
else {
compareObjectValue(dayTwoConverted, dayOneConverted);
}
</script>

Related

Strategy to Loop Through 2D Array from Google Sheets To Create Entry for Each Date and ID

I have a Google Sheet that contains data I need to manipulate. It is essentially a list of assignments by student. The original columns are:
Name, ID, Assignment 1, Assignment 2, Assignment 3, Assignment 4, Date, Overall Grade
Using existing code, I am concatenating the Assignments into a single field and creating an array with these columns:
ID, Name, Assignments, Date, Overall Grade
The resulting array looks like this:
[ [ '1234',
'Santa Claus',
'US History Chapter 1.1, , , ',
Fri Nov 18 2022 00:00:00 GMT-0800 (Pacific Standard Time),
'B+' ],
[ '1234',
'Santa Claus',
'US History Chapter 2.1, US History Chapter 1.1, , ',
Thu Nov 17 2022 00:00:00 GMT-0800 (Pacific Standard Time),
'B' ],
[ '12222',
'Mary Poppins',
'US History Chapter 8, , , ',
Fri Nov 18 2022 00:00:00 GMT-0800 (Pacific Standard Time),
'A' ]]
On a separate sheet I have a list of dates. What I want is to have a line for every single student and date combination and the assignment data if it exists. E.g. If the dates are Nov 17, Nov 18, and Nov19 and there are 2 students with data in the list, there would be 6 total entries sorted by student then by date. For the Assignments column, there would only be data if there was an assignment entered for that date and that student. Otherwise it would be blank. For example:
ID
Name
Assignments
Date
Grade
1234
Santa Claus
Chapter 2.1, US History Chapter 1.1
Nov17
B
1234
Santa Claus
US History Chapter 1.1
Nov18
B+
1234
Santa Claus
Nov19
12222
Mary Poppins
Nov17
12222
Mary Poppins
US History Chapter 8
Nov 18
A
12222
Mary Poppins
Nov19
What I think is needed is something like this:
Get unique list of all student IDs from the array
Get list of dates from the sheet
Use a nested loop to go through each date and each student to create the lines. If there is a match for the assignment data, add it, if not, leave that blank.
I'm just not sure how to manipulate the array to do something like that. How would I approach this?
Although, unfortunately, I cannot know your actual array, from your showing sample array and a sample table, how about the following sample script?
Sample script:
const array = [,,,]; // Please set your array.
const res = Object.entries(array.reduce((o, e) => (o[e[0]] = o[e[0]] ? [...o[e[0]], e] : [e], o), {}))
.sort(([a], [b]) => Number(a) > Number(b) ? 1 : -1)
.flatMap(([, v]) => v.sort((a, b) => a[3].getTime() > b[3].getTime() ? 1 : -1));
console.log(res)
In this sample script, first, an object is created by checking the column "A". And, sort the array by the column "A". And, for each element, sort the column "D". And, all values are merged as a 2-dimensional array.
Note:
If this sample script doesn't return your exected values, can you provide the detailed sample input values and the sample output values? By this, I would like to confirm it.
References:
reduce()
sort()

DC.js computing mean according to dimension

I have a dataset that looks something like this:
MONTH CAT VAL
may A 1.0
may B 3.2
may C 4.6
jun A 2.7
jun B 4.2
jun C 5.8
jul A 4.1
jul B 9.2
jul C 13.0
I've been able to create a chart in DC.js that shows the sum of VAL according to the CAT variable, with this code:
let chart = dc.barChart('#chart');
let ndx = crossfilter(data);
let catDim = ndx.dimension(function(d){return d.cat;});
let catGroup = catDim.group().reduceSum(function(d){return +d.val;});
chart
.dimension(catDim)
.group(catGroup)
.x(d3.scale.ordinal().domain(catDim))
.xUnits(dc.units.ordinal)
.elasticY(true)
My problem is that instead of the sum, I would like to show in the grafic the average of VAL per MONTH for each CAT (MONTH can be filtered in another graph).
Is there a way to do it?
Thanks in advance for your answers!
So instead of using crossfilter to "just" keep track of the sum, use a custom reduce that tracks both the sum and the number of items, and add a valueAccessor to return x.value.sum/x.value.qty
I would suggest you to use reductio to handle the custom reduce, check the examples, you have one for the average.

javascript "with" structure produce undefined vars

I have an object called totalRevenue which is responsible for storing stacked revenue of company on a monthly basis.
var totalRevenue = {
jan : 147,
feb : 290,
mar : 400
};
Not all month names are present in totalRevenue object & they are created dynamically when populating/estimating their respective values.
In the beginning of every month, we estimate revenue of that month using an algorithm. sth like this:
with(totalRevenue){
mar = mar - feb; // here we calculate pure mar revenue
feb = feb - jan; // here we calculate pure feb revenue
apr = mar - feb; // this is part one of the algo.
}
(I'm using with construct to avoid repetition.)
Now I'm using totalRevenue.apr for the rest of algo computations. But after some challenge, I got that totalRevenue.apr = undefined !!
Does anyone know why?! As I expected it to have a value equal to mar - feb.
If apr isn't a property of totalRevenue then it's not brought into scope by with and your code will create a new global named apr. There would be no way for the interpreter to know if a given name refers to a global or was intended to refer to a heretofore undefined property of the nearest with block so the assumption is it's a global. You can either ensure that totalRevenue has a property for every month or avoid using the with statement entirely. Using with is discouraged MDN has this to say:
Use of the with statement is not recommended, as it may be the source of confusing bugs and compatibility issues. See the "Ambiguity Contra" paragraph in the "Description" section below for details.

Predict next time series data point (time and value)

I have time series data that associates a measurement to the time. Suppose that it is an app where a user enters their height whenever they want. Based on past behavior, I not only what to predict what their next height measurement is, but I also want to predict when the measurement will be entered.
Sample data for a single person:
Date | Measurement
-------------|------------
Nov 8, 2014 | 1.42 m
Nov 23, 2014 | 1.43 m
Mar 8, 2015 | 1.48 m
Jun 15, 2015 | 1.52 m
Dec 18, 2015 | 1.52 m
Mar 1, 2016 | 1.59 m
Nov 8, 2016 | 1.60 m
What I want to predict is the next data point in this series. For example, it might be (Dec 8, 2016, 1.61 m).
My initial thoughts have been to make two separate models, one that is simply the time data with x values being indices. For example
0 | Nov 8, 2014
1 | Nov 23, 2014
2 | Mar 8, 2015
3 | Jun 15, 2015
4 | Dec 18, 2015
5 | Mar 1, 2016
6 | Nov 8, 2016
(where the dates have been converted to minutes since 1970 or something).
Use this model to predict the next time point, then the original model to predict, at that time point, what will be the measurement.
In terms of algorithms to use I was thinking to use a Kalman filter for both models.
My question is that I feel like I am missing something, or possibly over complicating this problem. Does anyone have an idea for an alternative solution?
I will be implementing in javascript with hopefully no external libraries.

Java Date timezone printing different timezones for different years, Workaround needed

While testing my application I got a weird problem. When I put a date having the year before 1945, it changes the timezone.
I have got this simple program to show the problem.
public static void main(String[] args) {
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
Calendar calendar = Calendar.getInstance();
System.out.println("**********Before 1945");
calendar.set(1943, Calendar.APRIL, 12, 5, 34, 12);
System.out.println(format.format(calendar.getTime()));
System.out.println(calendar.getTime());
System.out.println("**********After 1945");
calendar.set(1946, Calendar.APRIL, 12, 5, 34, 12);
System.out.println(format.format(calendar.getTime()));
System.out.println(calendar.getTime());
}
The output I am getting is below:-
**********Before 1945
1943-04-12 05:34:12+0630
Mon Apr 12 05:34:12 IDT 1943
**********After 1945
1946-04-12 05:34:12+0530
Fri Apr 12 05:34:12 IST 1946
For the first one, I am getting it as +0630 and IDT, while for the second one, I am getting +0530 and IST which is expected.
Edit:-
After looking at #Elliott Frisch answer I tried a date before 1942:-
calendar.set(1915, Calendar.APRIL, 12, 5, 34, 12);
System.out.println(format.format(calendar.getTime()));
System.out.println(calendar.getTime());
output:-
1915-04-12 05:34:12+0553
Mon Apr 12 05:34:12 IST 1915
Here again, it says IST but shows +0553. Shouldn't it be +0530.
Just for a comparison, I tried same thing in javascript:-
new Date("1946-04-12 05:34:12") //prints Fri Apr 12 1946 05:34:12 GMT+0530 (IST)
new Date("1943-04-12 05:34:12") //prints Fri Apr 12 1943 05:34:12 GMT+0530 (IST)
new Date("1915-04-12 05:34:12") //prints Mon Apr 12 1915 05:34:12 GMT+0530 (IST)
Which works fine. I want to know why java is affected by it, and if it's a known problem, what is the possible workaround for it.
Thanks in advance.
This is likely the expected behaviour from Java (and not from JavaScript).
As implied by the comment by RobG above, programming languages may or may not support historical time rules (such as DST and timezone offsets). In your case, it appears that your Java runtime supports it, whereas your JavaScript runtime does not.
A list of historical timezones and DST rules for India can be found at timeanddate.com. The list confirms the timezone offsets of your Java dates:
Until 1941: UTC+5:53:20
1941: UTC+6:30
1942: UTC+5:30
1943-44: UTC+6:30
From 1945: UTC+5:30
Checking your dates against Wolfram|Alpha further confirms your Java date UTC offsets: 1915, 1943, 1946
Wikipedia provides more information about time in India:
Calcutta time was officially maintained as a separate time zone until 1948
Calcutta time could either be specified as UTC+5:54 or UTC+5:53:20. The latter is consistent with your code example.
The Wikipedia entry further informs that the current IST timezone with an offset of UTC+5:30 was not in full effect in all of India until 1955.
As pointed out by Elliott Frisch and confirmed by the link to timeanddate.com above, DST was in effect during WWII. In your comment to his answer, you state:
is this the way we are supposed to save in database and use it in applications, or we use some workaround for it
I guess it depends. If you really need to distinguish dates as points in time accurately, you would need a timezone-independent representation such as UTC or unix time (or milliseconds since the unix epoch). If you just work with local dates from the same timezone, a simple string representation (e.g. YYYY-MM-DD hh:mm:ss) could suffice.
There was a war. From the wikipedia link, India observed DST during World War 2, from 1942-1945.
java.time
Avoid using the troublesome old date-time classes bundled with the earliest versions of Java. Now legacy, supplanted by the java.time classes.
ZoneId z = ZoneId.of( "Asia/Kolkata" ); // "Asia/Calcutta"
LocalTime lt = LocalTime.of( 5 , 34 , 12 );
ZonedDateTime zdt1943 = ZonedDateTime.of( LocalDate.of( 1943 , Month.APRIL , 12 ) , lt , z );
ZonedDateTime zdt1945 = ZonedDateTime.of( LocalDate.of( 1945 , Month.APRIL , 12 ) , lt , z );
ZonedDateTime zdt1946 = ZonedDateTime.of( LocalDate.of( 1946 , Month.APRIL , 12 ) , lt , z );
ZonedDateTime zdt2016 = ZonedDateTime.of( LocalDate.of( 2016 , Month.APRIL , 12 ) , lt , z );
Dump to console.
System.out.println( "zdt1943: " + zdt1943 );
System.out.println( "zdt1945: " + zdt1945 );
System.out.println( "zdt1946: " + zdt1946 );
System.out.println( "zdt2016: " + zdt2016 );
See live code in IdeOne.com.
When run. We see the same behavior as described in your Question, with an offset-from-UTC of six and a half hours during the war and five and a half after. We get the same behavior whether using Asia/Kolkata or Asia/Calcutta. The java.time classes use tzdata, formerly known as Olson Database.
zdt1943: 1943-04-12T05:34:12+06:30[Asia/Kolkata]
zdt1945: 1945-04-12T05:34:12+06:30[Asia/Kolkata]
zdt1946: 1946-04-12T05:34:12+05:30[Asia/Kolkata]
zdt2016: 2016-04-12T05:34:12+05:30[Asia/Kolkata]
In the Question…
When I put a date having the year before 1945, it changes the timezone.
No it does not change the time zone. The results say that in the earlier years “5:34” was defined as six and a half hours ahead of UTC while in later years the definition became five and a half hours ahead of UTC. Just as “5:34” means eight hours behind UTC in Seattle in the summer but seven hours in the winter, because of Daylight Saving Time (DST) nonsense.
But I suspect these may wrong values; read on.
Time in Calcutta
The behavior we are seeing does not seem to match my reading of this Wikipedia page, Time in Calcutta. That page describes odd offsets other than on-the-hour or on-the-half-hour, such as UTC+05:54, which we are not seeing in any of our respective code samples.
So I suspect tzdata does not contain this historical data for India. But just this layperson’s guess; I am no historian.
Do not use date-time types for historical values
While I do not know the specifics of time in this historical period of India and its handling in tzdata, it seems that none of our date-time libraries are handling these historical nuances.
But we should not expect such handling! Know that tzdata does not promise complete coverage of time zones before 1970.
When referring to historical date-time values, I suggest you use simply text rather than any of the date-time data types. The purpose of the data types is for validation and calculation. You are likely doing neither for historical values. I cannot imagine you are determining the number of days an invoice is overdue from 1943.
Perhaps you should edit your Question to describe why you are storing these historical date-time values in a database so precisely. If you were merely experimenting and noticed these issues, know that you should not expect precise date-time handling in the far past (before 1970) nor into the far future (past the few weeks notice politicians sometimes give about sudden time zone definition changes).
Upshot: Attempting to precisely handle historical date-time values is fraught with various issues and seems pointless to me.
what is the possible workaround for it
I suggest using “local” date-time values as text in ISO 8601 format without any time zone.
I would recommend keeping the epoch equivalent of the dates in the database. I believe, irrespective of the day light savings, the time and date of a period represents the actual, be it IDT or IST. I would use the example https://stackoverflow.com/a/6687502 and convert all the dates to epoch and store into DB. I will reverse the logic to display the date time from the database along with IDT/IST indicator to avoid the confusion for the user.

Categories

Resources