Trying to create a hidden markov model to find recurring payments in this transactions json:
https://pastebin.com/tzRaqMxk
I created a similarity score, to estimate the likely hood of a transaction date, amount, and name being a recurring transaction.
nn = require('nearest-neighbor');
const items = https://pastebin.com/tzRaqMxk //pastebin json here
var query = { amount: 89.4, name: "SparkFun", date: "2017-05-28"};
var fields = [
{ name: "name", measure: nn.comparisonMethods.word },
{ name: "amount", measure: nn.comparisonMethods.number, max: 100 },
{ name: "date", measure: nn.comparisonMethods.date, max: 31 }
];
nn.findMostSimilar(query, items, fields, function(nearestNeighbor, probability) {
console.log(query);
console.log(nearestNeighbor);
console.log(probability);
});
The first challenge is what to do if the recurring transaction is not on the same day of the month I.e. usually happens on the 18th, but because the 18th fell on a Saturday, the payment doesn't clear until the 20th. What statistical measure to I used to identify a similar score as nearly the same, but not a probability of 1.
Then after I have this array of data, how can I feed that into a hidden markov model?
Related
Let's say you were given an array of object data from database by typing some query.
And you want to get some results of remained_deposit, remained_point, total_balance and refund_amount.
Before we dive into the problem
'DEPOSIT' is the money that user deposit
'POINT' is like an additional/extra money/deposit that the business provide when the amount of deposit is over $30($30 deposit => 3 POINT is given, $50 deposit => 10 POINT is given )
'POINT' is not refundable.
'DEDUCTION' is the amount of money that user spent.
THE RULE IS YOU NEED TO SUBTRACT DEDUCTION FROM DEPOSIT FIRST. AND WHEN DEPOSIT BECOMES 0 or NEGATIVE, WE SUBTRACT DEDUCTION FROM POINT (if exists..) - you can realize what it means in the example 2
Let's say I got user 1's deposit history result by typing query.
const queryResult = [
{
date: '2022-10-10',
amount : 50,
type: 'DEPOSIT',
},
{
date: '2022-10-10',
amount : 10,
type: 'POINT',
},
{
date: '2022-10-10',
amount : -5,
type: 'DEDUCTION',
},
{
date: '2022-10-10',
amount : -5,
type: 'DEDUCTION',
},
];
In this case, the remain result must become
const result = {
remained_deposit: 40,
remained_point: 10,
balance: 50 (remained_deposit + remained_point),
refundable: 40(remained_deposit)
}
** explain: we have 50 deposit and subtract two deduction from deposit.
(50 -5 -5 = 40)
So, when the user requests 'refund', the user must get refund $40 (not $50..)
Let's see another case.
const data2 = [
{
date: '2022-10-10 ',
amount : 50,
type: 'DEPOSIT',
},
{
date: '2022-10-10',
amount : 10,
type: 'POINT',
},
{
date: '2022-10-10',
amount : -30,
type: 'DEDUCTION',
},
{
date: '2022-10-11',
amount : -25,
type: 'DEDUCTION',
},
{
date: '2022-10-11',
amount : 10,
type: 'DEPOSIT',
},
];
In this case, the remain result must become
const result2 = {
remained_deposit: 10,
remained_point: 5,
balance: 15,
refundable: 10
}
As we did above, we subtract deduction from deposit first.
50(deposit) - 30(deduction) = 20 (deposit)
20(deposit) - 25(deduction) = -5(deposit.. becomes negative)
In this case, we calculate with the POINT.
-5(deposit) + 10 (POINT) = 5(POINT)
if the user requests refund at this point, he/she can get nothing.
Because remained balance which is POINT(5) is not refundable.
However since the user deposit 10 after the last deduction,
and if he/she requests refund at this point, remained_deposit 10 will be given to the user.
Let's see one more confusing case
const data3 = [
{
date: '2022-10-10',
amount : 50,
type: 'DEPOSIT',
},
{
date: '2022-10-10',
amount : 10,
type: 'POINT',
},
{
date: '2022-10-10',
amount : -40,
type: 'DEDUCTION',
},
{
date: '2022-10-11',
amount : 30,
type: 'DEPOSIT',
},
{
date: '2022-10-11',
amount : 3,
type: 'POINT',
},
{
date: '2022-10-11',
amount : -25,
type: 'DEDUCTION',
},
];
In this case, the remain result must become
const result3 = {
remained_deposit: 25,
remained_point: 3,
balance: 28,
refundable: 25
}
**explanation
50(deposit) - 40(deduction) = 10(deposit)
10(deposit) - 25(deduction) = -15(deposit)
**even though deposit(30) comes first than deduction(-25), we subtract deduction from the very first deposit first.
-15(deposit) + 10(point ) = -5(deposit)
-5(deposit) + 30(deposit) = 25(deposit)
What matters is that the first deposit needs to be whole subtracted first and then POINT is next.
I hope the examples gave you some clues how you guys must calculate to get the refund.
If you are still confused, please let me know.
If you can understand, then can you help me make some JS code to get the result?
I tried creating JS code to get the expected result, but I kept failing.. and it brought me to Stackoverflow.
I'm working in a project where people fill a form then with a Mongo aggregation that group all the people based on the date, time and place they choose.
const matchController = {
generateMatch: async (req, res) => {
const form = await Forms.aggregate([
{
$group: {
_id: { Date: "$date", Time: "$time", Place: "$place" },
Data: {
$addToSet: {
Name: "$firstName",
Surname: "$surname",
Email: "$email",
Date: "$date",
Time: "$time",
Status: "$status",
Place: "$place",
_id: "$_id"
},
},
Total: { $sum: 1 },
},
},
{
$match: {
Total: { $gte: 2 },
},
},
{ $out: "matchs" },
]);
},
But now I would like to add more complexity with some rules, for example I want each group to be made with just 2 filled forms. Also I'm thinking about allowing mutiple time selection so for example, 4 people fill the form with same date and place but:
Person 1 selects 08:00 a.M and 12:00p.M,
Person 2 selects 12:00 p.M,
Person 3 selects 8:00 a.M and 13:00 p.M,
Person 4 selects 12:00 p.m.
I want a validation so in this example instead of getting person 1 and 2 matching and person 3 and 4 free, it would check and match person 1 and 3 and person 2 and 4.
I know this question is kind of complex so any guidance on how to get there would be deeply appreciated and thanked.
I have one collection who include some value coming from sensor. My collection look like this.
const MainSchema: Schema = new Schema(
{
deviceId: {
type: mongoose.Types.ObjectId,
required: true,
ref: 'Device',
},
sensorId: {
type: mongoose.Types.ObjectId,
default: null,
ref: 'Sensor',
},
value: {
type: Number,
},
date: {
type: Date,
},
},
{
versionKey: false,
}
);
I want to get data from this collection with my endpoint. This collection should has more 300.000 documents. I want to get data from this collection with sensor data. (like name and desc. to "Sensor")
My Sensor Collection:
const Sensor: Schema = new Schema(
{
name: {
type: String,
required: true,
min: 3,
},
description: {
type: String,
default: null,
},
type: {
type: String,
},
},
{
timestamps: true,
versionKey: false,
}
);
I use 2 method for get data from MainSchema. First approach is look like this (Include aggregate):
startDate, endDate and _sensorId are passed by parameter for this functions.
const data= await MainSchema.aggregate([
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$match: {
$and: [
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
],
},
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);
Second approach look like this (Include find and populate):
const data= await MainSchema.find({
sensorId: _sensorId,
date: {
$gte: new Date(startDate),
$lte: new Date(endDate),
},
})
.lean()
.sort({ date: 1 })
.populate('sensorId', { name: 1, description: 1});
Execution time for same data set:
First approach: 25 - 30 second
Second approach: 11 - 15 second
So how can i get this data more faster. Which one is best practise?
And how can i do extras for improve the query speed?
Overall #NeNaD's answer touches on a lot of the important points. What I'm going to say in this one should be considered in addition to that other information.
Index
Just to clarify, the ideal index here would be a compound index of { sensorId: 1, date: 1 }. This index follows the ESR Guidance for index key ordering and will provide the most efficient retrieval of the data according to the query predicates specified in the $match stage.
If the index: true annotation in Mongoose creates two separate single field indexes, then you should go manually create this index in the collection. MongoDB will only use one of those indexes to execute this query which will not be as efficient as using the compound index described above.
Also regarding the existing approach, what is the purpose of the trailing $sort?
If the application (a chart in this situation) does not need sorted results then you should remove this stage entirely. If the client does need sorted results then you should:
Move the $sort stage earlier in the pipeline (behind the $match), and
Test if including the sort field in the index improves performance.
As written, the $sort is currently a blocking operation which is going to prevent any results from being returned to the client until they are all processed. If you move the $sort stage up and can change it to sort on date (which probably makes sense for sensor data) the it should automatically use the compound index that we mentioned earlier to provide the sort in a non-blocking manner.
Stage Ordering
Ordering of aggregation stages is important, both for semantic purposes as well as for performance reasons. The database itself will attempt to do various things (such as reordering stages) to improve performance so long as it does not logically change the result set in any way. Some of these optimizations are described here. As these are version specific anyway, you can always take a look at the explain plan to get a better indication of what specific changes the database has applied. The fact that performance did not improve when you manually moved the $match to the beginning (which is generally a best practice) could suggest that the database was able to automatically do that on your behalf.
Schema
I'm a little curious about the schema itself. Is there any reason that there are two separate collections here?
My guess is that this is mostly a play at 'normalization' to help reduce data duplication. That is mostly fine, unless you find yourself constantly performing $lookups like this for most of your read operations. You could certainly consider testing what performance (and storage) looks like if you combine them.
Also for this particular operation, would it make sense to just issue two separate queries, one to get the measurements and one to get the sensor data (a single time)? The aggregation matches on sensorId and the value of that field is what is then used to match against the _id field from the other collection. Unless I'm doing the logic wrong, this should be the same data for each of the source documents.
Time Series Collections
Somewhat related to schema, have you looked into using Time Series Collections? I don't know what your specific goals or pain points are, but it seems that you may be working with IoT data. Time Series collections are purpose-built to help handle use cases like that. Might be worth looking into as they may help you achieve your goals with less hassle or overhead.
Frist step
Create index for sensorId and date properties in the collection. You can do it by specifying index: true in your model:
const MainSchema: Schema = new Schema(
{
deviceId: { type: mongoose.Types.ObjectId, required: true, ref: 'Device' },
sensorId: { type: mongoose.Types.ObjectId, default: null, ref: 'Sensor', index: true },
value: { type: Number },
date: { type: Date, index: true },
},
{
versionKey: false,
}
);
Second step
Aggregation queries can take leverage of indexes only if your $match stage is the first stage in the pipeline, so you should change the order of the items in your aggregation query:
const data= await MainSchema.aggregate([
{
$match: {
{ sensorId: new Types.ObjectId(_sensorId) },
{
date: {
$gte: new Date(startDate),
$lt: new Date(endDate),
},
},
},
},
{
$lookup: {
from: 'Sensor',
localField: 'sensorId',
foreignField: '_id',
as: 'sensorDetail',
},
},
{
$unwind: '$sensorDetail',
},
{
$project: {
sensorDetail: {
name: 1,
description: 1,
},
value: 1,
date: 1,
},
},
{
$sort: {
_id: 1,
},
},
]);
consider this mongo collection with following documents and props.
{sku: '3344', frequency: 30, lastProccessedDate: 2021-01-07 15:18:07.576Z},
{sku: '2233', frequency: 30, lastProccessedDate: 2021-02-16 15:18:07.576Z},
{sku: '1122', frequency: 30, lastProccessedDate: 2021-04-13 15:18:07.576Z}
I want to query and get all the documents with (lastProcessedDate + frequency (days)) <= current date.
Essentially in SQL world this is possible to do it, but I can't figure it out to do it in mongo or even if it is possible to do it.
In SQL this would be something like
SELECT * FROM table WHERE DATE_FORMAT(DATE_ADD(FROM_UNIXTIME(lastProcessedDate), INTERVAL frequency DAY), '%Y-%m-%d') <= CURDATE()
If it is not possible I know I can store the calculated date in the document and just query based on that but you know I want to know if it is possible to do it.
Thank you all!
Unfortunately, I can't give you a solution for mongoose. But here is the Mongo query that returns the result you want:
db.getCollection("yourCollection").aggregate([
{
$project: {
_id: 1,
sku: 1,
frequency: 1,
lastProccessedDate: 1,
shiftedDate: {
$add: [
"$lastProccessedDate", { $multiply: [ 24 * 60 * 60 * 1000, "$frequency" ] }
]
}
}
}, {
$match: {
shiftedDate: { $lte: new Date() }
}
}, {
$project: {
_id: 1,
sku: 1,
frequency: 1,
lastProccessedDate: 1,
}
}
])
First, we transform documents to new form that contains the same fields plus a new temporary field - a "shifted" date that is defined as lastProccessedDate + frequency (days) (pay attention that we actually add milliseconds, so there's 24 * 60 * 60 * 1000 in the query). Then we select only that documents which shiftedDate is less than (or equals) current timestamp (new Date() returns current timestamp). Finally, we transform the filtered documents to original form, without the temporary field we used to filter the documents previously.
Perhaps there's a better solution to get documents you want, but this one can resolve your problem too.
I have array that contains plenty of data. Format is always like that:
1:
UserName: "John Smith"
Priority: "2"
Time Occured: "02/09/2019 11:20:23"
Time Ended: "02/09/2019 11:20:23"
2:
UserName: "Tom Bill"
Priority: "4"
Time Occured: "01/08/2019 13:20:23"
Time Ended: "04/08/2019 15:20:23"
3:
UserName: "John Smith"
Priority: "2"
Time Occured: "06/08/2019 13:20:23"
Time Ended: "09/09/2019 15:20:23"
...
Of course there is more stuff, but just to give you idea of structure.
Array contains entries that might be under the same user name. As user can have multiple entries
What I want to do, is sort and modify it to the way I can use it on data table. I am not sure what approach might be the best or what is possible.
I was thinking that I need to modify array do some math in meantime. So in Data table I can present that John Smith, got 8 entries, two of them are sev 4 etc etc. Tom Bill got 4 entries etc. Basically I won't use original data as I need to modify some parts of it, for Example I am not interested in date itself, but if it was in the past or in the future, already got scripts for that, yet I need to do it for every single user.
A structure something like this seems to be sufficient for your requirement:
data = {
'John Smith' : [{ Priority : 1, .... }, { ...2nd instance }],
'John Doe' : [{...1st instance of John Doe}],
}
Basically an object that has the names for keys, and each key has an array of entries of data.
Whenever you wish to add more entries to John Smith, you get access to the array directly by using data['John Smith']
EDIT
To convert the data to this format.
data = [
{
'UserName': "John Smith",
'Priority': "2",
'Time Occured': "02/09/2019 11:20:23",
'Time Ended': "02/09/2019 11:20:23",
},
{
'UserName': "Tom Bill",
'Priority': "4",
'Time Occured': "01/08/2019 13:20:23",
'Time Ended': "04/08/2019 15:20:23",
},
{
'UserName': "John Smith",
'Priority': "2",
'Time Occured': "06/08/2019 13:20:23",
'Time Ended': "09/09/2019 15:20:23",
}
]
convertData = (data) =>{
let newData = {}
for(let i = 0; i<data.length; i++){
// console.log(data[i])
let name = data[i]['UserName']
tempData = {
'Priority' : data[i]['Priority'],
'Time Occured' : data[i]['Time Occured'],
//Add more properties here
}
if (newData[name]==null){
newData[name] = []
}
newData[name] = [...newData[name], tempData]
}
console.log(newData)
}
convertData(data)
Look at this codepen.
https://codepen.io/nabeelmehmood/pen/jONGQmX