Deduplicating using nodeJS - javascript

My goal is to take in a CSV file which contains approximately 4 million records and process each record while scrubbing the data of a particular field. The scrubbing process we have actually creates a reversible hash but is a time consuming process (almost 1 second). What I would like to do since there are only about 50,000 unique values for that field is to set them as properties of an object. Here is a pseudo example of how the object will be built. You can see that for duplicates I plan to just overwrite the existing value (this is to avoid having to loop through some if based search statement.
var csv = require('csv');
var http = require('http');
var CBNObj = new Object;
csv()
.fromPath(__dirname+'/report.csv',{
columns: true
})
.transform(function(data){
CBNObj[data['Field Value']] = data['Field Value'];
});
console.log(CBNObj);
This should create my object something like this.
myObj['fieldValue1'] = 'fieldValue1'
myObj['fieldValue2'] = 'fieldValue2'
myObj['fieldValue3'] = 'fieldValue3'
myObj['fieldValue1'] = 'fieldValue1'
myObj['fieldValue1'] = 'fieldValue1'
I have looked over some good posts on here about iterating over every property in an object (like this one Iterating over every property of an object in javascript using Prototype?) but I am still not exactly sure how to acccomplish what I am doing. How can I then take my object with 50k properties and essentially dump the values into an array so that I can end up with something like this?
myArray = ['fieldVaue1','fieldVaue2','fieldVaue3']
EDIT: I could also use some assistance on the first part here because I am getting a null value or undefined when I try and set the object properties. I also still need help then traversing through the object properties to build my array. Any help would be greatly appreciated.

You know that the keys of your object are the unique values you want. You just need an array. In node.js you can use Object.keys().
https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Object/keys
It's a standard way to take all the keys of an object (that aren't provided by the prototype chain) and put them into an array. So your example looks like this.
var csv = require('csv');
var AcctObj = new Object();
var uniqueArray;
csv()
.fromPath(__dirname+'/report.csv',{
columns: true
})
.on('data',function(data){
AcctObj[data['Some Field Value']] = data['Some Field Value'];
})
.on('end', function(){
uniqueArray = Object.keys(AcctObj);
});
Object.keys also does the hasOwnProperty check internally, so it's similar to the answer by #DvideBy0. It's just one step to the array you want.

var csv = require('csv');
var AcctObj = new Object();
csv()
.fromPath(__dirname+'/report.csv',{
columns: true
})
.on('data',function(data){
AcctObj[data['Some Field Value']] = data['Some Field Value'];
})
.on('end', function(){
for(var prop in AcctObj) {
if(AcctObj.hasOwnProperty(prop))
//Do something here....
}
});

Related

Javascript object/array manipulation

Struggling with some javascript array manipulation/updating. Hope someone could help.
I have an array:
array('saved_designs'=array());
Javascript JSON version:
{"saved_design":{}}
I will be adding a label, and associated array data:
array("saved_designs"=array('label'=array('class'='somecssclass',styles=array(ill add more assoc elements here),'hover'=array(ill add more assoc elements here))))
Javascript version:
{"saved_designs":{"label":{"class":"someclass","style":[],"hover":[]}}}
I want to be able to append/modify this array. If 'label' already defined...then cycle through the sub data for that element...and update. If 'label' doesnt exist..then append a new data set to the 'saved_designs' array element.
So, if label is not defined, add the following to the 'saved_designs' element:
array('label2' = array('class'=>'someclass2',styles=array(),'hover=>array()')
Things arent quite working out as i expect. Im unsure of the javascript notation of [], and {} and the differences.
Probably going to need to discuss this as answers are provided....but heres some code i have at the moment to achive this:
//saveLabel = label the user chose for this "design"
if(isUnique == 0){//update
//ask user if want to overwrite design styles for the specified html element
if (confirm("Their is already a design with that label ("+saveLabel+"). Overwrite this designs data for the given element/styles?")) {
currentDesigns["saved_designs"][saveLabel]["class"] = saveClass;
//edit other subdata here...
}
}else{//create new
var newDesign = [];
newDesign[saveLabel] = [];
newDesign[saveLabel]["class"] = saveClass;
newDesign[saveLabel]["style"] = [];
newDesign[saveLabel]["hover"] = [];
currentDesigns["saved_designs"].push(newDesign);//gives error..push is not defined
}
jQuery("#'.$elementId.'").val(JSON.stringify(currentDesigns));
thanks in advance. Hope this is clear. Ill update accordingly based on questions and comments.
Shaun
It can be a bit confusing. JavaScript objects look a lot like a map or a dictionary from other languages. You can iterate over them and access their properties with object['property_name'].
Thus the difference between a property and a string index doesn't really exist. That looks like php you are creating. It's called an array there, but the fact that you are identifying values by a string means it is going to be serialized into an object in javascript.
var thing = {"saved_designs":{"label":{"class":"someclass","style":[],"hover":[]}}}
thing.saved_designs.label is the same thing as thing["saved_designs"]["label"].
In javascript an array is a list that can only be accessed by integer indices. Arrays don't have explicit keys and can be defined:
var stuff = ['label', 24, anObject]
So you see the error you are getting about 'push not defined' is because you aren't working on an array as far as javascript is concerned.
currentDesigns["saved_designs"] == currentDesigns.saved_designs
When you have an object, and you want a new key/value pair (i.e. property) you don't need a special function to add. Just define the key and the value:
**currentDesigns.saved_designs['key'] = newDesign;**
If you have a different label for every design (which is what it looks like) key is that label (a string).
Also when you were defining the new design this is what javascript interprets:
var newDesign = [];
newDesign is an array. It has n number of elements accessed by integers indices.
newDesign[saveLabel] = [];
Since newDesign is a an array saveLabel should be an numerical index. The value for that index is another array.
newDesign[saveLabel]["class"] = saveClass;
newDesign[saveLabel]["style"] = [];
newDesign[saveLabel]["hover"] = [];
Here explicitly you show that you are trying to use an array as objects. Arrays do not support ['string_key']
This might very well 'work' but only because in javascript arrays are objects and there is no rule that says you can't add properties to objects at will. However all these [] are not helping you at all.
var newDesign = {label: "the label", class: saveClass};
is probably what you are looking for.

How to add data to a 2 dimensional array in javascript

I am trying to add values into an array but for some reason it is not working. I am new to JavaScript.
Here is my code:
eventsArray = new Array();
$.each(xmlJsonObj.feed.entry, function(index, value){
eventsArray[index] = new Array('title' = value.title, 'date' = value.date[1]);
});
So basically I am pulling out some values from the json object and want to save them as key-value pairs in an array (multidimensional as each event has several values).
This array will later be sorted by date.
I am currently getting the following error:
ReferenceError: Left side of assignment is not a reference.
I am new to JavaScript and don't really understand whats wrong. Tried to look at some examples but still can't see a good example of creating two dimensional arrays with JavaScript (or objects, as everything in JS is an object) in a loop like this.
I would be very thankfull for any help or tips.
The cause of the error message is this:
'title' = value.title
That would mean that you are trying to assign a value to a literal string. The rest of the code (except from the other one just like it) is actually valid syntax, even if that is not what you are trying to do, so that's why you get the error message on that part of the code.
To have a collection of key-value pairs you would use an object instead of an array, and you can create it like this:
eventsArray[index] = { title: value.title, date: value.date[1] };
May be simplest one,
var eventsArray = new Array();
$.each(xmlJsonObj.feed.entry, function (index, value) {
eventsArray[index] = { 'title': value.title, 'date': value.date[1] };
});
It works if you change your code to:
var eventsArray = new Array();
$.each(xmlJsonObj.feed.entry, function(index, value){
eventsArray.push({ title : value.title, date: value.date[1] });
});
You should use objects for this:
eventsArray[index] = {};
eventsArray[index].title = value.title;
eventsArray[index].date = value.date[1];
The problem you have is that you try to assign value.title value to String. Arrays in JS didn't work in that way. Also Arrays didn't support String keys, this is why you may need Object.
If your date property is TIMESTAMP for example you can sort it like:
eventsArray.sort( function( a, b ) {
return a.date - b.date;
});

Creating Nested Objects on the Fly With Javascript

I'm new to Javascript coming from a Python background, where it's easy to created nested data using custom dictionaries and .get methods. What I'm trying to do is created a nested object of artist data that takes on this form: artistDict[artist][albumName] = albumYear. I need to create this object on the fly by iterating over an iterable of album objects. Here's the code I'm currently using:
albumDict = {};
albums.forEach(function(item){
albumDict[item.artist][item.name] = item.year;
});
document.write(albumDict);
This doesn't work, which isn't surprising, since something like this wouldn't work in Python either. However, in Python I could use a .get method to check if an entry was in the dictionary and create it if not -- is there something similar, or any other utility that I could use to achieve my goal in JS?
This should work: (if the property doesn't exist you should initialize it..)
albumDict = {};
albums.forEach(function(item){
albumDict[item.artist] = albumDict[item.artist] || {};
albumDict[item.artist][item.name] = item.year;
});
Try this:
albums.forEach(function(item){
albumDict[item.artist] = albumDict[item.artist] || {};
albumDict[item.artist][item.name] = item.year;
});
The first line in that function sets albumDict[item.artist] to a new object if it doesn't exist, yet. Otherwise, it sets it to itself.
Then, you can just set the year on the dict entry.

Test for value within array of objects

I am dynamically building an array of objects using a process that boils down to something like this:
//Objects Array
var objects = [];
//Object Structure
var object1 = {"id":"foobar_1", "metrics":90};
var object2 = {"id":"some other foobar", "metrics":50};
objects[0] = object1;
objects[1] = object2;
(Let it be said for the record, that if you can think of a better way to dynamically nest data such that I can access it with objects[i].id I am also all ears!)
There's ultimately going to be more logic at play than what's above, but it's just not written yet. Suffice it to say that the "object1" and "object2" parts will actually be in an iterator.
Inside that iterator, I want to check for the presence of an ID before adding another object to the array. If, for example, I already have an object with the ID "foobar_1", instead of pushing a new member to the array, I simply want to increment its "metrics" value.
If I wasn't dealing with an array of objects, I could use inArray to look for "foobar_1" (a jQuery utility). But that won't look into the object's values. The way I see it, I have two options:
Keep a separate simple array of just the IDs. So instead of only relying on the objects array, I simply check inArray (or plain JS equivalent) for a simple "objectIDs" array that is used only for this purpose.
Iterate through my existing data object and compare my "foobar_1" needle to each objects[i].id haystack
I feel that #1 is certainly more efficient, but I can't help wondering if I'm missing a function that would do the job for me. A #3, 4, or 5 option that I've missed! CPU consumption is somewhat important, but I'm also interested in functions that make the code less verbose whether they're more cycle-efficient or not.
I'd suggest switching to an object instead of an array:
var objects = {};
objects["foobar_1"] = {metrics: 90};
objects["some other foobar"] = {metrics: 50};
Then, to add a new object uniquely, you would do this:
function addObject(id, metricsNum) {
if (!(id in objects)) {
objects[id] = {metrics: metricsNum};
}
}
To iterate all the objects, you would do this:
for (var id in objects) {
// process objects[id]
}
This gives you very efficient lookup for whether a given id is already in your list or not. The only thing it doesn't give you that the array gave you before is a specific order of objects because the keys of an object don't have any specific order.
Hmm , i wonder why dont you use dictionary cause that is perfectlly fits your case. so your code will be as below:
//Objects Array
var objects = [];
//Object Structure
var object1 = {"metrics":90};
var object2 = {"metrics":50};
objects["foobar_1"] = object1;
objects["some other foobar"] = object2;
// An example to showing the object existence.
if (!objects["new id"]){
objects["new id"] = {"metrics": 100};
}
else {
objects["new id"].matrics++;
}

Javascript is passing an Array of Objects instead of an Array of Arrays

I'm passing a Javascript Array() to Flash via FlashVars but Flash complains. Can you guys point me what am I doing wrong here?
javascript code
// array with the user defined cities
var usercities = new Array(
{'nome':"London", 'lat':51.5002, 'long':-0.1262 },
{'nome':"NYC", 'lat':51.5002, 'long':-0.1262 }
);
flashvars.newcities = usercities;
flash code
// this array is pre-populated so if the users doesn't enter data this is shown
var cities:Array = new Array(
{ nome:"London", lat:51.5002, long:-0.1262 },
{ nome:"NYC", lat:40.7144, long:-74.0060 }
);
// gets FlashVars
var newcities:Object = LoaderInfo(this.root.loaderInfo).parameters.newcities;
if(newcities != null) {
cities = newcities;
};
Doesn't work. I need to have the cities array on the Flash Side exactly as it is. On the Javascript side all code can change.
Thank you for your help.
JavaScript does not have associative arrays like other languages. In order to have named indexes, you have to use an object. An array that is assigned a value with a named index will be converted to an object.
In order to do this, you may need to change your Flash code. As meder said, serializing your array is your best bet. I'd suggest a JSON encode in the JavaScript and a decode in the Flash.
Well you can just manually make them arrays. Something like this:
var usercities = [];
usercities[0] = [];
usercities[0]["nome"] = "London";
usercities[0]["lat"] = 51.5002
usercities[0]["long"] = -0.1262
usercities[1] = [];
usercities[1]["nome"] = "NYC";
usercities[1]["lat"] = 51.5002
usercities[1]["long"] = -0.1262
Though I think it is all the same but flash may be seeing it differently.
Ended up passing the values as this:
javascript
var cities = new Array(
Array("London", 51.5002, -0.1262),
Array("NYC", 40.7144, -74.0060),
);
That flash gets as a pure string.
"London",51.5002,-0.1262,"NYC",40.7144,-74.0060
I then exploded the string and converted to Array. It's a bit dirty but in the end works. As long as the Array always has 3 items per row and no item has a comma.
Hope this may help someone.

Categories

Resources