Google Sheet Scripts: How to compare 2 columns to remove duplicates - javascript

There are 3 duplicate articles from the same domain in Sheet1: https://docs.google.com/spreadsheets/d/1pExqHJQubnSPDKczkF9HMA2QN1cxTYmzyewQdugDRYs/edit#gid=0
Goal: I'd like to remove articles that have the same title AND domain.
Sheet DesiredResult has the desired result.
I'd like to modify this filter script to compare article title(column a) & domain(column C), if they are the same, then remove:
function removeDuplicates() {
const sheet = SpreadsheetApp.getActive().getSheetByName('Sheet1');
const data = sheet.getDataRange().getValues();
var temp = {}; // Added
var newData = data.filter(function(e) { // Added
if (!temp[e[1]]) {
temp[e[1]] = e[1];
return true;
}
return false;
});
sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}
Right now it just looks at the link if they're the same. It's not using an if statement so I'm not sure how to add a comparison for 2 columns. Any help is appreciated, thanks!
Update: I've tried looking at !temp[e[1]] to see if there are values to compare in an if statement but it shows as undefined so I'm stuck where to add the second column comparison in this section.
Reference: Previous question, asked to compare 1 column - Google Sheet Scripts: How to compare only one column to remove duplicates

You want to remove the duplicated rows by comparing the column "A" and "C".
In your shared spreadsheet, you want to modify from Sheet1 to DesiredResult.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this modification? In this modification, the method of removeDuplicates() is used. This was added at July 26, 2019. So your question had been posted at July 15, 2019, unfortunately, at that time, this method had not been added yet. Please think of this as just one of several answers.
Modified script:
const sheet = SpreadsheetApp.getActive().getSheetByName('Combined');
sheet.getDataRange().removeDuplicates([1, 3]);
Note:
When you test above modified script to the shared Spreadsheet, please modify the sheet name from Combined to Sheet1.
Reference:
removeDuplicates(columnsToCompare)
If I misunderstood your question and this was not the result you want, I apologize.
Added:
If you want to modify your script, how about the following modification? Also please think of this just one of several answers.
From:
if (!temp[e[1]]) {
temp[e[1]] = e[1];
To:
if (!temp[e[0] + e[2]]) {
temp[e[0] + e[2]] = e[1];

function removeDuplicates() {
var sheet=SpreadsheetApp.getActive().getSheetByName('Sheet1');
var data=sheet.getDataRange().getValues();
var newData = data.filter(function(r) {
return !(r[0]==r[2])//comparing column A with Column C
});
sheet.clearContents();
sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}
Test it on this Data:
1,a,a
2,b,a
3,c,a
4,a,a
5,b,a
6,c,6
7,a,7
8,b,8
9,c,9
Array.filter()

Related

Writing array to single row

I have created an API which iterates through JSON format data, reading 2 items per ID. I'm storing this data in an array called values[].
How about this modification?
Modification points:
In order to put the values of values to one row, please modify values.push([timestamp, price]); to values.push(timestamp, price);. By this, each value is put in values which is one dimensional array.
In order to put values from the row 3, in this modification, it checks whether the row 3 for putting values is empty.
When above points are reflected to your script, it becomes as follows.
Modified script:
From:
values.push([timestamp, price]);
}
var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("rawStockData");
ss.getRange("A3:BH3").setValues(values);
}
To:
values.push(timestamp, price); // Modified
}
var ss = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("rawStockData");
var range = ss.getRange(3, 1, 1, values.length); // Modified
if (range.isBlank()) { // Added
range.setValues([values]);
} else {
ss.appendRow(values);
}
}
References:
push()
appendRow(rowContents)

Select specific rows to be appended into google sheets via array

Basically I have an array of information that I can currently append into google sheets, the thing is that a lot of the information is not necessary for my need so I wanted to find a way to just append the columns I need.
The picture above shows how everything looks,
basically the idea is to make it looks like in the following picture.
so basically I only need to append columns 4,5,7
currently why I do is this!
if (tozip.getContentType() == "application/zip"){ //for ZIP files
var unZip = Utilities.unzip(tozip); //assigns the unzipped file to a new variable
var table = Utilities.parseCsv(unZip[0].getDataAsString());// assigns the data to variable
for (var i = 0; i < table.length; i++) {//loops trought the array an appends the data as it goes.
sheet.appendRow(table[i]);
}
the data comes from a csv file and looks like this.
[[isApplication, applicationDate, isQualified, Funded_Date, isFunded, requested_loan_amount, amountFunded], [1, 2020-02-03, 1, 2020-02-03, 1, , 1300.0000], [1, 2019-12-29, 1, 2019-12-30, 1, 3000.0000, 2000.0000], [1, 2020-01-27, 1, 2020-01-28, 1, , 800.0000], [1, 2020-01-08, 1, 2020-01-10, 1, 2500.0000, 2500.0000], [1, 2020-02-04, 1, 2020-02-10, 1, , 1400.0000], [1, 2020-01-21, 1, 2020-01-21, 1, 5000.0000, 2000.0000], [1, 2020-02-06, 1, 2020-02-06, 1, 1100.0000, 1400.0000], [1, 2020-02-01, 1, 2020-02-04, 1, 1500.0000, 601.0000], [1, 2020-02-11, 1, 2020-02-11, 1, 500.0000, 800.0000]]
so yeah a lot of messy csv data.
I tried adding this to the code and a few variations of it so It can select the inside data
for (var i = 0; i < table.length; i++) {//loops trought the array an appends the data as it goes.
var columns = [];
columns.push(3);
columns.push(4);
columns.push(6);
sheet.appendRow(table[i][columns]);
}
but it does not work I'm super new to this type of stuff, so I'm pretty sure that's not the correct way to try and select the information I want from the array.
let me know if I need to elaborate more on this, I'm not super good at explaining this stuff.
Thank you in advance for the answers I really appreciate the help on this.
You want to retrieve the columns "D", "E" and "G" from the data retrieved by parsing the CSV data.
In your script, table of var table = Utilities.parseCsv(unZip[0].getDataAsString()); is the 2 dimensional data shown in your question.
You want to put the retrieved values to the Spreadsheet.
You want to achieve this using Google Apps Script.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Modification points:
table of var table = Utilities.parseCsv(unZip[0].getDataAsString()); is 2 dimensional array.
When for (var i = 0; i < table.length; i++) {} is used, each row can be retrieved by table[i]. And the values from the columns "D", "E" and "G" can be retrieved by table[i][3], table[i][4], table[i][6].
In this modification, var values = [] is prepared, and each row is put with values.push([table[i][3], table[i][4], table[i][6]]).
When the method of appendRow() is used in the for loop, the process cost becomes high. So in this case, an array is created in the for loop. And the array is put to the Spreadsheet using setValues(). By this, the cost can be reduced.
When above points are reflected to your script, it becomes as follows.
Modified script:
var table = Utilities.parseCsv(unZip[0].getDataAsString());
// I modified below script.
var values = [];
for (var i = 0; i < table.length; i++) {
values.push([table[i][3], table[i][4], table[i][6]]);
}
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Sheet1");
sheet.getRange(sheet.getLastRow() + 1, 1, values.length, values[0].length).setValues(values);
Above script, the values are put to "Sheet1". If you want to change this, please modify getSheetByName("Sheet1").
In this case, table is var table = Utilities.parseCsv(unZip[0].getDataAsString()).
Note:
When var table = Utilities.parseCsv(unZip[0].getDataAsString()) doesn't return the array of CSV data, above modified script cannot be used. Please be careful this.
References:
parseCsv(csv)
getRange(row, column, numRows, numColumns)
setValues(values)
If I misunderstood your question and this was not the direction you want, I apologize.

How to compare two sheets and delete/add any column with a distinct value in row 1? Google Script

I want to compare two sheets (based on header values in row 1) and delete any column with a unique value (without a match). For example, Assuming Sheet1, Row 1 data and Sheet 2, Row 1 are uniform, if a user adds/deletes a column within any sheet, I want to always match the number of columns in both sheets with their values
Screenshots of sheets headings.
IF both sheets looks like this
And a user adds a new Column N
Or delete column N
How can I ensure that both sheet matches by deleting the odd/distinct column in Sheet 1?
I have tried modifying this code below but I can't just get the unique one out. This code only look for headers with a defined value.
function deleteAloneColumns(){
var sheet = SpreadsheetApp.getActiveSheet();
var lastColumnPos = sheet.getLastColumn();
var headers = sheet.getRange( 1 ,1, 1, lastColumnPos ).getValues()[0];
for( var i = lastColumnPos ; i < 1; i--){
if( headers[i] === "alone" ) sheet.deleteColumn(i);
}
SpreadsheetApp.getUi().alert( 'Job done!' );
}
Any help to compare and delete the column with the unique value will be appreciated.
Problem
Balancing sheets based on header row values mismatch.
Solution
If I understood you correctly, you have a source sheet against which validation is run and two primary use cases: user adds a new column named differently than any other column (if you want to check that the column strictly matches the one in sheet1, it is easy to modify) in source sheet or deletes one that should be there.
const balanceSheets = (sourceShName = 'Sheet1',targetShName = 'Sheet2') => {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const s1 = ss.getSheetByName(sourceShName);
const s2 = ss.getSheetByName(targetShName);
const s2lcol = s2.getLastColumn();
//keep all vals from source to reduce I/O
const s1DataVals = s1.getDataRange().getValues();
const s2Vals = s2.getRange(1, 1, 1, s2lcol).getValues();
const h1Vals = s1DataVals[0];
const h2Vals = s2Vals[0];
//assume s1 is source (validation) sheet
//assume s2 is target sheet that a user can edit
//case 1: target has value not present in source -> delete column in target
let colIdx = 0;
h2Vals.forEach(value => {
const isOK = h1Vals.some(val => val===value);
isOK ? colIdx++ : s2.deleteColumn(colIdx+1);
});
//case 2: target does not have values present in source -> append column from source
h1Vals.forEach((value,index) => {
const isOK = h2Vals.some(val => val===value);
!isOK && s2.insertColumnAfter(index);
const valuesToInsert = s1DataVals.map(row => [row[index]]);
const numRowsToInsert = valuesToInsert.length;
s2.getRange(1,index+1, numRowsToInsert,1).setValues(valuesToInsert);
});
};
Showcase
Here is a small demo of how it works as a macros:
Notes
Solving your problem with two forEach is suboptimal, but I kept number of I/O low (it can be lowered further by, for example, moving deleteColum out of the loop while only keeping track of column indices).
The script uses ES6 capabilities provided by V8, so please, be careful (although I would recommend migrating as soon as possible - even if you encounter bugs / inconsistencies , it is worth more than it costs.
UPD made script more flexible by moving sheet names to parameter list.
UPD2 after discussing the issue with deleteColumn() behaviour, the answer is updated to keep column pointer in bounds (for those curious about it - forEach kept incrementing the index, while deleteColumn reduced bounds for any given index).
Reference
insertColumnAfter() method reference

Retrieve Google Sheets column by header name

Is there a way to retrieve a column dynamically by it's column name (header)?
Instead of:
var values = sheet.getRange("A:A").getValues();
Something like: (Just for simplicity)
var values = sheet.getRange(sheet.column.getHeader("name").getValues();
Please keep in mind that Google Apps Script is roughly ES3.
You can write one ;)
function getColValuesByName(sheet, name) {
var index = sheet.getRange(1,1,1,sheet.getLastColumn()).getValues()[0].indexOf(name);
index++;
return sheet.getRange(1,index,sheet.getLastRow(),1).getValues();
}
Here's a very simple one-line function you can copy. It returns the column number (A = 1, B = 2, etc.) for use in getRange, for example.
function getColByHeader(name) {
return SpreadsheetApp.getActiveSheet().getRange('1:1').getValues()[0].indexOf(name) + 1;
}
Although there is no direct way, there are plenty of ways to get what you want with a little set up:
Get all data and filter it(no set up):
var values = sheet.getDataRange().getValues();
var headers = values.splice(0,1);
headerIdx = headers[0].indexOf("name");
values = values.map(function(row){return [row[headerIdx]];})
Named ranges set up:
If you have named ranges associated with that column,
spreadsheet.getRangeByName('Sheet Name!name').getValues();//where 'name' is a named range
Developer metadata set up:
If you have developer metadata associated with that column,
SpreadsheetApp.getActive()
.createDeveloperMetadataFinder()
.withKey(/*METADATA_KEY_ASSOCIATED_WITH_COLUMN*/)
.find()[0]
.getLocation()
.getColumn()
.getValues();

Dynamically Validating Multiple Google Sheet Tabs

I am writing a script for google sheet validation on localization tests. I've gotten stuck on some of the logic. The purpose of the script is to 1) Iterate through all tabs. 2) Find the column on row 2 that has the text "Pass/Fail". Lastly, 3) Iterate down that column and return the rows that say Fail.
The correct script to look at is called combined(). Step 1 is close to being correct, I think. Step 2 has been hard coded for the moment and is not dynamic searching the row for the text. Step 3 is done.
Any help would be great :)!!! Thanks in advance.
https://docs.google.com/spreadsheets/d/1mJfDtAi0hHqhqNB2367OPyNFgSPa_tW9l1akByaTSEk/edit?usp=sharing
/*This function is to cycle through all spreadsheets.
On each spreadsheet, it will search the second row for the column that says "Pass/Fail".
Lastly, it will take that column and look for all the fails and return that row*/
function combined() {
var sheets = SpreadsheetApp.getActiveSpreadsheet().getSheets();
var r =[];
for (var i=0 ; i<sheets.length ; i++){//iterate through all the sheets
var sh = SpreadsheetApp.getActiveSheet();
var data = sh.getDataRange().getValues(); // read all data in the sheet
//r.push("test1"); //Testing to make sure all sheets get cycled through
/*I need something here to find which column on row two says "Pass/Fail"*/
for(i=3;i<data.length;++i){ // iterate row by row and examine data in column A
//r.push("test2"); //Testing to make sure the all
if(data[i][7]=='Fail'){ r.push(data[i])}; // if column 7 contains 'fail' then add it to the list
}
}
return r; //Return row of failed results on all tabs
}
At first, it retrieves data at column g. It retrieves a result from the data. The result is 2 dimensional array. The index of each element of the 2D array means the sheet index. If the sheet doesn't include values in column g, the element length is 0.
For example, in the case of following situation,
Sheet 0 doesn't include values in column g.
Sheet 1 includes values in column g. There are "Fail" value at the row number of 3, 4, 5.
Sheet 2 includes values in column g. There are "Fail" value at the row number of 6, 7, 8.
The result (return r) becomes below.
[[], [3, 4, 5], [6, 7, 8]]
Sample script 1:
function combined() {
var sheets = SpreadsheetApp.getActiveSpreadsheet().getSheets();
var data =[];
sheets.forEach(function(ss){
try { // In the case of check all sheets, if new sheet is included in the spreadsheet, an error occurs. This ``try...catch`` is used to avoid the error.
data.push(ss.getRange(3, 7, ss.getLastRow(), 1).getValues());
} catch(e) {
data.push([]);
}
});
var r = [];
data.forEach(function(e1, i1){
var temp = [];
e1.forEach(function(e2, i2){
if (e2[0] == "Fail") temp.push(i2 + 3);
});
r.push(temp);
});
return r;
}
If I misunderstand your question, I'm sorry.

Categories

Resources