Unique rows based on more than one variable

To find the number of unique rows in R based on two columns, you can use the unique() function with the duplicated() function. Assuming you have a data frame called my_data with two columns named column1 and column2, you can use the following code:

unique_rows <- my_data[!duplicated(my_data[, c("column1", "column2")]),]
n_unique_rows <- nrow(unique_rows)

The duplicated() function returns a logical vector indicating which rows are duplicates based on the specified columns. The ! operator is used to negate the logical vector, so that TRUE values become FALSE and vice versa. When this logical vector is used to index the data frame using the operator, only the unique rows are returned. Finally, the nrow() function is used to count the number of unique rows.

Note that the order of the columns in the c() function is important, as the function will consider duplicates based on the order of the columns specified.

Krzysztof Banas
Krzysztof Banas
Principal Research Fellow

I work as beam-line scientist at Singapore Synchrotron Light Source. My research interests include application of advanced statistical methods for hyperspectral data processing (dimension reduction, clustering and identification).

Related