Find columns that include missing values (NA)

To find which columns in an R data frame include missing values (NA), you can use the colSums() function, which calculates the sum of each column. If you apply this function to a data frame with missing values, the resulting vector will include a count of the number of missing values in each column.

Here’s an example of how to use colSums() to identify columns with missing values:

# Create a sample data frame with missing values
df <- data.frame(x = c(1, 2, NA, 4), y = c(NA, 2, 3, 4), z = c(1, 2, 3, NA))

# Use colSums() to count the number of NAs in each column
na_counts <- colSums(is.na(df))

# Print the result
na_counts

In this example, na_counts will be a vector with three elements, indicating the number of missing values in each column of df. Columns with at least one missing value will have a count greater than zero. You can also use logical indexing to extract the column names with missing values:

# Extract column names with missing values
na_cols <- names(na_counts)[na_counts > 0]

# Print the result
na_cols
Krzysztof Banas
Krzysztof Banas
Principal Research Fellow

I work as beam-line scientist at Singapore Synchrotron Light Source. My research interests include application of advanced statistical methods for hyperspectral data processing (dimension reduction, clustering and identification).

Related