Replace NA with 0 in R

There are different methods to Replace NA with 0 in R language. here we are going to use them to replace NA with 0 by using different functions. First, we should know: Table of Contents …

replace-na-with-0-in-r

There are different methods to Replace NA with 0 in R language. here we are going to use them to replace NA with 0 by using different functions.

First, we should know:

what is NA in R?

In R, missing values are defined as NA (not available). If you do not want to return NA values, use “.omit()” function that omit and returns the rows and column without NA value.

But what if you want to replace them with 0?

Use the below method from different categories to replace NA with 0 in R.

  • Replace NA in dataframe R.
  • Replace NA in column R.
  • R replace NA with 0 in multiple columns.
  • Replace NA with 0 in r dplyr.
  • replace na with 0 in R data.table.

1. Replace NA in dataframe R

This is the simple dataframe that we have defined and print. This dataframe contains many of the values as NA.

#Simple dataframe
df <- data.frame(
      Class_A =  c(11,11,NA,NA,22,33,33,NA),
      Class_B =  c(99,99,NA,77,77,NA,55,55),
      Class_C =  c("Green","Green",NA,"Blue","Blue",NA,"Red","Red"),
      Class_D =  c(NA,NA,"Blue","Yellow","Yellow","Purple",NA,NA)
         )
print(df)

Output

     Class_A Class_B Class_C Class_D
1      11      99   Green    <NA>
2      11      99   Green    <NA>
3      NA      NA    <NA>    Blue
4      NA      77    Blue    Yellow
5      22      77    Blue   Yellow
6      33      NA    <NA>   Purple
7      33      55     Red    <NA>
8      NA      55     Red    <NA>


** Process exited - Return Code: 0 **

Here we got output dataframe with multiple NA. To replace NA with 0 in r in dataframe , we use .na()  function. This function only targets numeric data column and replace their NA with 0.

# Replace NA in dataframe R
df <- data.frame(
      Class_A =  c(11,84,NA,NA,49, 87, 44),
      Class_B =  c(99,98,NA,84, 54,NA,55),
      Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
      Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA)
                  )
df[is.na(df)] <- 0
print(df)

Output

Warning messages:
1: 
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
  invalid factor level, NA generated
  Class_A Class_B Class_C    Class_D
1      11      99   Green       <NA>
2      84      98  Purple       <NA>
3       0       0    <NA>        XML
4       0      84    Blue       Java
5      49      54    Cyan JavaScript
6      87       0    <NA>         C#
7      44      55    Pink       <NA>


** Process exited - Return Code: 0 **

One thing that you have been noticed in the above program is you get the only replacement of NA with 0 inside the column that contains only numeric data. 

To solve this problem, we will use “ stringsAsFactors = FALSE”. Factor datatype stores data in the categorical form such as Class_A, Class_B, etc. To replace the dataframe NA values with 0, False the string as a factor that helps to replace NA with 0 in R without any numeric and string categories limits.

# Replace NA in dataframe R
df <- data.frame(
      Class_A =  c(11,84,NA,NA,49, 87, 44),
      Class_B =  c(99,98,NA,84, 54,NA,55),
      Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
      Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
      stringsAsFactors = FALSE
                  )
df[is.na(df)] <- 0
print(df)

You will get the output in which all numeric and string column’s NA replaced with 0 

Output

Class_A Class_B  Class_C    Class_D
1      11      99       Green         0
2      84      98       Purple        0   
3       0       0            0            XML
4       0      84        Blue        Java
5      49      54       Cyan       JavaScript
6      87       0          0                C#
7      44      55      Pink              0 

Replace NA in a column in R

If you want to replace only one column from the dataframe, target the column name and the dataframe name along with the .na() function. Suppose we are targeting Class_C to replace NA in column R with 0.

# Replace NA in column(single) R
df <- data.frame(Class_A =  c(11,84,NA,NA,49, 87, 44),
                 Class_B =  c(99,98,NA,84, 54,NA,55),
                 Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
                 Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
                 stringsAsFactors = FALSE                 
)
df["Class_C"][is.na(df["Class_C"])] <- 0
print(df)

Output

  Class_A Class_B Class_C    Class_D
1      11      99   Green       <NA>
2      84      98  Purple       <NA>
3      NA      NA       0        XML
4      NA      84    Blue       Java
5      49      54    Cyan JavaScript
6      87      NA       0         C#
7      44      55    Pink       <NA>

There is another method to replace multiple columns in R. It is easy to replace Na with 0 in R for a single column but a bit different to “R replace NA with 0 in multiple columns”.  In this method, we selected two columns “Class_B” and “Class_C” where we replaces all NA of these two columns with 0.

df<-data.frame(
      Class_A =  c(11,84,NA,NA,49, 87, 44),
      Class_B =  c(99,98,NA,84, 54,NA,55),
      Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
      Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
      stringsAsFactors = FALSE
) 
df[,c("Class_B", "Class_C")][is.na(df[,c("Class_B","Class_C")])] <- 0
print(df)

Output

  Class_A Class_B Class_C    Class_D
1      11      99   Green       <NA>
2      84      98  Purple       <NA>
3      NA       0       0        XML
4      NA      84    Blue       Java
5      49      54    Cyan JavaScript
6      87       0       0         C#
7      44      55    Pink       <NA>

4. Replace NA with 0 in r dplyr

dplyr is a library of R language that allows you to manipulate dataframe. To replace NA with 0 in r by using the dplyr library that calls mutate function. mutate function allows to replace values. In this program, by using mutate function, we will replace NA with 0. 

#Replace NA with 0 in r dplyr

install.packages("dplyr")
library(dplyr)
df <- data.frame(
      Class_A =  c(11,84,NA,NA,49, 87, 44),
      Class_B =  c(99,98,NA,84, 54,NA,55),
      Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
      Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
      stringsAsFactors = FALSE
)
df <- mutate_if(df, is.numeric, ~replace(., is.na(.), 0))
df

Output

Class_A Class_B Class_C    Class_D
1      11      99   Green       <NA>
2      84      98  Purple       <NA>
3       0       0    <NA>        XML
4       0      84    Blue       Java
5      49      54    Cyan JavaScript
6      87       0    <NA>         C#
7      44      55    Pink       <NA>

5. replace na with 0 in r data table

Both data frame and data table have the same syntax but the data. the table is 20 times faster than the dataframe. Here is the method of how to replace NA with 0 in r data table. 

install.packages("data.table")
library(data.table)
dt  = data.table(
       Class_A =  c(11,84,NA,NA,49, 87, 44),
       Class_B =  c(99,98,NA,84, 54,NA,55),
       Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
       Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
                 stringsAsFactors = FALSE
)
print(dt)

This is the Output with NA.

Class_A Class_B Class_C    Class_D
1:      11      99   Green       <NA>
2:      84      98  Purple       <NA>
3:      NA      NA    <NA>        XML
4:      NA      84    Blue       Java
5:      49      54    Cyan JavaScript
6:      87      NA    <NA>         C#
7:      44      55    Pink       <NA>

Now replace NA with 0 in the R data.table by using the below command 

# replace na with 0 in r data table
install.packages("data.table")
library(data.table)
dt  = data.table(
Class_A =  c(11,84,NA,NA,49, 87, 44),
Class_B =  c(99,98,NA,84, 54,NA,55),
Class_C =  c("Green","Purple",NA,"Blue","Cyan",NA,"Pink"),
Class_D =  c(NA,NA,"XML","Java","JavaScript","C#",NA),
                 stringsAsFactors = FALSE
)
dt[, names(dt) := lapply(.SD, function(x) {x[is.na(x)] <- "0" ; x})]
dt

Output

Class_A Class_B Class_C    Class_D
1:      11      99   Green          0
2:      84      98  Purple          0
3:       0       0       0        XML
4:       0      84    Blue       Java
5:      49      54    Cyan JavaScript
6:      87       0       0         C#
7:      44      55    Pink          0

Conclusion

In this article, we cover the topic of “replace NA with 0 in R” with the help of different methods. R replaces NA with 0 inside the different types of data such as data.frame, data.table, single column, multiple columns by using different methods and libraries.

Categories R

Leave a Comment