Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Fastest way to read a CSV file into R

Monday, July 13, 2015

Fastest way to read a CSV file into R

So I thought it would be really help to see just what the difference is between the two methods. FOr this example I an still using a relatively small data set. It is a little over five and a half million rows by six columns.
So for the read.csv function built in R
## Start timer
ptm<-proc.time()
test1<-read.csv("baby_data.csv")
## Stop timer and print time
ptm<-proc.time()-ptm
dim(test1)
## [1] 5674089       6
print(ptm)
##    user  system elapsed 
##  33.427   0.495  33.945

for the fread function in the data.table
## Start timer
ptm<-proc.time()
require(data.table)
## Loading required package: data.table
test2<-fread("baby_data.csv")
## 
Read 64.9% of 5674089 rows
Read 86.2% of 5674089 rows
Read 5674089 rows and 6 (of 6) columns from 0.187 GB file in 00:00:05
ptm<-proc.time()-ptm
print(ptm)
##    user  system elapsed 
##   4.027   0.190   4.224

As you can see fread() is almost 10 times faster than read.csv to process this data set. That is pretty amazing. There is also a package called readr by Hadley Wickham that is a little slower than data.table but has some nice added features.

No comments:

Post a Comment