티스토리 뷰

1. 데이터 구조 저장, 로드, 제거

# save data struct
save(x, y, z, file ="mydata.rdata")

# load mydata.rdata
load("mydata.rdata")

# save session data
save.image()

# show all variable
ls()

# remove m and subject1 value
rm(m, subject1)

# remove all variable
rm(list=ls())

2. CSV File에서 Data Import and Export

# csv file import and export
read.csv()
write.csv()

# Set working directory 
setwd("H:\\DongAGraduateSchool\\DongA-GraduateSchool\\데이터마이닝-김상진교수님")
# show working directory
getwd()

# exprt csv file
write.csv(pt_data, "pt_data.csv", row.names=TRUE)

# show 
dir()

# import csv file
pt_data1 = read.csv("pt_data.csv")

3. 데이터 확인 

# Import usedcars.csv file
userdcars = read.csv("usedcars.csv")

# Display data struct
str(userdcars)

# result 
# > str(userdcars)
# 'data.frame':	150 obs. of  6 variables:
#  $ year        : int  2011 2011 2011 2011 2012 2010 2011 2010 2011 2010 ...
#  $ model       : Factor w/ 3 levels "SE","SEL","SES": 2 2 2 2 1 2 2 2 3 3 ...
#  $ price       : int  21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 ...
#  $ mileage     : int  7413 10926 7351 11613 8367 25125 27393 21026 32655 36116 ...
#  $ color       : Factor w/ 9 levels "Black","Blue",..: 9 4 7 4 8 7 2 7 7 7 ...
#  $ transmission: Factor w/ 2 levels "AUTO","MANUAL": 1 1 1 1 1 1 1 1 1 1 ...

4. 수치 변수 탐색

# import usedcars.csv file 
userdcars = read.csv("usedcars.csv")

# Display Numeric variable
summary(userdcars)
summary(userdcars$year)
summary(userdcars[c("price", "mileage")])
mean(userdcars$price)
median(userdcars$price)
range(userdcars$price)
diff(range(userdcars$price))
str(userdcars$price)
summary(userdcars$price)
IQR(userdcars$price)
quantile(userdcars$price, probs=c(0.01, 0.99))
quantile(userdcars$price, seq(from=0,to=1, by=0.20))

# result 

# > summary(userdcars)
#       year      model        price          mileage           color    transmission
#  Min.   :2000   SE :78   Min.   : 3800   Min.   :  4867   Black  :35   AUTO  :128  
#  1st Qu.:2008   SEL:23   1st Qu.:10995   1st Qu.: 27200   Silver :32   MANUAL: 22  
#  Median :2009   SES:49   Median :13592   Median : 36385   Red    :25               
#  Mean   :2009            Mean   :12962   Mean   : 44261   Blue   :17               
#  3rd Qu.:2010            3rd Qu.:14904   3rd Qu.: 55125   Gray   :16               
#  Max.   :2012            Max.   :21992   Max.   :151479   White  :16               
#                                                           (Other): 9               

# > summary(userdcars$year)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#    2000    2008    2009    2009    2010    2012 

# > summary(userdcars[c("price", "mileage")])
#      price          mileage      
#  Min.   : 3800   Min.   :  4867  
#  1st Qu.:10995   1st Qu.: 27200  
#  Median :13592   Median : 36385  
#  Mean   :12962   Mean   : 44261  
#  3rd Qu.:14904   3rd Qu.: 55125  
#  Max.   :21992   Max.   :151479  

# > mean(userdcars$price)
# [1] 12961.93

# > median(userdcars$price)
# [1] 13591.5

# > range(userdcars$price)
# [1]  3800 21992
# > diff(range(userdcars$price))
# [1] 18192
# > str(userdcars$price)
#  int [1:150] 21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 ...

# > summary(userdcars$price)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#    3800   10995   13592   12962   14904   21992 

# > IQR(userdcars$price)
# [1] 3909.5

# > quantile(userdcars$price, probs=c(0.01, 0.99))
#       1%      99% 
#  5428.69 20505.00 

# > quantile(userdcars$price, seq(from=0,to=1, by=0.20))
       0%     20%     40%     60%     80%    100% 
#  3800.0 10759.4 12993.8 13992.0 14999.0 21992.0 

 

'R Language' 카테고리의 다른 글

[R] 5. 회귀분석  (0) 2020.04.22
[R] 4. 행렬 기초 이론  (0) 2020.04.19
[R] 2. R 데이터 구조  (0) 2020.03.29
[R] 1. R Program Install  (0) 2020.03.29
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
TAG
more
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함