티스토리 뷰
1. 데이터 구조 저장, 로드, 제거
# save data struct
save(x, y, z, file ="mydata.rdata")
# load mydata.rdata
load("mydata.rdata")
# save session data
save.image()
# show all variable
ls()
# remove m and subject1 value
rm(m, subject1)
# remove all variable
rm(list=ls())
2. CSV File에서 Data Import and Export
# csv file import and export
read.csv()
write.csv()
# Set working directory
setwd("H:\\DongAGraduateSchool\\DongA-GraduateSchool\\데이터마이닝-김상진교수님")
# show working directory
getwd()
# exprt csv file
write.csv(pt_data, "pt_data.csv", row.names=TRUE)
# show
dir()
# import csv file
pt_data1 = read.csv("pt_data.csv")
3. 데이터 확인
# Import usedcars.csv file
userdcars = read.csv("usedcars.csv")
# Display data struct
str(userdcars)
# result
# > str(userdcars)
# 'data.frame': 150 obs. of 6 variables:
# $ year : int 2011 2011 2011 2011 2012 2010 2011 2010 2011 2010 ...
# $ model : Factor w/ 3 levels "SE","SEL","SES": 2 2 2 2 1 2 2 2 3 3 ...
# $ price : int 21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 ...
# $ mileage : int 7413 10926 7351 11613 8367 25125 27393 21026 32655 36116 ...
# $ color : Factor w/ 9 levels "Black","Blue",..: 9 4 7 4 8 7 2 7 7 7 ...
# $ transmission: Factor w/ 2 levels "AUTO","MANUAL": 1 1 1 1 1 1 1 1 1 1 ...
4. 수치 변수 탐색
# import usedcars.csv file
userdcars = read.csv("usedcars.csv")
# Display Numeric variable
summary(userdcars)
summary(userdcars$year)
summary(userdcars[c("price", "mileage")])
mean(userdcars$price)
median(userdcars$price)
range(userdcars$price)
diff(range(userdcars$price))
str(userdcars$price)
summary(userdcars$price)
IQR(userdcars$price)
quantile(userdcars$price, probs=c(0.01, 0.99))
quantile(userdcars$price, seq(from=0,to=1, by=0.20))
# result
# > summary(userdcars)
# year model price mileage color transmission
# Min. :2000 SE :78 Min. : 3800 Min. : 4867 Black :35 AUTO :128
# 1st Qu.:2008 SEL:23 1st Qu.:10995 1st Qu.: 27200 Silver :32 MANUAL: 22
# Median :2009 SES:49 Median :13592 Median : 36385 Red :25
# Mean :2009 Mean :12962 Mean : 44261 Blue :17
# 3rd Qu.:2010 3rd Qu.:14904 3rd Qu.: 55125 Gray :16
# Max. :2012 Max. :21992 Max. :151479 White :16
# (Other): 9
# > summary(userdcars$year)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2000 2008 2009 2009 2010 2012
# > summary(userdcars[c("price", "mileage")])
# price mileage
# Min. : 3800 Min. : 4867
# 1st Qu.:10995 1st Qu.: 27200
# Median :13592 Median : 36385
# Mean :12962 Mean : 44261
# 3rd Qu.:14904 3rd Qu.: 55125
# Max. :21992 Max. :151479
# > mean(userdcars$price)
# [1] 12961.93
# > median(userdcars$price)
# [1] 13591.5
# > range(userdcars$price)
# [1] 3800 21992
# > diff(range(userdcars$price))
# [1] 18192
# > str(userdcars$price)
# int [1:150] 21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 ...
# > summary(userdcars$price)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 3800 10995 13592 12962 14904 21992
# > IQR(userdcars$price)
# [1] 3909.5
# > quantile(userdcars$price, probs=c(0.01, 0.99))
# 1% 99%
# 5428.69 20505.00
# > quantile(userdcars$price, seq(from=0,to=1, by=0.20))
0% 20% 40% 60% 80% 100%
# 3800.0 10759.4 12993.8 13992.0 14999.0 21992.0
'R Language' 카테고리의 다른 글
[R] 5. 회귀분석 (0) | 2020.04.22 |
---|---|
[R] 4. 행렬 기초 이론 (0) | 2020.04.19 |
[R] 2. R 데이터 구조 (0) | 2020.03.29 |
[R] 1. R Program Install (0) | 2020.03.29 |
댓글