R语言数据分析报告附代码数据风暴和其他恶劣天气事件引起的公共卫生和经济问题文档格式.docx
《R语言数据分析报告附代码数据风暴和其他恶劣天气事件引起的公共卫生和经济问题文档格式.docx》由会员分享,可在线阅读,更多相关《R语言数据分析报告附代码数据风暴和其他恶劣天气事件引起的公共卫生和经济问题文档格式.docx(9页珍藏版)》请在冰豆网上搜索。
library(R.utils)
library(data.table)
library(dplyr)
library(lubridate)
library(reshape2)
library(scales)
加载数据集
数据从互联网下载并加载到环境中。
fileUrl<
-"
#Createdirectorydataifneeded
if(!
file.exists("
./data"
)){
dir.create("
)
}
#Downloadthecompressedfileifneeded
./data/repdata-data-StormData.csv.bz2"
download.file(fileUrl,
"
mode="
wb"
)#mode"
forbinaryfiles
#Extractthefileifneeded
./data/repdata-data-StormData.csv"
bunzip2("
#LoadtheCSV
stormData<
-fread("
##
Read17.6%of967216rows
Read34.1%of967216rows
Read49.6%of967216rows
Read60.0%of967216rows
Read74.4%of967216rows
Read81.7%of967216rows
Read91.0%of967216rows
Read902297rowsand37(of37)columnsfrom0.523GBfilein00:
00:
09
事件类型
每个天气事件的类型存储在数据集的列EVTYPE中。
为了提高分析质量,报告的事件类型需要进行标准化。
#Removewhitespacesatthebeginingandtheendoftheeventtype
stormData$EVTYPE<
-str_trim(stormData$EVTYPE)
#Putalltheeventtypesinuppercase
-toupper(stormData$EVTYPE)
Subsetcolumnsrelevanttotheanalysis
stormDataValues<
-select(stormData,EVTYPE,
FATALITIES,INJURIES,
PROPDMG,PROPDMGEXP,
CROPDMG,CROPDMGEXP)
规范财产损失值
将一个新的PropertyDamage列添加到工作数据集中,以估计财产损失的标准化值,以美元表示。
#Initializenewcolumnwiththevalueoftheoriginalone
stormDataValues$PropertyDamage<
-stormDataValues$PROPDMG
#ProcessK/h/H/blankcases(multiplierby1,000):
stormDataValues[PROPDMGEXP=="
h"
]$PROPDMGEXP<
-"
K"
H"
"
]$PropertyDamage<
-
stormDataValues[PROPDMGEXP=="
]$PROPDMG*1000
#Processm/M(multiplierby1,000,000):
m"
M"
]$PROPDMG*1000000
#ProcessB(multiplierby1,000,000,000):
B"
]$PROPDMG*1000000000
#Process1/2/.../8(multiplierby10~^exponentialindicator)
stormDataValues[
PROPDMGEXP%in%c("
1"
"
2"
3"
4"
5"
6"
7"
8"
)]$PropertyDamage<
stormDataValues[
)]$PROPDMG*10^as.numeric(stormDataValues[
)]$PROPDMGEXP)
Normalizecropdamagevalues
Anewcolumn
CropDamage
isaddedtotheworkdatasetforthenormalizedvalueincropdamageestimated,expressedinUSdollars.
stormDataValues$CropDamage<
-stormDataValues$CROPDMG
#Processk/K(multiplierby1,000):
stormDataValues[CROPDMGEXP=="
k"
]$CROPDMGEXP<
]$CropDamage<
stormDataValues[CROPDMGEXP=="
]$CROPDMG*1000
]$CROPDMG*1000000
]$CROPDMG*1000000000
Aggregatebydamagekind
Anaggregationstrategyisdevisedtostablisharankingforthetwogroupsofinformations:
damagetothepublichealthandtotheeconomy.
Thenewcolumn
PeopleHarmed
isthesumofthe
FATALITIES
and
INJURIES
columns.
EconomicDamage
PropertyDamage
#Newcolumnwiththesumoffatalitiesandinjuries
stormDataValues$PeopleHarmed<
stormDataValues$FATALITIES+stormDataValues$INJURIES
#Newcolumnwiththesumofpropertyandcropdamage
stormDataValues$EconomicDamage<
stormDataValues$CropDamage+stormDataValues$PropertyDamage
Analyzingthequantiles
Thetargetistolookhowthedataisgroupedaftertheprocessing.Inordertoachievethat,thequantilesofthecolumnsrelatedtothepresentreportarecomputed.
#Damagetothepublichealth
quantile(stormDataValues$PeopleHarmed)
##0%25%50%75%100%
##00001742
#Economicdamage
quantile(stormDataValues$EconomicDamage)
##0001000115032500000
Forbothcases,mostoftheusefuldatatoaddresstheana