完整版计算机语言python100道pandas含答案.docx

上传人:b****5 文档编号:6881771 上传时间:2023-01-12 格式:DOCX 页数:13 大小:21.23KB
下载 相关 举报
完整版计算机语言python100道pandas含答案.docx_第1页
第1页 / 共13页
完整版计算机语言python100道pandas含答案.docx_第2页
第2页 / 共13页
完整版计算机语言python100道pandas含答案.docx_第3页
第3页 / 共13页
完整版计算机语言python100道pandas含答案.docx_第4页
第4页 / 共13页
完整版计算机语言python100道pandas含答案.docx_第5页
第5页 / 共13页
点击查看更多>>
下载资源
资源描述

完整版计算机语言python100道pandas含答案.docx

《完整版计算机语言python100道pandas含答案.docx》由会员分享,可在线阅读,更多相关《完整版计算机语言python100道pandas含答案.docx(13页珍藏版)》请在冰豆网上搜索。

完整版计算机语言python100道pandas含答案.docx

完整版计算机语言python100道pandas含答案

1.Importpandasunderthenamepd.

In [1]:

importpandasaspd

importnumpyasnp

2.Printtheversionofpandasthathasbeenimported.

In [2]:

pd.__version_

3.Printoutalltheversioninformationofthelibrariesthatarerequiredbythepandaslibrary

In [3]:

pd.show_versions()

4.CreateaDataFramedffromthisdictionarydatawhichhastheindexlabels.

In [2]:

data={'animal':

['cat','cat','snake','dog','dog','cat','snake','cat','dog

'age':

[2.5,3,0.5,np.nan,5,2,4.5,np.nan,7,3],

'visits':

[1,3,2,3,2,3,1,1,2,1],

'priority':

['yes','yes','no','yes','no','no','no','yes','no','no']

labels=['a','b','c','d','e','f','g','h','i','j']

df=pd.DataFrame(data,index=labels)

5.DisplayasummaryofthebasicinformationaboutthisDataFrameanditsdata.

In [5]:

df.info()

#...or...

df.describe()

6.Returnthefirst3rowsoftheDataFramedf

In [6]:

df.iloc[:

3]

#orequivalently

df.head(3)

7.Selectjustthe'animal'and'age'columnsfromtheDataFramedf.

In [7]:

df.loc[:

['animal','age']]

#or

df[['animal','age']]

8.Selectthedatainrows[3,4,8]andincolumns['animal','age'].

In [3]:

df.loc[df.index[[3,4,8]],['animal','age']]

9.Selectonlytherowswherethenumberofvisitsisgreaterthan3.

In [4]:

df[df['visits']>3]

10.Selecttherowswheretheageismissing,i.e.isNaN.

In [5]:

df[df['age'].isnull()]

11.Selecttherowswheretheanimalisacatandtheageislessthan3.

In [6]:

df[(df['animal']=='cat')&(df['age']<3)]

12.Selecttherowstheageisbetween2and4(inclusive).

In [7]:

df[df['age'].between(2,4)]

13.Changetheageinrow'f'to1.5.

In [ ]:

df.loc['f','age']=1.5

14.Calculatethesumofallvisits(thetotalnumberofvisits).

In [ ]:

df['visits'].sum()

15.Calculatethemeanageforeachdifferentanimalindf.

In [8]:

df.groupby('animal')['age'].mean()

16.Appendanewrow'k'todfwithyourchoiceofvaluesforeachcolumn.Thendeletethatrowtoreturnthe

originalDataFrame.

In [ ]:

df.loc['k']=[5.5,'dog','no',2]

#andthendeletingthenewrow...

df=df.drop('k')

17.Countthenumberofeachtypeofanimalindf.

In [9]:

df['animal'].value_counts()

18.Sortdffirstbythevaluesinthe'age'indecendingorder,thenbythevalueinthe'visit'columnin

ascendingorder.

In [10]:

df.sort_values(by=['age','visits'],ascending=[False,True])

19.The'priority'columncontainsthevalues'yes'and'no'.Replacethiscolumnwithacolumnofboolean

values:

'yes'shouldbeTrueand'no'shouldbeFalse.

In [ ]:

df['priority']=df['priority'].map({'yes':

True,'no':

False})

In [14]:

df['animal']=df['animal'].replace('snake','python')

print(df)

21.Foreachanimaltypeandeachnumberofvisits,findthemeanage.Inotherwords,eachrowisananimal,

eachcolumnisanumberofvisitsandthevaluesarethemeanages(hint:

useapivottable).

In [15]:

df.pivot_table(index='animal',columns='visits',values='age',aggfunc='mean')

22.YouhaveaDataFramedfwithacolumn'A'ofintegers.Forexample:

df=pd.DataFrame({'A':

[1,2,2,3,4,5,5,5,6,7,7]})

Howdoyoufilteroutrowswhichcontainthesameintegerastherowimmediatelyabove?

In [16]:

df=pd.DataFrame({'A':

[1,2,2,3,4,5,5,5,6,7,7]})

df.loc[df['A'].shift()!

=df['A']]

#Alternatively,wecouldusedrop_duplicates()here.Note

#thatthisremoves*all*duplicatesthough,soitwon't

23.GivenaDataFrameofnumericvalues,say

df=pd.DataFrame(np.random.random(size=(5,3)))#a5x3frameoffloatvalu

es

howdoyousubtracttherowmeanfromeachelementintherow?

In [ ]:

df.sub(df.mean(axis=1),axis=0)

24.SupposeyouhaveDataFramewith10columnsofrealnumbers,forexample:

df=pd.DataFrame(np.random.random(size=(5,10)),columns=list('abcdefghij'

))

Whichcolumnofnumbershasthesmallestsum?

((Findthatcolumn'slabel.)

In [17]:

df.sum().idxmin()

25.HowdoyoucounthowmanyuniquerowsaDataFramehas(i.e.ignoreallrowsthatareduplicates)?

In [ ]:

len(df)-df.duplicated(keep=False).sum()

#orperhapsmoresimply...

len(df.drop_duplicates(keep=False))

26.YouhaveaDataFramethatconsistsof10columnsoffloating--pointnumbers.Supposethatexactly5

entriesineachrowareNaNvalues.ForeachrowoftheDataFrame,findthecolumnwhichcontainsthethird

NaNvalue.

(YoushouldreturnaSeriesofcolumnlabels.)

In [ ]:

(df.isnull().cumsum(axis=1)==3).idxmax(axis=1)

27.ADataFramehasacolumnofgroups'grps'andandcolumnofnumbers'vals'.Forexample:

df=pd.DataFrame({'grps':

list('aaabbcaabcccbbc'),

'vals':

[12,345,3,1,45,14,4,52,54,23,235,21,57,3,87]})

In [ ]:

df.groupby('grp')['vals'].nlargest(3).sum(level=0)

28.ADataFramehastwointegercolumns'A'and'B'.Thevaluesin'A'arebetween1and100(inclusive).For

eachgroupof10consecutiveintegersin'A'(i.e.(0,10],(10,20],...),calculatethesumofthe

correspondingvaluesincolumn'B'.

In [ ]:

df.groupby(pd.cut(df['A'],np.arange(0,101,10)))['B'].sum()

29.ConsideraDataFramedfwherethereisanintegercolumn'X':

df=pd.DataFrame({'X':

[7,2,0,3,4,2,5,0,3,4]})

Foreachvalue,countthedifferencebacktothepreviouszero(orthestartoftheSeries,whicheveriscloser).

Thesevaluesshouldthereforebe[1,2,0,1,2,3,4,0,1,2].Makethisanewcolumn'Y'.

In [ ]:

izero=np.r_[-1,(df['X']==0).nonzero()[0]]#indicesofzeros

idx=np.arange(len(df))

df['Y']=idx-izero[np.searchsorted(izero-1,idx)-1]

30.ConsideraDataFramecontainingrowsandcolumnsofpurelynumericaldata.Createalistoftherowcolumnindexlocationsofthe3largestvalues.

In [ ]:

df.unstack().sort_values()[-3:

].index.tolist()

31.GivenaDataFramewithacolumnofgroupIDs,'grps',andacolumnofcorrespondingintegervalues,

'vals',replaceanynegativevaluesin'vals'withthegroupmean.

In [ ]:

defreplace(group):

mask=group<0

group[mask]=group[~mask].mean()

returngroup

df.groupby(['grps'])['vals'].transform(replace)

32.Implementarollingmeanovergroupswithwindowsize3,whichignoresNaNvalue.Forexampleconsider

thefollowingDataFrame:

>>>df=pd.DataFrame({'group':

list('aabbabbbabab'),

'value':

[1,2,3,np.nan,2,3,

np.nan,1,7,3,np.nan,8]})

>>>df

groupvalue

0a1.0

1a2.0

2b3.0

3bNaN

4a2.0

5b3.0

6bNaN

7b1.0

8a7.0

9b3.0

10aNaN

11b8.0

ThegoalistocomputetheSeries:

01.000000

11.500000

23.000000

33.000000

41.666667

53.000000

63.000000

72.000000

83.666667

92.000000

104.500000

114.000000

In [ ]:

g1=df.groupby(['group'])['value']#groupvalues

g2=df.fillna(0).groupby(['group'])['value']#fillna,thengroupvalues

s=g2.rolling(3,min_periods=1).sum()/g1.rolling(3,min_periods=1).count()#comp

s.reset_index(level=0,drop=True).sort_index()

33.CreateaDatetimeIndexthatcontainseachbusinessdayof2015anduseittoindexaSeriesofrandom

numbers.Let'scallthisSeriess.

In [ ]:

dti=pd.date_range(start='2015-01-01',end='2015-12-31',freq='B')

s=pd.Series(np.random.rand(len(dti)),index=dti)

34.FindthesumofthevaluesinsforeveryWednesday

In [ ]:

s[s.index.weekday==2].sum()

35.Foreachcalendarmonthins,findthemeanofvalues.

In [ ]:

s.resample('M').mean()

36.Foreachgroupoffourconsecutivecalendarmonthsins,findthedateonwhichthehighestvalue

occurred.

In [ ]:

s.groupby(pd.TimeGrouper('4M')).idxmax()

37.CreateaDateTimeIndexconsistingofthethirdThursdayineachmonthfortheyears2015and2016.

In [ ]:

pd.date_range('2015-01-01','2016-12-31',freq='WOM-3THU')

38.SomevaluesinthetheFlightNumbercolumnaremissing.Thesenumbersaremeanttoincreaseby10witheachrowso10055and10075needtobeputinplace.Fillinthesemissingnumbersandmakethecolumnan

integercolumn(insteadofafloatcolumn)

In [ ]df['FlightNumber']=df['FlightNumber'].interpolate().astype(int)

39.TheFrom_Tocolumnwouldbebetterastwoseparatecolumns!

Spliteachstringontheunderscore

delimiter_togiveanewtemporaryDataFramewiththecorrectvalues.Assignthecorrectcolumnnamesto

thistemporaryDataFrame.

In [ ]:

temp=df.From_To.str.split('_',expand=True)

temp.columns=['From','To']

40.NoticehowthecapitalisationofthecitynamesisallmixedupinthistemporaryDataFrame.Standardise

thestringssothatonlythefirstletterisuppercase(e.g."londON"shouldbecome"London".)

In [ ]

temp['From']=temp['From'].str.capitalize()

temp['To']=temp['To'].str.capitalize()

41.DeletetheFrom_TocolumnfromdfandattachthetemporaryDataFramefromthepreviousquestions.

In [ ]:

df=df.drop('From_To',axis=1)

df=df.join(temp)

42.IntheAirlinecolumn,youcanseesomeextrapuctuationandsymbolshaveappearedaroundtheairline

names.Pulloutjusttheairlinename.E.g.'(BritishAirways.)'shouldbecome'British

Airways'.

In [ ]:

df['Airline']=df['Airline'].str.extract('([a-zA-Z\s]+)',expand=False).str.strip()

#note:

using.strip()getsridofanyleading/trailing

43.IntheRecentDelayscolumn,thevalueshavebeenenteredintotheDataFrameasalist.Wewouldlikeeach

firstvalueinitsowncolumn,eachsecondvalueinitsowncolumn,andsoon.Ifthereisn'tanNthvalue,the

valueshouldbeNaN.

ExpandtheSeriesoflistsintoaDataFramenameddelays,renamethecolumnsdelay_1,delay_2,

etc.andreplacetheunwantedRecentDelayscolumnindfwithdelays.In [ ]:

delays=df['RecentDelays'].apply(pd.Series)

delays.columns=['delay_{}'.format(n)forninrange(1,len(delays.columns)+1)]

df=df.drop('RecentDelays',axis=1).join(delays)

44.Giventhelistsletters=['A','B','C']andnum

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 人文社科

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1