数据分析教学pandas数据结构.docx

资源描述

数据分析教学pandas数据结构.docx

《数据分析教学pandas数据结构.docx》由会员分享，可在线阅读，更多相关《数据分析教学pandas数据结构.docx（12页珍藏版）》请在冰豆网上搜索。

数据分析教学pandas数据结构.docx

数据分析教学pandas数据结构

数据分析教学——pandas数据结构

一、Pandas数据结构

Pandas处理有三种数据结构形式：

Series，DataFrame,index。

Series和DataFrame是现在常用的两种数据类型。

1.Series

Series和一维数组很像，只是它的每一个值都有一个索引，输出显示时索引在左，值在右。

pandas.Series（data,index=index,dtype,copy）

∙data:

可以是多种类型，如列表，字典，标量等

∙index:

索引值必须是唯一可散列的，与数据长度相同，如果没有索引被传递，则默认为**np.arrange（n）**

∙dtype:

设置数据类型

∙copy:

复制数据，默认为Flase

1）创建一个空的序列

importnumpyasnp

importpandasaspd

#利用Series序列构造函数

#创建一个空序列

s=pd.Series（）

print（s）

"""

输出：

Series（[],dtype:

float64）

"""

2）从ndarray创建一个序列：

如果数据是ndarray，则传递的索引必须具有相同的长度。

如果没有索引被传递，那么默认情况下，索引将是range（n），其中n是数组长度，即[0,1,2,3...。

范围（LEN（阵列））-1]。

#使用ndarray创建一个序列

data=np.array（['a','b','c','d']）

s=pd.Series（data）

print（s）

"""

输出：

dtype:

object

"""

不给赋索引值时，默认的索引范围为1~（len（data）-1）

3）传入索引值：

#传入索引值

data=np.array（['a','b','c','d']）

s=pd.Series（data,index=[12,13,14,15]）

print（s）

"""

输出:

12a

13b

14c

15d

dtype:

object

"""

4）从字典创建一个序列：

当所创建的索引中，未给赋值时，也即缺少元素是，用NAN填充

data={'a':

0,'b':

1,'c':

2.}

s=pd.Series（data,index=['b','a','c','d']）

print（s）

"""

输出：

b1.0

a0.0

c2.0

dNaN

dtype:

float64

"""

5）用标量创建一个序列：

#从标量创建一个序列

s=pd.Series（5,index=['a','b','c','d']）

print（s）

"""

输出：

dtype:

int64

"""

6）检索数据：

data=[1,2,3,4]

s=pd.Series（data,index=['a','b','c','d']）

print（s["d"]）#检索索引为d的数据

print（s[-3:

]）#检索后3个数据

"""

输出：

dtype:

int64

"""

2.DataFrame

DataFrame是一个2维标签的数据结构，它的列可以存在不同的类型。

你可以把它简单的想成Excel表格或SQLTable，或者是包含字典类型的Series。

它是最常用的Pandas对象。

和Series一样，DataFrame接受许多不同的类型输入。

pandas.DataFrame（data,index,columns,dtype）

∙data:

包含一维数组，列表对象，或者是Series对象的字典对象

∙index：

对于行标签，如果没有索引被传递，则要用于结果帧的索引是可选缺省值np.arrange（n）。

∙columns:

对于列标签，可选的默认语法是-np.arrange（n）。

这只有在没有通过索引的情况下才是正确的。

∙dtype:

每列的数据类型

1）创建一个空的DataFrame

#创建一个空的DataFrame

importpandasaspd

df=pd.DataFrame（）

print（df）

"""

输出：

EmptyDataFrame

Columns:

[]

Index:

[]

"""

2）从列表中创建一个DataFrame

DateFrame可以使用单个列表或者列表列表创建

data=[1,2,3,4,5]

df=pd.DataFrame（data）

print（df）

"""

输出：

"""

data=[['Al',9],['Bl',8],['Cl',10]]

#dtype参数将Age列的类型更改为浮点型

df=pd.DataFrame（data,columns=['Name','Age'],dtype=float）

print（df）

"""

输出：

NameAge

0Al9.0

1Bl8.0

2Cl10.0

"""

3）从ndarrys/lists的字典创建一个dataFrame

所有的ndarrays必须具有相同的长度。

如果索引被传递，那么索引的长度应该等于数组的长度。

如果没有索引被传递，那么默认情况下，索引将是range（n），其中n是数组长度。

#从ndarrays/List的Dict创建一个DataFrame

data1={'Name':

['Al','Bl','Cl'],'Age':

[9,8,10]}

df1=pd.DataFrame（data1）

print（df1）

"""

输出：

AgeName

09Al

18Bl

210Cl

"""

4）添加行标签

data1={'Name':

['Al','Bl','Cl'],'Age':

[9,8,10]}

#添加行标签

df1=pd.DataFrame（data1,index=['rank1','rank2','rank3']）

print（df1）

"""

输出：

AgeName

rank19Al

rank28Bl

rank310Cl

"""

5）在列表中创建一个dataframe

importpandasaspd

data=[{'a':

1,'b':

2},{'a':

5,'b':

10,'c':

20}]

df=pd.DataFrame（data,index=['rank1','rank2']）

print（df）

"""

输出:

abc

rank112NaN

rank251020.0

"""

#使用字典，行索引，列索引列表创建DataFrame

data=[{'a':

1,'b':

2},{'a':

2,'b':

10,'c':

9}]

df1=pd.DataFrame（data,index=['rank1','rank2'],columns=['a','b']）

print（'df1:

\n',df1）

df2=pd.DataFrame（data1,index=['rank1','rank2'],columns=['a','b1']）

print（'df2:

\n',df2）

"""

输出：

df1:

rank112

rank2210

df2:

ab1

rank11NaN

rank22NaN

"""

6）从序列字典中创建一个DataFrame，并进行列添加，删除

#从序列字典创建一个DataFrame

d={'one':

pd.Series（[1,2,3],index=['a','b','c']）,

'two':

pd.Series（[1,2,3,4],index=['a','b','c','d']）}

df=pd.DataFrame（d）

print（df）

"""

输出：

onetwo

a1.01

b2.02

c3.03

dNaN4

"""

添加列：

#添加列

d={'one':

pd.Series（[1,2,3],index=['a','b','c']）,

'two':

pd.Series（[1,2,3,4],index=['a','b','c','d']）}

df=pd.DataFrame（d）

df['three']=pd.Series（[20,3,21],index=['a','b','d']）

df['four']=df['one']+df['three']

print（df）

"""

输出：

onetwothreefour

a1.0120.021.0

b2.023.05.0

c3.03NaNNaN

dNaN421.0NaN

"""

删除列：

#删除列

d={'one':

pd.Series（[1,2,3],index=['a','b','c']）,

'two':

pd.Series（[1,2,3,4],index=['a','b','c','d']）,

'three':

pd.Series（[20,3,21],index=['a','b','d']）}

df=pd.DataFrame（d）

print（df）

"""

输出：

onethreetwo

a1.020.01

b2.03.02

c3.0NaN3

dNaN21.04

"""

#删除列

deldf['one']

print（df）

"""

输出：

threetwo

a20.01

b3.02

cNaN3

d21.04

"""

7）通过字典创建dataFrame,并进行行选择，添加，删除

#行选择，添加，删除

d={'one':

pd.Series（[1,2,3],index=['a','b','c']）,

'two':

pd.Series（[1,2,3,4],index=['a','b','c','d']）}

df=pd.DataFrame（d）

print（df,'\n'）

print（df.loc['b'],'\n'）

print（df.iloc[2],'\n'）

print（df[2:

4]）

"""

输出：

onetwo

a1.01

b2.02

c3.03

dNaN4

one2.0

two2.0

Name:

b,dtype:

float64

one3.0

two3.0

Name:

c,dtype:

float64

onetwo

c3.03

dNaN4

"""

添加行：

#添加行

df=pd.DataFrame（[[1,2],[3,4]],columns=['a','b']）

df2=pd.DataFrame（[[5,6],[7,8]],columns=['a','b']）

df=df.append（df2）

print（df）

"""

输出：

012

134

056

178

"""

删除行：

#删除行

df=df.drop（0）

print（df）

"""

输出：

134

178

"""

展开阅读全文