ZH奶酪Python使用ElementTree解析XML译Word文档格式.docx

资源描述

ZH奶酪Python使用ElementTree解析XML译Word文档格式.docx

《ZH奶酪Python使用ElementTree解析XML译Word文档格式.docx》由会员分享，可在线阅读，更多相关《ZH奶酪Python使用ElementTree解析XML译Word文档格式.docx（7页珍藏版）》请在冰豆网上搜索。

ZH奶酪Python使用ElementTree解析XML译Word文档格式.docx

一个C语言实现的可用API：

xml.etree.cElementTree.

Changedinversion2.7:

TheElementTreeAPIisupdatedto1.3.Formoreinformation,seeIntroducingElementTree1.3.

19.7.1.综述

这是关于使用xml.etree.ElementTree（ET）的简要综述，目的是演示如何创建block和模块的基本概念。

19.7.1.1.XML树和elements

XMLisaninherentlyhierarchicaldataformat,andthemostnaturalwaytorepresentitiswithatree.EThastwoclassesforthispurpose-ElementTree表示整个XML文档,andElement表示树中的一个节点。

遍历整个文档r（读写文件）通常使用ElementTree遍历单独的节点或者子节点通常使用element。

19.7.1.2.解析XML

我们使用下面的XML文档做为示例:

2008

141100

2011

59900

13600

我们有多种方法导入数据。

从硬盘文件导入：

importxml.etree.ElementTreeasET

tree=ET.parse（'

country_data.xml'

）

root=tree.getroot（）通过字符串导入：

root=ET.fromstring（country_data_as_string）fromstring（）解析XML时直接将字符串转换为一个Element，解析树的根节点。

其他的解析函数会建立一个ElementTree。

一个Element,根节点有一个tag以及一些列属性（保存在dictionary中）

root.tag

data'

root.attrib

{}有一些列孩子节点可供遍历：

forchildinroot:

...printchild.tag,child.attrib

...

country{'

name'

Liechtenstein'

}

Singapore'

Panama'

}孩子节点是嵌套的，我们可以通过索引访问特定的孩子节点。

root[0][1].text

2008'

19.7.1.3.查找感兴趣的element

Element拥有一些方法来帮助我们迭代遍历其子树。

例如：

Element.iter（）:

forneighborinroot.iter（'

neighbor'

）:

...printneighbor.attrib

Austria'

direction'

Switzerland'

Malaysia'

CostaRica'

Colombia'

}Element.findall（）查找当前element的孩子的属于某个tag的element。

Element.find（）查找属于某个tag的第一个element,Element.text访问element的文本内容。

Element.get（）获取element的属性。

forcountryinroot.findall（'

country'

...rank=country.find（'

rank'

）.text

...name=country.get（'

...printname,rank

Liechtenstein1

Singapore4

Panama68使用XPath.可以更加巧妙的访问element。

19.7.1.4.修改XML文件

ElementTree提供了一个简单的方法来建立XML文档并将其写入文件。

ElementTree.write（）提供了这个功能。

一旦被建立，一个Element对象可能会进行以下操作：

改变文本（比如Element.text）,添加或修改属性（Element.set（））,添加孩子（例如Element.append（））.

假设我们想将每个国家的排名+1，并且增加一个updated属性：

forrankinroot.iter（'

...new_rank=int（rank.text）+1

...rank.text=str（new_rank）

...rank.set（'

updated'

yes'

tree.write（'

output.xml'

）我们的XML现在是这样的：

我们可以使用这个函数来删除节点：

Element.remove（）.让我们删除所有排名大于50的国家：

...rank=int（country.find（'

）.text）

...ifrank>

50:

...root.remove（country）

19.7.1.5.创建XML文档：

SubElement（）函数也提供了一个为已有element创建子element的简便方法：

a=ET.Element（'

b=ET.SubElement（a,'

c=ET.SubElement（a,'

d=ET.SubElement（c,'

ET.dump（a）19.7.1.6.其他资源：

Seehttp:

//effbot.org/zone/element-index.htmfortutorialsandlinkstootherdocs.19.7.2.XPath

该模块提供了对XPathexpressions的有限的支持。

目的是支持其中的一部分句法；

完整的XPath工程超出了这个模块的范畴。

19.7.2.1.Example

root=ET.fromstring（countrydata）

#Top-levelelements

root.findall（"

#All'

grand-childrenof'

childrenofthetop-level

#elements

./country/neighbor"

#Nodeswithname='

thathavea'

year'

child

.//year/..[@name='

nodesthatarechildrenofnodeswithname='

.//*[@name='

]/year"

nodesthatarethesecondchildoftheirparent

.//neighbor[2]"

）19.7.2.2.支持的XPath语法

语法解释tag

选中符合给定tag的全部孩子elements。

Forexample,spamselectsallchildelementsnamedspam,andspam/eggselectsallgrandchildrennamedegginallchildrennamedspam.

选中全部孩子elements。

Forexample,*/eggselectsallgrandchildrennamedegg.

选中当前element。

Thisismostlyusefulatthebeginningofthepath,toindicatethatit’sarelativepath.

选中同一级别的全部子element.Forexample,.//eggselectsalleggelementsintheentiretree.

选中父亲节点；

[@attrib]

选中含有给定属性的全部节点。

[@attrib='

value'

]

选中含有给定属性以及给定属性值的全部节点。

Thevaluecannotcontainquotes.

[tag]

选中所有拥有一个叫做tag的孩子的elements。

Onlyimmediatechildrenaresupported.

[position]

选中所有位于指定位置的elements。

Thepositioncanbeeitheraninteger（1isthefirstposition）,theexpressionlast（）（forthe

展开阅读全文