Lemur索引文件格式分析.docx

上传人:b****6 文档编号:4701024 上传时间:2022-12-07 格式:DOCX 页数:4 大小:16.10KB
下载 相关 举报
Lemur索引文件格式分析.docx_第1页
第1页 / 共4页
Lemur索引文件格式分析.docx_第2页
第2页 / 共4页
Lemur索引文件格式分析.docx_第3页
第3页 / 共4页
Lemur索引文件格式分析.docx_第4页
第4页 / 共4页
亲,该文档总共4页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

Lemur索引文件格式分析.docx

《Lemur索引文件格式分析.docx》由会员分享,可在线阅读,更多相关《Lemur索引文件格式分析.docx(4页珍藏版)》请在冰豆网上搜索。

Lemur索引文件格式分析.docx

Lemur索引文件格式分析

Lemur索引文件格式分析WORD文档使用说明:

Lemur索引文件格式分析来源于PDFWORD

本WOED文件是采用在线转换功能下载而来,因此在排版和显示效果方面可能不能满足您的应用需求。

如果需要查看原版WOED文件,请访问这里

Lemur索引文件格式分析文件原版地址:

Lemur索引文件格式分析|PDF转换成WROD_PDF阅读器下载

IndriBuildIndexC索引文件格式

IndriBuildIndexC索引文件格式....................................................................................1I.Repository.................................................................................................................1A.文件列表及描述.......................................................................................................1II.DiskIndex..................................................................................................................1A.文件列表及描述.......................................................................................................1B.文件格式(参见IndexWriter:

:

write()).................................................................2III.CompressedCollection..........................................................................................3A.文件列表及描述.......................................................................................................4B.文件格式(参见CompressedCollection:

:

addDocument())..................................4

I.RepositoryA.文件列表及描述

FileNameIndex(Directory)collection(Directory)deletedmanifestDescriptionContainszeroormoreDiskIndexinstances,innumberedsubdirectories.ContainsaCompressedCollectioninstance.Bitmapcontainingalistofalldeleteddocuments.XMLfilestoringconfigurationinformationaboutthecollection,includingindexcounts,stemmerandstopwordinformation.

II.DiskIndexA.文件列表及描述

FileNamedirectFiledocumentLengthsdocumentStatisticsfieldsFilefrequentIDfrequentStringfrequentTermsDescriptionNumericrepresentationofeachdocumentinthecollection,usefulforqueryexpansionLengthofeachdocumentinwords,4bytesperdocumentOffsetofeachdocumentinthedirectFile,documentlengthinthedirectFile,documenttermlength,numberofuniquetermsineachdocumentInvertedextentlistforfieldsBulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforfrequenttermsListoftermID,termStringpairs,notstoredinatree,usedatindexmergetime

infrequentIDinfrequentStringinvertedFilemanifest

BulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforinfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforinfrequenttermsInvertedlistsforeachterminthecollectionXMLfilestoringimportantcollectionstatisticsandconfigurationinformation

文件格式(IndexWriter:

:

write()())B.文件格式(参见IndexWriter:

:

write())

invertedFileinvertedFileTermStatistic<{TermInvDataOffset}TermStatistic,TermInvList>*RVLDataLength(UINT32),[TermString(CString),TermData]TermDocCount(UInt),TermMaxDocLength(UInt),

TermDataTermTotalCount(UINT64),TermMinDocLength(UInt),TermFieldStatisticsTermFieldStatisticsTermInvListTopDocsBatchDataDocEntry

FiledCount

ControlByte(Byte),?

*TopDocsCount(Int),TopDocsCountNextBatchDocID(DOCID_T),RVLDataLength(int),[*]DocID(Int),PositionCount(Int),PositionCount

frequentTerms,frequentID,frequentString,infrequentID,frequentStringfrequentTermsfrequentIDfrequentStringDistTermData1DiskTermData2DistTermData3*MapMapTermData,TermID(TERMID_T),TermString(CString),TermFilePointerTermData,TermString(CString),TermFilePointerTermData,TermID(TERMID_T),TermFilePointer

TermFilePointerinfrequentIDinfrequentStringfieldsFilefieldsFileBatchData

TermInvDataOffset(UINT64),TermInvDataLength(UINT64)MapMap

{FieldDataOffset}ControlByte(UINT8),

*

NextBatchDocID(DOCID_T),RVLDataLength(Int),[*]ExtentLength(Int),

ExtentCount(Int),

ExtentParent(Int)?

ExtentNumber(INT64)?

>ExtentCountdirectFile,documentLengths,documentStatisticsdirectFile*TermCount(Int),FieldCount(Int),

DocDirectDataFieldCountFiledExtentFieldNumber(UINT64)documentLengthsdocumentStatistics

TermCount,

FiledID(Int),

FilesdParentOrdinal(Int),

FieldBegin(Int),

FieldEnd(Int),

**

DocumentDataDocDirectDataOffset(UINT64),DocDirectDataLength(Int),DocIndexedLength(Int),DocTotalLength(Int),DocUniqTermCount(Int)manifestmanifestIndexType(CString),IndexBuildDate(CString),IndriDistribution(CString),CorpusStatistics,FiledCountCorpusStatisticsTotalDocCount(UINT64),TotalTermCount(UINT64),UniqTermCount(UINT64),DocBase(DOCID_T),FrequentTermCount(Int),MaxDocument(DOCID_T)FieldDescriptionIsNumeric(Bool),IsOrdinal(Bool),IsParental(Bool),FieldName(CString),ParseName(String)?

TotalDocCount(UInt),TotalTermCount(UINT64),FieldDataOffset(UINT64)

III.

CompressedCollectionCompressedCollection

A.文件列表及描述

FileNamelookupmanifeststorageDescriptionKeyfile(B-Tree)thatstoresthemappingbetweendocumentedandoffsetintostorageXMLfilestoringconfigurationinformationforthisCompressedCollectionStorescompressedversionofeachdocument(hereusingzlibcompressionlibrary)inthecollection,alongwithbyteoffsetsforeachwordineachdocument,andvariousdocumentmetadataaddedatindextime.Keyfile(B-Tree)thatstoresthemappingbetweenadocumentIDandametadatastringKeyfile(B-Tree)thatstoresthemappingbetweenametadatastringandoneormoredocumentIDs

forwardLookupnreverseLookupn

文件格式(CompressedCollection:

:

addDocument()B.文件格式(参见CompressedCollection:

:

addDocument())

lookuplookupMap

manifestmanifestForwardMetadataList,ReverseMetadataListMetadataName(String)MetadataName(String)

ForwardMetadataListReverseMetadataListstoragestorage

<{StorageDocOffset}StorageDocData>*PairCount,PairCount,

StorageDocDataPairCount(UINT32)KeyValuePairMetadataPair

MetadataPair|TermPositionPair|TextPair|ContentPair|ContentLengthPairMetadataKey(CString),MetadataValue(Void*)TermPositionFlag(CString),*

TermPositionPairTextPair

TextFlag(CString),TextData(Void*)

ContentPair

ContentFlag(CString),ContentOffset(Int)ContentLengthFlag(CString),ContentLength(Int)

ContentLengthPair

forwardLookupn,reverseLookupnforwardLookupnreverseLookupnMapMap

尊重他人劳动,转载请注明来自[PDF转换成WROD_PDF阅读器下载:

本文【Lemur索引文件格式分析】网址:

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 高中教育 > 理化生

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1