Lemur索引文件格式分析Word格式文档下载.docx

上传人:b****6 文档编号:17635921 上传时间:2022-12-07 格式:DOCX 页数:4 大小:16.10KB
下载 相关 举报
Lemur索引文件格式分析Word格式文档下载.docx_第1页
第1页 / 共4页
Lemur索引文件格式分析Word格式文档下载.docx_第2页
第2页 / 共4页
Lemur索引文件格式分析Word格式文档下载.docx_第3页
第3页 / 共4页
Lemur索引文件格式分析Word格式文档下载.docx_第4页
第4页 / 共4页
亲,该文档总共4页,全部预览完了,如果喜欢就下载吧!
下载资源
资源描述

Lemur索引文件格式分析Word格式文档下载.docx

《Lemur索引文件格式分析Word格式文档下载.docx》由会员分享,可在线阅读,更多相关《Lemur索引文件格式分析Word格式文档下载.docx(4页珍藏版)》请在冰豆网上搜索。

Lemur索引文件格式分析Word格式文档下载.docx

write()).................................................................2III.CompressedCollection..........................................................................................3A.文件列表及描述.......................................................................................................4B.文件格式(参见CompressedCollection:

addDocument())..................................4

I.RepositoryA.文件列表及描述

FileNameIndex(Directory)collection(Directory)deletedmanifestDescriptionContainszeroormoreDiskIndexinstances,innumberedsubdirectories.ContainsaCompressedCollectioninstance.Bitmapcontainingalistofalldeleteddocuments.XMLfilestoringconfigurationinformationaboutthecollection,includingindexcounts,stemmerandstopwordinformation.

II.DiskIndexA.文件列表及描述

FileNamedirectFiledocumentLengthsdocumentStatisticsfieldsFilefrequentIDfrequentStringfrequentTermsDescriptionNumericrepresentationofeachdocumentinthecollection,usefulforqueryexpansionLengthofeachdocumentinwords,4bytesperdocumentOffsetofeachdocumentinthedirectFile,documentlengthinthedirectFile,documenttermlength,numberofuniquetermsineachdocumentInvertedextentlistforfieldsBulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforfrequenttermsListoftermID,termStringpairs,notstoredinatree,usedatindexmergetime

infrequentIDinfrequentStringinvertedFilemanifest

BulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforinfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforinfrequenttermsInvertedlistsforeachterminthecollectionXMLfilestoringimportantcollectionstatisticsandconfigurationinformation

文件格式(IndexWriter:

write()())B.文件格式(参见IndexWriter:

write())

invertedFileinvertedFileTermStatistic<

{TermInvDataOffset}TermStatistic,TermInvList>

*RVLDataLength(UINT32),[TermString(CString),TermData]TermDocCount(UInt),TermMaxDocLength(UInt),

TermDataTermTotalCount(UINT64),TermMinDocLength(UInt),TermFieldStatisticsTermFieldStatisticsTermInvListTopDocsBatchDataDocEntry

<

TermFieldTotalCount(UINT64),TermFieldElementCount(UInt)>

FiledCount

ControlByte(Byte),<

TopDocs>

?

<

BatchData>

*TopDocsCount(Int),<

DocID(DOCID_T),PositionCount(Int),DocLength(Int)>

TopDocsCountNextBatchDocID(DOCID_T),RVLDataLength(int),[<

DocEntry>

*]DocID(Int),PositionCount(Int),<

TermPosition(Int)>

PositionCount

frequentTerms,frequentID,frequentString,infrequentID,frequentStringfrequentTermsfrequentIDfrequentStringDistTermData1DiskTermData2DistTermData3<

DistTermData1>

*Map<

TermID(TERMID_T),DiskTermData2>

Map<

TermString(CString),DiskTermData3>

TermData,TermID(TERMID_T),TermString(CString),TermFilePointerTermData,TermString(CString),TermFilePointerTermData,TermID(TERMID_T),TermFilePointer

TermFilePointerinfrequentIDinfrequentStringfieldsFilefieldsFileBatchData

TermInvDataOffset(UINT64),TermInvDataLength(UINT64)Map<

TermString(),DiskTermData3>

{FieldDataOffset}ControlByte(UINT8),

*

NextBatchDocID(DOCID_T),RVLDataLength(Int),[<

*]ExtentLength(Int),

ExtentCount(Int),<

ExtentBegin(Int),DocEntryDocID(DOCID_T),ExtentOrdinal(Int)?

ExtentParent(Int)?

ExtentNumber(INT64)?

>

ExtentCountdirectFile,documentLengths,documentStatisticsdirectFile<

RVLDataLength(UINT32),{DocDirectDataOffset}[DocDirectData]>

*TermCount(Int),FieldCount(Int),

DocDirectData<

FieldExtent>

FieldCountFiledExtentFieldNumber(UINT64)documentLengthsdocumentStatistics

TermID(TERMID_T)>

TermCount,

FiledID(Int),

FilesdParentOrdinal(Int),

FieldBegin(Int),

FieldEnd(Int),

DocLength(UINT32)>

*<

DocumentData>

DocumentDataDocDirectDataOffset(UINT64),DocDirectDataLength(Int),DocIndexedLength(Int),DocTotalLength(Int),DocUniqTermCount(Int)manifestmanifestIndexType(CString),IndexBuildDate(CString),IndriDistribution(CString),CorpusStatistics,<

FieldDescription>

FiledCountCorpusStatisticsTotalDocCount(UINT64),TotalTermCount(UINT64),UniqTermCount(UINT64),DocBase(DOCID_T),FrequentTermCount(Int),MaxDocument(DOCID_T)FieldDescriptionIsNumeric(Bool),IsOrdinal(Bool),IsParental(Bool),FieldName(CString),ParseName(String)?

TotalDocCount(UInt),TotalTermCount(UINT64),FieldDataOffset(UINT64)

III.

CompressedCollectionCompressedCollection

A.文件列表及描述

FileNamelookupmanifeststorageDescriptionKeyfile(B-Tree)thatstoresthemappingbetweendocumentedandoffsetintostorageXMLfilestoringconfigurationinformationforthisCompressedCollectionStorescompressedversionofeachdocument(hereusingzlibcompressionlibrary)inthecollection,alongwithbyteoffsetsforeachwordineachdocument,andvariousdocumentmetadataaddedatindextime.Keyfile(B-Tree)thatstoresthemappingbetweenadocumentIDandametadatastringKeyfile(B-Tree)thatstoresthemappingbetweenametadatastringandoneormoredocumentIDs

forwardLookupnreverseLookupn

文件格式(CompressedCollection:

addDocument()B.文件格式(参见CompressedCollection:

addDocument())

lookuplookupMap<

DocID(DOCID_T),StorageDocOffset(UINT64)>

manifestmanifestForwardMetadataList,ReverseMetadataListMetadataName(String)MetadataName(String)

ForwardMetadataListReverseMetadataListstoragestorage

{StorageDocOffset}StorageDocData>

KeyValuePair>

PairCount,<

KeyOffset(UINT32),ValueOffset(UINT32)>

PairCount,

StorageDocDataPairCount(UINT32)KeyValuePairMetadataPair

MetadataPair|TermPositionPair|TextPair|ContentPair|ContentLengthPairMetadataKey(CString),MetadataValue(Void*)TermPositionFlag(CString),<

TermBegin(Int),TermLength(Int)>

TermPositionPairTextPair

TextFlag(CString),TextData(Void*)

ContentPair

ContentFlag(CString),ContentOffset(Int)ContentLengthFlag(CString),ContentLength(Int)

ContentLengthPair

forwardLookupn,reverseLookupnforwardLookupnreverseLookupnMap<

DocID(DOCID_T),MetadataValue(Void*)>

MetadataValue(CString),DocIDList(Void*)>

尊重他人劳动,转载请注明来自[PDF转换成WROD_PDF阅读器下载:

本文【Lemur索引文件格式分析】网址:

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > PPT模板 > 动物植物

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1