Lemur索引文件格式分析.docx
《Lemur索引文件格式分析.docx》由会员分享,可在线阅读,更多相关《Lemur索引文件格式分析.docx(4页珍藏版)》请在冰豆网上搜索。
![Lemur索引文件格式分析.docx](https://file1.bdocx.com/fileroot1/2022-12/7/b1aba596-f1b8-4334-90d2-8bc80fdc4f86/b1aba596-f1b8-4334-90d2-8bc80fdc4f861.gif)
Lemur索引文件格式分析
Lemur索引文件格式分析WORD文档使用说明:
Lemur索引文件格式分析来源于PDFWORD
本WOED文件是采用在线转换功能下载而来,因此在排版和显示效果方面可能不能满足您的应用需求。
如果需要查看原版WOED文件,请访问这里
Lemur索引文件格式分析文件原版地址:
Lemur索引文件格式分析|PDF转换成WROD_PDF阅读器下载
IndriBuildIndexC索引文件格式
IndriBuildIndexC索引文件格式....................................................................................1I.Repository.................................................................................................................1A.文件列表及描述.......................................................................................................1II.DiskIndex..................................................................................................................1A.文件列表及描述.......................................................................................................1B.文件格式(参见IndexWriter:
:
write()).................................................................2III.CompressedCollection..........................................................................................3A.文件列表及描述.......................................................................................................4B.文件格式(参见CompressedCollection:
:
addDocument())..................................4
I.RepositoryA.文件列表及描述
FileNameIndex(Directory)collection(Directory)deletedmanifestDescriptionContainszeroormoreDiskIndexinstances,innumberedsubdirectories.ContainsaCompressedCollectioninstance.Bitmapcontainingalistofalldeleteddocuments.XMLfilestoringconfigurationinformationaboutthecollection,includingindexcounts,stemmerandstopwordinformation.
II.DiskIndexA.文件列表及描述
FileNamedirectFiledocumentLengthsdocumentStatisticsfieldsFilefrequentIDfrequentStringfrequentTermsDescriptionNumericrepresentationofeachdocumentinthecollection,usefulforqueryexpansionLengthofeachdocumentinwords,4bytesperdocumentOffsetofeachdocumentinthedirectFile,documentlengthinthedirectFile,documenttermlength,numberofuniquetermsineachdocumentInvertedextentlistforfieldsBulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforfrequenttermsListoftermID,termStringpairs,notstoredinatree,usedatindexmergetime
infrequentIDinfrequentStringinvertedFilemanifest
BulkTreestoringthemappingfromtermIDtotermStringandtermstatisticsforinfrequenttermsBulkTreestoringthemappingfromtermStringtotermIDandtermstatisticsforinfrequenttermsInvertedlistsforeachterminthecollectionXMLfilestoringimportantcollectionstatisticsandconfigurationinformation
文件格式(IndexWriter:
:
write()())B.文件格式(参见IndexWriter:
:
write())
invertedFileinvertedFileTermStatistic<{TermInvDataOffset}TermStatistic,TermInvList>*RVLDataLength(UINT32),[TermString(CString),TermData]TermDocCount(UInt),TermMaxDocLength(UInt),
TermDataTermTotalCount(UINT64),TermMinDocLength(UInt),TermFieldStatisticsTermFieldStatisticsTermInvListTopDocsBatchDataDocEntry
FiledCount
ControlByte(Byte),?
*TopDocsCount(Int),TopDocsCountNextBatchDocID(DOCID_T),RVLDataLength(int),[*]DocID(Int),PositionCount(Int),PositionCount
frequentTerms,frequentID,frequentString,infrequentID,frequentStringfrequentTermsfrequentIDfrequentStringDistTermData1DiskTermData2DistTermData3*MapMapTermData,TermID(TERMID_T),TermString(CString),TermFilePointerTermData,TermString(CString),TermFilePointerTermData,TermID(TERMID_T),TermFilePointer
TermFilePointerinfrequentIDinfrequentStringfieldsFilefieldsFileBatchData
TermInvDataOffset(UINT64),TermInvDataLength(UINT64)MapMap
{FieldDataOffset}ControlByte(UINT8),
*
NextBatchDocID(DOCID_T),RVLDataLength(Int),[*]ExtentLength(Int),
ExtentCount(Int),ExtentParent(Int)?
ExtentNumber(INT64)?
>ExtentCountdirectFile,documentLengths,documentStatisticsdirectFile*TermCount(Int),FieldCount(Int),
DocDirectDataFieldCountFiledExtentFieldNumber(UINT64)documentLengthsdocumentStatistics
TermCount,
FiledID(Int),
FilesdParentOrdinal(Int),
FieldBegin(Int),
FieldEnd(Int),
**
DocumentDataDocDirectDataOffset(UINT64),DocDirectDataLength(Int),DocIndexedLength(Int),DocTotalLength(Int),DocUniqTermCount(Int)manifestmanifestIndexType(CString),IndexBuildDate(CString),IndriDistribution(CString),CorpusStatistics,FiledCountCorpusStatisticsTotalDocCount(UINT64),TotalTermCount(UINT64),UniqTermCount(UINT64),DocBase(DOCID_T),FrequentTermCount(Int),MaxDocument(DOCID_T)FieldDescriptionIsNumeric(Bool),IsOrdinal(Bool),IsParental(Bool),FieldName(CString),ParseName(String)?
TotalDocCount(UInt),TotalTermCount(UINT64),FieldDataOffset(UINT64)
III.
CompressedCollectionCompressedCollection
A.文件列表及描述
FileNamelookupmanifeststorageDescriptionKeyfile(B-Tree)thatstoresthemappingbetweendocumentedandoffsetintostorageXMLfilestoringconfigurationinformationforthisCompressedCollectionStorescompressedversionofeachdocument(hereusingzlibcompressionlibrary)inthecollection,alongwithbyteoffsetsforeachwordineachdocument,andvariousdocumentmetadataaddedatindextime.Keyfile(B-Tree)thatstoresthemappingbetweenadocumentIDandametadatastringKeyfile(B-Tree)thatstoresthemappingbetweenametadatastringandoneormoredocumentIDs
forwardLookupnreverseLookupn
文件格式(CompressedCollection:
:
addDocument()B.文件格式(参见CompressedCollection:
:
addDocument())
lookuplookupMap
manifestmanifestForwardMetadataList,ReverseMetadataListMetadataName(String)MetadataName(String)
ForwardMetadataListReverseMetadataListstoragestorage
<{StorageDocOffset}StorageDocData>*PairCount,PairCount,
StorageDocDataPairCount(UINT32)KeyValuePairMetadataPair
MetadataPair|TermPositionPair|TextPair|ContentPair|ContentLengthPairMetadataKey(CString),MetadataValue(Void*)TermPositionFlag(CString),*
TermPositionPairTextPair
TextFlag(CString),TextData(Void*)
ContentPair
ContentFlag(CString),ContentOffset(Int)ContentLengthFlag(CString),ContentLength(Int)
ContentLengthPair
forwardLookupn,reverseLookupnforwardLookupnreverseLookupnMapMap
尊重他人劳动,转载请注明来自[PDF转换成WROD_PDF阅读器下载:
本文【Lemur索引文件格式分析】网址: