计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究Word格式.docx
《计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究Word格式.docx》由会员分享,可在线阅读,更多相关《计算机专业文献翻译面向数字图书馆的海量信息管理体系结构研究Word格式.docx(14页珍藏版)》请在冰豆网上搜索。
(DepartmentofComputerScienceandTechnology,TsinghuaUniversity,Beijing100084,China)
XINGChun-Xiao+,ZENGChun,LIChao,ZHOULi-Zhu
Abstract
Thispaperinvestigatesthechallengingissuesandtechnologiesinmanagingverylargedigitalcontentsandcollections,andgivesanoverviewoftheworksandenablingtechnologiesintherelatedareas.Basedontheanalysisandcomparisonoftherelatedwork,anovelarchitectureofmassiveinformationmanagementfordigitallibraryisdesigned.Thekeycomponentsandcoreservicesaredescribedindetail.Finally,acasestudyTHADL(TsinghuaUniversityarchitecturedigitallibrary)thatcomplieswiththearchitecturalframeworkispresented.
Keywords:
digitallibrary;
architecture;
massiveinformationmanagement;
interoperability;
metadata
1Introducn
Intherecordedhitiostoryofhumanbeing,theprintedmaterialsusedtoplayadominantroleinthepreservationandpervasionofhumaninformationandknowledge.However,withtherapiddevelopmentoftechnologiesincomputer,communication,multimediaandstorage,thisroleisgivingawaytothedigitalresourcesinthenewera.Theexplosivegrowthofinformationindigitalformshasposedchallengesnotonlytotraditionalarchivesandtheirinformationproviders,butalsotoorganizationsinthegovernment,commercialandnon-profitsectors.AccordingtothelatestreportbyLymanandVarian,theworld’stotalyearlyproductionofprint,film,optical,andmagneticcontentwouldrequireroughly1.5billiongigabytesofstoragewhichisroughly250megabytesforeverypersonontheearth.Printeddocumentsofallkindscompriseonly0.003%ofthetotal.Magneticstorageisbyfarthelargestmediumforstoringinformationandisthemostrapidlygrowingsection,withashippedharddrivecapacitydoublingeveryyear.Thetypesofdigitalresourcesarediverse.Theyincludedigitaltexts,documents,scientificdata,images,animation,video,audioetc.Theapplicationsofthedigitalresourcesarequitebroad,includingDL(digitallibrary),movie/videocenter,otherpublicmedia(television,broadcast,newspaper,etc.),museum,andnationalorcooperativeinformationcenter.Atthesametimetheinformationhighway,whichisrepresentedbyInternet,hasbeenanimportanttoolofthepervasionofdigitalresources.Thegovernments,companies,groups,researchinstitutes,non-governmentorganizations,educationinstitutesallovertheworldputmassiveinformationontheWeb.
Technologychallengesandkeyissues
Thesemassivedigitalresourcespresentmanychallengingissuesindatamanagementtechnologyarea.Thefollowingaresomeexamples.
(1)Datamodel.
Traditionaldatamodeltheoriesareonlyapplicabletostructureddata,butnotforthemassivedigitalresourcesofvarioustypesandtheyaremostlysemi-structuredorunstructured.Thus,newdatamodelsaredemanded.
(2)Systemarchitecture.
Traditionaldatabasemanagementsystemsaredesignedforbusinessdataprocessingfeaturedbyconcurrent,short,andupdatetransactions.Thereforetransactionmanagementandconcurrentcontrolremainsasthecenterofsystemarchitecture.Thearchitectureisnotsuitableforthemanagementofdigitalresourcesasclassicaltransactionconceptisbecominglessimportantintheseresources.Weneedtopursuenovelanduniversalframeworksformassivedigitalresourcesmanagement.
(3)Massiveinformationstorage.
Thevolumeofdigitaldataresourcesiscountedbyterabytesorpetabytes.TraditionalstoragedevicesusingSCSIcannotworkforefficientstorage,onlinemigrationandpersistentarchiveofsuchmassivedigitalresources.Sotheresearchofmulti-levelstoragesystems,SAN(StorageAreaNetworks)andothertechnologyareinevitable.
(4)Queryprocessing.
Intraditionaldatabasesystems,queriesareexpressedinquerylanguagesuchasSQL,butinthequeryandsearchofmassivedigitalresources,manynewmechanismsshouldbeused,suchaskeywordsearch,full-textsearch,similarityquery,andcontent-basedmultimediaretrieval.Howtointegratethequerymethods(includingSQL,OQL,anddifferentXMLquerylanguages,e.g.,XQL,XML-QL,XML-GL)efficientlytobuildanefficientandflexiblequeryprocessingmethodhasnotbeensatisfactorilysolvedyet.
Tosolvetheproblemsmentionedabovewillremainasamajorgoaltoresearchersinthenextfewyears.Tofulfillthisend,wepresentanovelarchitectureformassiveinformationmanagementofdigitalresourcesinthispaper.Thisarchitectureisintendedtomeettherequirementsofmanagingdigitalresourcescharacterizedbydistributed,dynamic,massiveandheterogeneousproperties.
2OverviewoftheRelatedWork
TheIEEESTD610.12[2]definesarchitectureasthestructureofcomponents,theirrelat