收藏

下载资源加入VIP,免费下载

HBase学习笔记HBase性能研究1.docx

上传人：b****7 文档编号：9242642 上传时间：2023-02-03 格式：DOCX 页数：21 大小：22.10KB

下载相关举报

HBase学习笔记HBase性能研究1.docx_第1页

第1页 / 共21页

HBase学习笔记HBase性能研究1.docx_第2页

第2页 / 共21页

HBase学习笔记HBase性能研究1.docx_第3页

第3页 / 共21页

HBase学习笔记HBase性能研究1.docx_第4页

第4页 / 共21页

HBase学习笔记HBase性能研究1.docx_第5页

第5页 / 共21页

点击查看更多>>

资源描述

HBase学习笔记HBase性能研究1.docx

《HBase学习笔记HBase性能研究1.docx》由会员分享，可在线阅读，更多相关《HBase学习笔记HBase性能研究1.docx（21页珍藏版）》请在冰豆网上搜索。

HBase学习笔记HBase性能研究1.docx

HBase学习笔记HBase性能研究1

HBase学习笔记-HBase性能研究

（1）

使用JavaAPI与HBase集群交互时，需要构建HTable对象，使用该对象提供的方法来进行插入/删除/查询等操作。

要创建HTable对象，首先要创建一个带有HBase集群信息的配置对象Configurationconf，其一般创建方法如下：

Configurationconf=HBaseConfiguration.create（）;

//设置HBase集群的IP和端口

conf.set（"hbase.zookeeper.quorum","XX.XXX.X.XX"）;

conf.set（"hbase.zookeeper.property.clientPort","2181"）;

在拥有了conf之后，可以通过HTable提供的如下两种构造方法来创建HTable对象：

（1）直接利用conf来创建HTable对象，对应的构造函数如下：

复制代码

publicHTable（Configurationconf,finalTableNametableName）

throwsIOException{

this.tableName=tableName;

this.cleanupPoolOnClose=this.cleanupConnectionOnClose=true;

if（conf==null）{

this.connection=null;

return;

}

this.connection=HConnectionManager.getConnection（conf）;

this.configuration=conf;

this.pool=getDefaultExecutor（conf）;

this.finishSetup（）;

}

复制代码

注意红色部分的代码。

这种构造方法实际上调用了HConnectionManager的getConnection函数，来获取了一个HConnection对象。

一般使用JavaAPI进行数据库操作的时候，都会创建一个类似的connection对象来维护一些数据库连接相关的信息（熟悉odbc，jdbc的话这一块就没有理解问题）。

getConnection函数的具体实现如下：

复制代码

publicstaticHConnectiongetConnection（finalConfigurationconf）

throwsIOException{

HConnectionKeyconnectionKey=newHConnectionKey（conf）;

synchronized（CONNECTION_INSTANCES）{

HConnectionImplementationconnection=CONNECTION_INSTANCES.get（connectionKey）;

if（connection==null）{

connection=（HConnectionImplementation）createConnection（conf,true）;

CONNECTION_INSTANCES.put（connectionKey,connection）;

}elseif（connection.isClosed（））{

HConnectionManager.deleteConnection（connectionKey,true）;

connection=（HConnectionImplementation）createConnection（conf,true）;

CONNECTION_INSTANCES.put（connectionKey,connection）;

}

connection.incCount（）;

returnconnection;

}

}

复制代码

其中，CONNECTION_INSTANCES的类型是LinkedHashMap。

同样注意红色部分的三行代码。

第一行，根据conf信息创建了一个HConnectionKey的对象；第二行，去CONNECTION_INSTANCES中查找是否存在刚才创建的HConnectionKey；第三行，如果不存在，那么调用createConnection来创建一个HConnection的对象，否则直接返回刚才从Map中查找得到的HConnection对象

不嫌麻烦，再看一下HConnectionKey的构造函数和重写的hashCode函数，代码分别如下：

复制代码

HConnectionKey（Configurationconf）{

Mapm=newHashMap（）;

if（conf!

=null）{

for（Stringproperty:

CONNECTION_PROPERTIES）{

Stringvalue=conf.get（property）;

if（value!

=null）{

m.put（property,value）;

}

}

}

this.properties=Collections.unmodifiableMap（m）;

try{

UserProviderprovider=UserProvider.instantiate（conf）;

UsercurrentUser=provider.getCurrent（）;

if（currentUser!

=null）{

username=currentUser.getName（）;

}

}catch（IOExceptionioe）{

HConnectionManager.LOG.warn（"Errorobtainingcurrentuser,skippingusernameinHConnectionKey",ioe）;

}

}

复制代码

复制代码

publicinthashCode（）{

finalintprime=31;

intresult=1;

if（username!

=null）{

result=username.hashCode（）;

}

for（Stringproperty:

CONNECTION_PROPERTIES）{

Stringvalue=properties.get（property）;

if（value!

=null）{

result=prime*result+value.hashCode（）;

}

}

returnresult;

}

复制代码

可以看到，hashCode函数被重写以后，其返回值实际上是username的hashCode函数的返回值，而username来自于currentuser，currentuser又来自于provider，provider是由conf创建的。

可以看出，只要有相同的conf，就能创建出相同的username，也就能保证HConnectionKey的hashCode函数被重写以后，能够在username相同时返回相同的值。

而CONNECTION_INSTANCES是一个LinkedHashMap，其get函数会调用HConnectionKey的hashCode函数来判断该对象是否已经存在。

因此，getConnection函数的本质就是根据conf信息返回connection对象，对每一个内容相同的conf，只会返回一个connection

（２）调用createConnection方法来显式地创建connection，再使用connection来创建HTable对象。

createConnection方法和Htable对应的构造函数分别如下：

复制代码

publicstaticHConnectioncreateConnection（Configurationconf）　throwsIOException{

UserProviderprovider=UserProvider.instantiate（conf）;

returncreateConnection（conf,false,null,provider.getCurrent（））;

}

staticHConnectioncreateConnection（finalConfigurationconf,finalbooleanmanaged,finalExecutorServicepool,finalUseruser）

throwsIOException{

StringclassName=conf.get（"hbase.client.connection.impl",HConnectionManager.HConnectionImplementation.class.getName（））;

Class

>clazz=null;

try{

clazz=Class.forName（className）;

}catch（ClassNotFoundExceptione）{

thrownewIOException（e）;

}

try{

//DefaultHCM#HCIisnotaccessible;makeitsobeforeinvoking.

Constructor

>constructor=

clazz.getDeclaredConstructor（Configuration.class,

boolean.class,ExecutorService.class,User.class）;

constructor.setAccessible（true）;

return（HConnection）constructor.newInstance（conf,managed,pool,user）;

}catch（Exceptione）{

thrownewIOException（e）;

}

}

复制代码

复制代码

publicHTable（TableNametableName,HConnectionconnection）throws　IOException{

this.tableName=tableName;

this.cleanupPoolOnClose=true;

this.cleanupConnectionOnClose=false;

this.connection=connection;

this.configuration=connection.getConfiguration（）;

this.pool=getDefaultExecutor（this.configuration）;

this.finishSetup（）;

}

复制代码

可以看出，这样的话每次创建HTable对象，都需要创建一个新的HConnection对象，而不像方法（１）中那样共享一个HConnection对象。

那么，上述两种方法，在执行插入/删除/查找的时候，性能如何呢？

先从代码角度分析一下。

为了简便，先分析HTable在执行put（插入）操作时具体做的事情。

HTable的put函数如下：

复制代码

publicvoidput（finalPutput）　throwsInterruptedIOException,RetriesExhaustedWithDetailsException{

doPut（put）;

if（autoFlush）{

flushCommits（）;

}

}

privatevoiddoPut（Putput）throwsInterruptedIOException,RetriesExhaustedWithDetailsException{

if（ap.hasError（））{

writeAsyncBuffer.add（put）;

backgroundFlushCommits（true）;

}

validatePut（put）;

currentWriteBufferSize+=put.heapSize（）;

writeAsyncBuffer.add（put）;

while（currentWriteBufferSize>writeBufferSize）{

backgroundFlushCommits（false）;

}

}

privatevoidbackgroundFlushCommits（booleansynchronous）throws　InterruptedIOException,RetriesExhaustedWithDetailsException{

try{

do{

ap.submit（writeAsyncBuffer,true）;

}while（synchronous&&!

writeAsyncBuffer.isEmpty（））;

if（synchronous）{

ap.waitUntilDone（）;

}

if（ap.hasError（））{

LOG.debug（tableName+":

Oneormoreoftheoperationshavefailed-"+

"waitingforalloperationinprogresstofinish（successfullyornot）"）;

while（!

writeAsyncBuffer.isEmpty（））{

ap.submit（writeAsyncBuffer,true）;

}

ap.waitUntilDone（）;

if（!

clearBufferOnFail）{

//ifclearBufferOnFailedisnotset,we'resupposedtokeepthefailedoperationinthe

//writebuffer.Thisisaquestionablefeaturekepthereforbackwardcompatibility

writeAsyncBuffer.addAll（ap.getFailedOperations（））;

}

RetriesExhaustedWithDetailsExceptione=ap.getErrors（）;

ap.clearErrors（）;

throwe;

}

}finally{

currentWriteBufferSize=0;

for（Rowmut:

writeAsyncBuffer）{

if（mutinstanceofMutation）{

currentWriteBufferSize+=（（Mutation）mut）.heapSize（）;

}

}

}

}

复制代码

如红色部分所表示，调用顺序是put->doPut->backgroundFlushCommits->ap.submit，其中ap是类AsyncProcess的对象。

因此追踪到AsyncProcess类，其代码如下：

复制代码

publicvoidsubmit（List

extendsRow>rows,booleanatLeastOne）throwsInterruptedIOException{

submitLowPriority（rows,atLeastOne,false）;

}

publicvoidsubmitLowPriority（List

extendsRow>rows,booleanatLeastOne,booleanisLowPripority）throwsInterruptedIOException{

if（rows.isEmpty（））{

return;

}

//ThislookslikewearekeyingbyregionbutHRegionLocationhasacomparatorthatcompares

//ontheserverportiononly（hostname+port）sothisMapcollectsregionsbyserver.

Map>actionsByServer=　newHashMap>（）;

List>retainedActions=newArrayList>（rows.size（））;

longcurrentTaskCnt=tasksDone.get（）;

booleanalreadyLooped=false;

NonceGeneratorng=this.hConnection.getNonceGenerator（）;

do{

if（alreadyLooped）{

//if,forwhateverreason,welooped,wewanttobesurethatsomethinghaschanged.

waitForNextTaskDone（currentTaskCnt）;

currentTaskCnt=tasksDone.get（）;

}else{

alreadyLooped=true;

}

//Waituntilthereisatleastoneslotforanewtask.

waitForMaximumCurrentTasks（maxTotalConcurrentTasks-1）;

//Rememberthepreviousdecisionsaboutregionsorregionserversweputinthe

//finalmulti.

MapregionIncluded=newHashMap（）;

MapserverIncluded=newHashMap（）;

intposInList=-1;

Iterator

extendsRow>it=rows.iterator（）;

while（it.hasNext（））{

Rowr=it.next（）;

HRegionLocationloc=findDestLocation（r,posInList）;

if（loc==null）{//locisnullifthereisanerrorsuchasmetanotavailable.

it.remove（）;

}elseif（canTakeOperation（loc,regionIncluded,serverIncluded））{

Actionaction=newAction（r,++posInList）;

setNonce（ng,r,action）;

retainedActions.add（action）;

addAction（loc,action,actionsByServer,ng）;

it.remove（）;

}

}

}while（retainedActions.isEmpty（）&&atLeastOne&&!

hasError（））;

HConnectionManager.ServerErrorTrackererrorsByServer=createServerErrorTracker（）;

sendMultiAction（retainedActions,actionsByServer,1,errorsByServer,isLowPripority）;

}

privateHRegionLocationfindDestLocation（Rowrow,intposInList）{

if（row==null）thrownewIllegalArgumentException（"#"+id+",rowcannotbenull"）;

HRegionLocationloc=null;

IOExceptionlocationException=null;

try{

loc=hConnection.locateRegion（this.tableName,row.getRow（））;

if（loc==null）{

locationException=newIOException（"#"+id+",nolocationfound,abortingsubmitfor"+

"tableName="+tableName+

"rowkey="+Arrays.toString（row.getRow（）））;

}

}catch（IOExceptione）{

locationException=e;

}

if（locationException!

=null）{

//TherearemultipleretriesinlocateRegionalready.Noneedtoaddnew.

//Wecan'tcontinuewiththisrow,henceit'sthelastretry.

manageError（posInList,row,false,locationException,null）;

returnnull;

}

returnloc;

}

复制代码

这里代码的主要实现机制是异步调用，也就是说，并非每一次put操作都是直接往HBase里面写数据的，而是等到缓存区域内的数据多到一定程度（默认设置是２Ｍ），再进行一次写操作。

当然这次操作在Server端应当还是要排队执行的，具体执行机制这里不作展开

展开阅读全文

相关资源

猜你喜欢

相关搜索

当前位置：首页 > 总结汇报 > 其它

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1