HBase学习笔记HBase性能研究1.docx
《HBase学习笔记HBase性能研究1.docx》由会员分享,可在线阅读,更多相关《HBase学习笔记HBase性能研究1.docx(21页珍藏版)》请在冰豆网上搜索。
HBase学习笔记HBase性能研究1
HBase学习笔记-HBase性能研究
(1)
使用JavaAPI与HBase集群交互时,需要构建HTable对象,使用该对象提供的方法来进行插入/删除/查询等操作。
要创建HTable对象,首先要创建一个带有HBase集群信息的配置对象Configurationconf,其一般创建方法如下:
Configurationconf=HBaseConfiguration.create();
//设置HBase集群的IP和端口
conf.set("hbase.zookeeper.quorum","XX.XXX.X.XX");
conf.set("hbase.zookeeper.property.clientPort","2181");
在拥有了conf之后,可以通过HTable提供的如下两种构造方法来创建HTable对象:
(1)直接利用conf来创建HTable对象,对应的构造函数如下:
复制代码
publicHTable(Configurationconf,finalTableNametableName)
throwsIOException{
this.tableName=tableName;
this.cleanupPoolOnClose=this.cleanupConnectionOnClose=true;
if(conf==null){
this.connection=null;
return;
}
this.connection=HConnectionManager.getConnection(conf);
this.configuration=conf;
this.pool=getDefaultExecutor(conf);
this.finishSetup();
}
复制代码
注意红色部分的代码。
这种构造方法实际上调用了HConnectionManager的getConnection函数,来获取了一个HConnection对象。
一般使用JavaAPI进行数据库操作的时候,都会创建一个类似的connection对象来维护一些数据库连接相关的信息(熟悉odbc,jdbc的话这一块就没有理解问题)。
getConnection函数的具体实现如下:
复制代码
publicstaticHConnectiongetConnection(finalConfigurationconf)
throwsIOException{
HConnectionKeyconnectionKey=newHConnectionKey(conf);
synchronized(CONNECTION_INSTANCES){
HConnectionImplementationconnection=CONNECTION_INSTANCES.get(connectionKey);
if(connection==null){
connection=(HConnectionImplementation)createConnection(conf,true);
CONNECTION_INSTANCES.put(connectionKey,connection);
}elseif(connection.isClosed()){
HConnectionManager.deleteConnection(connectionKey,true);
connection=(HConnectionImplementation)createConnection(conf,true);
CONNECTION_INSTANCES.put(connectionKey,connection);
}
connection.incCount();
returnconnection;
}
}
复制代码
其中,CONNECTION_INSTANCES的类型是LinkedHashMap。
同样注意红色部分的三行代码。
第一行,根据conf信息创建了一个HConnectionKey的对象;第二行,去CONNECTION_INSTANCES中查找是否存在刚才创建的HConnectionKey;第三行,如果不存在,那么调用createConnection来创建一个HConnection的对象,否则直接返回刚才从Map中查找得到的HConnection对象
不嫌麻烦,再看一下HConnectionKey的构造函数和重写的hashCode函数,代码分别如下:
复制代码
HConnectionKey(Configurationconf){
Mapm=newHashMap();
if(conf!
=null){
for(Stringproperty:
CONNECTION_PROPERTIES){
Stringvalue=conf.get(property);
if(value!
=null){
m.put(property,value);
}
}
}
this.properties=Collections.unmodifiableMap(m);
try{
UserProviderprovider=UserProvider.instantiate(conf);
UsercurrentUser=provider.getCurrent();
if(currentUser!
=null){
username=currentUser.getName();
}
}catch(IOExceptionioe){
HConnectionManager.LOG.warn("Errorobtainingcurrentuser,skippingusernameinHConnectionKey",ioe);
}
}
复制代码
复制代码
publicinthashCode(){
finalintprime=31;
intresult=1;
if(username!
=null){
result=username.hashCode();
}
for(Stringproperty:
CONNECTION_PROPERTIES){
Stringvalue=properties.get(property);
if(value!
=null){
result=prime*result+value.hashCode();
}
}
returnresult;
}
复制代码
可以看到,hashCode函数被重写以后,其返回值实际上是username的hashCode函数的返回值,而username来自于currentuser,currentuser又来自于provider,provider是由conf创建的。
可以看出,只要有相同的conf,就能创建出相同的username,也就能保证HConnectionKey的hashCode函数被重写以后,能够在username相同时返回相同的值。
而CONNECTION_INSTANCES是一个LinkedHashMap,其get函数会调用HConnectionKey的hashCode函数来判断该对象是否已经存在。
因此,getConnection函数的本质就是根据conf信息返回connection对象,对每一个内容相同的conf,只会返回一个connection
(2)调用createConnection方法来显式地创建connection,再使用connection来创建HTable对象。
createConnection方法和Htable对应的构造函数分别如下:
复制代码
publicstaticHConnectioncreateConnection(Configurationconf) throwsIOException{
UserProviderprovider=UserProvider.instantiate(conf);
returncreateConnection(conf,false,null,provider.getCurrent());
}
staticHConnectioncreateConnection(finalConfigurationconf,finalbooleanmanaged,finalExecutorServicepool,finalUseruser)
throwsIOException{
StringclassName=conf.get("hbase.client.connection.impl",HConnectionManager.HConnectionImplementation.class.getName());
Class
>clazz=null;
try{
clazz=Class.forName(className);
}catch(ClassNotFoundExceptione){
thrownewIOException(e);
}
try{
//DefaultHCM#HCIisnotaccessible;makeitsobeforeinvoking.
Constructor
>constructor=
clazz.getDeclaredConstructor(Configuration.class,
boolean.class,ExecutorService.class,User.class);
constructor.setAccessible(true);
return(HConnection)constructor.newInstance(conf,managed,pool,user);
}catch(Exceptione){
thrownewIOException(e);
}
}
复制代码
复制代码
publicHTable(TableNametableName,HConnectionconnection)throws IOException{
this.tableName=tableName;
this.cleanupPoolOnClose=true;
this.cleanupConnectionOnClose=false;
this.connection=connection;
this.configuration=connection.getConfiguration();
this.pool=getDefaultExecutor(this.configuration);
this.finishSetup();
}
复制代码
可以看出,这样的话每次创建HTable对象,都需要创建一个新的HConnection对象,而不像方法(1)中那样共享一个HConnection对象。
那么,上述两种方法,在执行插入/删除/查找的时候,性能如何呢?
先从代码角度分析一下。
为了简便,先分析HTable在执行put(插入)操作时具体做的事情。
HTable的put函数如下:
复制代码
publicvoidput(finalPutput) throwsInterruptedIOException,RetriesExhaustedWithDetailsException{
doPut(put);
if(autoFlush){
flushCommits();
}
}
privatevoiddoPut(Putput)throwsInterruptedIOException,RetriesExhaustedWithDetailsException{
if(ap.hasError()){
writeAsyncBuffer.add(put);
backgroundFlushCommits(true);
}
validatePut(put);
currentWriteBufferSize+=put.heapSize();
writeAsyncBuffer.add(put);
while(currentWriteBufferSize>writeBufferSize){
backgroundFlushCommits(false);
}
}
privatevoidbackgroundFlushCommits(booleansynchronous)throws InterruptedIOException,RetriesExhaustedWithDetailsException{
try{
do{
ap.submit(writeAsyncBuffer,true);
}while(synchronous&&!
writeAsyncBuffer.isEmpty());
if(synchronous){
ap.waitUntilDone();
}
if(ap.hasError()){
LOG.debug(tableName+":
Oneormoreoftheoperationshavefailed-"+
"waitingforalloperationinprogresstofinish(successfullyornot)");
while(!
writeAsyncBuffer.isEmpty()){
ap.submit(writeAsyncBuffer,true);
}
ap.waitUntilDone();
if(!
clearBufferOnFail){
//ifclearBufferOnFailedisnotset,we'resupposedtokeepthefailedoperationinthe
//writebuffer.Thisisaquestionablefeaturekepthereforbackwardcompatibility
writeAsyncBuffer.addAll(ap.getFailedOperations());
}
RetriesExhaustedWithDetailsExceptione=ap.getErrors();
ap.clearErrors();
throwe;
}
}finally{
currentWriteBufferSize=0;
for(Rowmut:
writeAsyncBuffer){
if(mutinstanceofMutation){
currentWriteBufferSize+=((Mutation)mut).heapSize();
}
}
}
}
复制代码
如红色部分所表示,调用顺序是put->doPut->backgroundFlushCommits->ap.submit,其中ap是类AsyncProcess的对象。
因此追踪到AsyncProcess类,其代码如下:
复制代码
publicvoidsubmit(List
extendsRow>rows,booleanatLeastOne)throwsInterruptedIOException{
submitLowPriority(rows,atLeastOne,false);
}
publicvoidsubmitLowPriority(List
extendsRow>rows,booleanatLeastOne,booleanisLowPripority)throwsInterruptedIOException{
if(rows.isEmpty()){
return;
}
//ThislookslikewearekeyingbyregionbutHRegionLocationhasacomparatorthatcompares
//ontheserverportiononly(hostname+port)sothisMapcollectsregionsbyserver.
Map>actionsByServer= newHashMap>();
List>retainedActions=newArrayList>(rows.size());
longcurrentTaskCnt=tasksDone.get();
booleanalreadyLooped=false;
NonceGeneratorng=this.hConnection.getNonceGenerator();
do{
if(alreadyLooped){
//if,forwhateverreason,welooped,wewanttobesurethatsomethinghaschanged.
waitForNextTaskDone(currentTaskCnt);
currentTaskCnt=tasksDone.get();
}else{
alreadyLooped=true;
}
//Waituntilthereisatleastoneslotforanewtask.
waitForMaximumCurrentTasks(maxTotalConcurrentTasks-1);
//Rememberthepreviousdecisionsaboutregionsorregionserversweputinthe
//finalmulti.
MapregionIncluded=newHashMap();
MapserverIncluded=newHashMap();
intposInList=-1;
Iterator
extendsRow>it=rows.iterator();
while(it.hasNext()){
Rowr=it.next();
HRegionLocationloc=findDestLocation(r,posInList);
if(loc==null){//locisnullifthereisanerrorsuchasmetanotavailable.
it.remove();
}elseif(canTakeOperation(loc,regionIncluded,serverIncluded)){
Actionaction=newAction(r,++posInList);
setNonce(ng,r,action);
retainedActions.add(action);
addAction(loc,action,actionsByServer,ng);
it.remove();
}
}
}while(retainedActions.isEmpty()&&atLeastOne&&!
hasError());
HConnectionManager.ServerErrorTrackererrorsByServer=createServerErrorTracker();
sendMultiAction(retainedActions,actionsByServer,1,errorsByServer,isLowPripority);
}
privateHRegionLocationfindDestLocation(Rowrow,intposInList){
if(row==null)thrownewIllegalArgumentException("#"+id+",rowcannotbenull");
HRegionLocationloc=null;
IOExceptionlocationException=null;
try{
loc=hConnection.locateRegion(this.tableName,row.getRow());
if(loc==null){
locationException=newIOException("#"+id+",nolocationfound,abortingsubmitfor"+
"tableName="+tableName+
"rowkey="+Arrays.toString(row.getRow()));
}
}catch(IOExceptione){
locationException=e;
}
if(locationException!
=null){
//TherearemultipleretriesinlocateRegionalready.Noneedtoaddnew.
//Wecan'tcontinuewiththisrow,henceit'sthelastretry.
manageError(posInList,row,false,locationException,null);
returnnull;
}
returnloc;
}
复制代码
这里代码的主要实现机制是异步调用,也就是说,并非每一次put操作都是直接往HBase里面写数据的,而是等到缓存区域内的数据多到一定程度(默认设置是2M),再进行一次写操作。
当然这次操作在Server端应当还是要排队执行的,具体执行机制这里不作展开