HBase学习笔记HBase性能研究1Word文档格式.docx
《HBase学习笔记HBase性能研究1Word文档格式.docx》由会员分享,可在线阅读,更多相关《HBase学习笔记HBase性能研究1Word文档格式.docx(21页珍藏版)》请在冰豆网上搜索。
this.configuration=conf;
this.pool=getDefaultExecutor(conf);
this.finishSetup();
注意红色部分的代码。
这种构造方法实际上调用了HConnectionManager的getConnection函数,来获取了一个HConnection对象。
一般使用JavaAPI进行数据库操作的时候,都会创建一个类似的connection对象来维护一些数据库连接相关的信息(熟悉odbc,jdbc的话这一块就没有理解问题)。
getConnection函数的具体实现如下:
publicstaticHConnectiongetConnection(finalConfigurationconf)
HConnectionKeyconnectionKey=newHConnectionKey(conf);
synchronized(CONNECTION_INSTANCES){
HConnectionImplementationconnection=CONNECTION_INSTANCES.get(connectionKey);
if(connection==null){
connection=(HConnectionImplementation)createConnection(conf,true);
CONNECTION_INSTANCES.put(connectionKey,connection);
}elseif(connection.isClosed()){
HConnectionManager.deleteConnection(connectionKey,true);
connection.incCount();
returnconnection;
}
其中,CONNECTION_INSTANCES的类型是LinkedHashMap<
HConnectionKey,HConnectionImplementation>
。
同样注意红色部分的三行代码。
第一行,根据conf信息创建了一个HConnectionKey的对象;
第二行,去CONNECTION_INSTANCES中查找是否存在刚才创建的HConnectionKey;
第三行,如果不存在,那么调用createConnection来创建一个HConnection的对象,否则直接返回刚才从Map中查找得到的HConnection对象
不嫌麻烦,再看一下HConnectionKey的构造函数和重写的hashCode函数,代码分别如下:
HConnectionKey(Configurationconf){
Map<
String,String>
m=newHashMap<
();
if(conf!
=null){
for(Stringproperty:
CONNECTION_PROPERTIES){
Stringvalue=conf.get(property);
if(value!
m.put(property,value);
this.properties=Collections.unmodifiableMap(m);
try{
UserProviderprovider=UserProvider.instantiate(conf);
UsercurrentUser=provider.getCurrent();
if(currentUser!
username=currentUser.getName();
}catch(IOExceptionioe){
HConnectionManager.LOG.warn("
Errorobtainingcurrentuser,skippingusernameinHConnectionKey"
ioe);
publicinthashCode(){
finalintprime=31;
intresult=1;
if(username!
result=username.hashCode();
Stringvalue=properties.get(property);
result=prime*result+value.hashCode();
returnresult;
可以看到,hashCode函数被重写以后,其返回值实际上是username的hashCode函数的返回值,而username来自于currentuser,currentuser又来自于provider,provider是由conf创建的。
可以看出,只要有相同的conf,就能创建出相同的username,也就能保证HConnectionKey的hashCode函数被重写以后,能够在username相同时返回相同的值。
而CONNECTION_INSTANCES是一个LinkedHashMap,其get函数会调用HConnectionKey的hashCode函数来判断该对象是否已经存在。
因此,getConnection函数的本质就是根据conf信息返回connection对象,对每一个内容相同的conf,只会返回一个connection
(2)调用createConnection方法来显式地创建connection,再使用connection来创建HTable对象。
createConnection方法和Htable对应的构造函数分别如下:
publicstaticHConnectioncreateConnection(Configurationconf) throwsIOException{
returncreateConnection(conf,false,null,provider.getCurrent());
staticHConnectioncreateConnection(finalConfigurationconf,finalbooleanmanaged,finalExecutorServicepool,finalUseruser)
throwsIOException{
StringclassName=conf.get("
hbase.client.connection.impl"
HConnectionManager.HConnectionImplementation.class.getName());
Class<
?
>
clazz=null;
clazz=Class.forName(className);
}catch(ClassNotFoundExceptione){
thrownewIOException(e);
//DefaultHCM#HCIisnotaccessible;
makeitsobeforeinvoking.
Constructor<
constructor=
clazz.getDeclaredConstructor(Configuration.class,
boolean.class,ExecutorService.class,User.class);
constructor.setAccessible(true);
return(HConnection)constructor.newInstance(conf,managed,pool,user);
}catch(Exceptione){
publicHTable(TableNametableName,HConnectionconnection)throws IOException{
this.cleanupPoolOnClose=true;
this.cleanupConnectionOnClose=false;
this.connection=connection;
this.configuration=connection.getConfiguration();
this.pool=getDefaultExecutor(this.configuration);
可以看出,这样的话每次创建HTable对象,都需要创建一个新的HConnection对象,而不像方法(1)中那样共享一个HConnection对象。
那么,上述两种方法,在执行插入/删除/查找的时候,性能如何呢?
先从代码角度分析一下。
为了简便,先分析HTable在执行put(插入)操作时具体做的事情。
HTable的put函数如下:
publicvoidput(finalPutput) throwsInterruptedIOException,RetriesExhaustedWithDetailsException{
doPut(put);
if(autoFlush){
flushCommits();
privatevoiddoPut(Putput)throwsInterruptedIOException,RetriesExhaustedWithDetailsException{
if(ap.hasError()){
writeAsyncBuffer.add(put);
backgroundFlushCommits(true);
validatePut(put);
currentWriteBufferSize+=put.heapSize();
while(currentWriteBufferSize>
writeBufferSize){
backgroundFlushCommits(false);
privatevoidbackgroundFlushCommits(booleansynchronous)throws InterruptedIOException,RetriesExhaustedWithDetailsException{
do{
ap.submit(writeAsyncBuffer,true);
}while(synchronous&
&
!
writeAsyncBuffer.isEmpty());
if(synchronous){
ap.waitUntilDone();
if(ap.hasError()){
LOG.debug(tableName+"
:
Oneormoreoftheoperationshavefailed-"
+
"
waitingforalloperationinprogresstofinish(successfullyornot)"
while(!
writeAsyncBuffer.isEmpty()){
if(!
clearBufferOnFail){
//ifclearBufferOnFailedisnotset,we'
resupposedtokeepthefailedoperationinthe
//writebuffer.Thisisaquestionablefeaturekepthereforbackwardcompatibility
writeAsyncBuffer.addAll(ap.getFailedOperations());
RetriesExhaustedWithDetailsExceptione=ap.getErrors();
ap.clearErrors();
throwe;
}finally{
currentWriteBufferSize=0;
for(Rowmut:
writeAsyncBuffer){
if(mutinstanceofMutation){
currentWriteBufferSize+=((Mutation)mut).heapSize();
如红色部分所表示,调用顺序是put->
doPut->
backgroundFlushCommits->
ap.submit,其中ap是类AsyncProcess的对象。
因此追踪到AsyncProcess类,其代码如下:
publicvoidsubmit(List<
extendsRow>
rows,booleanatLeastOne)throwsInterruptedIOException{
submitLowPriority(rows,atLeastOne,false);
publicvoidsubmitLowPriority(List<
rows,booleanatLeastOne,booleanisLowPripority)throwsInterruptedIOException{
if(rows.isEmpty()){
//ThislookslikewearekeyingbyregionbutHRegionLocationhasacomparatorthatcompares
//ontheserverportiononly(hostname+port)sothisMapcollectsregionsbyserver.
HRegionLocation,MultiAction<
Row>
actionsByServer= newHashMap<
List<
Action<
retainedActions=newArrayList<
(rows.size());
longcurrentTaskCnt=tasksDone.get();
booleanalreadyLooped=false;
NonceGeneratorng=this.hConnection.getNonceGenerator();
if(alreadyLooped){
//if,forwhateverreason,welooped,wewanttobesurethatsomethinghaschanged.
waitForNextTaskDone(currentTaskCnt);
currentTaskCnt=tasksDone.get();
}else{
alreadyLooped=true;
//Waituntilthereisatleastoneslotforanewtask.
waitForMaximumCurrentTasks(maxTotalConcurrentTasks-1);
//Rememberthepreviousdecisionsaboutregionsorregionserversweputinthe
//finalmulti.
Long,Boolean>
regionIncluded=newHashMap<
ServerName,Boolean>
serverIncluded=newHashMap<
intposInList=-1;
Iterator<
it=rows.iterator();
while(it.hasNext()){
Rowr=it.next();
HRegionLocationloc=findDestLocation(r,posInList);
if(loc==null){//locisnullifthereisanerrorsuchasmetanotavailable.
it.remove();
}elseif(canTakeOperation(loc,regionIncluded,serverIncluded)){
Action<
action=newAction<
(r,++posInList);
setNonce(ng,r,action);
retainedActions.add(action);
addAction(loc,action,actionsByServer,ng);
}while(retainedActions.isEmpty()&
atLeastOne&
hasError());
HConnectionManager.ServerErrorTrackererrorsByServer=createServerErrorTracker();
sendMultiAction(retainedActions,actionsByServer,1,errorsByServer,isLowPripority);
privateHRegionLocationfindDestLocation(Rowrow,intposInList){
if(row==null)thrownewIllegalArgumentException("
#"
+id+"
rowcannotbenull"
HRegionLocationloc=null;
IOExceptionlocationException=null;
loc=hConnection.locateRegion(this.tableName,row.getRow());
if(loc==null){
locationException=newIOException("
nolocationfound,abortingsubmitfor"
tableName="
+tableName+
rowkey="
+Arrays.toString(row.getRow()));
}catch(IOExceptione){
locationException=e;
if(locationException!
//TherearemultipleretriesinlocateRegionalready.Noneedtoaddnew.
//Wecan'
tcontinuewiththisrow,henceit'
sthelastretry.
manageError(posInList,row,false,locationException,null);
returnnull;
returnloc;
这里代码的主要实现机制是异步调用,也就是说,并非每一次put操作都是直接往HBase里面写数据的,而是等到缓存区域内的数据多到一定程度(默认设置是2M),再进行一次写操作。
当然这次操作在Server端应当还是要排队执行的,具体执行机制这里不作展开