Kafkaflume学习文档.docx

资源描述

Kafkaflume学习文档.docx

《Kafkaflume学习文档.docx》由会员分享，可在线阅读，更多相关《Kafkaflume学习文档.docx（10页珍藏版）》请在冰豆网上搜索。

Kafkaflume学习文档.docx

Kafkaflume学习文档

Kafka学习文档

一、Kafka简介

Kafka是一个分布式的消息队列系统（MessageQueue）。

官网：

https:

//kafka.apache.org/

kafka集群有多个Broker服务器组成，每个类型的消息被定义为topic。

同一topic内部的消息按照一定的key和算法被分区（partition）存储在不同的Broker上。

消息生产者producer和消费者consumer可以在多个Broker上生产/消费topic

概念理解：

ØTopicsandLogs：

Topic即为每条发布到Kafka集群的消息都有一个类别，topic在Kafka中可以由多个消费者订阅、消费。

每个topic包含一个或多个partition（分区），partition数量可以在创建topic时指定，每个分区日志中记录了该分区的数据以及索引信息。

如下图：

Kafka只保证一个分区内的消息有序，不能保证一个主题的不同分区之间的消息有序。

如果你想要保证所有的消息都绝对有序可以只为一个主题分配一个分区。

分区会给每个消息记录分配一个顺序ID号（偏移量），能够唯一地标识该分区中的每个记录。

Kafka集群保留所有发布的记录，不管这个记录有没有被消费过，Kafka提供相应策略通过配置从而对旧数据处理。

实际上，每个消费者唯一保存的元数据信息就是消费者当前消费日志的位移位置。

位移位置是由消费者控制，即、消费者可以通过修改偏移量读取任何位置的数据。

ØDistribution--分布式

ØProducers--生产者

指定topic来发送消息到KafkaBroker

ØConsumers--消费者

根据topic消费相应的消息

二、Kafka集群部署

集群规划：

Zookeeper集群共三台服务器，分别为：

node1、node2、node3。

Kafka集群共三台服务器，分别为：

node1、node2、node3。

1、Zookeeper集群准备

kafka是一个分布式消息队列，需要依赖ZooKeeper，请先安装好zk集群。

Zookeeper集群安装步骤略。

2、安装Kafka

下载压缩包（官网地址：

http:

//kafka.apache.org/downloads.html）

解压：

tarzxvfkafka_2.10-0.9.0.1.tgz-C/opt/

mvkafka_2.10-0.9.0.1/kafka

修改配置文件：

config/server.properties

核心配置参数说明：

broker.id:

broker集群中唯一标识id，0、1、2、3依次增长（broker即Kafka集群中的一台服务器）

注：

当前Kafka集群共三台节点，分别为：

node1、node2、node3。

对应的broker.id分别为0、1、2。

zookeeper.connect:

zk集群地址列表

将当前node1服务器上的Kafka目录同步到其他node2、node3服务器上：

scp-r/opt/kafka/node2:

/opt

scp-r/opt/kafka/node3:

/opt

修改node2、node3上Kafka配置文件中的broker.id（分别在node2、3服务器上执行以下命令修改broker.id）

sed-i-e's/broker.id=.*/broker.id=1/'/opt/kafka/config/server.properties

sed-i-e's/broker.id=.*/broker.id=2/'/opt/kafka/config/server.properties

3、启动Kafka集群

A、启动Zookeeper集群。

B、启动Kafka集群。

分别在三台服务器上执行以下命令启动：

bin/kafka-server-start.shconfig/server.properties

4、测试

创建话题

（kafka-topics.sh--help查看帮助手册）

创建topic：

bin/kafka-topics.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--create--replication-factor2--partitions3--topictest

（参数说明：

--replication-factor：

指定每个分区的复制因子个数，默认1个

--partitions：

指定当前创建的kafka分区数量，默认为1个

--topic：

指定新建topic的名称）

查看topic列表：

bin/kafka-topics.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--list

查看“test”topic描述：

bin/kafka-topics.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--describe--topictest

创建生产者：

bin/kafka-console-producer.sh--broker-listnode1:

9092,node2:

9092,node3:

9092--topictest

创建消费者：

bin/kafka-console-consumer.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--from-beginning--topictest

注：

查看帮助手册：

bin/kafka-console-consumer.shhelp

三、Flume&Kafka

1、Flume安装

Flume安装流程略。

2、Flume+Kafka

A、启动Kafka集群。

bin/kafka-server-start.shconfig/server.properties

B、配置Flume集群，并启动Flume集群。

bin/flume-ngagent-na1-cconf-fconf/fk.conf-Dflume.root.logger=DEBUG,console

其中，Flume配置文件fk.conf内容如下：

a1.sources=r1

a1.sinks=k1

a1.channels=c1

#Describe/configurethesource

a1.sources.r1.type=avro

a1.sources.r1.bind=node1

a1.sources.r1.port=41414

#Describethesink

a1.sinks.k1.type=org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.topic=testflume

a1.sinks.k1.brokerList=node1:

9092,node2:

9092,node3:

9092

a1.sinks.k1.requiredAcks=1

a1.sinks.k1.batchSize=20

a1.sinks.k1.channel=c1

#Useachannelwhichbufferseventsinmemory

a1.channels.c1.type=memory

a1.channels.c1.capacity=1000000

a1.channels.c1.transactionCapacity=10000

#Bindthesourceandsinktothechannel

a1.sources.r1.channels=c1

a1.sinks.k1.channel=c1

3、测试

Ø分别启动Zookeeper、Kafka、Flume集群。

Ø创建topic：

bin/kafka-topics.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--create--replication-factor2--partitions3--topictestflume

bin/kafka-topics.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--list

Ø启动消费者：

bin/kafka-console-consumer.sh--zookeepernode1:

2181,node2:

2181,node3:

2181--from-beginning--topictestflume

Ø运行“RpcClientDemo”代码，通过rpc请求发送数据到Flume集群。

Flume中source类型为AVRO类型，此时通过Java发送rpc请求，测试数据是否传入Kafka。

其中，Java发送Rpc请求Flume代码示例如下：

（参考Flume官方文档：

http:

//flume.apache.org/FlumeDeveloperGuide.html）

importorg.apache.flume.Event;

importorg.apache.flume.EventDeliveryException;

importorg.apache.flume.api.RpcClient;

importorg.apache.flume.api.RpcClientFactory;

importorg.apache.flume.event.EventBuilder;

importjava.nio.charset.Charset;

/**

*Flume官网案例

*http:

//flume.apache.org/FlumeDeveloperGuide.html

*@authorroot

publicclassRpcClientDemo{

publicstaticvoidmain（String[]args）{

MyRpcClientFacadeclient=newMyRpcClientFacade（）;

//InitializeclientwiththeremoteFlumeagent'shostandport

client.init（"node1",41414）;

//Send10eventstotheremoteFlumeagent.Thatagentshouldbe

//configuredtolistenwithanAvroSource.

StringsampleData="HelloFlume!

for（inti=0;i<10;i++）{

client.sendDataToFlume（sampleData）;

System.out.println（"发送数据：

"+sampleData）;

}

client.cleanUp（）;

}

classMyRpcClientFacade{

privateRpcClientclient;

privateStringhostname;

privateintport;

publicvoidinit（Stringhostname,intport）{

//SetuptheRPCconnection

this.hostname=hostname;

this.port=port;

this.client=RpcClientFactory.getDefaultInstance（hostname,port）;

//Usethefollowingmethodtocreateathriftclient（insteadofthe

//aboveline）:

//this.client=RpcClientFactory.getThriftInstance（hostname,port）;

}

publicvoidsendDataToFlume（Stringdata）{

//CreateaFlumeEventobjectthatencapsulatesthesampledata

Eventevent=EventBuilder.withBody（data,Charset.forName（"UTF-8"））;

//Sendtheevent

try{

client.append（event）;

}catch（EventDeliveryExceptione）{

//cleanupandrecreatetheclient

client.close（）;

client=null;

client=RpcClientFactory.getDefaultInstance（hostname,port）;

//Usethefollowingmethodtocreateathriftclient（insteadof

//theaboveline）:

//this.client=RpcClientFactory.getThriftInstance（hostname,port）;

}

publicvoidcleanUp（）{

//ClosetheRPCconnection

client.close（）;

}

四、Storm&Kafka

官网地址：

http:

//storm.apache.org/about/integrates.html

展开阅读全文