1,由cdh安装的kafka的默认存储路径如图所示在/var/local/kafka/data,一般会进行修改
kafka配置参考:apache kafka系列之server.properties配置文件参数说明
路径下文件如下
如果是多个路径的话,使用,进行分隔,比如/data01/kafka/data,/data02/kafka/data,注意data的权限需要是kafka用户和kafka组
对应kafka manager上的topic
具体的topic的目录下的文件
1 2 3 4
| lintong@master:/var/local/kafka/data/test_topic-0$ ls 00000000000000000187.index 00000000000000000187.snapshot leader-epoch-checkpoint 00000000000000000187.log 00000000000000000187.timeindex
|
其中
187就是这个日志数据文件开始的offset
00000000000000000187.log是日志数据文件
可以使用解码命令查看日志片段中的内容
1 2 3 4 5 6 7 8
| /opt/cloudera/parcels/KAFKA/bin/kafka-run-class kafka.tools.DumpLogSegments --files ./00000000000000000187.log 20/07/19 17:28:55 INFO utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean Dumping ./00000000000000000187.log Starting offset: 187 baseOffset: 187 lastOffset: 187 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 0 CreateTime: 1595149737820 size: 71 magic: 2 compresscodec: NONE crc: 1738564593 isvalid: true baseOffset: 188 lastOffset: 188 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 71 CreateTime: 1595149744449 size: 74 magic: 2 compresscodec: NONE crc: 3794338178 isvalid: true baseOffset: 189 lastOffset: 189 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false position: 145 CreateTime: 1595149757514 size: 75 magic: 2 compresscodec: NONE crc: 3084259622 isvalid: true
|
log.segment.bytes设置是1G,如果log文件的大小达到1G之后会生成另外一个log文件
该参数在1.0.1及以下的kafka有bug,可能会影响消费者消费topic的数据,但是不影响生产者,参考:https://issues.apache.org/jira/browse/KAFKA-6292
当kafka的broker读取segment文件的时候,会判断当前当前读取的segment的偏移量position在继续读取一段HEADER_SIZE_UP_TO_MAGIC
之后和该segment文件最大可读取的偏移量end之间的大小
当调高了log.segment.bytes=2G,注意此处2G=有符号INT的最大值=2147483647,有可能导致position+HEADER_SIZE_UP_TO_MAGIC
的大小超过int最大值,从而成为负数,小于end,返回null,并导致从log文件的末尾开始读取数据
参考kafka 1.0.1版本源码:https://github.com/apache/kafka/blob/1.0.1/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java
全文 >>