tonglin0325的个人主页

CDH5.16安装lzo

1.在CDH管理页面进入parcels,下载GPLEXTRAS

1
2
3
4
lintong@master:/opt/cloudera/parcel-repo$ ls | grep GPLEXTRAS
GPLEXTRAS-5.16.2-1.cdh5.16.2.p0.8-xenial.parcel
GPLEXTRAS-5.16.2-1.cdh5.16.2.p0.8-xenial.parcel.sha1

将sha1改成sha

1
2
sudo mv GPLEXTRAS-5.16.2-1.cdh5.16.2.p0.8-xenial.parcel.sha1 GPLEXTRAS-5.16.2-1.cdh5.16.2.p0.8-xenial.parcel.sha

如果parcels的哈希文件不存在,可以这样生成

1
2
sha1sum ./SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-xenial.parcel | cut -d ' ' -f 1 > SPARK2-2.4..cloudera2-1.cdh5.13.3.p0.1041012-xenial.parcel.sha1

2.在界面上分配并激活

 

 

3.在HDFS配置的 io.compression.codecs 参数添加上

1
2
3
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec

 

 

 

参考文档

1
2
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_gpl_extras.html

4.在节点上安装lzo

1
2
sudo apt-get install liblzo2-2

参考文档

1
2
https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_ig_install_gpl_extras.html#xd_583c10bfdbd326ba-3ca24a24-13d80143249--7ec6

4.YARN配置,在mapreduce.application.classpath中添加

1
2
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*

 

 

5.重启,验证

1
2
3
4
5
6
7
8
9
create table test_table(id int,name string)

set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec;
set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress=true;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;

insert overwrite table test_table select * from test_table;

 

参考:0003-如何在CDH中使用LZO压缩