tonglin0325的个人主页

Hive学习笔记——UDF开发

实现一个UDF函数可以继承 org.apache.hadoop.hive.ql.exec.UDF,也可以继承 org.apache.hadoop.hive.ql.udf.generic.GenericUDF

1.继承UDF,参考

1
2
https://docs.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-java-udf

引入依赖

1
2
3
4
5
6
7
8
9
10
11
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.16.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.16.2</version>
</dependency>

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Description(
name = "hello",
value = "_FUNC_(str) - from the input string"
+ "returns the value that is \"Hello $str\" ",
extended = "Example:\n"
+ " > SELECT _FUNC_(str) FROM src;"
)
public class MyUDF extends UDF {

private static final Logger logger = LoggerFactory.getLogger(MyUDF.class);

public String evaluate(String str){
try {
return "Hello " + str;
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
return "ERROR";
}
}

}

打包的时候需要注意,需要把所以依赖的jar都打进去,然后将jar包上传到HDFS上或者s3上

1
2
3
4
5
6
7
8
9
10
11
12
hive> add jar hdfs:///user/hive/udf/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar;
converting to local hdfs:///user/hive/udf/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar
Added [/tmp/5aa66ab6-35ab-45d5-bef1-5acc79d16b23_resources/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar] to class path
Added resources: [hdfs:///user/hive/udf/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar]
hive> create temporary function my_lower as "com.bigdata.hive.MyUDF";
OK
Time taken: 0.073 seconds
hive> select my_lower("123");
OK
Hello 123
Time taken: 0.253 seconds, Fetched: 1 row(s)

查看jar包

1
2
3
hive> list jar;
/tmp/5aa66ab6-35ab-45d5-bef1-5acc79d16b23_resources/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar

删除jar包

1
2
hive> delete jar /tmp/5aa66ab6-35ab-45d5-bef1-5acc79d16b23_resources/bigdata-1.0-SNAPSHOT-jar-with-dependencies.jar;

查看function

1
2
3
4
5
hive> show functions like '*lower';
OK
lower
Time taken: 0.016 seconds, Fetched: 1 row(s)

删除function

1
2
hive> drop function if exists my_lower;

 

2.继承GenericUDF,参考:Hive- UDF&GenericUDF 以及 guide-to-writing-hive-udfs