tonglin0325的个人主页

Avro学习笔记——avro-tools工具

1.下载avro-tools.jar#

1
https://archive.apache.org/dist/avro/avro-1.10.1/java/

avro-tools.jar常用命令:Working with Apache Avro files in Amazon S3  

也可以查看help

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
java -jar ./avro-tools-1.10.1.jar help
Version 1.10.1 of Apache Avro
Copyright 2010-2015 The Apache Software Foundation

This product includes software developed at
The Apache Software Foundation (https://www.apache.org/).
----------------
Available tools:
canonical Converts an Avro Schema to its canonical form
cat Extracts samples from files
compile Generates Java code for the given schema.
concat Concatenates avro files without re-compressing.
count Counts the records in avro files or folders
fingerprint Returns the fingerprint for the schemas.
fragtojson Renders a binary-encoded Avro datum as JSON.
fromjson Reads JSON records and writes an Avro data file.
fromtext Imports a text file into an avro data file.
getmeta Prints out the metadata of an Avro data file.
getschema Prints out schema of an Avro data file.
idl Generates a JSON schema from an Avro IDL file
idl2schemata Extract JSON schemata of the types from an Avro IDL file
induce Induce schema/protocol from Java class/interface via reflection.
jsontofrag Renders a JSON-encoded Avro datum as binary.
random Creates a file with randomly generated instances of a schema.
recodec Alters the codec of a data file.
repair Recovers data from a corrupt Avro Data file
rpcprotocol Output the protocol of a RPC service
rpcreceive Opens an RPC Server and listens for one message.
rpcsend Sends a single RPC message.
tether Run a tethered mapreduce job.
tojson Dumps an Avro data file as JSON, record per line or pretty.
totext Converts an Avro data file to a text file.
totrevni Converts an Avro data file to a Trevni file.
trevni_meta Dumps a Trevni file's metadata as JSON.
trevni_random Create a Trevni file filled with random instances of a schema.
trevni_tojson Dumps a Trevni file as JSON.

2.查看avro文件的schema#

1
2
java -jar ./avro-tools-1.10.1.jar getschema ./xxxx.avro

3.查看avro文件内容的json格式#

1
2
java -jar ./avro-tools-1.10.1.jar tojson ./nova_ads_access_log-0-0008589084.avro | less

4.使用avro-tools编译java代码#

编译avro IDL文件,参考

1
2
https://avro.apache.org/docs/current/gettingstartedjava.html
https://yanbin.blog/convert-apache-avro-to-parquet-format-in-java/

定义schema文件kst.avsc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
"namespace": "com.linkedin.haivvreo",
"name": "test_serializer",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"int1", "type":"int" },
{ "name":"tinyint1", "type":"int" },
{ "name":"smallint1", "type":"int" },
{ "name":"bigint1", "type":"long" },
{ "name":"boolean1", "type":"boolean" },
{ "name":"float1", "type":"float" },
{ "name":"double1", "type":"double" },
{ "name":"list1", "type":{"type":"array", "items":"string"} },
{ "name":"map1", "type":{"type":"map", "values":"int"} },
{ "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [
{ "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },
{ "name":"union1", "type":["float", "boolean", "string"] },
{ "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },
{ "name":"nullableint", "type":["int", "null"] },
{ "name":"bytes1", "type":"bytes" },
{ "name":"fixed1", "type":{"type":"fixed", "name":"threebytes", "size":3} }
] }

编译avro IDL文件

1
2
java -jar ./src/main/resources/avro-tools-1.10.1.jar compile schema ./src/main/avro/kst.avsc ./src/main/java

 

这时编译出来的java代码中,IDL的string类型实际上是CharSequence

如果想编译成string,则可以添加-string参数

1
2
java -jar ./src/main/resources/avro-tools-1.10.1.jar compile -string schema ./src/main/avro/kst.avsc ./src/main/java