tonglin0325的个人主页

thrift,protobuf,avro序列化对比

对比thrift使用TCompactProtocol协议,protobuf使用,以及avro使用AvroKeyOutputFormat格式进行序列化对数据进行序列化后数据量大小

由于thrift的binary数据类型不能再次序列化化成二进制,所以测试的schema中没有binary类型的字段

1.avro schema

测试数据的avro schema定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"namespace": "com.linkedin.haivvreo",
"name": "test_serializer",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"int1", "type":"int" },
{ "name":"tinyint1", "type":"int" },
{ "name":"smallint1", "type":"int" },
{ "name":"bigint1", "type":"long" },
{ "name":"boolean1", "type":"boolean" },
{ "name":"float1", "type":"float" },
{ "name":"double1", "type":"double" },
{ "name":"list1", "type":{"type":"array", "items":"string"} },
{ "name":"map1", "type":{"type":"map", "values":"int"} },
{ "name":"struct1", "type":{"type":"record", "name":"struct1_name", "fields": [
{ "name":"sInt", "type":"int" }, { "name":"sBoolean", "type":"boolean" }, { "name":"sString", "type":"string" } ] } },
{ "name":"enum1", "type":{"type":"enum", "name":"enum1_values", "symbols":["BLUE","RED", "GREEN"]} },
{ "name":"nullableint", "type":["int", "null"] }
] }

2.Thrift schema

测试数据的thrift schema定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
namespace java com.linkedin.haivvreo

struct struct1_name{
1: required i32 sInt;
2: required bool sBoolean;
3: required string sString;
}

enum enum1_values {
BLUE,
RED,
GREEN
}

struct union1{
1: optional double member0;
2: optional bool member1;
3: optional string member2;
}

struct test_serializer{
1: required string string1;
2: required i32 int1;
3: required i32 tinyint1;
4: required i32 smallint1;
5: required i64 bigint1;
6: required bool boolean1;
7: required double float1;
8: required double double1;
9: required list<string> list1;
10: required map<string, i32> map1;
11: required struct1_name struct1;
12: required string enum1;
13: optional i32 nullableint
}

3.protobuf schema

全文 >>

ubuntu下git安装及使用

1.设置用户名和邮箱

1
2
3
git config --global user.name "xxxx"
git config --global user.email "xxx@xxx.edu.cn"

2.查看当前git的用户和邮箱

1
2
3
git config user.name
git config user.email

3.生成秘钥,回车3下,不设置密码

1
2
ssh-keygen -t rsa -C "xxx@xxx.edu.cn" -f ~/.ssh/id_rsa_github

4. ssh目录在etc/ssh下

~/.ssh/config配置文件如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#自己私人用的 GitHub 帳號,id_rsa 就是我自己原本用的 ssh key
Host github.com
User xxx
Hostname ssh.github.com
PreferredAuthentications publickey
IdentityFile ~/.ssh/id_rsa_github
Port 443

#公司工作用的 GitHub 帳號,此處的 COMPANY 你可以自行取名
Host gitlab.xxx.com
Hostname gitlab.xxx.com
Port xxx
User xxx
IdentityFile ~/.ssh/id_rsa

Host xx-*
HostName %h
User xxx
Port xxx

Host xxx-*
HostName %h
User xxx
Port xxx

Host xxx-*
HostName %h
User xxx
Port xxx

5.上传.pub公钥到github

6.可以git clone了

全文 >>

特征平台——feast

feast是google开源的一个特征平台,其提供特征注册管理,以及和特征存储(feature store),离线存储(offline store)和在线存储(online store)交互的SDK,官网文档:

1
2
https://docs.feast.dev/

目前最新的v0.24版本支持的离线存储:File,Snowflake,BigQuery,Redshift,Spark,PostgreSQL,Trino,AzureSynapse等,参考:

1
2
https://docs.feast.dev/reference/offline-stores

在线存储:SQLite,Snowflake,Redis,Datastore,DynamoDB,PostgreSQL,Cassandra等,参考:

1
2
https://docs.feast.dev/reference/online-stores

**provider

全文 >>