tonglin0325的个人主页

Ubuntu16.04安装Filebeat

Filebeat官方文档地址

1
2
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-installation.html

下载和安装

1
2
3
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.3.1-linux-x86_64.tar.gz
tar xzvf filebeat-7.3.1-linux-x86_64.tar.gz

编写filebeat.yml

 

启动

1
2
3
chmod go-w /home/lintong/software/apache/filebeat-7.3.1-linux-x86_64/filebeat.yml
./filebeat -e -c filebeat.yml

codec.format

1
2
3
codec.format:
string: '%{[@timestamp]} %{[message]}'

全文 >>

Ubuntu16.04安装Consul

1.下载安装包

1
2
3
https://www.consul.io/downloads.html
wget https://releases.hashicorp.com/consul/1.5.3/consul_1.5.3_linux_amd64.zip

2.解压

1
2
unzip consul_1.5.3_linux_amd64.zip

3.mv

1
2
sudo mv consul /usr/local/bin/consul

4.启动

参考:https://blog.csdn.net/u010046908/article/details/61916389

-dev 开发模式启动的时候,数据是存储在内存中,重启之后数据将丢失

1
2
consul agent -dev

-server 生成模式启动的时候,如果是server的话需要指定-server,如果是client的话,需要指定-client,比如

1
2
consul agent -ui -server -bootstrap-expect 1 -data-dir /tmp/consul -node=consul-server -bind=192.168.1.100 -client=192.168.1.100

全文 >>

kerberos相关

1.Kerberos介绍

Kerberos是一种计算机网络授权协议,用来在非安全网络中,对个人通信以安全的手段进行身份认证。这个词又指麻省理工学院为这个协议开发的一套计算机软件。软件设计上采用客户端/服务器结构,并且能够进行相互认证,即客户端和服务器端均可对对方进行身份认证。可以用于防止窃听、防止重放攻击、保护数据完整性等场合,是一种应用对称密钥体制进行密钥管理的系统。

 

Kerberos中的一些概念:

1)KDC:密钥分发中心,负责管理发放票据,记录授权。

2)Realm:Kerberos管理领域的标识。

3)principal:当每添加一个用户或服务的时候都需要向kdc添加一条principal,principl的形式为:主名称/实例名@领域名。

4)主名称:主名称可以是用户名或服务名,表示是用于提供各种网络服务(如hdfs,yarn,hive)的主体。

5)实例名:实例名简单理解为主机名。

2.安装及配置KDC服务

卸载老的krb

1
2
3
4
sudo apt remove --purge krb*
sudo rm -rf /etc/krb5kdc
sudo rm -rf /var/lib/krb5kdc

安装KDC服务和管理员服务

1
2
sudo apt-get install krb5-kdc krb5-admin-server

全文 >>

Airflow使用指南

1.只执行单个任务

将downstream和recursive按钮的点击状态取消,然后点击clear,最后选择Ignore All Deps,然后点击run

2.从一个任务开始,执行它以及它的下游任务

将downstream和recursive按钮的点击状态取消,然后点击clear,最后选择Ignore Task Deps,然后点击run

其他:调度工具airflow的介绍和使用示例

3.airflow命令行

1
2
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#dags

1.第一次登录创建airflow用户

1
airflow users create --username airflow --role Admin --password airflow --email airflow@xxx.com --lastname airflow --firstname airflow 

2.根据dag id删除一个dag

1
2
airflow dags delete {dag_id}

3.触发一个airflow dag

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
airflow dags trigger --help
usage: airflow dags trigger [-h] [-c CONF] [-e EXEC_DATE] [-r RUN_ID]
[-S SUBDIR]
dag_id

Trigger a DAG run

positional arguments:
dag_id The id of the dag

optional arguments:
-h, --help show this help message and exit
-c CONF, --conf CONF JSON string that gets pickled into the DagRun's conf attribute
-e EXEC_DATE, --exec-date EXEC_DATE
The execution date of the DAG
-r RUN_ID, --run-id RUN_ID
Helps to identify this run
-S SUBDIR, --subdir SUBDIR
File location or directory from which to look for the dag. Defaults to '[AIRFLOW_HOME]/dags' where [AIRFLOW_HOME] is the value you set for 'AIRFLOW_HOME' config you set in 'airflow.cfg'

airflow dags trigger -e '2022-07-19T08:00:00' your_dag_id

注意execution_time要在start_date和end_date之间,否则会报

1
2
ValueError: The execution_date [2022-07-19T08:00:00+00:00] should be >= start_date [2022-07-20T00:00:00+00:00] from DAG's default_args

全文 >>

Superset配置impala数据源

1.安装impyla

1
2
pip install impyla 

 2.在superset页面配置如下,此时impala是有kerberos认证的

1
2
impala://xxxx:xx/default?auth_mechanism=GSSAPI&kerberos_service_name=impala

如果遇到下面的问题,是thrift-sasl版本过高

1
2
The error message returned was:\n'TSocket' object has no attribute 'isOpen'

 降级为0.2.1版本就可以

1
2
3
4
pip list | grep thrift-sasl
thrift-sasl 0.3.0
pip install thrift-sasl==0.2.1

测试,ok

全文 >>

Superset配置hive数据源

1.在uri中配置 hive://localhost:10000/default

2.查询

3.如果你的hive集群是带有kerberos认证的,hive数据源需要这样配置

1
2
hive://xxx:xxx/default?auth=KERBEROS&kerberos_service_name=hive

如果在连接的时候报了如下的错

1
2
Could not start SASL: b'Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure

那就就是你没有用keytab进行认证

1
2
kinit -kt xxx.keytab xxx@XXXX

全文 >>

Superset配置mysql数据源

1.添加mysql数据源

测试连接的时候遇到

1
2
No module named 'MySQLdb'"

安装mysqlclient

1
2
pip install mysqlclient

如果遇到

1
2
ERROR: /bin/sh: 1: mysql_config: not found

安装

1
2
sudo apt-get install libmysqlclient-dev python3-dev

全文 >>

Hadoop学习笔记——HDFS

1.查看hdfs文件的block信息

不正常的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
hdfs fsck /logs/xxx/xxxx.gz.gz -files -blocks -locations
Connecting to namenode via http://xxx-01:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Flogs%2Fxxx%2Fxxx%2F401294%2Fds%3Dxxxx-07-14%2Fxxx.gz.gz
FSCK started by xxxx (auth:KERBEROS_SSL) from /10.90.1.91 for path xxxxx.gz.gz at Mon Jul 15 11:44:13 CST 2019
Status: HEALTHY
Total size: 0 B (Total open files size: 194 B)
Total dirs: 0
Total files: 0
Total symlinks: 0 (Files currently being written: 1)
Total blocks (validated): 0 (Total open file blocks (not validated): 1)
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 99
Number of racks: 3
FSCK ended at Mon Jul 15 11:44:13 CST 2019 in 0 milliseconds

 正常的文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Connecting to namenode via http://xxx:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Fxxx%2Fxxx%2Fxxx%2F401294%2Fds%3Dxxx-07-14%2Fxx.gz
FSCK started by xxxx (auth:KERBEROS_SSL) from /10.90.1.91 for path /logs/xxxx.gz at Mon Jul 15 11:46:12 CST 2019
/logs/xxxx.gz 74745 bytes, 1 block(s): OK
0. BP-1760298736-10.90.1.6-1536234810107:blk_1392467116_318836510 len=74745 Live_repl=3 [DatanodeInfoWithStorage[10.90.1.99:1004,DS-9d465b1f-943f-4716-bce0-8b36e5631b4a,DISK], DatanodeInfoWithStorage[10.90.1.216:1004,DS-160924c6-4cd7-4822-93c0-9ac9cf9c5784,DISK], DatanodeInfoWithStorage[10.90.1.191:1004,DS-d0a2e418-610f-4bef-8f1d-4ce045533656,DISK]]

Status: HEALTHY
Total size: 74745 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 74745 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 99
Number of racks: 3
FSCK ended at Mon Jul 15 11:46:12 CST 2019 in 1 milliseconds

 

2.修复hdfs文件命令

1
2
hdfs debug recoverLease -path /logs/xxxx.gz.gz -retries 3

修复之后

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
hdfs fsck /logs/xxx.gz.gz -files -blocks -locations
Connecting to namenode via http://xxx-01:50070/fsck?ugi=xxx&files=1&blocks=1&locations=1&path=%2Flogs%2Fnsh%2Fjson%2F401294%2Fds%3D2019-07-14%2Fxxx.gz.gz
FSCK started by xxx (auth:KERBEROS_SSL) from /10.90.1.91 for path /logs/xxxx.gz.gz at Mon Jul 15 11:48:01 CST 2019
/logs/xxxx.gz.gz 67157 bytes, 1 block(s): OK
0. BP-1760298736-10.90.1.6-1536234810107:blk_1392594522_319757834 len=67157 Live_repl=3 [DatanodeInfoWithStorage[10.90.1.213:1004,DS-6aee5c90-c834-475e-8f20-7a0f8bd8d315,DISK], DatanodeInfoWithStorage[10.90.1.207:1004,DS-cd79bacc-89ff-4fb3-82b5-79341391ae8d,DISK], DatanodeInfoWithStorage[10.90.1.97:1004,DS-ba5953f8-c0c3-444a-8996-3bcfa1bcf851,DISK]]

Status: HEALTHY
Total size: 67157 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 67157 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 99
Number of racks: 3
FSCK ended at Mon Jul 15 11:48:01 CST 2019 in 1 milliseconds

 

  1. 其他博客:Hadoop学习笔记—HDFS

    全文 >>

Ubuntu16.04安装Superset

**Superset **是Airbnb 开源的大数据可视化平台

其支持的datasource

1
2
https://superset.incubator.apache.org/index.html?highlight=datasource

类似的开源项目Zeppelin所支持的datasource

1
2
https://zeppelin.apache.org/docs/0.8.0/quickstart/sql_with_zeppelin.html

 

1.升级python3.5到python3.6,否则会报 ERROR: Sorry, Python < 3.6 is not supported

1
2
3
4
sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt update
sudo apt install python3.6

2.官方的安装文档

1
2
https://superset.incubator.apache.org/installation.html

3.安装虚拟环境

全文 >>