tonglin0325的个人主页

Ubuntu16.04安装flink-1.10.0

本来想cdh集成flink,但是我的cdh版本为5.16.2,参考了下面的issue可能cdh版本太低,至少要cdh6

1
2
https://github.com/pkeropen/flink-parcel/issues

进行独立安装

1
2
wget https://archive.apache.org/dist/flink/flink-1.10.0/flink-1.10.0-bin-scala_2.11.tgz

安装路径

1
2
/home/lintong/software/apache/flink-1.10.0

/etc/profile添加,并source /etc/profile

1
2
3
4
#flink
export FLINK_HOME=/home/lintong/software/apache/flink-1.10.0
export PATH=${FLINK_HOME}/bin:$PATH

下载flink-shaded-hadoop-2-uber-2.7.5-7.0.jar包,放到flink的lib目录下

1
2
wget https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-7.0/flink-shaded-hadoop-2-uber-2.7.5-7.0.jar

不然flink on yarn的时候会报

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more

启动yarn-session

1
2
yarn-session.sh -n 3 -s 5 -jm 1024 -tm 4096 -d

yarn-seesion参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-n : 指定TaskManager的数量;
-d: 以分离模式运行;
-id:指定yarn的任务ID;
-j:Flink jar文件的路径;
-jm:JobManager容器的内存(默认值:MB);
-nl:为YARN应用程序指定YARN节点标签;
-nm:在YARN上为应用程序设置自定义名称;
-q:显示可用的YARN资源(内存,内核);
-qu:指定YARN队列;
-s:指定TaskManager中slot的数量;
-st:以流模式启动Flink;
-tm:每个TaskManager容器的内存(默认值:MB);
-z:命名空间,用于为高可用性模式创建Zookeeper子路径;

去CDH上查看,第一个是正在运行,第二个是结束

去appliance id进到yarn的app页面

全文 >>

Yarn学习笔记——MR任务

1.hive sql提交到yarn上面执行之后,将会成为MR任务执行

正在运行的MR任务的application查看的url,不同类似的任务查看的url可能会不同,比如Spark,Flink等

1
2
http://xxxx:8088/cluster/app/application_158225xxxxx_0316

全文 >>

Yarn学习笔记——常用命令

1.yarn top,查看yarn上面的资源使用情况

2.队列使用状态

1
2
3
4
5
6
7
8
9
10
queue -status root.xxx_common
Queue Information :
Queue Name : root.xxx_common
State : RUNNING
Capacity : 100.0%
Current Capacity : 21.7%
Maximum Capacity : -100.0%
Default Node Label expression :
Accessible Node Labels :

3.查看yarn上运行的任务列表,如果集群有krb认证的话,需要先kinit,认证后可以看到所有正在运行的任务

1
2
yarn application -list

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):12
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_15771778xxxxx_0664 xx-flink-test Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:35437
application_15771778xxxxx_0663 xx-flink-debug Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-79:42443
application_15771778xxxxx_0641 xxx-flink Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:38067
application_15771778xxxxx_0182 common_flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-79:38583
application_15822552xxxxx_0275 testjar XXX-FLINK xxx root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:36751
application_15822552xxxxx_0259 flinksql XXX-FLINK hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-77:37127
application_15822552xxxxx_0026 kudu-test Apache Flink hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:43071
application_15822552xxxxx_0307 xxx_statistic XXX Flink xxx root.xxx_common RUNNING UNDEFINED 100% http://xxx:18000
application_15822552xxxxx_0308 xxx-statistic XXX Flink xxx root.xxx_common ACCEPTED UNDEFINED 0% N/A
application_15810489xxxxx_0003 xxx-flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:8081
application_15810489xxxxx_0184 common_flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:35659
application_15810489xxxxx_0154 Flink session cluster Apache Flink hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-80:38797

使用状态进行筛选

1
2
3
4
5
yarn application -list -appStates RUNNING
Total number of applications (application-types: [] and states: [RUNNING]):12
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_157717780xxxx_0664 xx-flink-test Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxxxx-xx:35437

4.查看任务状态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
yarn application -status application_1582255xxxx_0314
Application Report :
Application-Id : application_1582255xxxx_0314
Application-Name : select count(*) from tb1 (Stage-1)
Application-Type : MAPREDUCE
User : hive
Queue : root.xxxx_common
Start-Time : 1583822835423
Finish-Time : 1583822860082
Progress : 100%
State : FINISHED
Final-State : SUCCEEDED
Tracking-URL : http://xxx-xxxx-xx:19888/jobhistory/job/job_15822552xxxx_0314
RPC Port : 32829
AM Host : xxxx-xxxx-xx
Aggregate Resource Allocation : 162810 MB-seconds, 78 vcore-seconds
Log Aggregation Status : SUCCEEDED
Diagnostics :

全文 >>

Elasticsearch的Shard和Segment

Shard是什么?

在下面的文档中进行了介绍

1
2
https://www.elastic.co/guide/cn/elasticsearch/guide/current/kagillion-shards.html

1.一个分片的底层即为一个 Lucene 索引,会消耗一定文件句柄、内存、以及 CPU 运转。

全文 >>

mac修改brew源

参考:https://juejin.im/post/5daec26a51882575d50cd0aa

1.查看brew当前源

1
2
3
4
git -C "$(brew --repo)" remote -v
origin https://github.com/Homebrew/brew (fetch)
origin https://github.com/Homebrew/brew (push)

2.改成清华的源

1
2
3
4
5
git -C "$(brew --repo)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git
git -C "$(brew --repo homebrew/core)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-cask.git
brew update

已经修改成清华的源

1
2
3
4
git -C "$(brew --repo)" remote -v
origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git (fetch)
origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git (push)

或者改成中科大的源

1
2
3
4
5
6
7
8
9
# 替换 Homebrew
git -C "$(brew --repo)" remote set-url origin https://mirrors.ustc.edu.cn/brew.git
# 替换 Homebrew Core
git -C "$(brew --repo homebrew/core)" remote set-url origin https://mirrors.ustc.edu.cn/homebrew-core.git
# 替换 Homebrew Cask
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://mirrors.ustc.edu.cn/homebrew-cask.git
# 更新
brew update

如果想还原

1
2
3
4
5
git -C "$(brew --repo)" remote set-url origin https://github.com/Homebrew/brew.git
git -C "$(brew --repo homebrew/core)" remote set-url origin https://github.com/Homebrew/homebrew-core.git
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://github.com/Homebrew/homebrew-cask.git
brew update

全文 >>

MySQL自增id不连续问题

项目中有一张表是记录人员,在每个新用户调用接口认证通过了之后,会有一个往该表插入这个新用户信息的操作。

但是在线上环境中,发现该表的自增id不连续,且间隔都是差了2,比如上一个人的id是10,下一个人的id就是12,而在前端页面中,一个用户认证通过后,会调用3个接口,初步排查是MySQL并发操作导致了自增id不连续的情况

在这篇文章中,列举了导致自增id不连续的几个原因,这次遇到的就是第一种情况,因为个人的信息中我设置了唯一索引,参考:MySQL实战45讲Day38—-自增主键不是连续的原因

<1>、唯一键冲突是导致自增主键id不连续的第一种原因

<2>、事务回滚是导致自增主键id不连续的第二种原因

<3>、批量申请自增id的策略是导致自增主键id不连续的第三种原因

在这篇文章中提到了MySQL默认的innodb_autoinc_lock_mode=1,当innodb_autoinc_lock_mode=1和innodb_autoinc_lock_mode=2的情况下,自增id可能会出现不连续

在innodb_autoinc_lock_mode=0的时候,自增id是连续的,但是会导致锁表,影响并发性能

参考:INNODB自增主键的一些问题

再谈MySQL auto_increment空洞问题

解决方法:

1.MySQL中自增主键不连续之解决方案。

全文 >>