tonglin0325的个人主页

CDH学习笔记——cloudera manager API

可以使用CM提供的api查询cdh集群的信息

1
2
http://cloudera.github.io/cm_api/

7.0.3的api文档

1
2
https://archive.cloudera.com/cm7/7.0.3/generic/jar/cm_api/apidocs/index.html

查询impala query的api

1
2
https://archive.cloudera.com/cm7/7.0.3/generic/jar/cm_api/apidocs/json_ApiImpalaQuery.html

比如

1
2
https://xxxx:7180/api/v9/clusters/dev-cdh/services/impala/impalaQueries?from=2020-03-10T06:26:01.927Z

支持的参数如图所示

查询yarn上query的api

1
2
https://archive.cloudera.com/cm7/7.0.3/generic/jar/cm_api/apidocs/resource_YarnApplicationsResource.html

比如

1
2
https://xxxx:7180/api/v9/clusters/dev-cdh/services/yarn/yarnApplications

支持的参数如图所示,和impala的一样

全文 >>

Ubuntu16.04安装flink-1.10.0

本来想cdh集成flink,但是我的cdh版本为5.16.2,参考了下面的issue可能cdh版本太低,至少要cdh6

1
2
https://github.com/pkeropen/flink-parcel/issues

进行独立安装

1
2
wget https://archive.apache.org/dist/flink/flink-1.10.0/flink-1.10.0-bin-scala_2.11.tgz

安装路径

1
2
/home/lintong/software/apache/flink-1.10.0

/etc/profile添加,并source /etc/profile

1
2
3
4
#flink
export FLINK_HOME=/home/lintong/software/apache/flink-1.10.0
export PATH=${FLINK_HOME}/bin:$PATH

下载flink-shaded-hadoop-2-uber-2.7.5-7.0.jar包,放到flink的lib目录下

1
2
wget https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-7.0/flink-shaded-hadoop-2-uber-2.7.5-7.0.jar

不然flink on yarn的时候会报

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.exceptions.YarnException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more

启动yarn-session

1
2
yarn-session.sh -n 3 -s 5 -jm 1024 -tm 4096 -d

yarn-seesion参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-n : 指定TaskManager的数量;
-d: 以分离模式运行;
-id:指定yarn的任务ID;
-j:Flink jar文件的路径;
-jm:JobManager容器的内存(默认值:MB);
-nl:为YARN应用程序指定YARN节点标签;
-nm:在YARN上为应用程序设置自定义名称;
-q:显示可用的YARN资源(内存,内核);
-qu:指定YARN队列;
-s:指定TaskManager中slot的数量;
-st:以流模式启动Flink;
-tm:每个TaskManager容器的内存(默认值:MB);
-z:命名空间,用于为高可用性模式创建Zookeeper子路径;

去CDH上查看,第一个是正在运行,第二个是结束

去appliance id进到yarn的app页面

全文 >>

Yarn学习笔记——MR任务

1.hive sql提交到yarn上面执行之后,将会成为MR任务执行

正在运行的MR任务的application查看的url,不同类似的任务查看的url可能会不同,比如Spark,Flink等

1
2
http://xxxx:8088/cluster/app/application_158225xxxxx_0316

全文 >>

Yarn学习笔记——常用命令

1.yarn top,查看yarn上面的资源使用情况

2.队列使用状态

1
2
3
4
5
6
7
8
9
10
queue -status root.xxx_common
Queue Information :
Queue Name : root.xxx_common
State : RUNNING
Capacity : 100.0%
Current Capacity : 21.7%
Maximum Capacity : -100.0%
Default Node Label expression :
Accessible Node Labels :

3.查看yarn上运行的任务列表,如果集群有krb认证的话,需要先kinit,认证后可以看到所有正在运行的任务

1
2
yarn application -list

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):12
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_15771778xxxxx_0664 xx-flink-test Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:35437
application_15771778xxxxx_0663 xx-flink-debug Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-79:42443
application_15771778xxxxx_0641 xxx-flink Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:38067
application_15771778xxxxx_0182 common_flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-79:38583
application_15822552xxxxx_0275 testjar XXX-FLINK xxx root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:36751
application_15822552xxxxx_0259 flinksql XXX-FLINK hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-77:37127
application_15822552xxxxx_0026 kudu-test Apache Flink hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:43071
application_15822552xxxxx_0307 xxx_statistic XXX Flink xxx root.xxx_common RUNNING UNDEFINED 100% http://xxx:18000
application_15822552xxxxx_0308 xxx-statistic XXX Flink xxx root.xxx_common ACCEPTED UNDEFINED 0% N/A
application_15810489xxxxx_0003 xxx-flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-78:8081
application_15810489xxxxx_0184 common_flink Apache Flink xx root.xxx_common RUNNING UNDEFINED 100% http://xxx-76:35659
application_15810489xxxxx_0154 Flink session cluster Apache Flink hdfs root.xxx_common RUNNING UNDEFINED 100% http://xxx-80:38797

使用状态进行筛选

1
2
3
4
5
yarn application -list -appStates RUNNING
Total number of applications (application-types: [] and states: [RUNNING]):12
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_157717780xxxx_0664 xx-flink-test Apache Flink xxx-xx root.xxx_common RUNNING UNDEFINED 100% http://xxxxx-xx:35437

4.查看任务状态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
yarn application -status application_1582255xxxx_0314
Application Report :
Application-Id : application_1582255xxxx_0314
Application-Name : select count(*) from tb1 (Stage-1)
Application-Type : MAPREDUCE
User : hive
Queue : root.xxxx_common
Start-Time : 1583822835423
Finish-Time : 1583822860082
Progress : 100%
State : FINISHED
Final-State : SUCCEEDED
Tracking-URL : http://xxx-xxxx-xx:19888/jobhistory/job/job_15822552xxxx_0314
RPC Port : 32829
AM Host : xxxx-xxxx-xx
Aggregate Resource Allocation : 162810 MB-seconds, 78 vcore-seconds
Log Aggregation Status : SUCCEEDED
Diagnostics :

全文 >>

Elasticsearch的Shard和Segment

Shard是什么?

在下面的文档中进行了介绍

1
2
https://www.elastic.co/guide/cn/elasticsearch/guide/current/kagillion-shards.html

1.一个分片的底层即为一个 Lucene 索引,会消耗一定文件句柄、内存、以及 CPU 运转。

全文 >>

mac修改brew源

参考:https://juejin.im/post/5daec26a51882575d50cd0aa

1.查看brew当前源

1
2
3
4
git -C "$(brew --repo)" remote -v
origin https://github.com/Homebrew/brew (fetch)
origin https://github.com/Homebrew/brew (push)

2.改成清华的源

1
2
3
4
5
git -C "$(brew --repo)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git
git -C "$(brew --repo homebrew/core)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-cask.git
brew update

已经修改成清华的源

1
2
3
4
git -C "$(brew --repo)" remote -v
origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git (fetch)
origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git (push)

或者改成中科大的源

1
2
3
4
5
6
7
8
9
# 替换 Homebrew
git -C "$(brew --repo)" remote set-url origin https://mirrors.ustc.edu.cn/brew.git
# 替换 Homebrew Core
git -C "$(brew --repo homebrew/core)" remote set-url origin https://mirrors.ustc.edu.cn/homebrew-core.git
# 替换 Homebrew Cask
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://mirrors.ustc.edu.cn/homebrew-cask.git
# 更新
brew update

如果想还原

1
2
3
4
5
git -C "$(brew --repo)" remote set-url origin https://github.com/Homebrew/brew.git
git -C "$(brew --repo homebrew/core)" remote set-url origin https://github.com/Homebrew/homebrew-core.git
git -C "$(brew --repo homebrew/cask)" remote set-url origin https://github.com/Homebrew/homebrew-cask.git
brew update

全文 >>