hadoop+spark+zookeeper+hive分布式集群搭建
大数据服务合集
2022-09-01 hadoop+spark
hadoop+spark初识
2022-11-11 zookeeper
向集群加入了zookeeper分布式
2022-11-19 hive
这里也写了适配CentOS6.x的配置
2022-11-19 flume
日志采集
大数据服务合集
大数据服务合集
2022-09-01hadoop和spark的初识[1]hadoop and spark参考资料
2022-11-11添加了zookeeper分布式[2]zookeeper参考资料
2022-11-19添加了结合Mariadb的hive数据库使用[3]hive参考资料
hadoop+spark+zookeeper+hive分布式集群部署
环境准备
环境的准备基于我写的初始化脚本,自用7.x系列的CentOS,老版本的就支持CentOS/Redhat6,7,8但是有点不完善,需要可以邮箱或者博客留言。
| os\ip | hostname | block |
|---|---|---|
| centos7.9 192.168.222.226 | master | rsmanager,datanode,namenode.snamenode,nmanager |
| centos7.9 192.168.222.227 | node1 | snamenode,nmnager,datanode |
| centos7.9 192.168.222.228 | node2 | datanode,nmanager |
国外服务器托管代码,可能被墙
1 | git clone https://github.com/linjiangyu2/K.git //可能会拉不下来,多拉几次就下来了,因为托管代码的服务器是国外的 |
本站托管代码放心食用
1 | yum install -y https://mirrors.linjiangyu.com/centos/tianlin-release.noarch-7-1.x86_64.rpm |
CDN托管放心食用
1 | wget https://cdn.staticaly.com/gh/linjiangyu2/K@master/ksh |
对应自己的IP地址,最好/etc/hosts的解析名和我一致,不然下面的配置文件需要自己对应自己的解析名修改
1 |
|
2.搭建
hadoop分布式
上传jdk和hadoop的tar包
这里使用的二进制包
配置
1 | tar xf hadoop... //不知道你使用的版本,写了...,以下也是,tab键或者对应修改就可以 |
以下是自己直接写入配置,在master服务器上进行
1 | cd /opt/hadoop285/etc/hadoop |
1 | # vim core-site.xml |
1 | # vim hdfs-site.xml |
1 | # vim yarn-site.xml |
1 | # cp mapred-site.xml.template mapred-site.xml |
1 | vim slaves |
然后在master节点把配置发到各个节点
1 | for i in node{1..2};do rsync -av /usr/local/jdk root@$i:/usr/local/;done |
在node1,2上操作,最后在master操作
1 | hdfs namenode -format //初始化 |
在master上操作
1 | [root@ master]# start-all.sh |
最后可以在各个节点使用jps命令查看各自的部件
1 | [root@ xxx]# jps |
当然web界面也可以访问的,浏览器访问192.168.222.226:8088和192.168.222.226:50070(对应自己IP地址)
来尝试运行一下第一个hadoop分布式任务吧
1 | [root@ master]# hdfs dfs -put /etc/passwd /t1 |
spark分布式
下面开始搭建分布式spark
这里使用spark的3.3.0版本
1 | 把spark包上传到机器上,然后到该包的目录,这里统一以spark-3.3.0-bin-hadoop3.tgz这个包为演示 |
以上便搭建好了spark结合hadoop的分布式集群,spark也有自己的web界面,可以浏览器访问192.168.222.226:8080来查看(对应自己IP地址)
zookeeper分布式
在master机器上执行
1 | tar xf zookeeper* |
在各机器上执行
1 | mkdir -p /opt/data/zookeeper/logs |
在master机器上执行
1 | vim /etc/profile |
在各机器上执行
1 | source /etc/profile |
hive
Mariadb
这里为了方便直接安装mariadb作为MySQL使用,CentOS7.x和CentOS6.x使用方法不同(为了朋友写了CentOS6的,泪目了),使用前提网络要能访问外网
CentOS 7.x
1 | [root@master ~]# yum install -y mariadb mariadb-server |
CentOS 6.x
1 | [root@master ~]# mkdir /etc/yum.repos.d/bak |
这里使用的二进制包
把二进制包上传到master机器的opt目录下
hive配置
1 | [root@master ~]# cd /opt |
1 | [root@master conf]# vim hive-site.xml // 以下对应注释更改自己的配置 |
1 | [root@master conf]# cp hive-log4j2.properties.template hive-log4j2.properties |
上传连接MySQL需要的jar包
mysql-connector-java-8.0.17.jar
1 | [root@master ~]# mv mysql-connector-java-8.0.17.jar /opt/hive/lib/ |
连接操作测试
hive的启动需要先启动hadoop和spark服务
1 | start-all.sh && spark-start.sh |
表创建测试
在master机器上准备一下用到的txt文件,上传到hdfs文件系统
1 | [master@root ~]# vim t.txt |
回到node1
1 | 0: jdbc:hive2://master:10000> create database k ; |
flume
貌似是日志收集应用
把apache-flume-bin.tar.gz下载并上传到系统中
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35# tar xf apache-flume-1.11.0-bin.tar.gz && rm -f apache-flume-1.11.0-bin.tar.gz
# mv apache-flume* /usr/local/flume
# vim /etc/profile
export FLUME_HOME=/usr/local/flume
export PATH=${FLUME_HOME}/bin:$PATH
# source /etc/profile
# cd /usr/local/flume/conf/
# cp flume-env.sh.template flume-env.sh
# vim flume-env.sh
// 在最上面添加
export JAVA_HOME=/usr/local/jdk
# vim netcat-logger.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
# flume-ng agent -n a1 -c ./ -f ./netcat-logger.conf -Dflume.root.logger=INFO,console // 启动服务
# yum install -y telnet
# telnet 127.0.0.1 44444
// 随便写点东西回车会有OK出现就行了
sqoop
下载包sqoop放到/opt下
整
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop
cd sqoop
cp conf/sqoop-env-template.sh conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/hadoop285
Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/hadoop285
set the path to where bin/hbase is available
export HBASE_HOME=
Set the path to where bin/hive is available
export HIVE_HOME=/opt/hive
Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/zookeeper/conf
vim /etc/profile
export SQOOP_HOME=/opt/sqoop
export CLASSPATH=.:${JAVA_HOME}/lib:${SQOOP_HOME}/lib
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin:${SQOOP_HOME}/bin:$PATH
source /etc/profile
cp /opt/hive/lib/mysql-connector-java-8.0.17.jar /opt/sqoop/lib/1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232vim bin/configure-sqoop 把文件全部替换为以下
!/bin/bash
# Copyright 2011 The Apache Software Foundation
# Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This is sourced in by bin/sqoop to set environment variables prior to
invoking Hadoop.
bin="$1"
if [ -z "${bin}" ]; then
bin=`dirname $0`
bin=`cd ${bin} && pwd`
fi
if [ -z "$SQOOP_HOME" ]; then
export SQOOP_HOME=${bin}/..
fi
SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
. "${SQOOP_CONF_DIR}/sqoop-env.sh"
fi
Find paths to our dependency systems. If they are unset, use CDH defaults.
if [ -z "${HADOOP_COMMON_HOME}" ]; then
if [ -n "${HADOOP_HOME}" ]; then
HADOOP_COMMON_HOME=${HADOOP_HOME}
else
if [ -d "/usr/lib/hadoop" ]; then
HADOOP_COMMON_HOME=/usr/lib/hadoop
else
HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop
fi
fi
fi
if [ -z "${HADOOP_MAPRED_HOME}" ]; then
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
if [ -n "${HADOOP_HOME}" ]; then
HADOOP_MAPRED_HOME=${HADOOP_HOME}
else
HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce
fi
fi
fi
We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set
so that hcat script works correctly on BigTop
if [ -z "${HADOOP_HOME}" ]; then
if [ -n "${HADOOP_COMMON_HOME}" ]; then
HADOOP_HOME=${HADOOP_COMMON_HOME}
export HADOOP_HOME
fi
fi
if [ -z "${HBASE_HOME}" ]; then
if [ -d "/usr/lib/hbase" ]; then
HBASE_HOME=/usr/lib/hbase
else
HBASE_HOME=${SQOOP_HOME}/../hbase
fi
fi
if [ -z "${HCAT_HOME}" ]; then
if [ -d "/usr/lib/hive-hcatalog" ]; then
HCAT_HOME=/usr/lib/hive-hcatalog
elif [ -d "/usr/lib/hcatalog" ]; then
HCAT_HOME=/usr/lib/hcatalog
else
HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
if [ ! -d ${HCAT_HOME} ]; then
HCAT_HOME=${SQOOP_HOME}/../hcatalog
fi
fi
fi
if [ -z "${ACCUMULO_HOME}" ]; then
if [ -d "/usr/lib/accumulo" ]; then
ACCUMULO_HOME=/usr/lib/accumulo
else
ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
fi
fi
if [ -z "${ZOOKEEPER_HOME}" ]; then
if [ -d "/usr/lib/zookeeper" ]; then
ZOOKEEPER_HOME=/usr/lib/zookeeper
else
ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper
fi
fi
if [ -z "${HIVE_HOME}" ]; then
if [ -d "/usr/lib/hive" ]; then
export HIVE_HOME=/usr/lib/hive
elif [ -d ${SQOOP_HOME}/../hive ]; then
export HIVE_HOME=${SQOOP_HOME}/../hive
fi
fi
Check: If we can't find our dependencies, give up here.
if [ ! -d "${HADOOP_COMMON_HOME}" ]; then
echo "Error: $HADOOP_COMMON_HOME does not exist!"
echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'
exit 1
fi
if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
echo "Error: $HADOOP_MAPRED_HOME does not exist!"
echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'
exit 1
fi
# Moved to be a runtime check in sqoop.
if [ ! -d "${HBASE_HOME}" ]; then
echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
echo 'Please set $HBASE_HOME to the root of your HBase installation.'
fi
### Moved to be a runtime check in sqoop.
if [ ! -d "${HCAT_HOME}" ]; then
echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
fi
if [ ! -d "${ZOOKEEPER_HOME}" ]; then
echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
fi
Where to find the main Sqoop jar
SQOOP_JAR_DIR=$SQOOP_HOME
If there's a "build" subdir, override with this, so we use
the newly-compiled copy.
if [ -d "$SQOOP_JAR_DIR/build" ]; then
SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"
fi
function add_to_classpath() {
dir=$1
for f in $dir/*.jar; do
SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
done
export SQOOP_CLASSPATH
}
Add sqoop dependencies to classpath.
SQOOP_CLASSPATH=""
if [ -d "$SQOOP_HOME/lib" ]; then
add_to_classpath $SQOOP_HOME/lib
fi
Add HBase to dependency list
if [ -e "$HBASE_HOME/bin/hbase" ]; then
TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`
SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
fi
## Add HCatalog to dependency list
if [ -e "${HCAT_HOME}/bin/hcat" ]; then
TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
if [ -z "${HIVE_CONF_DIR}" ]; then
TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
fi
SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
fi
Add Accumulo to dependency list
if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
done
for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
done
fi
ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}
if [ -d "${ZOOCFGDIR}" ]; then
SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH
fi
SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}
If there's a build subdir, use Ivy-retrieved dependencies too.
if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then
for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do
SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
done
fi
add_to_classpath ${SQOOP_JAR_DIR}
HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"
if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then
User has elements to prepend to the classpath, forcibly overriding
Sqoop's own lib directories.
export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"
fi
export SQOOP_CLASSPATH
export SQOOP_CONF_DIR
export SQOOP_JAR_DIR
export HADOOP_CLASSPATH
export HADOOP_COMMON_HOME
export HADOOP_MAPRED_HOME
export HBASE_HOME
export HCAT_HOME
export HIVE_CONF_DIR
export ACCUMULO_HOME
export ZOOKEEPER_HOME1
2
3
4
5mysql -uroot -p123
create user 'root'@'127.0.0.1' identified by '123';
grant all privileges on *.* to 'root'@'127.0.0.1';
flush privileges;
exit测试连接
1
2
3
4
5
6
7
8
9
10
11
12
13sqoop list-databases -connect jdbc:mysql://localhost:3306/ --username root --password 123
输出
23/05/20 23:14:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
23/05/20 23:14:23 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
23/05/20 23:14:23 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
mysql
information_schema
performance_schema
sys
hive
rsyslog
Syslo
