Azkaban的安装与配置
软件准备
- Azkaban Web Server azkaban-web-server-2.5.0.tar.gz
- Azkaban Executor Server azkaban-executor-server-2.5.0.tar.gz
- Azkaban MySQL Setup Script azkaban-sql-script-2.5.0.tar.gz
- Azkaban Plugins
- HDFS Browser azkaban-hdfs-viewer-2.5.0.tar.gz
- Job Types Plugins azkaban-jobtype-2.5.0.tar.gz
- Job Summary azkaban-jobsummary-2.5.0.tar.gz
- Reportal azkaban-reportal-2.5.0.tar.gz
Azkaban安装
MySQL的安装
1.安装MySQL 2.创建'azkaban'数据库,并赋予权限
- 创建一个数据库 mysql > create database azkaban
- 创建用户和密码 mysql > create user 'azkaban'@'%' identified by 'azkaban'
- 给予用户相关权限 mysql > grant select,insert,update,delete on azkaban.* to 'azkaban'@'%' WITH GRANT OPTION; 如果我们需要通过web页面上传jar包,编辑/etc/my.cnf的配置 [mysqld] max_allowed_packet=1024M
3.如果修改max_allowed_packet则需要重启mysql服务 4.执行下载包azkaban-sql-script-2.5.0.tar.gz中脚本“create-all-sql”
Azkaban-Web-Server安装及配置
解压azkaban-web-server-2.5.0.tar.gz,只需要修改azkaban.properties。
修改azkaban.properties的配置
<!--一定要设置为上海,否则按美国时间执行--> default.timezone.id=Asia/Shanghai database.type=mysql mysql.port=3306 <!--mysql所在的主机--> mysql.host=localhost <!--改为自己的数据库名称--> mysql.database=azkaban <!--改为自己的数据库账号--> mysql.user=azkaban <!--改为自己的数据库密码--> mysql.password=azkaban <!--添加邮件服务--> [email protected] mail.host=smtp.getui.com [email protected] mail.password=*****
jetty的ssl配置 keytool是java的一个工具
<!--jetty ssl keystore生成--> keytool -genkey -keystore keystore -alias jetty-azkaban -keyalg RSA -validity 3560 <!--导出--> keytool -export -alias jetty-azkaban -keystore keystore -rfc -file selfsignedcert.cer <!--导入--> keytool -import -alias certificatekey -file selfsignedcert.cer -keystore truststore <!--修改配置文件--> jetty.maxThreads=25 jetty.ssl.port=8443 jetty.port=8081 <!--keystore所在的相对路径--> jetty.keystore=keystore jetty.password=*** jetty.keypassword=*** jetty.truststore=truststore jetty.trustpassword=***
Azkaban-Web-Server启动及关闭
1.web服务器启动 nohup ./bin/azkaban-web-start.sh & 2.访问https://localhost:8443 输入账号和密码,都是azkaban,如果需要修改,请修改conf/azkaban-users.xml文件 3.web服务器关闭 nohup ./bin/azkaban-web-shutdown.sh &
Azkaban-Executor-Server安装及配置
解压azkaban-executor-server-2.5.0.tar.gz,修改conf/azkaban.properties
default.timezone.id=Asia/Shanghai
database.type=mysql
mysql.port=3306
mysql.host=localhost
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban_abc
Azkaban-Executor-Server启动及关闭
1.executor服务器启动 nohup ./bin/azkaban-executor-start.sh & 2.executor服务器关闭 nohup ./bin/azkaban-executor-shutdown.sh &
Azkaban插件
HDFS Viewer插件
- 将azkaban-hdfs-viewer-2.5.0.tar.gz解压到$AZKABAN-WEB-SERVER/plugins/viewer目录下,并将其重命名为hdfs。
修改conf/plugin.properties文件:
viewer.name=HDFS viewer.path=hdfs viewer.order=1 viewer.hidden=false viewer.external.classpaths=extlib/* viewer.servlet.class=azkaban.viewer.hdfs.HdfsBrowserServlet <!--对应hadoop2.x--> hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0 azkaban.should.proxy=true proxy.user=azkaban proxy.keytab.location= allow.group.proxy=true file.max.lines=1000
jar包放到extlib中:
commons-cli-1.2.jar hadoop-auth-2.6.0-cdh5.4.7.jar hadoop-common-2.6.0-cdh5.4.7.jar hadoop-hdfs-2.6.0-cdh5.4.7.jar protobuf-java-2.5.0.jar
JobType插件
- 配置系统变量HADOOP_HOME
- 解压zkaban-jobtype-2.5.0.tar.gz到$AZKABAN-EXECUTOR_HOME/plugins/jobtype目录下
修改jobtype properties配置
common.properties
hadoop.home=/opt/cloudera/parcels/CDH/lib/hadoop hive.home=/opt/cl=/opt/cloudera/parcels/CDH/lib/hive pig.home=/opt/cloudera/parcels/CDH/lib/pig azkaban.should.proxy=false jobtype.global.classpath=${hadoop.home}/*.jar,${hadoop.home}/*,${hadoop.home}/lib/*,${hadoop.home}/etc/hadoop/*
commonprivate.properties
azkaban.should.proxy=false obtain.binary.token=false hadoop.home=/opt/cloudera/parcels/CDH/lib/hadoop pig.home=/opt/cloudera/parcels/CDH/lib/pig hive.home=/opt/cloudera/parcels/CDH/lib/hive
- 配置jobtype插件安装目录到azkaban-executor 配置azkaban.properties 在azkaban.properties文件中加一下配置 azkaban.jobtype.plugin.dir=plugins/jobtype
重启azkaban-executor
user@ae01:$AZKABAN-EXECUTOR_HOME$ sh bin/azkaban-executor-shutdown.sh user@ae01:$AZKABAN-EXECUTOR_HOME$ sh bin/azkaban-executor-start.sh
HadoopJava jobtype job示例
type=hadoopJava job.class=azkaban.jobtype.examples.java.WordCount classpath=/home/lanyz/azkaban-executor-2.5.0/lib/*,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/lib/*.jar,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/hadoop-common-2.6.0-cdh5.4.7.jar,/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/lib/hadoop/hadoop-auth-2.6.0-cdh5.4.7.jar main.args=hdfs://192.168.10.84:8020/user/lanyizheng/input/test hdfs://192.168.10.84:8020/user/lanyizheng/output Method.run=test(); force.output.overwrite=true input.path=hdfs://192.168.10.84:8020/user/lanyizheng/input/test output.path=hdfs://192.168.10.84:8020/user/lanyizheng/output
JobSummary插件
- 解压azkaban-jobsummary-2.5.0.tar.gz到$AZKABAN-WEB-SERVER/plugin/viewer目录下,并将其重命名为jobsummary。
Reportal插件
azkaban-web-server配置
- 解压azkaban-reportal-2.5.0.tar.gz,将解压出的文件夹下 ./viewer/reportal目录拷贝到$AZKABAN-WEB-SERVER/plugin/viewer目录下
- 用azkaban-hadoopsecuritymanager-2.5.0.jar替换其目录下的azkaban-hadoopsecuritymanager-2.2.0.jar以支持hadoop2.x
修改$AZKABAN-WEB-SERVER/plugin/viewer/reportal/conf/plugin.properties
viewer.name=Reportal viewer.path=reportal viewer.order=1 viewer.hidden=false viewer.external.classpaths=extlib/* viewer.servlet.class=azkaban.viewer.reportal.ReportalServlet azkaban.should.proxy=true proxy.user=azkaban proxy.keytab.location= allow.group.proxy=true reportal.output.filesystem=hdfs hadoop.security.manager.class=azkaban.security.HadoopSecurityManager_H_2_0
azkaban-executor-server配置
- 解压azkaban-reportal-2.5.0.tar.gz,将解压出的文件夹下 ./jobtypes/reportal香港目录拷贝到$AZKABAN-EXECUTOR-SERVER/plugin/jobtypes目录下
- 更新依赖jar,在reportalhive和reportaldatacollector插件根目录下的azkaban-hadoopsecuritymanager和azkaban-jobtype两个jar包都替换为2.5版本的jar。
- 配置reportalhive,分别配置plugin.properties和private.properties,配置我们系统对应的hadoop_home,hive_home
- 将系统安装的hadoop对应的jar包拷贝到reportal插件的lib目录下
由于azkaban-reportal-2.5.jar存在bug,需要对修改后重新打包生成以替换老的jar包,ReportalHiveRunner.java中去掉if判断条件,
if (!ShimLoader.getHadoopShims().usesJobShell()) { ... ... }
然后进入${AZKABAN_PLUGINS_SOURCE}/plugins/reportal,运行sudo ant生成{AZKABAN_PLUGINS_SOURCE}/dist/reportal/jars/azkaban-reportal-2.5.jar