Spark学习-SparkSQL--02-Spark history Server

创建时间：2017-08-03 投稿人：浏览次数：817

Spark History Server配置使用
1。Spark history Server产生背景

以standalone运行模式为例，在运行Spark Application的时候，Spark会提供一个WEBUI列出应用程序的运行时信息；但该WEBUI随着Application的完成(成功/失败)而关闭，也就是说，Spark Application运行完(成功/失败)后，将无法查看Application的历史记录；

Spark history Server就是为了应对这种情况而产生的，通过配置可以在Application执行的过程中记录下了日志事件信息，那么在Application执行结束后，WEBUI就能重新渲染生成UI界面展现出该Application在执行过程中的运行时信息；

Spark运行在yarn或者mesos之上，通过spark的history server仍然可以重构出一个已经完成的Application的运行时参数信息（假如Application运行的事件日志信息已经记录下来）；

配置&使用Spark History Server
以默认配置的方式启动spark history server：

cd $SPARK_HOME/sbin
start-history-server.sh

报错

starting org.apache.spark.deploy.history.HistoryServer, logging to /home/spark/software/source/compile/deploy_spark/sbin/../logs/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-hadoop000.out
failed to launch org.apache.spark.deploy.history.HistoryServer:
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:44)
        ... 6 more

[root@biluos logs]# /opt/moudles/spark-2.2.0-bin-hadoop2.7/sbin/start-history-server.sh hdfs://mycluster:8020/spark_job_history
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/moudles/spark-2.2.0-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-biluos.com.out

[root@biluos logs]# cat spark-root-org.apache.spark.deploy.history.HistoryServer-1-biluos.com.out 
Spark Command: /opt/moudles/jdk1.8.0_121/bin/java -cp /opt/moudles/spark-2.2.0-bin-hadoop2.7/conf/:/opt/moudles/spark-2.2.0-bin-hadoop2.7/jars/*:/opt/moudles/hadoop-2.7.3/etc/hadoop/ -Xmx1g org.apache.spark.deploy.history.HistoryServer hdfs://mycluster:8020/spark_job_history
========================================
17/08/03 03:22:18 INFO HistoryServer: Started daemon with process name: 2666@biluos.com
17/08/03 03:22:18 INFO SignalUtils: Registered signal handler for TERM
17/08/03 03:22:18 INFO SignalUtils: Registered signal handler for HUP
17/08/03 03:22:18 INFO SignalUtils: Registered signal handler for INT
17/08/03 03:22:18 WARN HistoryServerArguments: Setting log directory through the command line is deprecated as of Spark 1.1.0. Please set this through spark.history.fs.logDirectory instead.
17/08/03 03:22:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/03 03:22:19 INFO SecurityManager: Changing view acls to: root
17/08/03 03:22:19 INFO SecurityManager: Changing modify acls to: root
17/08/03 03:22:19 INFO SecurityManager: Changing view acls groups to: 
17/08/03 03:22:19 INFO SecurityManager: Changing modify acls groups to: 
17/08/03 03:22:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/08/03 03:22:19 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
        at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: hdfs://mycluster:8020/spark_job_history
        at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)
        at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:160)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:156)
        at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:78)
        ... 6 more
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://mycluster:8020/spark_job_history
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
        at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:204)
        ... 9 more


解决方法
[root@biluos logs]# hdfs dfs -mkdir /spark_job_history
重新启动不报错了

界面如图
这里写图片描述

声明：该文观点仅代表作者本人，入门客AI创业平台信息发布平台仅提供信息存储空间服务，如有疑问请联系rumenke@qq.com。

上一篇：《KyLin学习理解》-03-KyLin的坑总结
下一篇：没有了

热门文章: Spark学习-SparkSQL--02-Spark histo...

最新文章: 《KyLin学习理解》-03-KyLin的...