当构建完TaskScheduler之后,我们需要构建DAGScheduler这个核心对象:
进入其构造函数中:
可以看出构建DAGScheduler实例的时候需要把TaskScheduler实例对象作为参数传入。
LiveListenerBus:
MapOutputTrackerMaster:
BlockManagerMaster:
通过阅读代码,我们可以发现DAGScheduler实例化的时候,调用了initializeEventProcessActor()方法
private def initializeEventProcessActor() { // blocking the thread until supervisor is started, which ensures eventProcessActor is // not null before any job is submitted // 阻塞当前线程,等待supervisor启动,这样可以确保Job提交时,eventProcessActor not null implicit val timeout = Timeout(30 seconds) val initEventActorReply = dagSchedulerActorSupervisor ? Props(new DAGSchedulerEventProcessActor(this)) eventProcessActor = Await.result(initEventActorReply, timeout.duration). asInstanceOf[ActorRef] } initializeEventProcessActor()
DAGSchedulerEventProcessActor:
private[scheduler] class DAGSchedulerEventProcessActor(dagScheduler: DAGScheduler) extends Actor with Logging { override def preStart() { // set DAGScheduler for taskScheduler to ensure eventProcessActor is always // valid when the messages arrive // 设置taskScheduler对DAGScheduler的引用句柄。在此处设置保证了Job提交时候 // eventProcessActor已经准备就绪 dagScheduler.taskScheduler.setDAGScheduler(dagScheduler) } /** * The main event loop of the DAG scheduler. */ def receive = { case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) => dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) case StageCancelled(stageId) => dagScheduler.handleStageCancellation(stageId) case JobCancelled(jobId) => dagScheduler.handleJobCancellation(jobId) case JobGroupCancelled(groupId) => dagScheduler.handleJobGroupCancelled(groupId) case AllJobsCancelled => dagScheduler.doCancelAllJobs() case ExecutorAdded(execId, host) => dagScheduler.handleExecutorAdded(execId, host) case ExecutorLost(execId) => dagScheduler.handleExecutorLost(execId, fetchFailed = false) case BeginEvent(task, taskInfo) => dagScheduler.handleBeginEvent(task, taskInfo) case GettingResultEvent(taskInfo) => dagScheduler.handleGetTaskResult(taskInfo) case completion @ CompletionEvent(task, reason, _, _, taskInfo, taskMetrics) => dagScheduler.handleTaskCompletion(completion) case TaskSetFailed(taskSet, reason) => dagScheduler.handleTaskSetFailed(taskSet, reason) case ResubmitFailedStages => dagScheduler.resubmitFailedStages() } override def postStop() { // Cancel any active jobs in postStop hook dagScheduler.cleanUpAfterSchedulerStop() } }
可以看出核心在于实例化eventProcessActor对象,eventProcessActor会负责接收和发送DAGScheduler的消息,是DAGScheduler的通信载体。
相关推荐
spark-hive_2.11-2.3.0 spark-hive-thriftserver_2.11-2.3.0.jar log4j-2.15.0.jar slf4j-api-1.7.7.jar slf4j-log4j12-1.7.25.jar curator-client-2.4.0.jar curator-framework-2.4.0.jar curator-recipes-2.4.0....
Spark安装包:spark-3.1.3-bin-without-hadoop.tgz
Apache Spark版本3.1.3。Linux安装包。spark-3.1.3-bin-hadoop3.2.tgz
本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载,本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载
spark-3.0.0-bin-hadoop3.2下载安装包
Spark Doris Connector(apache-doris-spark-connector-2.3_2.11-1.0.1-incubating-src.tar.gz) Spark Doris Connector Version:1.0.1 Spark Version:2.x Scala Version:2.11 Apache Doris是一个现代MPP分析...
spark-3.2.0-bin-hadoop3.2.tgz
pyspark本地的环境配置包,spark-2.3.4-bin-hadoop2.7.tgz:spark-2.3.4-bin-hadoop2.7.tgz
spark-assembly-1.5.2-hadoop2.6.0 在spark编程中使用的一个jar
spark-streaming-kafka-0-8_2.11-2.4.0.jar
linux的spark新版本,匹配hadoop2.7版本,spark-3.2.1-bin-hadoop2.7.tgz
spark-hive-thriftserver_2.11-2.1.spark-hive-thrift
spark-2.4.8-bin-hadoop2.7.tgz
spark-streaming-flume_2.11-2.1.0.jar
spark-2.4.0-bin-hadoop2.7
spark-3.0.0-bin-hadoop2.7.tgz 官网下载不了的,需要资源的,可以到这里下载哦
spark-3.2.4-bin-hadoop3.2-scala2.13 安装包
spark-1.6.3-bin-hadoop2.6.tgz
spark-2.2.0-yarn-shuffle.jar
spark-2.3.1-bin-hadoop2.7.zip