package spark
Core Spark functionality. org.apache.spark.SparkContext serves as the main entry point to Spark, while org.apache.spark.rdd.RDD is the data type representing a distributed collection, and provides most parallel operations.
In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs
of key-value pairs, such as groupByKey and join; org.apache.spark.rdd.DoubleRDDFunctions
contains operations available only on RDDs of Doubles; and
org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can
be saved as SequenceFiles. These operations are automatically available on any RDD of the right
type (e.g. RDD[(Int, Int)] through implicit conversions.
Java programmers should reference the org.apache.spark.api.java package for Spark programming APIs in Java.
Classes and methods marked with Experimental are user-facing features which have not been officially adopted by the Spark project. These are subject to change or removal in minor releases.
Classes and methods marked with Developer API are intended for advanced users want to extend Spark through lower level interfaces. These are subject to changes or removal in minor releases.
- Source
- package.scala
- Alphabetic
- By Inheritance
- spark
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- 
      
      
      
        
      
    
      
        
        case class
      
      
        Aggregator[K, V, C](createCombiner: (V) ⇒ C, mergeValue: (C, V) ⇒ C, mergeCombiners: (C, C) ⇒ C) extends Product with Serializable
      
      
      :: DeveloperApi :: A set of functions used to aggregate data. :: DeveloperApi :: A set of functions used to aggregate data. - createCombiner
- function to create the initial value of the aggregation. 
- mergeValue
- function to merge a new value into the aggregation result. 
- mergeCombiners
- function to merge outputs from multiple mergeValue function. 
 - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        BarrierTaskContext extends TaskContext with Logging
      
      
      :: Experimental :: A TaskContext with extra contextual info and tooling for tasks in a barrier stage. :: Experimental :: A TaskContext with extra contextual info and tooling for tasks in a barrier stage. Use BarrierTaskContext#get to obtain the barrier context for a running barrier task. - Annotations
- @Experimental() @Since( "2.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        BarrierTaskInfo extends AnyRef
      
      
      :: Experimental :: Carries all task infos of a barrier task. :: Experimental :: Carries all task infos of a barrier task. - Annotations
- @Experimental() @Since( "2.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ComplexFutureAction[T] extends FutureAction[T]
      
      
      A FutureAction for actions that could trigger multiple Spark jobs. A FutureAction for actions that could trigger multiple Spark jobs. Examples include take, takeSample. Cancellation works by setting the cancelled flag to true and cancelling any pending jobs. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ContextAwareIterator[+T] extends Iterator[T]
      
      
      :: DeveloperApi :: A TaskContext aware iterator. :: DeveloperApi :: A TaskContext aware iterator. As the Python evaluation consumes the parent iterator in a separate thread, it could consume more data from the parent even after the task ends and the parent is closed. If an off-heap access exists in the parent iterator, it could cause segmentation fault which crashes the executor. Thus, we should use ContextAwareIterator to stop consuming after the task ends. - Annotations
- @DeveloperApi()
- Since
- 3.1.0 
 
- 
      
      
      
        
      
    
      
        abstract 
        class
      
      
        Dependency[T] extends Serializable
      
      
      :: DeveloperApi :: Base class for dependencies. :: DeveloperApi :: Base class for dependencies. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        ExceptionFailure(className: String, description: String, stackTrace: Array[StackTraceElement], fullStackTrace: String, exceptionWrapper: Option[ThrowableSerializationWrapper], accumUpdates: Seq[AccumulableInfo] = Seq.empty, accums: Seq[AccumulatorV2[_, _]] = Nil, metricPeaks: Seq[Long] = Seq.empty) extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: Task failed due to a runtime exception. :: DeveloperApi :: Task failed due to a runtime exception. This is the most common failure case and also captures user program exceptions. stackTracecontains the stack trace of the exception itself. It still exists for backward compatibility. It's better to usethis(e: Throwable, metrics: Option[TaskMetrics])to createExceptionFailureas it will handle the backward compatibility properly.fullStackTraceis a better representation of the stack trace because it contains the whole stack trace including the exception and its causesexceptionis the actual exception that caused the task to fail. It may beNonein the case that the exception is not in fact serializable. If a task fails more than once (due to retries),exceptionis that one that caused the last failure.- Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        ExecutorLostFailure(execId: String, exitCausedByApp: Boolean = true, reason: Option[String]) extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: The task failed because the executor that it was running on was lost. :: DeveloperApi :: The task failed because the executor that it was running on was lost. This may happen because the task crashed the JVM. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        FetchFailed(bmAddress: BlockManagerId, shuffleId: Int, mapId: Long, mapIndex: Int, reduceId: Int, message: String) extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: Task failed to fetch shuffle data from a remote node. :: DeveloperApi :: Task failed to fetch shuffle data from a remote node. Probably means we have lost the remote executors the task is trying to fetch from, and thus need to rerun the previous stage. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        trait
      
      
        FutureAction[T] extends Future[T]
      
      
      A future for the result of an action to support cancellation. A future for the result of an action to support cancellation. This is an extension of the Scala Future interface to support cancellation. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        HashPartitioner extends Partitioner
      
      
      A org.apache.spark.Partitioner that implements hash-based partitioning using Java's Object.hashCode.A org.apache.spark.Partitioner that implements hash-based partitioning using Java's Object.hashCode.Java arrays have hashCodes that are based on the arrays' identities rather than their contents, so attempting to partition an RDD[Array[_]] or RDD[(Array[_], _)] using a HashPartitioner will produce an unexpected or incorrect result. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        InterruptibleIterator[+T] extends Iterator[T]
      
      
      :: DeveloperApi :: An iterator that wraps around an existing iterator to provide task killing functionality. :: DeveloperApi :: An iterator that wraps around an existing iterator to provide task killing functionality. It works by checking the interrupted flag in TaskContext. - Annotations
- @DeveloperApi()
 
-  sealed abstract final class JobExecutionStatus extends Enum[JobExecutionStatus]
- 
      
      
      
        
      
    
      
        
        trait
      
      
        JobSubmitter extends AnyRef
      
      
      Handle via which a "run" function passed to a ComplexFutureAction can submit jobs for execution. Handle via which a "run" function passed to a ComplexFutureAction can submit jobs for execution. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        abstract 
        class
      
      
        NarrowDependency[T] extends Dependency[T]
      
      
      :: DeveloperApi :: Base class for dependencies where each partition of the child RDD depends on a small number of partitions of the parent RDD. :: DeveloperApi :: Base class for dependencies where each partition of the child RDD depends on a small number of partitions of the parent RDD. Narrow dependencies allow for pipelined execution. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        OneToOneDependency[T] extends NarrowDependency[T]
      
      
      :: DeveloperApi :: Represents a one-to-one dependency between partitions of the parent and child RDDs. :: DeveloperApi :: Represents a one-to-one dependency between partitions of the parent and child RDDs. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        trait
      
      
        Partition extends Serializable
      
      
      An identifier for a partition in an RDD. 
- 
      
      
      
        
      
    
      
        abstract 
        class
      
      
        Partitioner extends Serializable
      
      
      An object that defines how the elements in a key-value pair RDD are partitioned by key. An object that defines how the elements in a key-value pair RDD are partitioned by key. Maps each key to a partition ID, from 0 to numPartitions - 1.Note that, partitioner must be deterministic, i.e. it must return the same partition id given the same partition key. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RangeDependency[T] extends NarrowDependency[T]
      
      
      :: DeveloperApi :: Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs. :: DeveloperApi :: Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RangePartitioner[K, V] extends Partitioner
      
      
      A org.apache.spark.Partitioner that partitions sortable records by range into roughly equal ranges. A org.apache.spark.Partitioner that partitions sortable records by range into roughly equal ranges. The ranges are determined by sampling the content of the RDD passed in. - Note
- The actual number of partitions created by the RangePartitioner might not be the same as the - partitionsparameter, in the case where the number of sampled records is less than the value of- partitions.
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SerializableWritable[T <: Writable] extends Serializable
      
      
      - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ShuffleDependency[K, V, C] extends Dependency[Product2[K, V]] with Logging
      
      
      :: DeveloperApi :: Represents a dependency on the output of a shuffle stage. :: DeveloperApi :: Represents a dependency on the output of a shuffle stage. Note that in the case of shuffle, the RDD is transient since we don't need it on the executor side. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SimpleFutureAction[T] extends FutureAction[T]
      
      
      A FutureAction holding the result of an action that triggers a single job. A FutureAction holding the result of an action that triggers a single job. Examples include count, collect, reduce. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SparkConf extends Cloneable with Logging with Serializable
      
      
      Configuration for a Spark application. Configuration for a Spark application. Used to set various Spark parameters as key-value pairs. Most of the time, you would create a SparkConf object with new SparkConf(), which will load values from anyspark.*Java system properties set in your application as well. In this case, parameters you set directly on theSparkConfobject take priority over system properties.For unit tests, you can also call new SparkConf(false)to skip loading external settings and get the same configuration no matter what the system properties are.All setter methods in this class support chaining. For example, you can write new SparkConf().setMaster("local").setAppName("My app").- Note
- Once a SparkConf object is passed to Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime. 
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SparkContext extends Logging
      
      
      Main entry point for Spark functionality. Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. - Note
- Only one - SparkContextshould be active per JVM. You must- stop()the active- SparkContextbefore creating a new one.
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SparkEnv extends Logging
      
      
      :: DeveloperApi :: Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, RpcEnv, block manager, map output tracker, etc. :: DeveloperApi :: Holds all the runtime environment objects for a running Spark instance (either master or worker), including the serializer, RpcEnv, block manager, map output tracker, etc. Currently Spark code finds the SparkEnv through a global variable, so all the threads can access the same SparkEnv. It can be accessed by SparkEnv.get (e.g. after creating a SparkContext). - Annotations
- @DeveloperApi()
 
-  class SparkException extends Exception with SparkThrowable
-  trait SparkExecutorInfo extends Serializable
- 
      
      
      
        
      
    
      
        
        class
      
      
        SparkFirehoseListener extends SparkListenerInterface
      
      
      - Annotations
- @DeveloperApi()
 
-  trait SparkJobInfo extends Serializable
-  trait SparkStageInfo extends Serializable
- 
      
      
      
        
      
    
      
        
        class
      
      
        SparkStatusTracker extends AnyRef
      
      
      Low-level status reporting APIs for monitoring job and stage progress. Low-level status reporting APIs for monitoring job and stage progress. These APIs intentionally provide very weak consistency semantics; consumers of these APIs should be prepared to handle empty / missing information. For example, a job's stage ids may be known but the status API may not have any information about the details of those stages, so getStageInfocould potentially returnNonefor a valid stage id.To limit memory usage, these APIs only provide information on recent jobs / stages. These APIs will provide information for the last spark.ui.retainedStagesstages andspark.ui.retainedJobsjobs.NOTE: this class's constructor should be considered private and may be subject to change. 
- 
      
      
      
        
      
    
      
        
        trait
      
      
        SparkThrowable extends AnyRef
      
      
      - Annotations
- @Evolving()
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        TaskCommitDenied(jobID: Int, partitionID: Int, attemptNumber: Int) extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: Task requested the driver to commit, but was denied. :: DeveloperApi :: Task requested the driver to commit, but was denied. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        abstract 
        class
      
      
        TaskContext extends Serializable
      
      
      Contextual information about a task which can be read or mutated during execution. Contextual information about a task which can be read or mutated during execution. To access the TaskContext for a running task, use: org.apache.spark.TaskContext.get() 
- 
      
      
      
        
      
    
      
        sealed 
        trait
      
      
        TaskEndReason extends AnyRef
      
      
      :: DeveloperApi :: Various possible reasons why a task ended. :: DeveloperApi :: Various possible reasons why a task ended. The low-level TaskScheduler is supposed to retry tasks several times for "ephemeral" failures, and only report back failures that require some old stages to be resubmitted, such as shuffle map fetch failures. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        sealed 
        trait
      
      
        TaskFailedReason extends TaskEndReason
      
      
      :: DeveloperApi :: Various possible reasons why a task failed. :: DeveloperApi :: Various possible reasons why a task failed. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        case class
      
      
        TaskKilled(reason: String, accumUpdates: Seq[AccumulableInfo] = Seq.empty, accums: Seq[AccumulatorV2[_, _]] = Nil, metricPeaks: Seq[Long] = Seq.empty) extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: Task was killed intentionally and needs to be rescheduled. :: DeveloperApi :: Task was killed intentionally and needs to be rescheduled. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        TaskKilledException extends RuntimeException
      
      
      :: DeveloperApi :: Exception thrown when a task is explicitly killed (i.e., task failure is expected). :: DeveloperApi :: Exception thrown when a task is explicitly killed (i.e., task failure is expected). - Annotations
- @DeveloperApi()
 
Value Members
-  val SPARK_BRANCH: String
-  val SPARK_BUILD_DATE: String
-  val SPARK_BUILD_USER: String
-  val SPARK_REPO_URL: String
-  val SPARK_REVISION: String
-  val SPARK_VERSION: String
-  val SPARK_VERSION_SHORT: String
- 
      
      
      
        
      
    
      
        
        object
      
      
        BarrierTaskContext extends Serializable
      
      
      - Annotations
- @Experimental() @Since( "2.4.0" )
 
-  object Partitioner extends Serializable
- 
      
      
      
        
      
    
      
        
        object
      
      
        Resubmitted extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: A org.apache.spark.scheduler.ShuffleMapTaskthat completed successfully earlier, but we lost the executor before the stage completed.:: DeveloperApi :: A org.apache.spark.scheduler.ShuffleMapTaskthat completed successfully earlier, but we lost the executor before the stage completed. This means Spark needs to reschedule the task to be re-executed on a different executor.- Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        SparkContext extends Logging
      
      
      The SparkContext object contains a number of implicit conversions and parameters for use with various Spark features. 
-  object SparkEnv extends Logging
- 
      
      
      
        
      
    
      
        
        object
      
      
        SparkFiles
      
      
      Resolves paths to files added through SparkContext.addFile().
- 
      
      
      
        
      
    
      
        
        object
      
      
        Success extends TaskEndReason with Product with Serializable
      
      
      :: DeveloperApi :: Task succeeded. :: DeveloperApi :: Task succeeded. - Annotations
- @DeveloperApi()
 
-  object TaskContext extends Serializable
- 
      
      
      
        
      
    
      
        
        object
      
      
        TaskResultLost extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: The task finished successfully, but the result was lost from the executor's block manager before it was fetched. :: DeveloperApi :: The task finished successfully, but the result was lost from the executor's block manager before it was fetched. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        UnknownReason extends TaskFailedReason with Product with Serializable
      
      
      :: DeveloperApi :: We don't know why the task ended -- for example, because of a ClassNotFound exception when deserializing the task result. :: DeveloperApi :: We don't know why the task ended -- for example, because of a ClassNotFound exception when deserializing the task result. - Annotations
- @DeveloperApi()
 
-  object WritableConverter extends Serializable
-  object WritableFactory extends Serializable