Class StreamingStep

java.lang.Object
com.amazonaws.services.elasticmapreduce.util.StreamingStep

public class StreamingStep extends Object
Class that makes it easy to define Hadoop Streaming steps.

See also: Hadoop Streaming

AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);

HadoopJarStepConfig config = new StreamingStep()
    .withInputs("s3://elasticmapreduce/samples/wordcount/input")
    .withOutput("s3://my-bucket/output/")
    .withMapper("s3://elasticmapreduce/samples/wordcount/wordSplitter.py")
    .withReducer("aggregate")
    .toHadoopJarStepConfig();

StepConfig wordCount = new StepConfig()
    .withName("Word Count")
    .withActionOnFailure("TERMINATE_JOB_FLOW")
    .withHadoopJarStep(config);

RunJobFlowRequest request = new RunJobFlowRequest()
    .withName("Word Count")
    .withSteps(wordCount)
    .withLogUri("s3://log-bucket/")
    .withInstances(new JobFlowInstancesConfig()
        .withEc2KeyName("keypairt")
        .withHadoopVersion("0.20")
        .withInstanceCount(5)
        .withKeepJobFlowAliveWhenNoSteps(true)
        .withMasterInstanceType("m1.small")
        .withSlaveInstanceType("m1.small"));

RunJobFlowResult result = emr.runJobFlow(request);
  • Constructor Details

    • StreamingStep

      public StreamingStep()
      Creates a new default StreamingStep.
  • Method Details

    • getInputs

      public List<String> getInputs()
      Get list of step input paths.
      Returns:
      List of step inputs
    • setInputs

      public void setInputs(Collection<String> inputs)
      Set the list of step input paths.
      Parameters:
      inputs - List of step inputs.
    • withInputs

      public StreamingStep withInputs(String... inputs)
      Add more input paths to this step.
      Parameters:
      inputs - A list of inputs to this step.
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getOutput

      public String getOutput()
      Get output path.
      Returns:
      Output path.
    • setOutput

      public void setOutput(String output)
      Set the output path for this step.
      Parameters:
      output - Output path.
    • withOutput

      public StreamingStep withOutput(String output)
      Set the output path for this step.
      Parameters:
      output - Output path
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getMapper

      public String getMapper()
      Get the mapper.
      Returns:
      Mapper.
    • setMapper

      public void setMapper(String mapper)
      Set the mapper.
      Parameters:
      mapper - Mapper
    • withMapper

      public StreamingStep withMapper(String mapper)
      Set the mapper
      Parameters:
      mapper - Mapper
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getReducer

      public String getReducer()
      Get the reducer
      Returns:
      Reducer
    • setReducer

      public void setReducer(String reducer)
      Set the reducer
      Parameters:
      reducer - Reducer
    • withReducer

      public StreamingStep withReducer(String reducer)
      Set the reducer
      Parameters:
      reducer - Reducer
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • getHadoopConfig

      public Map<String,String> getHadoopConfig()
      Get the Hadoop config overrides (-D values).
      Returns:
      Hadoop config.
    • setHadoopConfig

      public void setHadoopConfig(Map<String,String> hadoopConfig)
      Set the Hadoop config overrides (-D values).
      Parameters:
      hadoopConfig - Hadoop config.
    • withHadoopConfig

      public StreamingStep withHadoopConfig(String key, String value)
      Add a Hadoop config override (-D value).
      Parameters:
      key - Hadoop configuration key.
      value - Configuration value.
      Returns:
      A reference to this updated object so that method calls can be chained together.
    • toHadoopJarStepConfig

      public HadoopJarStepConfig toHadoopJarStepConfig()
      Creates the final HadoopJarStepConfig once you are done configuring the step. You can use this as you would any other HadoopJarStepConfig.
      Returns:
      HadoopJarStepConfig representing this streaming step.