Class StreamingStep
java.lang.Object
com.amazonaws.services.elasticmapreduce.util.StreamingStep
Class that makes it easy to define Hadoop Streaming steps.
See also: Hadoop Streaming
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(credentials);
HadoopJarStepConfig config = new StreamingStep()
.withInputs("s3://elasticmapreduce/samples/wordcount/input")
.withOutput("s3://my-bucket/output/")
.withMapper("s3://elasticmapreduce/samples/wordcount/wordSplitter.py")
.withReducer("aggregate")
.toHadoopJarStepConfig();
StepConfig wordCount = new StepConfig()
.withName("Word Count")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(config);
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("Word Count")
.withSteps(wordCount)
.withLogUri("s3://log-bucket/")
.withInstances(new JobFlowInstancesConfig()
.withEc2KeyName("keypairt")
.withHadoopVersion("0.20")
.withInstanceCount(5)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m1.small")
.withSlaveInstanceType("m1.small"));
RunJobFlowResult result = emr.runJobFlow(request);
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionGet the Hadoop config overrides (-D values).Get list of step input paths.Get the mapper.Get output path.Get the reducervoidsetHadoopConfig(Map<String, String> hadoopConfig) Set the Hadoop config overrides (-D values).voidsetInputs(Collection<String> inputs) Set the list of step input paths.voidSet the mapper.voidSet the output path for this step.voidsetReducer(String reducer) Set the reducerCreates the final HadoopJarStepConfig once you are done configuring the step.withHadoopConfig(String key, String value) Add a Hadoop config override (-D value).withInputs(String... inputs) Add more input paths to this step.withMapper(String mapper) Set the mapperwithOutput(String output) Set the output path for this step.withReducer(String reducer) Set the reducer
-
Constructor Details
-
StreamingStep
public StreamingStep()Creates a new default StreamingStep.
-
-
Method Details
-
getInputs
-
setInputs
Set the list of step input paths.- Parameters:
inputs- List of step inputs.
-
withInputs
Add more input paths to this step.- Parameters:
inputs- A list of inputs to this step.- Returns:
- A reference to this updated object so that method calls can be chained together.
-
getOutput
-
setOutput
Set the output path for this step.- Parameters:
output- Output path.
-
withOutput
Set the output path for this step.- Parameters:
output- Output path- Returns:
- A reference to this updated object so that method calls can be chained together.
-
getMapper
-
setMapper
-
withMapper
Set the mapper- Parameters:
mapper- Mapper- Returns:
- A reference to this updated object so that method calls can be chained together.
-
getReducer
-
setReducer
-
withReducer
Set the reducer- Parameters:
reducer- Reducer- Returns:
- A reference to this updated object so that method calls can be chained together.
-
getHadoopConfig
-
setHadoopConfig
-
withHadoopConfig
Add a Hadoop config override (-D value).- Parameters:
key- Hadoop configuration key.value- Configuration value.- Returns:
- A reference to this updated object so that method calls can be chained together.
-
toHadoopJarStepConfig
Creates the final HadoopJarStepConfig once you are done configuring the step. You can use this as you would any other HadoopJarStepConfig.- Returns:
- HadoopJarStepConfig representing this streaming step.
-