Examples of OutputCommitter

org.apache.hadoop.mapred.OutputCommitter
OutputCommitter describes the commit of task output for a Map-Reduce job.
The Map-Reduce framework relies on the OutputCommitter of the job to:
1. Setup the job during initialization. For example, create the temporary output directory for the job during the initialization of the job.
2. Cleanup the job after the job completion. For example, remove the temporary output directory after the job completion.
3. Setup the task temporary output.
4. Check whether a task needs a commit. This is to avoid the commit procedure if a task does not need commit.
5. Commit of the task output.
6. Discard the task commit.
@see FileOutputCommitter @see JobContext @see TaskAttemptContext @deprecated Use {@link org.apache.hadoop.mapreduce.OutputCommitter} instead.
org.apache.hadoop.mapreduce.OutputCommitter
OutputCommitter describes the commit of task output for a Map-Reduce job.
The Map-Reduce framework relies on the OutputCommitter of the job to:
1. Setup the job during initialization. For example, create the temporary output directory for the job during the initialization of the job.
2. Cleanup the job after the job completion. For example, remove the temporary output directory after the job completion.
3. Setup the task temporary output.
4. Check whether a task needs a commit. This is to avoid the commit procedure if a task does not need commit.
5. Commit of the task output.
6. Discard the task commit.
The methods in this class can be called from several different processes and from several different contexts. It is important to know which process and which context each is called from. Each method should be marked accordingly in its documentation. It is also important to note that not all methods are guaranteed to be called once and only once. If a method is not guaranteed to have this property the output committer needs to handle this appropriately. Also note it will only be in rare situations where they may be called multiple times for the same task. @see org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter @see JobContext @see TaskAttemptContext
org.apache.tez.runtime.api.OutputCommitter
OutputCommitter to "finalize" the output and make it user-visible if needed. The OutputCommitter is allowed only on a terminal Output.

Examples of org.apache.hadoop.mapreduce.OutputCommitter


    HCatSchema tableSchema = inpy.getTableSchema(ijob.getConfiguration());
    System.err.println("Copying from ["+in+"] to ["+out+"] with schema : "+ tableSchema.toString());
    oupy.setSchema(ojob, tableSchema);
    oupy.checkOutputSpecs(ojob);
    OutputCommitter oc = oupy.getOutputCommitter(createTaskAttemptContext(ojob.getConfiguration()));
    oc.setupJob(ojob);


    for (InputSplit split : inpy.getSplits(ijob)){


      TaskAttemptContext rtaskContext = createTaskAttemptContext(ijob.getConfiguration());
      TaskAttemptContext wtaskContext = createTaskAttemptContext(ojob.getConfiguration());


      RecordReader<WritableComparable, HCatRecord> rr = inpy.createRecordReader(split, rtaskContext);
      rr.initialize(split, rtaskContext);


      OutputCommitter taskOc = oupy.getOutputCommitter(wtaskContext);
      taskOc.setupTask(wtaskContext);
      RecordWriter<WritableComparable<?>, HCatRecord> rw = oupy.getRecordWriter(wtaskContext);


      while(rr.nextKeyValue()){
        rw.write(rr.getCurrentKey(), rr.getCurrentValue());
      }
      rw.close(wtaskContext);
      taskOc.commitTask(wtaskContext);
      rr.close();
    }


    oc.commitJob(ojob);
  }

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

  }


  private static class ImporterOutputFormat extends HFileOutputFormat {
    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException {
      final OutputCommitter baseOutputCommitter = super.getOutputCommitter(context);


      return new OutputCommitter() {
        @Override
        public void setupJob(JobContext jobContext) throws IOException {
          baseOutputCommitter.setupJob(jobContext);
        }


        @Override
        public void setupTask(TaskAttemptContext taskContext) throws IOException {
          baseOutputCommitter.setupTask(taskContext);
        }


        @Override
        public boolean needsTaskCommit(TaskAttemptContext taskContext) throws IOException {
          return baseOutputCommitter.needsTaskCommit(taskContext);
        }


        @Override
        public void commitTask(TaskAttemptContext taskContext) throws IOException {
          baseOutputCommitter.commitTask(taskContext);
        }


        @Override
        public void abortTask(TaskAttemptContext taskContext) throws IOException {
          baseOutputCommitter.abortTask(taskContext);
        }


        @Override
        public void abortJob(JobContext jobContext, JobStatus.State state) throws IOException {
          try {
            baseOutputCommitter.abortJob(jobContext, state);
          } finally {
            cleanupScratch(jobContext);
          }
        }


        @Override
        public void commitJob(JobContext jobContext) throws IOException {
          try {
            baseOutputCommitter.commitJob(jobContext);
            Configuration conf = jobContext.getConfiguration();
            try {
              //import hfiles
              new LoadIncrementalHFiles(conf)
                .doBulkLoad(HFileOutputFormat.getOutputPath(jobContext),
                  new HTable(conf,
                    conf.get(HBaseConstants.PROPERTY_OUTPUT_TABLE_NAME_KEY)));
            } catch (Exception e) {
              throw new IOException("BulkLoad failed.", e);
            }
          } finally {
            cleanupScratch(jobContext);
          }
        }


        @Override
        public void cleanupJob(JobContext context) throws IOException {
          try {
            baseOutputCommitter.cleanupJob(context);
          } finally {
            cleanupScratch(context);
          }
        }

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

  private static class NullOutputFormat<K, V> extends
    org.apache.hadoop.mapreduce.lib.output.NullOutputFormat<K, V> {


    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext context) {
      return new OutputCommitter() {
        public void abortTask(TaskAttemptContext taskContext) {
        }


        public void cleanupJob(JobContext jobContext) {
        }

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter


    publishTest(job);
  }


  public void publishTest(Job job) throws Exception {
    OutputCommitter committer = new FileOutputCommitterContainer(job, null);
    committer.commitJob(job);


    Partition part = client.getPartition(dbName, tblName, Arrays.asList("p1"));
    assertNotNull(part);


    StorerInfo storer = InternalUtil.extractStorerInfo(part.getSd(), part.getParameters());

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

      if (dynamicPartitioningUsed){
        for (RecordWriter<? super WritableComparable<?>, ? super Writable> bwriter : baseDynamicWriters.values()){
          bwriter.close(context);
        }
        for (HCatOutputStorageDriver osd : baseDynamicStorageDrivers.values()){
          OutputCommitter baseOutputCommitter = osd.getOutputFormat().getOutputCommitter(context);
          if (baseOutputCommitter.needsTaskCommit(context)){
            baseOutputCommitter.commitTask(context);
          }
        }
      } else {
        baseWriter.close(context);
      }

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

          
          HCatOutputStorageDriver localOsd = createDynamicStorageDriver(dynamicPartValues);
          RecordWriter<? super WritableComparable<?>, ? super Writable> baseRecordWriter 
            = localOsd.getOutputFormat().getRecordWriter(context);
          localOsd.setupOutputCommitterJob(context);
          OutputCommitter baseOutputCommitter = localOsd.getOutputFormat().getOutputCommitter(context);
          baseOutputCommitter.setupTask(context);
          prepareForStorageDriverOutput(localOsd,context);
          baseDynamicWriters.put(dynHashCode, baseRecordWriter);
          baseDynamicStorageDrivers.put(dynHashCode,localOsd);
        }

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter


    publishTest(job);
  }


  public void publishTest(Job job) throws Exception {
    OutputCommitter committer = new HCatOutputCommitter(job,null);
    committer.cleanupJob(job);


    Partition part = client.getPartition(dbName, tblName, Arrays.asList("p1"));
    assertNotNull(part);


    StorerInfo storer = InitializeInput.extractStorerInfo(part.getSd(),part.getParameters());

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

  @Override
  public void checkOutputSpecs(JobContext context) { }
  
  @Override
  public OutputCommitter getOutputCommitter(TaskAttemptContext context) {
    return new OutputCommitter() {
      public void abortTask(TaskAttemptContext taskContext) { }
      public void cleanupJob(JobContext jobContext) { }
      public void commitTask(TaskAttemptContext taskContext) { }
      public boolean needsTaskCommit(TaskAttemptContext taskContext) {
        return false;

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter

    TaskAttemptID taskId = new TaskAttemptID();
    RecordReader<NullWritable, GridmixRecord> reader = new FakeRecordReader();


    LoadRecordGkGrWriter writer = new LoadRecordGkGrWriter();


    OutputCommitter committer = new CustomOutputCommitter();
    StatusReporter reporter = new TaskAttemptContextImpl.DummyReporter();
    LoadSplit split = getLoadSplit();


    MapContext<NullWritable, GridmixRecord, GridmixKey, GridmixRecord> mapContext = new MapContextImpl<NullWritable, GridmixRecord, GridmixKey, GridmixRecord>(
            conf, taskId, reader, writer, committer, reporter, split);

View Full Code Here

Examples of org.apache.hadoop.mapreduce.OutputCommitter


    Counter counter = new GenericCounter();
    Counter inputValueCounter = new GenericCounter();
    LoadRecordWriter output = new LoadRecordWriter();


    OutputCommitter committer = new CustomOutputCommitter();


    StatusReporter reporter = new DummyReporter();
    RawComparator<GridmixKey> comparator = new FakeRawComparator();


    ReduceContext<GridmixKey, GridmixRecord, NullWritable, GridmixRecord> reduceContext = new ReduceContextImpl<GridmixKey, GridmixRecord, NullWritable, GridmixRecord>(

View Full Code Here

0 1 2 3 4 5

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.