Whenever the mapper process is completed before tasktracker emits the result . Task tracker keep the output in the LFS(Local file system) in the same node.
Note : - Data localization is for mapper phase but not for sort & shuffle and reducer phase.
Life of mapper output is till the end of job completion i.e as the job completion success or failure the local copies of mapper o/p will automatically revoked by mapper only.
ASP.NET is a web application framework developed and marketed by Microsoft to allow programmers to build dynamic web sites, web applications and web services. It was first released in January 2002 with version 1.0 of the .NET Framework, and is the successor to Microsoft's Active Server Pages (ASP) technology. ASP.NET is built on the Common Language Runtime (CLR), allowing programmers to write ASP.NET code using any supported .NET language.
Thursday, April 30, 2015
Creation of jar file and Execution Process in Clustered Environment
Creation of jar file and Execution Process in Clustered Environment
Process of Creation of Jar file:
Step1: Open Eclipse, click on file, click on New then click on Java Project(if not available click on Other select Java and click on Java project)
Step2:
Copy the provided code or write the below code.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountNew {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountNew.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Step3: Right Click on “src” which is in Package Explorer tab -> Click On “Build Path”->Configure Build Path->Click on Libraries tab->Add External JARs..->Press “Ok” Button.
Step4: Step-3: Right Click on “src” ->New->Class->give the name of Class File same as in code->Press “Finish” Button->Paste Code/Write the Code
Step5: Again Right Click on “src”->Export->Select JAR File under the Java->give the path as well as Name of JAR File->Press Finish
Button
Execution Process:
Step1: copy the jar file and input text file(provided) into download folder(provided server shared path).
Step2: copy the text file into local folder and copy the jar file into local folder.
Step3: Create the new directory in HDFS path (hadoop fs –mkdir MrInput)
Step4: copy the text file from LFS to HDFS by using –put or –copyFromLocal.
Step5:
root@ubuntu:/home/chandu# hadoop jar BATCH38-WORDCOUNT.jar WordCountNew /root/user/chandu/MrInput/Input-Big.txt /root/user/chandu/MROutput
14/08/31 02:50:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/31 02:50:00 INFO input.FileInputFormat: Total input paths to process : 1
14/08/31 02:50:00 WARN snappy.LoadSnappy: Snappy native library is available
14/08/31 02:50:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/31 02:50:00 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/31 02:50:00 INFO mapred.JobClient: Running job: job_201408310158_0002
14/08/31 02:50:01 INFO mapred.JobClient: map 0% reduce 0%
14/08/31 02:50:07 INFO mapred.JobClient: map 100% reduce 0%
14/08/31 02:50:14 INFO mapred.JobClient: map 100% reduce 33%
14/08/31 02:50:16 INFO mapred.JobClient: map 100% reduce 100%
14/08/31 02:50:16 INFO mapred.JobClient: Job complete: job_201408310158_0002
14/08/31 02:50:16 INFO mapred.JobClient: Counters: 26
14/08/31 02:50:16 INFO mapred.JobClient: Job Counters
14/08/31 02:50:16 INFO mapred.JobClient: Launched reduce tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5140
14/08/31 02:50:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/31 02:50:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/31 02:50:16 INFO mapred.JobClient: Launched map tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: Data-local map tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8545
14/08/31 02:50:16 INFO mapred.JobClient: FileSystemCounters
14/08/31 02:50:16 INFO mapred.JobClient: FILE_BYTES_READ=139
14/08/31 02:50:16 INFO mapred.JobClient: HDFS_BYTES_READ=153395
14/08/31 02:50:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=119182
14/08/31 02:50:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=122
14/08/31 02:50:16 INFO mapred.JobClient: Map-Reduce Framework
14/08/31 02:50:16 INFO mapred.JobClient: Map input records=5487
14/08/31 02:50:16 INFO mapred.JobClient: Reduce shuffle bytes=139
14/08/31 02:50:16 INFO mapred.JobClient: Spilled Records=22
14/08/31 02:50:16 INFO mapred.JobClient: Map output bytes=251174
14/08/31 02:50:16 INFO mapred.JobClient: CPU time spent (ms)=2000
14/08/31 02:50:16 INFO mapred.JobClient: Total committed heap usage (bytes)=177016832
14/08/31 02:50:16 INFO mapred.JobClient: Combine input records=25872
14/08/31 02:50:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=125
14/08/31 02:50:16 INFO mapred.JobClient: Reduce input records=11
14/08/31 02:50:16 INFO mapred.JobClient: Reduce input groups=11
14/08/31 02:50:16 INFO mapred.JobClient: Combine output records=11
14/08/31 02:50:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=186167296
14/08/31 02:50:16 INFO mapred.JobClient: Reduce output records=11
14/08/31 02:50:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=749965312
14/08/31 02:50:16 INFO mapred.JobClient: Map output records=25872
Step6:
root@ubuntu:/home/chandu# hadoop fs -ls /root/user/chandu/MrInput
Found 1 items
-rw-r--r-- 1 root supergroup 153270 2014-08-31 02:33 /root/user/chandu/MrInput/Input-Big.txt
root@ubuntu:/home/chandu# hadoop fs -ls /root/user/chandu/MROutput
Found 3 items
-rw-r--r-- 1 root supergroup 0 2014-08-31 02:50 /root/user/chandu/MROutput/_SUCCESS
drwxrwxrwx - root supergroup 0 2014-08-31 02:50 /root/user/chandu/MROutput/_logs
-rw-r--r-- 1 root supergroup 122 2014-08-31 02:50 /root/user/chandu/MROutput/part-r-00000
root@ubuntu:/home/chandu# hadoop fs -cat /root/user/chandu/MROutput/part-r-00000
good 4312
hadoop 4312
having 2156
is 4312
knowledge 1078
leader 1078
learn 1078
market 3234
now 2156
people 1078
the 1078
root@ubuntu:/home/chandu#
Process of Creation of Jar file:
Step1: Open Eclipse, click on file, click on New then click on Java Project(if not available click on Other select Java and click on Java project)
Step2:
Copy the provided code or write the below code.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountNew {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountNew.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Step3: Right Click on “src” which is in Package Explorer tab -> Click On “Build Path”->Configure Build Path->Click on Libraries tab->Add External JARs..->Press “Ok” Button.
Step4: Step-3: Right Click on “src” ->New->Class->give the name of Class File same as in code->Press “Finish” Button->Paste Code/Write the Code
Step5: Again Right Click on “src”->Export->Select JAR File under the Java->give the path as well as Name of JAR File->Press Finish
Button
Execution Process:
Step1: copy the jar file and input text file(provided) into download folder(provided server shared path).
Step2: copy the text file into local folder and copy the jar file into local folder.
Step3: Create the new directory in HDFS path (hadoop fs –mkdir MrInput)
Step4: copy the text file from LFS to HDFS by using –put or –copyFromLocal.
Step5:
root@ubuntu:/home/chandu# hadoop jar BATCH38-WORDCOUNT.jar WordCountNew /root/user/chandu/MrInput/Input-Big.txt /root/user/chandu/MROutput
14/08/31 02:50:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/31 02:50:00 INFO input.FileInputFormat: Total input paths to process : 1
14/08/31 02:50:00 WARN snappy.LoadSnappy: Snappy native library is available
14/08/31 02:50:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/31 02:50:00 INFO snappy.LoadSnappy: Snappy native library loaded
14/08/31 02:50:00 INFO mapred.JobClient: Running job: job_201408310158_0002
14/08/31 02:50:01 INFO mapred.JobClient: map 0% reduce 0%
14/08/31 02:50:07 INFO mapred.JobClient: map 100% reduce 0%
14/08/31 02:50:14 INFO mapred.JobClient: map 100% reduce 33%
14/08/31 02:50:16 INFO mapred.JobClient: map 100% reduce 100%
14/08/31 02:50:16 INFO mapred.JobClient: Job complete: job_201408310158_0002
14/08/31 02:50:16 INFO mapred.JobClient: Counters: 26
14/08/31 02:50:16 INFO mapred.JobClient: Job Counters
14/08/31 02:50:16 INFO mapred.JobClient: Launched reduce tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5140
14/08/31 02:50:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/31 02:50:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/31 02:50:16 INFO mapred.JobClient: Launched map tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: Data-local map tasks=1
14/08/31 02:50:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8545
14/08/31 02:50:16 INFO mapred.JobClient: FileSystemCounters
14/08/31 02:50:16 INFO mapred.JobClient: FILE_BYTES_READ=139
14/08/31 02:50:16 INFO mapred.JobClient: HDFS_BYTES_READ=153395
14/08/31 02:50:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=119182
14/08/31 02:50:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=122
14/08/31 02:50:16 INFO mapred.JobClient: Map-Reduce Framework
14/08/31 02:50:16 INFO mapred.JobClient: Map input records=5487
14/08/31 02:50:16 INFO mapred.JobClient: Reduce shuffle bytes=139
14/08/31 02:50:16 INFO mapred.JobClient: Spilled Records=22
14/08/31 02:50:16 INFO mapred.JobClient: Map output bytes=251174
14/08/31 02:50:16 INFO mapred.JobClient: CPU time spent (ms)=2000
14/08/31 02:50:16 INFO mapred.JobClient: Total committed heap usage (bytes)=177016832
14/08/31 02:50:16 INFO mapred.JobClient: Combine input records=25872
14/08/31 02:50:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=125
14/08/31 02:50:16 INFO mapred.JobClient: Reduce input records=11
14/08/31 02:50:16 INFO mapred.JobClient: Reduce input groups=11
14/08/31 02:50:16 INFO mapred.JobClient: Combine output records=11
14/08/31 02:50:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=186167296
14/08/31 02:50:16 INFO mapred.JobClient: Reduce output records=11
14/08/31 02:50:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=749965312
14/08/31 02:50:16 INFO mapred.JobClient: Map output records=25872
Step6:
root@ubuntu:/home/chandu# hadoop fs -ls /root/user/chandu/MrInput
Found 1 items
-rw-r--r-- 1 root supergroup 153270 2014-08-31 02:33 /root/user/chandu/MrInput/Input-Big.txt
root@ubuntu:/home/chandu# hadoop fs -ls /root/user/chandu/MROutput
Found 3 items
-rw-r--r-- 1 root supergroup 0 2014-08-31 02:50 /root/user/chandu/MROutput/_SUCCESS
drwxrwxrwx - root supergroup 0 2014-08-31 02:50 /root/user/chandu/MROutput/_logs
-rw-r--r-- 1 root supergroup 122 2014-08-31 02:50 /root/user/chandu/MROutput/part-r-00000
root@ubuntu:/home/chandu# hadoop fs -cat /root/user/chandu/MROutput/part-r-00000
good 4312
hadoop 4312
having 2156
is 4312
knowledge 1078
leader 1078
learn 1078
market 3234
now 2156
people 1078
the 1078
root@ubuntu:/home/chandu#
Labels:
Mapreduce
Subscribe to:
Posts (Atom)