Counters:
======
Counters are the useful channel for gathering statistics about the job
* for the quality control OR application level statitics
* For Problem diagnosis.
Built-in Counters:
---------------------
Hadoop maintains some built-in counters for every job, which report various metrics for our job.
Ex: expected amount of INPUT consumed
expected amount of OUTPUT produced
Some built in counters:
1. Map input records --> num of input records consumed by all the maps in the JOB. Incremented every time a record is
read from InputSplit (thru RecordReader) before passing to map() method of Mapper.
2. Map output records--> num of outputt records produced by all the maps in the JOB. Incremented every time a collect()
method is called on Context object
like wise,
3. Reduce input records
4. Reduce output records
** Counters are maintained by the task with which they are associated, and periodically sent to the task tracker and
then to Job Tracker. So they all can be globally aggregated.
The built-in Job Counters are actually maintained by the job tracker, so they do not need to be sent across the network
unlike the all other counters ,including the user defined ones.
No comments:
Post a Comment