MapReduce

MapReduce projects can be used to process a wide variety of data sets, including text files, images, and sensor data. They are a powerful tool for processing large data sets in a parallel and distributed manner.

#How MapReduce Works

input format and record reader-----passing the contents to mapper..

input file:

hadoop is fun fun with hadoop

split1: 0 ,hadoop is fun split2: 1 ,fun with hadoop

Input to mapper:Key: offset value(0)long value: entire line(hadoop is fun)text map()1: hadoop,1 is,1 fun,1

map()2: fun,1 with,1 hadoop,1

output of mapper:key,value map1: (hadoop,1),(is,1),(fun,1) map2: (fun,1),(with,1),(hadoop,1)

Shuffling: hadoop-{1,1} is-{1,} fun-{1,1} with-{1}

input to reducer:key,list of values hadoop--{1,1}, is--{1}, fun--{1,1},with--{1}

output of reducer: key,value hadoop, 2 is,1 fun,2 with,1

#Steps to run map reduce:

Open cloudera quickstart vm
open eclipse
Go to file--new project--create java project
Right click on your java project on LHS--create a new class such as wordcount.
Now write the program in your class.
Configure your program to hadoop jar files(*hadoop-common-2.2.0.jar and *hadoop-mapreduce-client-core-2.2.0.jar) 6.1 Right click on your class name on LHS --build configure path--add hadoop jar files to your Program
Build the jar file of your program. Right click on your class name on LHS--export-click jar file inside java--select the export destination of jar file--select the folder in which to save program- select the main class--finally finish
Go to terminal: hadoop jar /input file path /output directory path hadoop jar /home/cloudera/Desktop/wc.jar /a.txt /wcoutput
See output: hadoop fs -ls /output directory hadoop fs -cat /wcoutput/part-r-00000

input hadoop fs -put /home/cloudera/desktop/adad.txt hdfs:/

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
wordcount		wordcount
Mapreduce.jar		Mapreduce.jar
README.md		README.md
bigdata.txt		bigdata.txt
hadoop-common-2.2.0.jar		hadoop-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar		hadoop-mapreduce-client-core-2.2.0.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MapReduce

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages