Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Getting Started
The Microsoft Distribution of Hadoop comes with a web-based interactive Javascript console that is started along with the other Hadoop services. The console allows you to:
- Perform HDFS operations, including uploading/reading files to/from the HDFS
- Run MapReduce programs from .js scripts or JAR files, and monitor their progress
- Run a Pig job specified using a fluent query syntax in Javascript, and monitor its progress
- Visualize data with graphs built using HTML5
To get started, you can open the console in your browser by going to http://localhost:8080/ after running "isotope start" in a local installation, or by clicking on the appropriate link on the Azure portal after you have signed in.
Walkthrough: Visualizing Word Count
Write the Javascript MapReduce script
Using Notepad or your favorite text editor, create a text file with the following contents:
var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } } }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); };
Save the text file as “WordCount.js” to your hard drive. Note that UTF-8 encoding, the default often used by Visual Studio, causes an "illegal character" exception when the Pig job runs so set the encoding to "US-ASCII – Codepage 20127".
Upload the script and input data
Open the interactive Javascript console and type:
fs.put()
Then select the WordCount.js file you created in the previous step and upload it to the HDFS.
Next, create a directory on the HDFS for the Gutenberg sample by typing:
#mkdir gutenberg
Finally, upload each of the Gutenberg files by typing
fs.put("gutenberg")
and selecting a .txt file from the Gutenberg set (located in C:\Apps\dist\examples\data\gutenberg). Repeat this step for each of the text files.
To make sure the files were uploaded correctly, use the following commands:
#ls #ls gutenberg #cat WordCount.js
Run the query
Run the following to find the top 10 most frequent words in the Gutenberg sample texts:
pig.from("gutenberg").mapReduce("WordCount.js", "word, count:long").orderBy("count DESC").take(10).to("gbtop10")
Once the job completes, you can see the output files in the HDFS by typing:
#ls gbtop10
Visualize the results
(Note: if you are using Internet Explorer, this step requires IE9+)
Read the results into the Javascript context by typing:
file = fs.read("gbtop10") data = parse(file.data, "word, count:long")
Then make a bar graph of the data:
graph.bar(data)
Enjoy!
The article was Written By David Zhang.