Interactive Javascript console on MDH

2024-01-17

Getting Started

The Microsoft Distribution of Hadoop comes with a web-based interactive Javascript console that is started along with the other Hadoop services. The console allows you to:

Perform HDFS operations, including uploading/reading files to/from the HDFS
Run MapReduce programs from .js scripts or JAR files, and monitor their progress
Run a Pig job specified using a fluent query syntax in Javascript, and monitor its progress
Visualize data with graphs built using HTML5

To get started, you can open the console in your browser by going to http://localhost:8080/ after running "isotope start" in a local installation, or by clicking on the appropriate link on the Azure portal after you have signed in.

Walkthrough: Visualizing Word Count

Write the Javascript MapReduce script

Using Notepad or your favorite text editor, create a text file with the following contents:

var map = function (key, value, context) {
    var words = value.split(/[^a-zA-Z]/);
    for (var i = 0; i < words.length; i++) {
        if (words[i] !== "") {
            context.write(words[i].toLowerCase(), 1);
        }
    }
};

var reduce = function (key, values, context) {
    var sum = 0;
    while (values.hasNext()) {
        sum += parseInt(values.next());
    }
    context.write(key, sum);
};

Save the text file as “WordCount.js” to your hard drive. Note that UTF-8 encoding, the default often used by Visual Studio, causes an "illegal character" exception when the Pig job runs so set the encoding to "US-ASCII – Codepage 20127".

Upload the script and input data

Open the interactive Javascript console and type:

fs.put()

Then select the WordCount.js file you created in the previous step and upload it to the HDFS.

Next, create a directory on the HDFS for the Gutenberg sample by typing:

#mkdir gutenberg

Finally, upload each of the Gutenberg files by typing

fs.put("gutenberg")

and selecting a .txt file from the Gutenberg set (located in C:\Apps\dist\examples\data\gutenberg). Repeat this step for each of the text files.

To make sure the files were uploaded correctly, use the following commands:

#ls
#ls gutenberg
#cat WordCount.js

Run the query

Run the following to find the top 10 most frequent words in the Gutenberg sample texts:

pig.from("gutenberg").mapReduce("WordCount.js", "word, count:long").orderBy("count DESC").take(10).to("gbtop10")

Once the job completes, you can see the output files in the HDFS by typing:

#ls gbtop10

Visualize the results

(Note: if you are using Internet Explorer, this step requires IE9+)

Read the results into the Javascript context by typing:

file = fs.read("gbtop10")
data = parse(file.data, "word, count:long")

Then make a bar graph of the data:

graph.bar(data)

Enjoy!

The article was Written By David Zhang.

Share via