Posts

How to Do MongoDB Map-Reduce Queries Easily with 3T MongoChef

In this post, we will see how 3T MongoChef can make your life easy with writing, debugging and running Map-Reduce jobs using the amazing new Map-Reduce Screen.

MongoDB’s Map-Reduce is the flexible cousin of the Aggregation pipeline. In general, it works by taking the data through two stages:

  • a map stage that processes each document and emits one or more objects for each input document
  • a reduce stage that combines emitted objects from the output of the map operation

The main advantage over the Aggregation pipeline is that Map-Reduce may use arbitrary JavaScript for each stage enabling otherwise impossible operations though at the expense of lower performance (potentially higher execution times). You can read more about it in MongoDB’s reference documentation.

For this tutorial, we will Map-Reduce a collection of documents with image metadata. The relevant parts of the schema are:

{
    "_id" : 592341,
    "tags" : [
        "cats",
        "kittens",
        "travel"
    ]
}

If you haven’t installed 3T MongoChef already, it’s available for Windows, Mac and Linux here: http://3t.io/mongochef/

Objective: Group Images by Tag except for Those Which Include The “work” Tag

To achieve this, we will need to write a Map-Reduce job that will:

  1. Exclude all images which include the “work” tag.
  2. Have the map() function emit the image id for each of the tags as key.
  3. Have the reduce() function combine the image ids for each tag.

Let us start by opening MongoChef’s new Map-Reduce screen by selecting the Open Map-Reduce option from the context menu:

Open Map-Reduce in MongoChef

Opening the Map-Reduce in 3T MongoChef

Filtering the Input Data

Clicking on the “Input data” tab and then the “Preview Input” toolbar button shows us a preview of the collection data. It is here that we can shape the data fed into the Map-Reduce job and omit any image tagged “work”. This is achieved by the following query

{ "tags": { $ne: "work" } }

We can inspect the data that will be fed into the map function by clicking the “Preview Output” toolbar button.

Input data sample

Input data sample

Mapping the Collection

For the second step, we move to the “map()” tab.

In this tab we want to specify the function responsible for emitting one or more key-pairs for each document. The following function gets the job done:

function () {
    for (var index in this.tags) {
        emit ( this.tags[index] , this._id );
    }
}

We can sample the map() function’s output by clicking the preview button verifying that this indeed did the trick. This feature comes in particularly handy, especially before submitting a job that could potentially run for hours. The “map() sample output” tab gives us a rich view of how our map() function works, showing emitted key/value pairs along with the document that produced them and its _id.

Map output preview

Map output preview

Reducing the Data

MongoChef’s default implementation of the reduce() function just so happens to do the rest for us:

function (key, values) {    
    var reducedValue = "" + values;
    return reducedValue;
}

Again, the Preview Output toolbar button will let us verify we did it right. Were we writing a more complex reduce() function or trying to debug what was being fed in, we could sample the input by clicking on the preview input button. That gives us a few of the key-value pairs that are emitted and then reduced.

Reduce sample input

Reduce sample input

Finally

MongoDB allows for a final stage to a Map-Reduce job for doing some final processing with use of a finalize() function. Let’s use this just so the output is easier to read:

function (key, reducedValue) {
    var finalValue = "tag '" + key + "' was found in images: " + reducedValue;
    return finalValue;
}

A quick inspection of finalize()’s sample output and we are ready to submit a job that will process all of the data.

Finalize sample output

Finalize sample output

Running the Map-Reduce Job

Now that we have set all the parameters of the job, and are sure that all our functions run as intended, we can submit the Map-Reduce job to run through the whole collection dataset by clicking the “Execute” button on the toolbar.

This action will open a new tab which will contain the results of the job when it is finished:

Finished job

Finished Job!

Clicking on Show details will bring up a dialog showing execution statistics as well as a configuration summary for this job.

Job statistics

Finished Map-Reduce job statistics

Map-Reduce Epilogue

Now that the Map-Reduce job is finished, we can save all this work as a script. The format is 100% JavaScript code, which allows the saved file to be run in IntelliShell or even the basic mongo shell and will produce identical results.

// *** 3T Software Labs, MongoChef: MapReduce Job ****

// Variable for db
var __3t_mongochef_db = "exam";

// Variable for map
var __3t_mongochef_map = function () {
	for (var index in this.tags) {
    		emit ( this.tags[index] , this._id );
	}
}
;

// Variable for reduce
var __3t_mongochef_reduce = function (key, values) {    
    var reducedValue = "" + values;
    return reducedValue;
};

// Variable for finalize
var __3t_mongochef_finalize = function (key, reducedValue) {
    var finalValue = "tag '" + key + "' was found in images: " + reducedValue;
    return finalValue;
}
;

db.runCommand({ 
    mapReduce: "images",
    map: __3t_mongochef_map,
    reduce: __3t_mongochef_reduce,
    finalize: __3t_mongochef_finalize,
    out: { "inline" : 1},
    query: { "tags": { $ne: "work" } },
    sort: { },
    inputDB: "exam",
 });

Do you have an existing script that you’ve been working with already? No problem, MongoChef will load it into the Map-Reduce screen, just click on the “Open Map-Reduce File” toolbar button, select the file and there you have it!