Posts

How to Do MongoDB Map-Reduce Queries Easily with 3T MongoChef

In this post, we will see how 3T MongoChef can make your life easy with writing, debugging and running Map-Reduce jobs using the amazing new Map-Reduce Screen.

MongoDB’s Map-Reduce is the flexible cousin of the Aggregation pipeline. In general, it works by taking the data through two stages:

  • a map stage that processes each document and emits one or more objects for each input document
  • a reduce stage that combines emitted objects from the output of the map operation

The main advantage over the Aggregation pipeline is that Map-Reduce may use arbitrary JavaScript for each stage enabling otherwise impossible operations though at the expense of lower performance (potentially higher execution times). You can read more about it in MongoDB’s reference documentation.

For this tutorial, we will Map-Reduce a collection of documents with image metadata. The relevant parts of the schema are:

{
    "_id" : 592341,
    "tags" : [
        "cats",
        "kittens",
        "travel"
    ]
}

If you haven’t installed 3T MongoChef already, it’s available for Windows, Mac and Linux here: http://3t.io/mongochef/

Objective: Group Images by Tag except for Those Which Include The “work” Tag

To achieve this, we will need to write a Map-Reduce job that will:

  1. Exclude all images which include the “work” tag.
  2. Have the map() function emit the image id for each of the tags as key.
  3. Have the reduce() function combine the image ids for each tag.

Let us start by opening MongoChef’s new Map-Reduce screen by selecting the Open Map-Reduce option from the context menu:

Open Map-Reduce in MongoChef

Opening the Map-Reduce in 3T MongoChef

Filtering the Input Data

Clicking on the “Input data” tab and then the “Preview Input” toolbar button shows us a preview of the collection data. It is here that we can shape the data fed into the Map-Reduce job and omit any image tagged “work”. This is achieved by the following query

{ "tags": { $ne: "work" } }

We can inspect the data that will be fed into the map function by clicking the “Preview Output” toolbar button.

Input data sample

Input data sample

Mapping the Collection

For the second step, we move to the “map()” tab.

In this tab we want to specify the function responsible for emitting one or more key-pairs for each document. The following function gets the job done:

function () {
    for (var index in this.tags) {
        emit ( this.tags[index] , this._id );
    }
}

We can sample the map() function’s output by clicking the preview button verifying that this indeed did the trick. This feature comes in particularly handy, especially before submitting a job that could potentially run for hours. The “map() sample output” tab gives us a rich view of how our map() function works, showing emitted key/value pairs along with the document that produced them and its _id.

Map output preview

Map output preview

Reducing the Data

MongoChef’s default implementation of the reduce() function just so happens to do the rest for us:

function (key, values) {    
    var reducedValue = "" + values;
    return reducedValue;
}

Again, the Preview Output toolbar button will let us verify we did it right. Were we writing a more complex reduce() function or trying to debug what was being fed in, we could sample the input by clicking on the preview input button. That gives us a few of the key-value pairs that are emitted and then reduced.

Reduce sample input

Reduce sample input

Finally

MongoDB allows for a final stage to a Map-Reduce job for doing some final processing with use of a finalize() function. Let’s use this just so the output is easier to read:

function (key, reducedValue) {
    var finalValue = "tag '" + key + "' was found in images: " + reducedValue;
    return finalValue;
}

A quick inspection of finalize()’s sample output and we are ready to submit a job that will process all of the data.

Finalize sample output

Finalize sample output

Running the Map-Reduce Job

Now that we have set all the parameters of the job, and are sure that all our functions run as intended, we can submit the Map-Reduce job to run through the whole collection dataset by clicking the “Execute” button on the toolbar.

This action will open a new tab which will contain the results of the job when it is finished:

Finished job

Finished Job!

Clicking on Show details will bring up a dialog showing execution statistics as well as a configuration summary for this job.

Job statistics

Finished Map-Reduce job statistics

Map-Reduce Epilogue

Now that the Map-Reduce job is finished, we can save all this work as a script. The format is 100% JavaScript code, which allows the saved file to be run in IntelliShell or even the basic mongo shell and will produce identical results.

// *** 3T Software Labs, MongoChef: MapReduce Job ****

// Variable for db
var __3t_mongochef_db = "exam";

// Variable for map
var __3t_mongochef_map = function () {
	for (var index in this.tags) {
    		emit ( this.tags[index] , this._id );
	}
}
;

// Variable for reduce
var __3t_mongochef_reduce = function (key, values) {    
    var reducedValue = "" + values;
    return reducedValue;
};

// Variable for finalize
var __3t_mongochef_finalize = function (key, reducedValue) {
    var finalValue = "tag '" + key + "' was found in images: " + reducedValue;
    return finalValue;
}
;

db.runCommand({ 
    mapReduce: "images",
    map: __3t_mongochef_map,
    reduce: __3t_mongochef_reduce,
    finalize: __3t_mongochef_finalize,
    out: { "inline" : 1},
    query: { "tags": { $ne: "work" } },
    sort: { },
    inputDB: "exam",
 });

Do you have an existing script that you’ve been working with already? No problem, MongoChef will load it into the Map-Reduce screen, just click on the “Open Map-Reduce File” toolbar button, select the file and there you have it!

How to Do MongoDB Aggregation Queries Easily with 3T MongoChef

In this post we’re going to take a look at how to do MongoDB aggregation queries easily with the amazing new Aggregation Screen in 3T MongoChef.

Prefer to watch?
See the accompanying MongoDB Aggregation video.

For this tutorial, we’re going to build a query based on the freely available housing data from the City of Chicago Data Portal to learn how to use the incredible new features and query support the Aggregation Screen provides.

If you haven’t installed 3T MongoChef 3.0 already, it’s available for Windows, Mac and Linux here: http://3t.io/mongochef/

Creating the Aggregation Query

Once we’ve opened up 3T MongoChef and connected to the database, we can select the collection we wish to query:

Select Collection for MongoDB Aggregation Query

We can open the Aggregation Screen by clicking the large ‘Aggregate‘ button in the main tool bar at the top, selecting ‘Open Aggregation Screen‘ from the right-click context menu or by pressing the ‘F4‘ shortcut key.

We now have an empty aggregation query, ready to be filled up, so let’s get cracking!

New Pipeline MongoDB Aggregation Query

Identifying the Question We Want to Answer

The question we want to ask of our data is simple:

Which zip codes have the greatest number of senior housing units available?

To think how we’ll answer this and how we’ll form our query, let’s take a look at the data. Click ‘Execute full pipeline‘ (executing an empty pipeline simply shows the contents of the collection).

Full Pipeline Results MongoDB Aggregation Query

If you prefer a JSON view of the data (and 3T MongoChef supports dynamically switching between tree, table and JSON views of your result data), it’s included below:

{ 
    "_id" : ObjectId("544f9533d4c6dc758c28fde4"), 
    "community_area" : {
        "name" : "Albany Park", 
        "number" : 14
    }, 
    "property" : {
        "type" : "Senior", 
        "name" : "Mayfair Commons"
    }, 
    "address" : "4444 W. Lawrence Ave.", 
    "zip_code" : "60630", 
    "phone_number" : "773-205-7862", 
    "management_company" : "Metroplex, Inc.", 
    "units" : 97, 
    "location" : {
        "x_coordinate" : 1145674.7538177613, 
        "y_coordinate" : 1931569.979044555, 
        "latitude" : 41.9682242321, 
        "longitude" : -87.7397474866, 
        "description" : "4444 W Lawrence Ave\n(41.968224232060564, -87.73974748655358)"
    }
}

OK, so we can see we have the fields we need – we can check "property.type" to see that it’s senior housing, and "zip_code" and "units" give us the zip code and number of available units there are, respectively.

To answer our question, we need to combine these into the right aggregation query. Let’s create the first stage of our query where we’ll match against the senior property type.

Adding a New Stage

Click ‘Add New Stage‘ and you’ll see a new stage in the ‘Pipeline‘ tab.

New Stage MongoDB Aggregation Query

Double click the new stage to edit it (or simply select the ‘Stage 1‘ tab):

Match Operator MongoDB Aggregation Query

The screenshot above jumps ahead a little bit as the stage specification has already been filled, but let’s break down each piece in turn.

First, notice the ‘$match‘ in the combo box. It’s here where we select the stage’s ‘operator’. A stage operator defines what the stage actually does. The ‘$match‘ operator takes the input set of documents and outputs only those that match the given criteria. It is essentially a filter. A full list of the supported operators and their meaning is available here: http://docs.mongodb.org/manual/meta/aggregation-quick-reference/ (this link is always readily available by clicking ‘Operator Quick Reference’ in the app).

For convenience, the specification of the Stage 1 ‘$match‘ operator is repeated below:

{
    "property.type": "Senior"
}

In the stage’s specification, we can see that we are matching against the "Senior" property type, meaning only documents with a value of "Senior" for the field "property.type" will be passed onto the (yet to be created) next stage of the pipeline for further processing.

We can check the output of this and any other stage at any time by clicking ‘Show output from the selected stage‘. Similarly, we can see the input of any stage at any time by clicking ‘Show input to the selected stage‘. This is a really nice and convenient feature, as it makes keeping track of the precise form of the data we are working at each stage in the pipeline really easy.

We can see in the ‘Stage 1 output‘ tab that we have the results we need from this stage, and so let’s go on and create the next.

Grouping Results

We now need a way to group together the results from Stage 1 on zip code and then add up each of the available units figures. The ‘$group‘ operator is exactly what we need for this.

Group Operator MongoDB Aggregation Query

The Stage 2 ‘$group‘ operator specification is repeated below:

{
    _id: "$zip_code",
    totalUnits: { $sum: "$units" }
}

The specification of Stage 2 states that the output of this stage will be documents that have an “_id" with a distinct zip code as a value and so will group together documents input to this stage that have the same zip code, and a “totalUnits" field whose value is the sum of all the "units" field values from each of the documents in the group. We can see the input to and output from tabs for this stage in the screenshot and can confirm that a reduction has taken place – of the 70 documents input to this stage, there were 36 distinct zip codes, and so the corresponding 36 documents are output from this stage.

Finding the Answer

As we want to know the zip codes that have the greatest number of senior housing units available, it would be convenient to sort the results from the greatest to the least total units available.

To do this, we’ll create a third stage using the ‘$sort‘ operator with the following specification, giving us exactly what we want:

{
    totalUnits: -1
}

Full Query and Results MongoDB Aggregation Query

Going back to the ‘Pipeline‘ tab we can see the result of the execution of the full query, as well as the full query itself, all in one one place. We can see we have the expected number of results from the full pipeline, and we can now answer our question – we have a list of the zip codes that have the greatest number of senior housing units available.

Wasn’t that easy? :-)

Specifying Query Options

Depending on your own particular query, you may wish to specify options such as to use a database cursor for the results (if the results are large), allow the query to write temporary intermediate results to disk or rather than run the query, explain aspects of the processing of the query.

These options can be set in the ‘Options‘ tab. Note that these options only became available in MongoDB 2.6, so if you are connected to a MongoDB 2.4 or earlier instance, the ‘Options‘ tab is not shown.

Options MongoDB Aggregation Query

Sharing Aggregation Queries

The aggregation queries you have created can be saved to and loaded from file, so not only can you reload them in future sessions, but you can also share them with other colleagues and users.  There is also a preview of the raw MongoDB script of the aggregation query available by selecting ‘Show Query Preview‘ from the context menu. This can be rather handy if you simply wish to examine the raw underlying script, or make a quick copy to stick in an email, or perhaps combine in it in a larger, more complex query in 3T MongoChef’s IntelliShell.

Query Preview MongoDB Aggregation Query

Handy References

It can take a bit of time to master all the different operators available in the aggregation pipeline, so links to the MongoDB Aggregation Pipeline Quick Reference and the Aggregation Section of the MongoDB Manual are always available within a click’s reach directly in the app itself via the ‘Operator Quick Reference‘ and ‘Aggregation Tutorial‘ links, respectively. It wont be too long before you’re masterfully producing complex MongoDB aggregation queries of your own!

 

OK, that’s it for this post. I hope you feel the same delight as we do about the amazing new features, convenience and boost to productivity the new Aggregation Screen in 3T MongoChef offers.

Please do check out 3T MongoChef, the best GUI for MongoDB. A little example of the rich code completion and easy in-line editing experience it offers is shown below:

IntelliShell for MongoDB

We’re always very keen to hear about your experiences and ideas for 3T MongoChef. If you’d like to tell us about them please visit our feedback page or click the ‘Feedback‘ tool bar button in the app.

Also, please check out our Schema Explorer & Documentation and Data Compare & Sync tools at 3T.io, as I expect they’ll also help in making you a much more powerful and productive MongoDB user.