SSL Certificate Validation & Handling Security Improved

As of MongoChef 4.5, we added support for MongoDB 3.4. The introduction of the 3.4 MongoDB driver has brought about quite a few bugfixes and changes one of which is an upgrade on the security of SSL certificate handling.

MongoChef will make sure that the certificate presented by the server indeed belongs to the server. The SSL/TLS protocol is now more strictly adhered to.

In order to download the latest MongoChef, choose the appropriate link below:

However…

… some SSL connections may now fail to work if not properly configured.

Each certificate protects a specific entity stated in the Subject Name field (CN) in the certificate – see https://support.dnsimple.com/articles/what-is-common-name/. Such a common name represents the entity protected by the SSL certificate. The certificate is valid only if the requested hostname matches the certificate’s common name.

If this is not the case, MongoChef will now by default not allow the connection.

SSL Connection Issues

If you are having problems connecting, it may be that you are connecting to a MongoDB server by IP (and not by CN) which is different than the CN. The protocol looks for alternative names that may match that IP address xx.xx.xx.xx but none is found. As a result, an error like “CertificateException: No subject alternative names present” is given.

A way to test this is to connect by name rather than IP: e.g. “my-ssl-mongod.server.com” instead of the IP “xx.xx.xxx.xx”. You have to make sure that this name resolves to the correct IP. If the local DNS does not do this it will have to be entered in the local OS’ hosts file – e.g. /etc/hosts in unix systems.

Another reason the connection may not be working anymore is that the server’s certificate (and/or the PEM client key file you are using – if any) is invalid. This is usually due to the use of a certificate which was not generated with a proper CN. Note that this may also mean that you are the target of a MITM (man-in-the-middle) attack.

How to Override

For our users’ convenience, we have added a new SSL option in the latest MongoChef 4.5.2 release. It will set your connection to also allow invalid hostnames which will emulate the connection behavior of MongoChef 4.4.x.

SSL Allow Invalid Hostnames

MongoDB Schema Discovery and Exploration with MongoChef

One of the great things that we love about MongoDB is of course that it is schema-less. This makes adapting your application to changing requirements a breeze. That said, your data will often have a fixed implicit schema – e.g. each document in your employees collection will likely always have a first and last name field. So, making sure from time to time that all documents indeed contain certain fields is probably a good idea. Likewise, having the power to dynamically add new or discontinue old fields in your documents can often lead to a proliferation of various “versions” of your document schema. So, getting a feel for how often a certain field occurs in your collection can be quite helpful.

Luckily, MongoChef Pro makes discovering and exploring the schema in your MongoDB collections super-easy! With MongoChef, you can quickly:

  • check the health of your schema
  • find schema anomalies
  • inspect data outliers
  • visualize data distributions

So, let’s dive right in!

Schema Discovery

Select the collection whose schema you want to explore (in this example “customers”) and click the “Schema” icon in the global toolbar. This will open a tab where you can now configure your schema discovery:

mongodb schema discovery

  1. Here you can control how MongoChef should sample documents from the collection for the schema discovery. By default MongoChef will analyze randomly selected documents. You can choose between “Random”, “First”, “Last”, or “All” – in which case MongoChef will read in all documents in the collection.
  2. MongoChef gives you full control over how many documents should be read for the schema discovery. For our example, we will look at 1,000 documents.
  3. By default, MongoChef will not analyse the elements of any array fields when it encounters them. The reason is that arrays can often contain thousands of elements of the same type which can lead to bloated schema results. You can of course override this default behavior.
  4. You can also provide a query to further control the document set that should be used for the schema discovery. In our example, we will use an empty query, which will return all documents in the collections
  5. Click “Run analysis” to start.

After the analysis has completed, you will see the schema discovery result page:

mongodb schema discovery result

In the left-hand pane (1), you will see the discovered schema tree. For each field you see its name, its global probability – i.e. which percentage of documents were found to contain that field, and its discovered field type(s). You can now easily explore your schema as you would with a JSON document.

In the right-hand pane (2), MongoChef displays information about the type and data distribution of the currently selected schema field.

Verifying Your Schema

The schema tree is a great tool to understand and verify your schema. For our example, we assume a – fictitious – “customers” collection which contains the personal information of the customers of our – equally fictitious – shop. With the schema tree, we can now easily verify that required fields like “name” and “transactions” do indeed occur in 100% of our documents. We can also observe that the optional field “title” is apparently provided by 56.7% of our customers. The schema tree is also really helpful in discovering schema anomalies:

Discovering Missing Fields

mongodb schema discovery missing fields

Looking at our schema discovery results, we see that field “first” – which stores our customers’ first names – is missing in 0.4% of our documents. This may for example suggest that our web shop software might be flawed. To learn more about those documents that are missing a “first” field, MongoChef makes it super-easy to explore that actual data. If we right-click the “first” field, we can choose “Explore documents not containing selected field”. This will open a new query tab that shows all documents that do not contain the field “first”. It is important to note that this query will of course honor the base query criteria of your analysis but will bring up all matching documents in the collection, not just in the (limited) sample set.

mongodb schema discovery inspect missing fields

We could now inspect those documents which might reveal clues as to what has caused the missing field anomaly.

Discovering Unexpected Fields

Looking at our schema discovery results again, we spot another anomaly – an unexpected field:

mongodb schema discovery unexpected field

We see that in 95% of our documents, the field “user_name” is spelled with an underscore. Yet, in 5% of documents, it is spelled with a hyphen. This could for example be down to a typo somewhere in the source code. After fixing the typo in the code, MongoChef then makes it very easy to also fix it in your collection. Right-click the incorrectly-spelled “user-name” field and select “Explore documents containing selected field”.

mongodb-schema-discovery-explore-documents

This will open a new query tab showing all documents containing the selected field (“user-name”). Here right-click the field to rename it:

mongodb schema discovery rename field

and choose “All documents in collection” in the ensuing dialog to rename all occurrences in the collection:

mongodb rename all

Easy-peasy :-)

Discovering Incorrect Field Types

Another type of schema outlier that one commonly wants to look out for is incorrect field types. MongoChef makes that really easy to spot too. Consider in our example the “address” field in our customer documents. We store the addresses as an embedded object of the following type:

“address”:{“street”:{“name”:”Scofield”,”suffix”:”Drive”,”number”:”34″}}

However, when we look at the “address.street” field in the discovered schema tree, we can see that there appear to be some outliers where “address.street” is of type String:

mongodb schema discovery outlier

When we select the String instance of “address.street” in the schema tree, we can quickly see in the right-hand data pane that in 4 documents something must have gone awry and all street information was stored in a simple string.

mongodb schema discovery type outlier

We can right-click the String instance node in the schema tree to “Explore documents with selected field of type String” to have a closer look at those documents:

mongodb schema discovery explore documents

 

mongodb schema discovery document view

Exploring Data Distributions

As we have seen, when you click a field or one of its type instances in the schema tree, you can see charts showing – depending on the selected data type – various statistics on the type or data distribution of the field. Now, while MongoChef is not a full-blown BI tool by any means, these data distribution charts can often already give you some very useful insights into your documents.

Value Histograms

For numeric fields, one of the stats charts that the right-hand pane shows is the value histogram. If we look for example at the value histogram of our “transactions” field, we can quickly observe that most customers have engaged in around 50 transactions, with tails of less common numbers of transactions.

mongodb schema discovery histogram

Top Values

For many data types, the right-hand panel also shows the top values that were found across the analyzed documents. This can often be helpful to spot data outliers. Consider the “package” field in our example. Suppose our customers can subscribe to a “Free”, “Basic”, “Standard”, “XL”, or “XXL” package. However, we see that there are some customers who seem to have a “Beginner” package – which may indicate a backend glitch, for example.

mongodb schema discovery top values

Luckily, we can now use MongoChef to edit those values directly in-place:

mongodb schema discovery in place editing

Date Distributions

For date fields, MongoChef shows you in detail the value distributions. Consider, in our example, the “registered_on” field. If we look at its “Monthly value distribution”, we notice that customer registration seems to be particularly strong in the summer as well as in January. This might then provide valuable feedback to the marketing and sales teams.

mongodb schema discovery date distribution

Summary

MongoChef provides a very powerful schema explorer feature that lets you easily discover the schema that is present across the documents in your collection and thereby helps you find schema and data outliers. Drilling down into individual fields, you can see, for each field, various visual statistics relevant to the data type of that field.

MongoDB schema discovery and exploration has never been easier!

How to temporarily enable/disable stages in your MongoDB aggregation pipeline

With the release of MongoChef 4.1.1, we have a made it super-easy to quickly disable stages temporarily in your aggregation query. This often comes in very handy while debugging your aggregation query.

To demonstrate this, let’s consider the following simple aggregation query on the City of Chicago’s publicly available housing database.

debug aggregation query

Our aggregation pipeline consists of 4 stages:

  1. Limit the documents that go into the pipeline to just 5
  2. A projection stage that just keeps the address, zip and type fields
  3. A match stage that only keeps documents that have “60644” as zip code
  4. A sort stage that sort the documents in the pipeline based on their (property) type

While designing and debugging my aggregation query, we limited my pipeline to just 5 documents to keep the load on the server as low as possible. Now that we think our query is done, we want to test it for the first time on the entire data set. Since we are not yet entirely convinced that the query does what it is supposed to, we don’t want to get rid of the limit stage just yet. Instead, we’ll just temporarily exclude it from the pipeline. For this, we simply click the stage tab and deselect the “Include in pipeline” option:

disable aggregation stage

When we now run the aggregation query, it will be executed without the (disabled) first stage and we can see that the full result set is returned:

full aggregation result

And that’s how easy it is to temporarily disable aggregation stages with MongoChef! And here once again in one fluid motion :-) .

disable aggregation stage

 

How to Color Your Connections and Databases in MongoChef

MongoChef now has a coloring feature that is aimed at helping you quickly differentiate visually between your live data and test data and avoid possible accidents coming from mixing these two. This is a short walk through on how to use this useful little feature.

Let’s face it, we have all been there. Or, it is actually not that unlikely that, in case you have not yet made this mistake, you will, at some point, do the unthinkable and confuse your test data and live data. If you are religious about avoiding this mistake, you are probably already applying a number of personal tricks to make sure you don’t fall into this trap.

MongoChef makes your life easier. With just one right click of your mouse, you can assign for example a red color to your live database

Coloring DB
and a green color to your test database. (I have made a local copy of my live database for this purpose, which I will consider my test database)

Coloring test DB

Now, you can have both databases open at the same time and keep switching back and forth between them while you play with your data. Your red tabs will now immediately tell you that you are working on your live database and that you should pay extra attention when you run update queries on them.

Coloring test and live DB
All nice and clean in one glance! Saves you time and trouble and now you can rest assured you will never make the mistake of updating your live database when thinking you were in fact working on your test one!
You can even play with horizontal and vertical split of tabs for a quick and easy glance at pairs of data sets.