Demystifying Aggregation Pipelines in MongoDB

Demystifying Aggregation Pipelines in MongoDB

Introduction

Hello everyone, I'm Hasnain. This blog is my first 2023 post and I'm really excited about it because we're going to learn Aggregation Pipelines in MongoDB. But before that, I hope you have learned the basics of MongoDB and you are aware of MongoDB query operators and CRUD(create, read, update, delete) operations. If not, you can check out my previous blog on Databases and CRUD operations in MongoDB using Mongoose or you can read the official MongoDB Documentation.

What are Aggregation Pipelines in MongoDB

As you already know, To extract data from the database we use queries. However, queries only return the data that already exists in the database. But if you want to process the data in a different way such as grouping the data, sorting the data in a specific order, filtering, and transforming you'll have to use aggregation operations. MongoDB allows you to use its Aggregation Framework which consists of an extremely powerful and extremely useful set of tools.

The idea is that we define a pipeline that all the documents from a certain collection go through where they are processed in order to transform them into aggregated results. An aggregation pipeline consists of one or more stages that process documents. Each stage operates on the input documents. For example, a stage can filter documents, group documents, and calculate values. The documents that are output from a stage are passed to the next stage.

You'll learn more about stages in the next section but for now let's see what sorting, filtering, grouping and transforming are.

  • Sorting: You can reorder the documents based on a chosen field.

  • Filtering: Documents can be narrowed down through a set of criteria.

  • Grouping: Operation that processes multiple documents together to generate a summarized result.

  • Transforming: Transforming refers to the ability to modify the structure of documents. This means you can rename or remove certain fields, or perhaps a group, or rename fields within an embedded document for legibility.

Creating an Aggregation Pipeline and learning more about Stages

To create an aggregation pipeline, you can use MongoDB’s aggregate() method. This method uses a syntax that is similar to the find()method which is used to query data in a collection, but aggregate() accepts one or more stage names as arguments. First, I hope you already know how to create a database and a collection. let's insert the test documents in the collection.

Inserting the sample data in the collection using 'insertMany'

  • Insert the test documents in the collection with the help of the command ‘InsertMany’.
// Sample Data
db.tours.insertMany([
{
                "ratingsAverage": 4.7,
                "ratingsQuantity": 37,
                "_id": "6396fe2ef7e15d333cd4439f",
                "name": "The Forest Hiker",
                "duration": 5,
                "maxGroupSize": 25,
                "difficulty": "easy",
                "price": 390,
                "id": "6396fe2ef7e15d333cd4439f"
            },
            {
                "ratingsAverage": 4.8,
                "ratingsQuantity": 23,
                "_id": "6396fe2ef7e15d333cd443a0",
                "name": "The Sea Explorer",
                "duration": 7,
                "maxGroupSize": 15,
                "difficulty": "medium",
                "price": 49,
                "durationWeeks": 1,
                "id": "6396fe2ef7e15d333cd443a0"
            },
            {
                "ratingsAverage": 4.6,
                "ratingsQuantity": 54,
                "_id": "6396fe2ef7e15d333cd443a2",
                "name": "The City Wanderer",
                "duration": 9,
                "maxGroupSize": 20,
                "difficulty": "easy",
                "price": 1197,
                "durationWeeks": 1.2857142857142858,
                "id": "6396fe2ef7e15d333cd443a2"
            },
            {
                "ratingsAverage": 4.9,
                "ratingsQuantity": 19,
                "_id": "6396fe2ef7e15d333cd443a3",
                "name": "The Park Camper",
                "duration": 10,
                "maxGroupSize": 15,
                "difficulty": "medium",
                "price": 1497,
                "durationWeeks": 1.4285714285714286,
                "id": "6396fe2ef7e15d333cd443a3"
            },
            {
                "ratingsAverage": 4.5,
                "ratingsQuantity": 13,
                "_id": "6396fe2ef7e15d333cd443a1",
                "name": "The Snow Adventurer",
                "duration": 4,
                "maxGroupSize": 10,
                "difficulty": "difficult",
                "price": 997,
                "durationWeeks": 0.5714285714285714,
                "id": "6396fe2ef7e15d333cd443a1"
            },
            {
                "ratingsAverage": 4.7,
                "ratingsQuantity": 28,
                "_id": "6396fe2ef7e15d333cd443a4",
                "name": "The Sports Lover",
                "duration": 14,
                "maxGroupSize": 8,
                "difficulty": "difficult",
                "price": 2997,
                "durationWeeks": 2,
                "id": "6396fe2ef7e15d333cd443a4"
            },
            {
                "ratingsAverage": 4.5,
                "ratingsQuantity": 35,
                "_id": "6396fe2ef7e15d333cd443a5",
                "name": "The Wine Taster",
                "duration": 5,
                "maxGroupSize": 8,
                "difficulty": "easy",
                "price": 1997,
                "durationWeeks": 0.7142857142857143,
                "id": "6396fe2ef7e15d333cd443a5"
            },
            {
                "ratingsAverage": 4.7,
                "ratingsQuantity": 28,
                "_id": "6396fe2ef7e15d333cd443a6",
                "name": "The Star Gazer",
                "duration": 9,
                "maxGroupSize": 8,
                "difficulty": "medium",
                "price": 2997,
                "id": "6396fe2ef7e15d333cd443a6"
            },
            {
                "ratingsAverage": 4.9,
                "ratingsQuantity": 33,
                "_id": "6396fe2ef7e15d333cd443a7",
                "name": "The Northern Lights",
                "duration": 3,
                "maxGroupSize": 12,
                "difficulty": "easy",
                "price": 1497,
                "id": "6396fe2ef7e15d333cd443a7"
            }
])

Creating an Aggregation Pipeline

In the aggregate method, we pass in an array of stages. The documents then pass through these stages one by one in a sequence that we define here.

// Example 1

//aggregate method
const stats = collection.aggregate([
// Defining the stages

// 1. The Match Stage
{
    $match:{ratingsAverage:{ $gte:4.5 }}
},
// 2. The Group Stage
{
    $group:{
            _id: "$name",
            avgRating:{ $avg:'$ratingsAverage' },
            avgPrice:{ $avg:'$price'}

}
},
// 3. The Sort Stage
{
  //Sorting in ascending
    $sort:{ avgPrice:1 }
},
//4. The Unwind Stage
{
    $unwind: 'startDates'
},
{
//5. Add Field
    $addField:{ month:'$_id'}
},
//6. limit
{
    $limit:3
},
//7. Project Stage
{
    $project:{ _id:0 }
}
])

Stages of MongoDB Aggregation Pipeline

  1. $project

  2. $match

  3. $group

  4. $sort

  5. $skip and $limit

  6. $addField

  7. $unwind

The $project stage

This stage can reshape every document in the stream, for instance, by adding new fields or getting rid of existing fields.

How does the project stage work? we simply give each of the field names a zero or a one. In this example, I have given a 0 to the id so it will never show up

// Project Stage
{
    $project:{ _id:0 }
}

The $match stage

The 'match' stage is used to select or filter certain documents. For every input document, the output is either zero documents (no match) or one document (a match). Have a look at the sample data and example 1.

Let's say we only have to select documents that have a rating average greater or equal to 4.5.

// The match stage
{
    $match:{ratingsAverage:{ $gte:4.5 }}
},

The $group stage

As the name says, It allows us to group documents together like using accumulators. An accumulator for example is even calculating an average. For example, if we have 5 restaurants and each of them has a rating we can then calculate the average rating using 'group'.

You must always specify the id because this is where we specify what we want to group by. I'm writing null here because I want to have everything in one group so that I can calculate the statistics for all the restaurants.

// 2. The Group Stage
{
    $group:{
            _id: "$name",
            avgRating:{ $avg:'$ratingsAverage' },
            avgPrice:{ $avg:'$price'}

}
}

So here I'm trying to calculate the average rating. In order to do that I specify a new field avgRating: and it will be $avg which is a MongoDB operator. So $avg and the name of the field in quotes. The same thing is repeated for the average price.

The $sort stage

You can sort the documents in an aggregation pipeline using a $sort stage. In $sort stage we create a new object and specify which field we want to sort this by.

Keep in mind that you have to use the field names that you specified in the $group stage. You can no longer use the old names because it doesn't exist in the aggregation pipeline.

// 3. The Sort Stage
{
  //Sorting in ascending
    $sort:{ avgPrice:1 }
},

The $skip and $limit stage

$skip allows you to skip the first n documents where n is the specified skip number and passes the remaining documents unamended to the pipeline.

$limit is exactly the same as 'limit' in the query. It is going to allow us to only have n number of specified documents.

//6. limit
{
    $limit:3
}

The $addField stage

Add field stage does what it says, it adds fields. Inside the object write the name of the field that you want to add and its value.

{
//5. Add Field
    $addField:{ month:'$_id'}
}

The $unwind stage

Unwind stage deconstructs an array field from the input documents and then output one document for each element of the array.

//4. The Unwind Stage
{
    $unwind: 'startDates'
}

If you want to learn more about MongoDB Aggregation Pipeline stages, visit the official documentation.

Conclusion

In this article, I have described only a small subset of aggregation pipeline features provided by MongoDB and tried to simplify it. The best resource would always be the official documentation. I hope this article fulfilled its purpose. Thank you so much for your time. I really appreciate it. If you have any queries, please post them in the comments. I'll see you in the next blog.