Datadog vs AWS Cloudwatch vs Grafana
Today, enterprises are choosing to operate large numbers of servers both in the cloud and in their data centers to meet the ever-increasing demand. As organizations adopt more cloud-native technologies and IT infrastructure is becoming increasingly distributed, organizations must align business objectives and end-user experience with the availability and performance of the IT infrastructure. This shift requires infrastructure monitoring to ensure all your components work together across cloud environments, operating systems, storage, servers, virtualized systems, and more. In this post, we’ll explore some of the best server monitoring tools and software currently on the market.
List of tools we will compare:
Datadog is monitoring, security, and analytics platform for developers, IT operations teams, security engineers, and business users in the cloud age. It can perform effective monitoring of servers, tools, and databases.
It helps users see inside any stack, at any scale, any app, and anywhere. It has been one of the pioneering tools to have a focus on infrastructure monitoring. The perfect merger of monitoring app performance, infrastructure, logs, and user experience is what makes it special.
Key Features of DataDog
Datadog is an enterprise SaaS tool that offers an array of services in the monitoring domain. Some of the key features of the Datadog monitoring platform includes:
Datadog offers scalable log ingestion and analytics through its log management product. You can search, filter, and analyze log data through its dashboard. You can route all your logs from one central control panel.
Application performance monitoring
Datadog’s APM tool provides end-to-end distributed tracing from frontend devices to databases. You can connect the collected traces to infrastructure metrics, network calls, and live processes.
Using Datadog security monitoring, you can analyze operational and security logs in real-time. It provides built-in threshold and anomaly detection rules to detect threats quickly.
With Datadog network monitoring, you can analyze traffic as it flows across applications, containers, availability zones, and on-premise servers. You can track key network metrics like TCP retransmits, latency, and connection churn.
Real user monitoring
With Datadog’s real user Monitoring, you can have end-to-end visibility into user journeys for web and mobile applications.
Grafana allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. Create, explore, and share beautiful dashboards with your team and foster a data-driven culture. Grafana is primarily used to visualize your time-series database data into meaningful charts from which you can draw insights. Grafana can be used to build an open-source stack for APM, time-series, and logs monitoring.
Key Features of Grafana
Grafana is an open-source dashboard tool. The biggest feature of Grafana is that you can use it to combine different data sources and then visualize data in a central dashboard. It also comes with admin features for effective collaboration with the team.
Some of the key features of Grafana are:
Grafana provides a lot of panels that can be used for building dashboards. To build dashboards that suit your needs, you can choose from multiple chart types like heatmaps, histograms, pie charts, etc.
Grafana provides an extensive set of plugins to extend Grafana capabilities. Some of the plugins that Grafana offers are:
- Data Source plugins
- App plugins
- Panel Plugins
Grafana provides a central UI to set and manage alerts with a central UI.
Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), IT managers, and product owners. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, and optimize resource utilization. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events. You get a unified view of operational health and gain complete visibility of your AWS resources, applications, and services running on AWS and on-premises. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications running smoothly.
Key Features of CloudWatch
CloudWatch is a monitoring tool provided by Amazon Web Services. It provides monitoring for applications running on the AWS infrastructure.
Some of the key features of CloudWatch includes:
Easy collection of logs and metrics
Using CloudWatch, you can collect logs and metrics from your application, infrastructure, and services. Some of the types of logs that can be collected:
- Logs published by AWS services Currently, over 30 AWS services publish logs to CloudWatch
- Custom logs Using a CloudWatch agent, you can push logs from your application and on-premises resources.
CloudWatch allows you to collect default metrics from more than 70 AWS services such as Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, AWS Lambda, etc.
Unified visualization and composite alarms
Amazon CloudWatch provides dashboards that unify data from multiple sources for actionable insights. Some of the key visualization features include:
- Graph metrics and log data side by side
- Graphs for cloud resources and applications in a unified view
Logs and metrics correlation
Using CloudWatch, you can correlate log patterns to a specific metric and set alarms on it.
Container monitoring, lambda monitoring, and anomaly detection
CloudWatch provides automatic dashboards for container and lambda insights. Using anomaly detection, you can create alarms to auto-adjust thresholds based on metrics patterns.
- Getting started with
If you are using AWS services, then CloudWatch already offers a default console to monitor the services you use in your AWS account.
For using Datadog, you first need to sign up for a Datadog account. Once you sign up, you can install Datadog agents on your hosts. The Datadog agent reports metrics and events from your host to Datadog.
For getting started with Grafana, you first need to install it. Once Grafana is installed you can connect it to your desired data source and start visualizing the data.
Some of the popular data sources that Grafana supports are:
- AWS CloudWatch
- Azure Monitor
2. Multi-cloud support
Datadog supports multi-cloud monitoring like AWS, Azure, and Google cloud services.
CloudWatch is used to monitor AWS resources and applications that run on it. You can use the CloudWatch Logs agent installer on an instance to install and configure the CloudWatch Logs agent. After installation is complete, logs automatically flow from the instance to the log stream you create while installing the agent. The agent confirms that it has started and it stays running until you disable it.
Grafana supports multi-cloud monitoring with the help of plugins.
With Amazon CloudWatch, there is no up-front commitment or minimum fee; you simply pay for what you use. You will be charged at the end of the month for your usage. CloudWatch provides a free tier that you can explore. CloudWatch’s paid tier called EC2 detailed monitoring starts at $2.10 per instance per month(assuming 7 metrics per instance). The cost also depends on the number of metrics sent and is divided into multiple tiers. The first 10k metrics are charged at $0.30 per metric per month.
Datadog is an expensive enterprise monitoring tool with many different pricing tiers that vary on your use cases. For example, infrastructure enterprise monitoring starts at $23 per host per month while its APM sand continuous profiler starts at $40 per host per month.
The open-source version of Grafana comes for free, although you do need to account for the cost of data storage and networking. GrafanaLabs offers paid cloud plans starting at $49 per month, which scale up based on usage.
It offers better visibility and UI experience. Since the Datadog interactions with the CloudWatch are through the CloudWatch API, so it exposes more metrics
CloudWatch enables us to create custom dashboards with the metrics and logs. Some features are missing like, grouping servers.
Grafana also offers better Visibility and UI experience for the end-users. Custom dashboard creation is a bit complex in Grafana.
5. Alerts and notification management
Monitoring all of your infrastructures in one place wouldn’t be complete without the ability to know when critical changes are occurring.
Datadog integrates with partners like PagerDuty to ensure your on-call team members can be added to incidents and appropriately notified. Datadog also supports email and slack integration for notification.
AWS CloudWatch has Alarms for alerts and several actions can be taken as part of this Alarm. For example, when there is high CPU or memory usage in web servers we can initiate an alarm to trigger autoscaling. Alarms also come with notification provision via email and SMS, it can also be integrated with PagerDuty for call alerts.
In Grafana, when an alert changes state, it sends out notifications. Each alert rule can have multiple notifications. To add a notification to an alert rule you first need to add and configure a notification channel (can be email, PagerDuty, or other integration)
6. Log management
Logging the important parts of your system’s operations is crucial for maintaining infrastructure health. Modern infrastructure can generate thousands of log events per minute. In this situation, you need to choose which logs to send to a log management solution, and which logs to archive. Filtering your logs before sending them, however, may lead to gaps in coverage or the accidental removal of valuable data.
Datadog Log Management, also referred to as Datadog logs or logging removes these limitations by decoupling log ingestion from indexing. This enables you to cost-effectively collect, process, archive, explore, and monitor all of your logs without limitations, also known as Logging without Limits.
CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services that you use, in a single, highly scalable service. You can then easily view them, search them for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis. CloudWatch Logs enables you to see all of your logs, regardless of their source, as a single and consistent flow of events ordered by time, and you can query them and sort them based on other dimensions, group them by specific fields, create custom computations with a powerful query language, and visualize log data in dashboards.
Grafana Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.
7. Metrics management
CloudWatch Metrics are data about the performance of your systems. By default, many services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances). You can also enable detailed monitoring for some resources, such as your Amazon EC2 instances, or publish your application metrics. Amazon CloudWatch can load all the metrics in your account (both AWS resource metrics and application metrics that you provide) for search, graphing, and alarms.
In Datadog, metric data is ingested and stored as data points with a value and timestamp. A sequence of data points is stored as a time series. Any metrics with fractions of second timestamps are rounded to the nearest second. If any points have the same timestamp, the latest point overwrites the previous ones. Metrics that track system health come automatically through Datadog’s integrations with more than 500 services. You can also track metrics that are specific to your business — also known as custom metrics. You can track things such as the number of user logins or user cart sizes to the frequency of your team’s code commits. In addition, metrics can help you adjust the scale of your environment to meet the demand from your customers. Knowing exactly how much you need to consume in resources can help you save money or improve performance.
Grafana Metrics are stored in a time-series database (TSDB), like Prometheus, by recording a metric and pairing that entry with a time stamp. Each TSDB uses a slightly different data model, but all combine these two aspects and Grafana Cloud can accept their different metrics formats for visualization. Grafana and Grafana Cloud offer a variety of visualizations to suit different use cases.
Health organizations need to adhere to HIPAA compliance requirements for application logs as they grow and scale. Audit logs must be safely collected from every service within an organization’s system and stored for six years in case they are needed for an internal or HHS investigation. With Datadog’s HIPAA-compliant log management and security solutions, organizations can capture and store audit logs on a long-term basis, leverage their logs to verify their level of compliance with other HIPAA provisions, and automatically detect security threats in real-time.
Third-party auditors assess the security and compliance of Amazon CloudWatch as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others. Amazon CloudWatch itself does not produce, store, or transmit PHI. Customers can monitor CloudWatch API calls with AWS CloudTrail.
Grafana Labs maintains PCI Compliance through third-party approved scanning vendors. Its also Certified through an independent third-party audit with A-LIGN for ISO 27001.
Datadog offers a support platform at help.datadoghq.com and lives chat with Datadog Support Team on any business day between the hours of 10:00 and 19:00 ET. They also have a slack channel of community members and Datadog staff to discuss the latest Datadog announcements and features, get assistance with questions you have, and more
AWS Support gives customers help on technical issues and additional guidance to operate their infrastructures in the cloud. Customers can choose a tier that meets their specific requirements, continuing the AWS tradition of providing the building blocks of success without bundling or long-term commitments. AWS Support is one-on-one, fast-response support from experienced technical support engineers. The service helps customers use AWS’s products and features. With pay-by-the-month pricing and unlimited support cases, customers are freed from long-term commitments. Customers with operational issues or technical questions can contact a team of support engineers and receive predictable response times and personalized support.
There are differing types of Grafana Cloud account options, including a free tier. Each has a different feature set and different levels of support provided. There are 3 types of Grafana Cloud accounts Free, Pro and Advanced. Support is limited to documentation set and queries in the public community forums in the Free account, whereas email support is included in the Pro account. In Advanced support both email support and call support is included
10. Serverless APM Monitoring
Datadog gives us the ability to view all of those metrics, logs, and traces from our serverless applications in one place. With the help of a lambda function called Datadog Forwarder, Datadog can export custom matrices from Lambda. AWS CloudWatch provides effective Serverless APM monitoring with the help of Amazon X-Ray. Prebuilt dashboards are available in Grafana for serverless monitoring. We can also create custom dashboards
Pros and Cons
Now let’s check out the pros and cons of each tool
- Support for log aggregation and analytics
- Support for anomaly detection and alerts
- Support for custom metrics and custom Datadog integrations
- No self-hosted solution
- Complex to use; can be overwhelming for new users
- Limited log analytics due to lack of support for JSON log processing
- It allows us to configure alarms to trigger a notification (like sending emails) when any specified condition is rendered satisfied.
- CloudWatch provides the feature called Events, which is different from alerts. It makes the platform aware of the application’s operational changes as they happen in real-time. An Event can even automatically trigger a specified action.
- Very advanced visibility and insights about other integrated AWS services.
- Only pay for what you use.
- Centralized storage and analysis for logs and metrics from all combined AWS resources and also allows executing queries on this data.
- It can only be used for AWS services. There may be some good scripts made by third parties to get metrics for non-AWS servers but they aren’t an “official” solution.
- Not enough customization of dashboards.
- No metrics for memory usage by default. A custom metric has to be configured to have this basic indicator monitored.
- Becomes very expensive at the enterprise level — can be over $50,000 per year.
- Free and open-source with a huge open-source community for support
- Automatic service discovery and support for both push and pull metric scraping models
- Support for custom metrics; a huge number of exporters available to export metrics to Prometheus from different sources
- Complex and time-consuming to manage Prometheus instances; operational overhead if your staff is unfamiliar with the tool
- Need to manually configure and manage Prometheus exporters
- Manual setup required for graphs and alerts
From the comparative study, we can see that no tool can be pointed as a perfect one. We have to choose the tool based on our application architecture and organizational needs. For example, if you are planning to launch infra in AWS it would be perfect to use AWS CloudWatch for metrics, logs, dashboards .. as it already has preconfigured configurations for this. If you are planning low-cost monitoring It would be better to go for Grafana as it's open-source and free. Datadog is compatible with many infra but its price is variable and usually determined by the number of agents/hosts installed on your system; you also have to purchase each feature individually. AWS CloudWatch can satisfy almost all monitoring needs. It does require the installation of a plugin or agent in most cases of CloudWatch. Also, the pay-as-you-go concept provides this at a low cost. Each tool can be leveraged depending on its audience, pricing, and ultimate application.
Thanks for reading, I would also love to thank my mentor Rajith who was a great support in creating this post.