OPERATE and MONITOR Phase

Monitoring and Alerting

Monitoring and alerting is the process of collecting logs and metrics about everything happening in our infrastructure and sending notifications based on the metrics threshold value.

Metrics monitoring

  • Prometheus: It’s a widely used open source tool for metrics monitoring. It provides various exporters that can be used for monitoring systems or application metrics. We can also use Grafana to visualize prometheus metrics.

  • Nagios and Zabbix: These are open source software tools to monitor IT infrastructures such as networks, servers, virtual machines, and cloud services.

  • Sensu Go: It is a complete solution for monitoring and observability at scale.

Log monitoring

  • OpenSearch/Elasticsearch: It is a real-time distributed and analytic engine that helps in performing various kinds of search operations.

  • Graylog: It provides centralized log management functionality for collecting, storing, and analyzing data.

  • Grafana Loki: Grafana Loki is a lightweight log aggregation system designed to store and query logs from all your applications and infrastructure.

Alerting

  • Prometheus Alertmanager: The Alertmanager handles alerts sent by client applications such as the Prometheus server.

  • Grafana OnCall: Developer-friendly incident response with phone calls, SMS, slack, and telegram notifications.

Security-focused logging and monitoring policy is used to prevent sensitive information from being logged in plain text. We can write a test case in our logging system to look for certain patterns of data. For example, a regex to find out sensitive information so that we can detect the logs in a lower environment.

Application performance Monitoring (APM) improves the visibility into a distributed microservices architecture. The APM data can help enhance software security by allowing a full view of an application. Distributed tracing tools like Zipkin and Jaeger kind of stitch all logs together and bring full visibility of requests from start to end. It speeds up response time for new bugs or attacks.

Although all cloud providers have their own monitoring toolsets and some tools are accessible from the marketplace. Also, there are paid monitoring tool providers like Newrelic, Datadog, Appdynamics, and Splunk that provide all types of monitoring.

Security information and event management (SIEM)

Security information and event management (SIEM) offer real-time monitoring and analysis of events as well as tracking and logging of security data for compliance or auditing purposes. Splunk, Elastic SIEM, and Wazuh which give automated detection of suspicious activity and tools with behavior-based rules also can detect anomalies using prebuilt ML jobs.

Auditing

After the deployment visibility comes from the level of auditing that has been put in place on application and infrastructure. The goal would be to have your auditing at a level that allows you to feed info into a security tool to give needed data. We can enable audits on GCP with Audit Logs, AWS cloud using CloudTrail or on Azure with platform logs. For auditing applications, we can enable inbuilt audit logs and send the audit data to any logging tool like Elasticseach using auditbeat or Splunk and create an auditing dashboard.

Kubernetes runtime security monitoring

Falco is a cloud native Kubernetes threat detection tool. It can detect unexpected behavior, intrusions, and data theft in real time. In the backend, it uses Linux eBPF technology to trace your system and applications at runtime. For example, it can detect if someone tries to read a secret file inside a container, access a pod as a root user, etc, and trigger a webhook or send logs to the monitoring system. There are similar tools like Tetragon, KubeArmor, and Tracee which also provide Kubernetes runtime security.

Till now, we have seen how DevSecOps CI/CD pipeline looks like. Now, let’s dive into adding more security layer on top.

Last updated