Contents

Cancel

Recommended Articles

  1. unify-apps

    Indexing

    Unify AI

    Transform raw content into searchable knowledge through AI-powered indexing and vector embeddings

  2. unify-apps

    IMAP

    Unify Integrations

    Integrate your app with IMAP to enable seamless email synchronization, real-time access, and enhanced communication workflows.

  3. unify-apps

    Preview Your Work

    Unify Automations

    Effortlessly review & monitor your automation’s performance

  4. unify-apps

    QuickBooks

    Unify Integrations

    Integrate your app with QuickBooks to streamline accounting, automate invoicing, and manage finances effortlessly

  5. unify-apps

    FTP/FTPS

    Unify Integrations

    Connect your app with FTP/FTPS to automate secure file transfers and streamline data exchange across systems.

  6. unify-apps

    Snowflake

    Unify Automations

    Connect to Snowflake for fast, scalable cloud data warehousing and analytics

  7. unify-apps

    Button

    Unify Applications

    Create interactive elements with ease using buttons

  8. unify-apps

    BambooHR

    Unify Automations

    Integrate your app with BambooHR to streamline HR management, automate employee data processing, and enhance onboarding workflows

  9. unify-apps

    Build your first automation

    Unify Automations

    Learn how to build your automation step by step

  10. unify-apps

    User Management

    Unify Applications

    Efficiently manage user roles and permissions

  11. unify-apps

    Microsoft Teams

    Unify Automations

    Connect your app with Microsoft Teams to enhance communication, automate workflows, and foster collaboration across your organization

  12. unify-apps

    Jira

    Unify Automations

    Use Jira to plan, track, and manage your agile and software development projects

  13. unify-apps

    Pre Processing

    Unify AI

    Optimize query processing through intelligent rephrasing, retrieval, and ranking to deliver accurate AI responses

  14. unify-apps

    Google Calendar

    Unify Integrations

    Integrate your app with Google Calendar to streamline scheduling, automate event management, and improve team coordination

  15. unify-apps

    SAP HANA

    Unify Integrations

    Connect your app with SAP HANA to leverage in-memory computing for real-time data processing and advanced analytics at scale.

#
Infrastructural Documentation
Logo
L1/L2 Alerts

L1/L2 Alerts

Logo

5 mins READ

To outline our approach based on potential tools:

  • Prometheus and Grafana: If you’re using this setup, we can deploy Prometheus agents on all Kubernetes nodes to collect metrics like CPU, memory, etc., and configure alerts in Grafana with Prometheus as the data source.

  • Managed Services: For resources like Kafka, MemoryDB, etc., we’ll route CloudWatch metrics into Grafana for alerting.

  • Functional Alerts: For functional checks (e.g., failed runs), we’ll send metrics to Prometheus via a connector, enabling alerting in Grafana.

  • Datadog/Splunk: If Datadog or Splunk is the monitoring tool, we can deploy Datadog agents to monitor the entire infrastructure.

Alerts

Alert Name

Threshold

Severity

CPU Utilization (Percent)

70%

L1 Warning

CPU Utilization (Percent)

90%

L2 Critical

FreeLocalStorage (Bytes)

10Gb

L1 Warning

FreeLocalStorage (Bytes)

5Gb

L2 Critical

Database Connections (Count)

less than 50

L1 Warning

Database Connections (Count)

less than 25

L2 Critical

Read Latency (Seconds)

greater than 3sec

L1 Warning

Read Latency (Seconds)

greater than 5sec

L2 Critical

Write Latency (Seconds)

greater than 3sec

L1 Warning

Write Latency (Seconds)

greater than 5sec

L2 Critical

Database Memory Usage Percentage

85

L1 Warning

Database Memory Usage Percentage

90

L2 Critical

Engine CPU Utilization

85

L1 Warning

Engine CPU Utilization

90

L2 Critical

Number of client connection over last hour is less than

<15

L2 Critical

Authentication failures over last hour is more than

>3

L2 Critical

Disk usage by broker

80%

L1 Warning

Disk usage by broker

90%

L2 Critical

CPU (User) usage by broker

80%

L1 Warning

CPU (User) usage by broker

90%

L2 Critical

Lag alerts on topics

>500

L1 Warning

Lag alerts on topics

>1000

L2 Critical

Kafka - partition count per broker

>1000

L2 Critical

Kafka - connection count per broker

<= 0

L2 Critical

Container in waiting status (in minutes)

>1min

L2 Critical

Container restarts (over last 10 min)

>5

L2 Critical

Container terminated with error (over last 10 min)

1

L2 Critical

Pod High CPU Usage (in percentage)

>80%

L1 Warning

Pod High CPU Usage (in percentage)

>90%

L2 Critical

Pod High Memory Usage (in percentage)

>80%

L1 Warning

Pod High Memory Usage (in percentage)

>90%

L2 Critical

Kubernetes Pod Crash Looping (over last 5 min)

1

L2 Critical

Node Not Ready (duration in minutes)

4mins

L1 Warning

Node Not Ready (duration in minutes)

5min

L2 Critical

High CPU Node Utilization (in percentage)

>80%

L1 Warning

High CPU Node Utilization (in percentage)

>90%

L2 Critical

Kubernetes PVC available space (in percentage)

<20%

L1 Warning

Kubernetes PVC available space (in percentage)

<10%

L2 Critical

Kubernetes PVC Pending / Lost (over last 10 min)

1

L2 Critical

Full GC Alerts on pods (over last 5 min)

>0

L2 Critical

Subnet run out of free IP addresses

0

L2 Critical

401 status code

>50 hits within 5 minutes

L1 Warning

500 status code

>10 hits within a minute

L1 Warning

>500 status codes

>10 hits within a minute

L1 Warning

SQL Routine load error

>10 hits within 5 minute

L1 Warning

GRPC Exceptions

>5 hits within 5 minutes

L1 Warning

NullPointerException errors

>5 hits within 5 minutes

L1 Warning