What is infrastructure monitoring
Infrastructure monitoring is the continuous collection of metrics from servers, containers, databases and the network, their storage, visualization and alerting when something goes out of bounds. The goal is to notice a problem before users do and find the cause faster.
What exactly gets tracked
- Hosts: CPU, memory, disk (space and I/O), network, load average.
- Containers and orchestration: pod restarts, CPU/memory limits, OOM kills.
- Databases: connections, slow queries, replication lag.
- Services and applications: error rate, latency (p50/p95/p99), queues.
What it consists of
- Collection — an agent or exporter scrapes metrics and ships them to storage.
- Storage — a time-series database (TSDB) holds the series of points.
- Visualization — dashboards with charts.
- Alerting — rules that send a notification to Slack/Telegram/on-call when a metric breaches a threshold.
Push vs pull
In the pull model the server fetches metrics itself (like Prometheus). In the push model the agent sends data itself. Unimoni uses push over mTLS — you do not need to open inbound ports on your servers.
Where to start
Capture basic host metrics (USE: Utilization, Saturation, Errors), set up a few actionable alerts (host down, low disk space, rising errors) and do not breed noise — an alert with no action only dulls attention.