Central log management and why it really matters
It’s a well-known fact that without logging we don’t really understand if our systems are functioning properly. Or worse, if something goes wrong, it’s next-to-impossible to understand why.
However, today’s software architecture is more modularized as opposed to a more monolithic approach employed in the past. Individual applications are now often isolated into application containers (e.g. LXC, Docker) to decouple complex systems into more easily manageable chunks which can be easily deployed, replicated, secured and provisioned.
This trend poses a challenge to logging: Logs are spread through multiple machines (sometimes hundreds), in various application containers, or virtual machines. Attempting to keep track of all the files manually can, and will prove, infeasible.
The solution is centralized log management. It’s a means by which individual applications send their log and event data to a remote central server. This remote server processes and stores the data, and usually provides a user interface for managing and searching the logs or events.
Centralized log management has several advantages over the decentralized approach:
Improved log data availability
If an application container crashes, logs might become inaccessible or disappear altogether, as is the case for Docker’s ephemeral storage. Unfortunately, log data are most important when such crashes occur, not when the system is running smoothly. Thus, having access to log data in such cases is crucial. Centralized log management solves this issue by immediately broadcasting all the data to a central server so that nothing can is lost.
If an application container is breached, the attacker is going to have a much harder time covering their tracks (modifying log files, deleting command history, etc.,) because all their actions will be immediately broadcasted to a remote server, which they do not have access to. This helps with breach detection in the first place.
Improved system-wide overview
Problems and faults in production environments can be stressful. Each second a service is offline or at reduced capacity is costing your business money. Diagnosing systems that consists of dozens or even hundreds of components produces huge cognitive overhead for the people troubleshooting. If the logs are streamed into a single place where they can be easily filtered within a good interface, a lot of this stress is greatly reduced.
Typically, monitoring only keeps track of metrics that are visible from outside of the application (response time, memory usage etc.,) but ultimately, these are only partial heuristics which do not give a thorough picture of the health of the system. Only the application itself knows if there is an unexpected error that occurred. For example, there might be only a single use case that is failing and only a small subset of users are affected, but on the surface, for the monitoring tools, everything seems to be working.
This problem can be solved by centralized log management. Centralized log management systems often allow setting up alerts based on some log pattern. For example, in case of an error level message in the logs, send a notification to a person who has been assigned on-call duty. Whilst logging and monitoring are usually considered to be separate topics in their own right, log event-based alerts provide a good combination. One further benefit is that the application itself does not need to decide what to do in case of errors, that is left up to the centralized logging system.
In conclusion, if you or your company are planning to start making use of a workflow that uses containerized applications or applications that are deployed to many hosts, be sure to consider how your log data is going to be managed. Centralized log management might prove invaluable for you.