I did not want to build a copy of a New Relic but smaller. I needed a slightly different logic. When thinking of a tool to monitor multiple web sites, I wanted to create a type of alert for admins when it was necessary. The logic was to gather relevant data as unobtrusively as possible without taxing performance, and then, to tell an admin, whether everything was basically ok, or not.
Thus, the data was to be gathered on a page load, and transmitted to the server, where it would be analyzed. When visiting the performance review page, the admin would then see a general status with an invitation to go and log into the admin account for more info, if problems were discovered.
Additionally, I was prompted in idea to display a processes watchdog log, that would only have the error messages in it and sorted by the amount of occurrences. PHP errors, missing images, files permissions needing correction - all these errors and notices can slow down a site. And, they can be analyzed locally, without having to send anything to the server - simply analyze the watchdog table.
Which statistics is relevant? Drupal has access to page timer and memory usage, as well as to query log. First two are the most important for the litmus test. If you have memory limit being reached, or if you have long page timers, then your site is under-performing. Query log analysis - amount of queries and their summary timer - can also be helpful, mostly, to find the pages that abuse the database layer. However, the database query log is more taxing, and it also helps to find the source of the problem, while identifying the presence of the problem is what’s needed at the initial stage.
So, the module would only send the page timer and the memory usage data, when they were poor, and additional analysis like query log statistics - if additionally enabled.
Now, the question demanding a good though was this - what is the performance fringe, when we want to tap our support staff and say, “Hey friend, there seems to be an issues that we need to address?” Every web site as it’s own peculiarities. It would be incorrect to put into the same weight category a social network and a small business-card website. At the same time, we are talking about the same platform here - Drupal, and we also have some general expectations of the web site behavior as well. We expect for a page to load faster than 0.5 seconds. Now, we are not talking about the front end, where there can be caching and reverse proxy. We are talking about the pages that have Drupal cache enabled and the time is the internal php time, from Drupal boot to drupal exit. We also can be sure that we want to be alerted if a script reaches more that ½ of the memory limit. In this case, if two people accessed the site at the same time, there would be a bottleneck. Thus, we have some general guidelines which we can apply here, regardless of the size of the web site, and then to make adjustments as we go on.
A separate consideration was security. The server host had to use https and pass a secure key by which the domain submitting the data would be identified. This opened the door to commercial usage of the service, even though it was too far from it in implementation.
On the server side, the data was to be stored for 7 days. Thus, a web site with about a 100 visitors daily would have a database table with side of below 4 Mb of data.
These thought were implemented, and I am currently testing the solution, raw as it is. You can also join in and have a look at what resulted: