Since I’ve been increasing my knowledge running my own VPS and VM servers on my FreeNAS/TrueNAS machine, I’ve learned the importance of logs.
Whether it’s a website, application, or operating system, most likely, they are producing various forms of logs. These logs hold the clues to generic operations and errors that can help you understand how well or poorly the application is working.
The problem is that these logs are constantly being generated and are not the most human-friendly. Operating systems alone generate tens of thousands of lines of logs. Keeping up with these logs is difficult. That’s where logging software comes in.
Of course, no conversation about logs cannot begin without mentioning Ren and Stimpy:
https://youtu.be/5Y0dGHkAkIY
How I Got Into Logging
At work, we use software called Splunk. At first, I could barely get a sense of what to do. This was because I didn’t understand our company’s infrastructure, and I was only given a query to execute to complete my specific task.
Later on, I got the idea of using this software for another project. After reaching out to areas who dropped this application into my lap for help, I got no replies and I was at a standstill. It didn’t stop my motivation to learn how to approach my goal because I found lots of potential in having this information.
Splunk is available for free, but has some limitations:
- Alerting (monitoring) is not available.
There are no users or roles. This means:
- There is no login. You are passed straight into Splunk Web as an administrator-level user.
- The command line or browser can access and control all aspects of Splunk Free with no user and password prompt.
- There is only the admin role, and it is not configurable. You cannot add roles or create user accounts.
- Restrictions on search, such as user quotas, maximum per-search time ranges, and search filters are not supported.
- Distributed search configurations including search head clustering are not available.
- Deployment management capabilities are not available.
- Indexer clustering is not available.
- Forwarding in TCP/HTTP formats is not available. This means you can forward data from a Free license instance to other Splunk platform instances, but not to non-Splunk software.
- Report acceleration summaries are not available.
- The Free license gives very limited access to Splunk Enterprise features.
- The Free license is for a standalone, single-instance use only installation.
- The Free license does not expire.
- The Free license allows you to index 500 MB per day. If you exceed that you will receive a license violation warning.
- The Free license will prevent searching if there are a number of license violation warnings.
The majority of these limitations aren’t terrible. However, the first bullet item is where I have an issues: “Alerting (monitoring) is not available.” To have emails be sent to you if a event matches a rule that you create is one of the reasons why logging software is so powerful.
Therefore, I found Graylog as a possible solution. It allows for a similar interface as Splunk, notifications, and is extensible through plugins and content packs. Best part: It’s Open Source.
There are many available open source logging platforms, but Graylog was the first that i tried, and appeared to work well for me. The installation process is quite extensive. You will need to have reasonable familiarity with the command line, and I found you’ll need to have a minimum of 4GB of RAM to have a decent performance.
I installed Graylog initially in January 2020 using both the Documentation and a decent YouTube Tutorial which provided more instruction. It required me to install Java, MongoDB, ElasticSearch, and then Graylog. This stack is the backbone of Graylog, and is the reason why you need at least 4 GB of RAM. I first installed this on a VM in my TrueNAS/FreeNAS machine.
Having Graylog installed isn’t enough, you need to send logs to Graylog. Every system and application produces logs in different ways, and there are various ways to sent logs to Graylog. That is where I got the most hung up on working on this project.
Created a UDP Syslog format
The installation instructions from above help ingest rsyslogs to Graylog. Rsyslogs are a common logging format for Linux systems. I found configuring rsyslogs to send to Graylog on Linux rather simple and easy to understand the more I did it for my VMs and RPis.
Apache logs
What really took me some time was figuring out how to send logs from Apache servers. After lots of digging, I found some two older blog posts (here and here) about how others had did it years prior, and I finally figured out how to do it myself.
After some time, I was able to put together the following:
LogFormat "{ \"version\": \"1.1\", \"host\": \"%V\", \"short_message\": \"%r\", \"timestamp\": %{%s}t, \"level\": 6, \"user_agent\": \"%{User-Agent}i\", \"source_ip\": \"%a\", \"duration_usec\": %D, \"duration_sec\": %T, \"request_size_byte\": %O, \"http_status_orig\": %s, \"http_status\": %>s, \"http_request_path\": \"%U\", \"http_request\": \"%U%q\", \"http_method\": \"%m\", \"http_referrer\": \"%{Referer}i\", \"from_apache\": \"true\" }" apache_prod_greylog CustomLog ${APACHE_LOG_DIR}/prod_ssl_apache_gelf.log apache_prod_greylog CustomLog "| /bin/nc -u 192.99.167.196 1514" apache_prod_greylog
The first line creates a custom LogFormat.
The second line outputs that format to a new log file in the Apache log directory.
The final line pipes the format to a netcat
command that sends the data to an IP address at a specific port. Note: IP Address is no live
Apache logs aren’t created unless there is traffic sent to the web server. After visiting the site on the same server that Graylog is on, it didn’t take long to see the data ingested in Graylog. This gave me the idea to continuously add more information to the Apache logs to suit my needs. I used the official Apache Log format documentation and some Loggly information in order to adjust the log formats to my liking.
Sending Remote Logs to Graylog on the Same Network
Now that I have both Linux system logs and Apache logs working, I replicated these steps on all my VMs, and Graylog was now obtaining logs from several VMs and two Raspberry Pis.
Sending Remote Logs to Locally Hosted Graylog
Once I had my own proof-of-concept Graylog instance running within my local network, I felt comfortable wanting to have the logs generated from this very site and server hosted at Digital Ocean sent to Graylog. This would require me to open my local network to the internet to reach my VM that housed Graylog.
This was another headache.
There’s lots of information on how to do this. Essentially, you would port forward traffic to your IP address provided by your ISP. That seems simple with most routers. However, my ISP provides Dynamic IP addresses that updates once-in-a-while. That’s a simple workaround with Dynamic DNS, where software check frequently to see if your IP address to the your ISP changes, if it does, it updates the software.
Well, that’s where I really got stuck. It turns out that that my internet is behind what is called a Carrier Grade NAT or CGNAT. There are a finite amount of IP address throughout the world, and the availability of these IP addresses is shrinking. To accommodate the amount of IP address an ISP may have, they may place certain neighborhoods behind a CGNAT.
The concept is similar to having a router in your home. A router creates a network for all the machines on that network that includes an IP address and creates new IP addresses for each machine that connects to the router. A CGNAT does the same thing for certain neighborhoods. Therefore, my home internet was assigned an IP address within the neighborhood network, and the neighborhood network that I’m a part of has an single IP address that broadcasts to the rest of the world. This means that port forwarding and dynamic DNS was not available.
This is where I recognized that I had hit another wall. I did find another option called ngrok, but it too was didn’t work the way I would like. After looking my known options, I chose to pack up my Graylog project for the time being.
Learning Splunk and New Motivation
I finally got mentorship with Splunk and after playing with it more, I was able to approach my goal at work and saw more possibilities with logging that reignited my interest with Graylog.
Since I was now a full year into managing a VPS for aaronweiss.me, I felt it might be an excellent opportunity to launch another VPS with the sole goal of logging. However, I knew I needed more RAM than my current little $5 droplet at Digital Ocean. To have a droplet with a minimum of 4 GB of RAM would be $20 per month. I felt that was too steep for a little project like this, which led me on a journey to find a VPS that had the resources I needed at a reasonable price.
Looking for low-priced VPS with that amount of RAM is not difficult, but there are companies you need to vet as they could be fly-by-night operations. I had located Hetzner and OVH first.
Hetzner has extremely low-cost VPSs available. A Single Core 4 GB VPS would be 5.68 euros which was roughly $6.73 per month based on the currency rate at the time of publication. Hetzner’s servers are located in Germany and Finland. Given that I’m solely concerned about logs, I didn’t need low latency, and this would have been okay.
I had also found OVH, which is a French company with servers located world-wide. I found they had a server Quebec for $10.58 with 2 vCPU and 4GB of RAM. I chose to start off with this company. After about a day of setting things up, I was able to ingest logs from my own website, VMs, and my Raspberry Pis and it was working very well.
But I wanted to reduce that cost even more. I finally found VPSDime which offers a $7 VPS with 6 GB of RAM and 4vCPUs. These resources at that price was suspicious to me, but after due diligence, the amount strong service and support reviews, I thought I’d try it out. The extra resources make a huge difference in speed. When I would restart Graylog or any portion of the stack on my VMs, it could take about 5 minutes to load. OVH took about 4 minutes to load. VPS Dime takes about a minute or less.
Support was great when I had an issue. Surprisingly, it wasn’t my issue or theirs. It was the lovely CenturyLink outage that occurred on August 30th, 2020.
Admittedly, OVH nor VPSDime’s interfaces were not nearly as intuitive as Digital Ocean, but I was able to navigate through VPS Dime just fine.
Monitoring and Events: My Use Case for Graylog
As I stated earlier, one of my primary goals of creating this logging infrastructure was to be able to have notifications sent when certain conditions trigger.
For time to time, I was getting an “Error Establishing a Database Connection” from WordPress on this website. Since I don’t go to my own website often, this error and downtime can occur for days. Unsure of when and what caused this, I had a difficult time finding the MySQL error log to see what cause the error. Luckily, restarting MySQL quickly restarted MySQL which brought the website backup in less than 10 seconds.
Among the reasons for this error are:
- Incorrect database credentials
- Corrupted database
- Corrupted files
- Issues with the database
Once I got Graylog up and running, I created an alert for the website any time a 500 error occurred that would email me. Finally, in late September, I received several dozen emails from the previous 8 hours while I was sleeping from Graylog stating that there was a 500 error. Lo and behold, my website was down with the “Error Establishing a Database Connection” notification from WordPress.
After I restarted MySQL, I found the first time this error occurred in Graylog, then found the MySQL log file and time it occurred. The error stated:
2020-09-26T06:02:14.370246Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
A quick search pointed me to a Stack Overflow question and answer showing how to enable explicit_defaults_for_timestamp in MySQL. So now, it’s just waiting if this Database Connection occurs again. When it does, I’ll have the tools to search, discover, and investigate again.
Update: The issue wasn’t just the explicit_defaults_for_timestamp, it was also because my system needed a swap file. Despite the warnings that swap files can contribute to faster SSD degredation, I followed this Digital Ocean tutorial for Ubuntu Server 18.04 for create a swap file. Since then, there have been no MySQL failures.
The Future
Logging has been fun and I have a better understanding of how I can monitor the logs that each of my servers and websites produce. I still need to figure out how to get PHP and MySQL logs sent to Graylog, but I’m sure I’ll overcome that obstacle in due time.
Logging provides a smoking gun to why something has occurred, it may not provide the best understanding of what is running or other stats. That is where Nagios and Grafana come in to provide status monitoring and statistics and graphing.
Additionally, as I discovered in this journey, I’d prefer to run this Graylog instance on a Virtual Machine at my home, rather than spend another $7 per month for a VPS. I’ve looked into the use of a remote VPN server that will circumnavigate the CGNAT for a direct connection to the VM. That will ultimately provide more features than just allowing my remote VPS hosting aaronweiss.me to point to my VM. I could also allow this VPN to connect to other portions of my network such as accessing my Plex installation or using the same VPS server to run PiHole to block ads. a VPN server doesn’t need the same resources as Graylog, and could cost less and even us Digital Ocean if I wish.
No Comments?
The time it takes to moderate and fight spam is too great. Therefore, the decision disable comments was made. If you have a question or have a comment about this blog or any other article on this website, please contact me.