OpenRC is a lightweight init system used by several Linux distributions like Alpine, Gentoo, and Artix. While it efficiently manages services during normal operations, it doesn’t automatically restart failed services by default. This can lead to unexpected downtime if critical services crash. Let’s explore how to implement automatic service recovery in OpenRC to keep your system running smoothly.
Using OpenRC’s Built-in Respawn Feature
OpenRC includes a built-in respawn feature that can automatically restart services that exit unexpectedly. This method is simple to set up and doesn’t require additional scripts or tools.
Step 1: Open the service configuration file in /etc/conf.d/
. For example, if you want to set up automatic recovery for nginx:
sudo nano /etc/conf.d/nginx
Step 2: Add the following line to the file:
respawn_delay=5
respawn_max=0
This configuration tells OpenRC to wait 5 seconds before attempting to restart the service, and to keep trying indefinitely (0 means no limit).
Step 3: Save the file and exit the editor.
Step 4: Restart the service to apply the changes:
sudo rc-service nginx restart
Now, if nginx crashes or exits unexpectedly, OpenRC will automatically attempt to restart it after a 5-second delay.
Creating a Custom Monitoring Script
For more complex monitoring scenarios or when you need to perform additional actions before restarting a service, a custom script can be useful.
Step 1: Create a new script file:
sudo nano /usr/local/bin/service-monitor.sh
Step 2: Add the following content to the script:
#!/bin/bash
SERVICE_NAME="nginx"
MAX_RESTARTS=3
RESTART_INTERVAL=300
restart_count=0
last_restart_time=0
while true; do
if ! rc-service $SERVICE_NAME status > /dev/null 2>&1; then
current_time=$(date +%s)
if [ $((current_time - last_restart_time)) -ge $RESTART_INTERVAL ]; then
restart_count=0
fi
if [ $restart_count -lt $MAX_RESTARTS ]; then
echo "$(date): $SERVICE_NAME is down. Attempting restart..." >> /var/log/service-monitor.log
rc-service $SERVICE_NAME restart
last_restart_time=$current_time
restart_count=$((restart_count + 1))
else
echo "$(date): $SERVICE_NAME failed to restart $MAX_RESTARTS times. Manual intervention required." >> /var/log/service-monitor.log
exit 1
fi
fi
sleep 60
done
This script checks the service status every minute. If the service is down, it attempts to restart it up to 3 times within a 5-minute interval. If the service fails to restart after 3 attempts, the script exits and logs an error message.
Step 3: Make the script executable:
sudo chmod +x /usr/local/bin/service-monitor.sh
Step 4: Create a new OpenRC service for the monitoring script:
sudo nano /etc/init.d/service-monitor
Add the following content:
#!/sbin/openrc-run
name="Service Monitor"
command="/usr/local/bin/service-monitor.sh"
command_background=true
pidfile="/run/service-monitor.pid"
depend() {
need net
after nginx
}
Step 5: Make the new service file executable:
sudo chmod +x /etc/init.d/service-monitor
Step 6: Add the monitoring service to the default runlevel:
sudo rc-update add service-monitor default
Step 7: Start the monitoring service:
sudo rc-service service-monitor start
This method provides more flexibility and control over the restart process, allowing you to implement custom logic and logging.
Using Supervisor with OpenRC
For advanced service management and monitoring, Supervisor can be integrated with OpenRC. Supervisor is a process control system that can monitor and automatically restart services.
Step 1: Install Supervisor:
sudo apk add supervisor
Step 2: Create a configuration file for the service you want to monitor:
sudo nano /etc/supervisor.d/nginx.ini
Add the following content:
[program:nginx]
command=/usr/sbin/nginx -g "daemon off;"
autostart=true
autorestart=true
startretries=5
numprocs=1
startsecs=0
process_name=%(program_name)s_%(process_num)02d
stderr_logfile=/var/log/supervisor/%(program_name)s_stderr.log
stderr_logfile_maxbytes=10MB
stdout_logfile=/var/log/supervisor/%(program_name)s_stdout.log
stdout_logfile_maxbytes=10MB
This configuration tells Supervisor to start nginx, automatically restart it if it crashes, and manage its log files.
Step 3: Create an OpenRC service file for Supervisor:
sudo nano /etc/init.d/supervisord
Add the following content:
#!/sbin/openrc-run
name="Supervisor daemon"
command="/usr/bin/supervisord"
command_args="-c /etc/supervisord.conf"
pidfile="/run/supervisord.pid"
depend() {
need net
}
Step 4: Make the service file executable:
sudo chmod +x /etc/init.d/supervisord
Step 5: Add Supervisor to the default runlevel:
sudo rc-update add supervisord default
Step 6: Start the Supervisor service:
sudo rc-service supervisord start
Now Supervisor will manage the nginx service, automatically restarting it if it crashes. You can add more services to Supervisor by creating additional configuration files in /etc/supervisor.d/
.
Implementing automatic service recovery in OpenRC enhances system reliability by minimizing downtime caused by service failures. Whether you choose the built-in respawn feature, a custom monitoring script, or integrate with Supervisor, these methods ensure that critical services remain operational. Remember to test your chosen solution thoroughly and monitor system logs to catch any persistent issues that may require manual intervention.