How to Set Up Automatic Service Recovery in OpenRC

OpenRC is a lightweight init system used by several Linux distributions like Alpine, Gentoo, and Artix. While it efficiently manages services during normal operations, it doesn’t automatically restart failed services by default. This can lead to unexpected downtime if critical services crash. Let’s explore how to implement automatic service recovery in OpenRC to keep your system running smoothly.

Using OpenRC’s Built-in Respawn Feature

OpenRC includes a built-in respawn feature that can automatically restart services that exit unexpectedly. This method is simple to set up and doesn’t require additional scripts or tools.

Step 1: Open the service configuration file in /etc/conf.d/. For example, if you want to set up automatic recovery for nginx:

sudo nano /etc/conf.d/nginx

Step 2: Add the following line to the file:

respawn_delay=5
respawn_max=0

This configuration tells OpenRC to wait 5 seconds before attempting to restart the service, and to keep trying indefinitely (0 means no limit).

Step 3: Save the file and exit the editor.

Step 4: Restart the service to apply the changes:

sudo rc-service nginx restart

Now, if nginx crashes or exits unexpectedly, OpenRC will automatically attempt to restart it after a 5-second delay.


Creating a Custom Monitoring Script

For more complex monitoring scenarios or when you need to perform additional actions before restarting a service, a custom script can be useful.

Step 1: Create a new script file:

sudo nano /usr/local/bin/service-monitor.sh

Step 2: Add the following content to the script:

#!/bin/bash

SERVICE_NAME="nginx"
MAX_RESTARTS=3
RESTART_INTERVAL=300

restart_count=0
last_restart_time=0

while true; do
    if ! rc-service $SERVICE_NAME status > /dev/null 2>&1; then
        current_time=$(date +%s)
        if [ $((current_time - last_restart_time)) -ge $RESTART_INTERVAL ]; then
            restart_count=0
        fi
        
        if [ $restart_count -lt $MAX_RESTARTS ]; then
            echo "$(date): $SERVICE_NAME is down. Attempting restart..." >> /var/log/service-monitor.log
            rc-service $SERVICE_NAME restart
            last_restart_time=$current_time
            restart_count=$((restart_count + 1))
        else
            echo "$(date): $SERVICE_NAME failed to restart $MAX_RESTARTS times. Manual intervention required." >> /var/log/service-monitor.log
            exit 1
        fi
    fi
    sleep 60
done

This script checks the service status every minute. If the service is down, it attempts to restart it up to 3 times within a 5-minute interval. If the service fails to restart after 3 attempts, the script exits and logs an error message.

Step 3: Make the script executable:

sudo chmod +x /usr/local/bin/service-monitor.sh

Step 4: Create a new OpenRC service for the monitoring script:

sudo nano /etc/init.d/service-monitor

Add the following content:

#!/sbin/openrc-run

name="Service Monitor"
command="/usr/local/bin/service-monitor.sh"
command_background=true
pidfile="/run/service-monitor.pid"

depend() {
    need net
    after nginx
}

Step 5: Make the new service file executable:

sudo chmod +x /etc/init.d/service-monitor

Step 6: Add the monitoring service to the default runlevel:

sudo rc-update add service-monitor default

Step 7: Start the monitoring service:

sudo rc-service service-monitor start

This method provides more flexibility and control over the restart process, allowing you to implement custom logic and logging.


Using Supervisor with OpenRC

For advanced service management and monitoring, Supervisor can be integrated with OpenRC. Supervisor is a process control system that can monitor and automatically restart services.

Step 1: Install Supervisor:

sudo apk add supervisor

Step 2: Create a configuration file for the service you want to monitor:

sudo nano /etc/supervisor.d/nginx.ini

Add the following content:

[program:nginx]
command=/usr/sbin/nginx -g "daemon off;"
autostart=true
autorestart=true
startretries=5
numprocs=1
startsecs=0
process_name=%(program_name)s_%(process_num)02d
stderr_logfile=/var/log/supervisor/%(program_name)s_stderr.log
stderr_logfile_maxbytes=10MB
stdout_logfile=/var/log/supervisor/%(program_name)s_stdout.log
stdout_logfile_maxbytes=10MB

This configuration tells Supervisor to start nginx, automatically restart it if it crashes, and manage its log files.

Step 3: Create an OpenRC service file for Supervisor:

sudo nano /etc/init.d/supervisord

Add the following content:

#!/sbin/openrc-run

name="Supervisor daemon"
command="/usr/bin/supervisord"
command_args="-c /etc/supervisord.conf"
pidfile="/run/supervisord.pid"

depend() {
    need net
}

Step 4: Make the service file executable:

sudo chmod +x /etc/init.d/supervisord

Step 5: Add Supervisor to the default runlevel:

sudo rc-update add supervisord default

Step 6: Start the Supervisor service:

sudo rc-service supervisord start

Now Supervisor will manage the nginx service, automatically restarting it if it crashes. You can add more services to Supervisor by creating additional configuration files in /etc/supervisor.d/.


Implementing automatic service recovery in OpenRC enhances system reliability by minimizing downtime caused by service failures. Whether you choose the built-in respawn feature, a custom monitoring script, or integrate with Supervisor, these methods ensure that critical services remain operational. Remember to test your chosen solution thoroughly and monitor system logs to catch any persistent issues that may require manual intervention.