How to implement PHP-FPM health checks and failover in Nginx?

To implement PHP-FPM health checks and failover in Nginx, you can follow these steps:

  1. Configure PHP-FPM with health check script: In your php-fpm.conf or www.conf file (depending on your setup), add a new location block to define a health check script and its response. For example:
location /ping { access_log off; include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_pass unix:/run/php/php7.4-fpm.sock; fastcgi_read_timeout 3s; # Set appropriate timeout }
  1. Configure Nginx upstream with failover: In your Nginx configuration, define an upstream block to declare your PHP-FPM servers. Add the backup parameter to the primary server to indicate it as the backup server. For example:
upstream php-fpm { server 127.0.0.1:9000; server 127.0.0.1:9001 backup; }
  1. Add health check to Nginx configuration: In the server block where you proxy requests to PHP-FPM, add a location block to perform the health check using the script defined in step 1. For example:
location /ping { proxy_pass http://php-fpm/ping; proxy_next_upstream error timeout invalid_header http_500; proxy_connect_timeout 3s; # Set appropriate timeout }
  1. Configure failover for PHP-FPM: In the same server block, use the proxy_pass directive to forward the requests to the PHP-FPM upstream. Add the error_page directive to redirect to the /ping location in case of an error. For example:
error_page 502 = /ping; location / { try_files $uri $uri/ =404; proxy_pass http://php-fpm; proxy_next_upstream error timeout invalid_header http_500; proxy_connect_timeout 3s; # Set appropriate timeout }
  1. Test the configuration and restart Nginx: Reload or restart Nginx to apply the new configuration changes. Verify that the health check is working by accessing the /ping endpoint directly or by monitoring the Nginx access logs.

With these steps, Nginx will periodically perform health checks on PHP-FPM servers and automatically failover to the backup server if the primary server becomes unresponsive.