The gateway process can be in three states: not running (crashed), running and responsive (healthy), or running but not responding (hung). A simple pgrep check catches the first case but misses the third. On a 1 GB device with aggressive memory pressure, the gateway occasionally enters a hung state where V8 is stuck in garbage collection or waiting on a blocked I/O operation.
The healthcheck uses a two-phase approach: first check if the process exists, then verify it responds to HTTP requests. Two consecutive HTTP failures trigger a kill + restart.
The healthcheck runs as a cron job every 2 minutes:
#!/data/data/com.termux/files/usr/bin/bash
# healthcheck.sh — 2-phase gateway health verification
PID=$(pgrep -f "openclaw-gateway")
# Phase 1: Process check
if [ -z "$PID" ]; then
# Not running, start it
echo "$(date) Gateway not running, starting" >> $PREFIX/tmp/healthcheck.log
start-openclaw
exit 0
fi
# Phase 2: HTTP check (5s timeout)
if ! curl -sf -m 5 http://localhost:9000/api/status > /dev/null; then
# First failure — wait and retry
sleep 5
if ! curl -sf -m 5 http://localhost:9000/api/status > /dev/null; then
# Second failure — kill and restart
echo "$(date) Gateway hung (PID $PID), killing" >> $PREFIX/tmp/healthcheck.log
kill -9 $PID
sleep 2
start-openclaw
fi
fiInstall the cron job:
# Add to crontab (busybox crontab):
$PREFIX/bin/applets/crontab -e
# Add line:
*/2 * * * * $PREFIX/bin/healthcheck.sh# Check cron is running the healthcheck:
$PREFIX/bin/applets/crontab -l | grep healthcheck
# Expected: */2 * * * * .../healthcheck.sh
# Check healthcheck log:
cat $PREFIX/tmp/healthcheck.log
# Expected: restart events with timestamps (if any)
# Simulate hung gateway (test only):
kill -STOP $(pgrep -f openclaw-gateway) # Pause the process
# Wait 2-4 minutes for cron to detect and restart
cat $PREFIX/tmp/healthcheck.log | tail -3
# Expected: "Gateway hung" messagekill -9 (SIGKILL) is used instead of SIGTERM because a hung process won't respond to graceful signalscurl -m 5 timeout must be shorter than the cron interval. With 2-minute cron and 5-second timeout, there's ample marginpgrep -f "openclaw-gateway" matches the process title set by process.title = 'openclaw-gateway' in start-openclaw$PREFIX/bin/applets/crond, not the standard /usr/sbin/crond| Metric | Before | After |
|---|---|---|
| Hung detection | None (manual SSH) | Auto (2 min) |
| Restart on crash | Manual | Auto |
| False positive rate | N/A | ~0 (double-check) |
| Max downtime | Hours (until noticed) | ~4 min |