< Back to all hacks

#46 WiFi Watchdog

Network
Problem
WiFi drops silently, gateway can't reach API providers. No automatic recovery.
Solution
Cron every 2 min: ping 8.8.8.8, toggle WiFi via svc wifi disable/enable if dead.
Lesson
Android WiFi stack on the Moto E2 silently disconnects under memory pressure or Doze mode. A simple ping-and-toggle cron recovers connectivity in under 10 seconds.

Context

The Moto E2 running as a headless PocketClaw gateway depends entirely on WiFi for API access. There is no SIM card, no mobile data fallback. When WiFi drops, the gateway keeps running but every API request fails with ENETUNREACH or ETIMEDOUT, and the Telegram bot goes silent.

WiFi drops happen for several reasons on this device:

  • Android Doze mode (API 23+) aggressively saves power by disabling WiFi during periods of inactivity. Even with battery optimization disabled for Termux, the system-level WiFi management still triggers disconnections
  • Low memory conditions cause Android's wpa_supplicant or connectivity service to crash and fail to reconnect
  • The Moto E2's Qualcomm WiFi chipset (WCN3620) has known firmware issues with certain access points where it loses association after extended uptime
  • After headless debloat (Hack #30), some network management services are disabled, reducing Android's ability to automatically reconnect

The phone has no screen interaction (headless), so there is nobody to notice the drop and manually toggle WiFi. An automated watchdog is essential.

Implementation

The watchdog script lives in Termux and runs via busybox crond:

#!/data/data/com.termux/files/usr/bin/bash
# wifi-watchdog.sh — detect and recover from WiFi drops
# Runs every 2 minutes via busybox cron

LOG="$PREFIX/tmp/wifi-watchdog.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

# Phase 1: Quick connectivity check
if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  # WiFi is fine, exit silently
  exit 0
fi

# Phase 2: WiFi appears dead. Log and attempt recovery.
echo "[$TIMESTAMP] WiFi dead, toggling..." >> "$LOG"

# Toggle WiFi off and on using Android's svc command
# svc runs as shell user and can control WiFi without root
svc wifi disable
sleep 2
svc wifi enable

# Wait for WiFi to reassociate and get DHCP lease
sleep 8

# Phase 3: Verify recovery
if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  echo "[$TIMESTAMP] WiFi recovered after toggle" >> "$LOG"
  exit 0
fi

# Phase 4: First toggle failed. Try a harder reset.
echo "[$TIMESTAMP] First toggle failed, retrying with longer delay..." >> "$LOG"
svc wifi disable
sleep 5
svc wifi enable
sleep 15

if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  echo "[$TIMESTAMP] WiFi recovered after second toggle" >> "$LOG"
else
  echo "[$TIMESTAMP] WiFi STILL DEAD after two toggles" >> "$LOG"
fi

Deploy the script and set up the cron job:

# Save the watchdog script
cat > $PREFIX/bin/wifi-watchdog.sh << 'SCRIPT'
#!/data/data/com.termux/files/usr/bin/bash
LOG="$PREFIX/tmp/wifi-watchdog.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  exit 0
fi
echo "[$TIMESTAMP] WiFi dead, toggling..." >> "$LOG"
svc wifi disable
sleep 2
svc wifi enable
sleep 8
if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  echo "[$TIMESTAMP] WiFi recovered after toggle" >> "$LOG"
  exit 0
fi
echo "[$TIMESTAMP] First toggle failed, retrying..." >> "$LOG"
svc wifi disable
sleep 5
svc wifi enable
sleep 15
if ping -c 1 -W 3 8.8.8.8 > /dev/null 2>&1; then
  echo "[$TIMESTAMP] WiFi recovered after second toggle" >> "$LOG"
else
  echo "[$TIMESTAMP] WiFi STILL DEAD after two toggles" >> "$LOG"
fi
SCRIPT

chmod +x $PREFIX/bin/wifi-watchdog.sh

# Add cron job via busybox crontab
# busybox applets are at $PREFIX/bin/applets/
$PREFIX/bin/applets/crontab -e
# Add this line:
# */2 * * * * $PREFIX/bin/wifi-watchdog.sh

To add the cron entry programmatically:

# Write crontab entry
CRON_LINE="*/2 * * * * $PREFIX/bin/wifi-watchdog.sh"
($PREFIX/bin/applets/crontab -l 2>/dev/null; echo "$CRON_LINE") | sort -u | $PREFIX/bin/applets/crontab -

# Verify crontab
$PREFIX/bin/applets/crontab -l
# Expected: */2 * * * * /data/data/com.termux/files/usr/bin/wifi-watchdog.sh

The watchdog is started automatically by the boot script (~/.termux/boot/start-pocketclaw.sh):

# Inside start-pocketclaw.sh — start busybox crond for watchdog + other crons
$PREFIX/bin/applets/crond -b -c $PREFIX/var/spool/cron/crontabs

Verification

# Verify crond is running:
pgrep -f crond
# Expected: PID of crond process

# Verify crontab contains the watchdog:
$PREFIX/bin/applets/crontab -l | grep wifi-watchdog
# Expected: */2 * * * * .../wifi-watchdog.sh

# Simulate a WiFi drop (from ADB shell or Termux):
svc wifi disable
# Wait 2 minutes for cron to fire
# Check the log:
tail -5 $PREFIX/tmp/wifi-watchdog.log
# Expected: "[timestamp] WiFi dead, toggling..." followed by recovery

# Test the script manually:
$PREFIX/bin/wifi-watchdog.sh
# If WiFi is up: exits silently (exit 0)
# If WiFi is down: toggles and logs

# Check recovery time:
svc wifi disable && time $PREFIX/bin/wifi-watchdog.sh
# Expected: ~10 seconds for toggle + DHCP

Gotchas

  • svc wifi disable/enable works from Termux (untrusted_app) without root. This is an Android design decision: the svc command runs in the shell domain and has permission to control WiFi via the connectivity service
  • The 2-minute cron interval means worst-case detection latency is 2 minutes. During that window, all API requests fail. The healthcheck cron (Hack #54) handles restarting the gateway if it enters a bad state from accumulated timeouts
  • ping -W 3 sets a 3-second timeout. On very slow networks, this might false-positive. Increase to -W 5 if you see unnecessary toggles in the log
  • The sleep 8 after re-enabling WiFi is critical. WiFi association takes ~2 seconds, and DHCP lease acquisition takes ~3-5 seconds on most routers. Without the sleep, the verification ping fires before the phone has an IP address and reports false failure
  • Log rotation is not built in. The watchdog log grows unbounded. Add periodic truncation: echo "" > $PREFIX/tmp/wifi-watchdog.log via a weekly cron, or use tail -100 to keep only recent entries
  • On some Android 6 devices, svc wifi may require the CHANGE_WIFI_STATE permission that Termux doesn't declare in its manifest. On the Moto E2, this works without issues due to the shell UID mapping. If it fails on another device, use am broadcast to toggle WiFi instead
  • Do NOT use pkill -f to restart the watchdog. pkill -f with broad patterns can kill SSH sessions and other processes. Kill by specific PID only

Result

MetricBeforeAfter
WiFi drop detectionManual (never, headless)Automatic every 2 min
Recovery timeInfinite (nobody notices)~10 seconds after detection
Recovery methodManual screen togglesvc wifi disable/enable
Cron intervalN/AEvery 2 minutes
False positive rateN/ANear zero (3s ping timeout)
Downtime per WiFi dropHours (until noticed)2 min detection + 10s recovery