Troubleshooting
When Mesh Hypervisor hits a snag, this section helps you diagnose and fix common issues. All steps are run from the central orchestration node’s CLI unless noted. For setup details, see Usage.
Prerequisites
Ensure:
- Central node is booted (see Installation).
- You’re logged in (
root/toorat the console).
Checking Logs
Start with logs—they’re your first clue:
mesh system logview
This opens lnav in /var/log/pxen/, showing DHCP, PXE, HTTP, and service activity. Scroll with arrows, filter with / (e.g., /error), exit with q.
Common Log Issues
- DHCP Requests Missing: No nodes booting—check network cables or PXE settings.
- HTTP 403 Errors: Permissions issue on
/srv/pxen/http/—runchmod -R 644 /srv/pxen/http/*. - Kernel Downloaded, Then Stops: APKOVL fetch failed—verify UUID matches in
/host0/machines/<folder>/uuid. Check permissions on/srv/pxen/http.
Node Not Booting
If mesh node info shows no nodes:
- Verify PXE: On the node, ensure BIOS/UEFI is set to network boot.
- Check Logs: In
mesh system logview, look for DHCP leases and kernel downloads. - Test Network: From the central node:
Findping <node-ip><node-ip>in logs (e.g., DHCP lease). No response? Check cables or switches.
Workload Not Starting
If mesh workload start -n <uuid> -w <name> fails:
- Check Logs: Run
mesh system logview—look for QEMU or KVM errors. - Verify Config: Ensure
/var/pxen/monoliths/<name>.confexists and matches-w <name>—see Configuring Workloads. - Resources: SSH to the node:
Confirm RAM and CPU suffice (e.g., 500M RAM formesh node ctl -n <uuid> free -m; cat /proc/cpuinfoqemutest1). - Restart: Retry:
mesh workload soft-stop -n <uuid> -w <name> mesh workload start -n <uuid> -w <name>
Network Issues
If a node’s IP or VXLAN isn’t working:
- Check IP: On the node:
No static IP? Verifymesh node ctl -n <uuid> "ip addr"interfacesinmanifest—see Network Configuration. - VXLAN Bridge: Check bridge existence:
Missing? Ensuremesh node ctl -n <uuid> "ip link show br456"/var/pxen/networks/manage.confis installed. - Ping Test: From the node:
No reply? Check VXLAN config inmesh node ctl -n <uuid> "ping6 -c 4 fd42:2345:1234:9abc::1"/host0/network/.
Time Sync Problems
If nodes show as offline in mesh node info:
- Check Time: On the node:
Off by hours? Time sync failed.mesh node ctl -n <uuid> "date" - Fix Chrony: Ensure
ntp_syncgroup is applied (e.g., ingroupsfile)—see Configuring Nodes. - Restart Chrony: On the node:
mesh node ctl -n <uuid> "rc-service crond restart"
Notes
- Logs are verbose—most errors trace back to permissions, network, or config mismatches.
- If stuck, rebuild configs with
mesh system configureand reboot nodes. - For manifest tweaks, see Manifest Files; for node control, see Managing Nodes.
Next, explore Advanced Topics.