Starting to learn . I thought I'd just replace the virt-manager on my desktop to get some hands on in an environment I only use for testing.

I failed.

The following I found out digging through documentation, forum and code:

  • There needs to be a domain defined by the dns lookup for the local IP. Just a hostname alone doesn't seem to be sufficient. hostname -f should return a domain part.
  • The node name (hostname) needs to be the same in various places
    • /etc/pve/.members (created from corosync.conf)
    • rrd file in /var/lib/rrdcached/db/pve2-node
  • corosync.conf exists two times - in /etc/pve and in /etc/corosync - which one is used under which circumstances I didn't find out, yet
    • /etc/pve/corosync.conf is generated from the sqlite database in /var/lib/pve-cluster/config.db

For debugging I looked at this code to understand where the Status: unknown pop-over on the host does come from.

I added some logging there which I could trigger from the cli using pvesh ls nodes.

The variables used there where filled like this in the not working state:

member=node1 get_rrd_key=pve2-node/PureBlackSoul status=unknown

The function get's the nodename $node shown on the gui from dns for the local IP I suspect. The $member hash seems to be filled with the data from /etc/pve/.members created by the corosync.service from - in my case - /etc/corosync/corosync.conf.

The value from $node is used to find the rrd data for the host.

The condition for online being:

There needs to be rrd-data that can be found by using the content of the $node argument to extract_node_stats and if %$members (/etc/pve/.members) is not empty it needs to contain a member named $node being marked online.

Short: if you host is Status: unknown dig into the following places:

  • members hash: /etc/corosync/corosync.conf, /etc/pve/corosync.conf, /var/lib/pve-cluster/config.db
  • dns hostname: nslookup <local IP not being 127.0.0.0/8 or interface lo>, nameserver or /etc/hosts, hostname -i to find IPs, hostname -f to check whether there is a domain part
  • rrd files: /var/lib/rrdcached/db/pve2-node/ - filename should be present as the hostname without the domain
node seems to be offline Proxmox Support Forum