VARA Linux – Keeping the TNC alive

Required Packages

sudo apt install libpcap-dev expect perl

Required systemd service unit for VARA

Breakdown:

  • The process that is responsible for monitoring and stop/starting VARA is managed by the linux host’s systemd process management system. To set up your VARA instance to use systemd, follow this guide

Required Perl Modules

The killcx tool requires some additional perl modules to be installed, the CPAN tool can be used to do this. If you are asked whether to use a local or sudo install, choose sudo. This makes the modules available system-wide, otherwise they will be saved into your home directory under /.local/share/CPAN, when running the killcx as root it will not be able to find the modules if they were installed only ’locally’.

cpan -T install Net::RawIP Net::Pcap NetPacket::Ethernet

Download killcx tool

Sourceforge

Backup Download:

Save to /home/pi

cd /home/pi/bin

unzip ../killcx.zip

sudo ln -s /home/pi/bin/killcx /usr/local/bin

Breakdown:

  • Testing killcx tool
  • killcx requires to be run as root, so sudo is always required.
  • Syntax: sudo killcx <host>:<port>
  • The tool will spoof packets to force the Linux kernel to kill dead/TIME_WAIT connections, which are left hanging by a broken wineserver process.
  • The tool is used in the scripts below to automate the process of restarting the remote TNC.

Kicktnc.sh Usage

./kicktnc <host> <cmd port> <data port> <BPQ tnc port number> <remote ssh user id>

Breakdown:

The syntax is required for the crontab entries.

  • <host> – The host IP address of the remote TNC
  • <cmd port> – The port configured in VARA as the command port
  • <data port> – The port displayed as the Data Port in VARA after setting the command port.
  • <BPQ TNC Port Number> – The PORTNUM= number the TNC is attached to in BPQ
  • <remote ssh user id> – Remote commands have to be sent to the host running the TNC in order to manage it, we need to have an SSH key defined for this user to enable auto-login without password to <host>.
  • Permissions: chmod 755 kicktnc.sh
  • Note: If the process is interrupted by the user, a lock file may be left behind and subsequent attempts to use kicktnc.sh will produce a ”Busy” message. To resolve this, ssh to the remote host and manually remove /tmp/varalock, ensuring no current kicktnc.sh instances are actually busy!.
#!/bin/bash
script="/home/bpq/nodes/pe1rrr/scripts/kicktnc.expect"
user="MyBPQlogin"
pass="myBPQpassword!"
host="localhost"
port="8010"

#=== No changes required below ===

remotehost="$1"
remoteport1="$2"
remoteport2="$3"
tncport="$4"
sshuser="$5"
lockfile=/tmp/varalock

function KillPorts {
	if ! sudo killcx "${remotehost}:${remoteport1}"; then
		CmdError=1
	else
		CmdError=0
	fi
	if ! sudo killcx "${remotehost}:${remoteport2}"; then
		DataError=1
	else
		DataError=0
	fi
}


function NCPorts {
	if ! nc -w 2 $remotehost $remoteport1; then
		NCDataError=1
	else
		NCDataError=0
	fi
	if ! nc -w 2 $remotehost $remoteport2; then
		NCCmdError=1
	else
		NCCmdError=0
	fi
}

if ! ssh $sshuser@$remotehost "test -e ${lockfile}"; then
    # your file exists
	if ! $script $host $port $user $pass $tncport ; then
		CmdError=1
		echo "Nothing to do"
	else
		echo "Problem..."
		KillPorts
		if [[ $DataError == 1 ]] && [[ $CmdError == 1 ]] ; then
			echo "No TCP connections Alive. Killed TNC. Locking..."
			ssh ${sshuser}@${remotehost} touch $lockfile
			ssh ${sshuser}@${remotehost} sudo service vara restart
			sleep 30
			echo "Removing Lockfile..."
			ssh ${sshuser}@${remotehost} rm $lockfile
		else
			echo "Attempting to Poke Ports"
			NCPorts
			if [[ $NCDatsError == 1 ]] && [[ $NCCmdError == 1 ]] ; then
				echo "Connections refused... TNC is unalive"
				ssh ${sshuser}@${remotehost} touch $lockfile
				ssh ${sshuser}@${remotehost} sudo service vara restart
				sleep 30
				echo "Removing Lockfile..."
				ssh ${sshuser}@${remotehost} rm $lockfile
			fi
		fi 
	fi
else
	echo "Busy"
fi

Kicktnc.expect

Breakdown:

  • No code changes are required.
  • Store this script in the same directory as kicktnc.sh
  • Permissions: chmod 755 kicktnc.expect
#!/usr/bin/expect 
log_user 0
set timeout 15

set host [lindex $argv 0]
set port [lindex $argv 1]
set login [lindex $argv 2]
set password [lindex $argv 3]
set tncport [lindex $argv 4]

spawn telnet $host $port
set timeout 5

expect "callsign:"
send "$login\r"

expect "password:"
send "$password\r"

expect "Connected," { send "AT $tncport\r" }

expect {
	"Ok" { send "D\r\rB\r"; send_user "Nothing to do\r\n"; exit 1 }
	-re {.+(TNC\ Not\ Ready)} { exit 0; }
        -re {.+(is\ in\ use)} { send "B\r"; send_user "Interlocked port in use\r\n"; exit 1}
	-re {.+(Port\ in\ use)} { send "B\r"; send_user "Port in use\r\n"; exit 1}
}
expect eof

Crontab Scheduling

Breakdown:

  • The kicktnc.sh script is called from the crontab every 30 seconds, it attempts a login to the node to check whether the TNC is responsive, if it is not then the killtnc.sh script sends executive commands to the TNC’s host’s process manager after resetting TCP connections.
  • The crontab scheduler runs the script and if there is a problem detected, a lock file is written to the TNC host so that subsequent crontab scheduled tasks don’t interfere until the triggered one has completed. This prevents the TNC from being put into a perpetual restart loop.
  • If the scheduled task is ended abruptly, steps may have to be taken to remove the remote lock-file otherwise the script will return a persistent ”Busy” message.
  • Path to both of the scripts may vary system to system, be sure to change the crontab lines to reflect your own locations, and that the syntax following the file path matches the syntax usage
  • Edit the crontab while logged into the BPQ host, using crontab -e
  • It may take up to two or three automatic crontab cycles of kicktnc.sh to restore VARA to working order, in total in the worst case scenario this causes VARA to be temporarily unavailable for up to 1-2 minutes.
# VARA health Monitor
# Port 7 TNC
* * * * * /home/pi/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.55 8300 8301 7 pi > /dev/null 2>&1
* * * * * ( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.55 8300 8301 7 pi > /dev/null 2>&1 )

# Port 3 TNC
* * * * * /home/pi/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.27 8300 8301 3 pi > /dev/null 2>&1
* * * * * ( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.27 8300 8301 3 pi > /dev/null 2>&1 )

This example points to itself, and has a different ssh userid, bpq:

# Port 23 TNC
* * * * * /home/pi/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.42 8300 8301 23 bpq > /dev/null 2>&1
* * * * * ( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.42 8300 8301 23 bpq > /dev/null 2>&1 )
  • The TNC ports in the example above align with the VARA ports configured in my BPQ, these are shown below
RIJEN:PE1RRR-7} Ports
  1 756 AX300 +1100hz             
  2 756 AX300 +2000hz             
  3 756 VARA500 +1500hz           
  4 756 ARDOP500 +1500hz          
  5 G90 AX300 +1100hz             
  6 G90 AX300 +2000hz             
  7 G90 VARA500 +1500hz           
  8 G90 ARDOP500 +1500hz          
 21 703 AX300 1100hz              
 22 703 AX300 +2000hz             
 23 703 VARA500 +1500hz           
 24 703 ARDOP500 +1500hz          
 15 144.8 MHz APRS                
 16 TCP KISS LINK 

Again, the script may need to iterate over two launches as it will first nuke any ports that are left open and then force the TNC to restart. After initial re-connection is complete it will stay connected for days at a time with little to no noticeable interruptions or getting stuck after sessions.

See activity in the /var/log/syslog while the TNCs are being checked,

Jul  3 22:59:01 gigabox CRON[13115]: (bpq) CMD (( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.27 8300 8301 3 pi > /dev/null 2>&1 ))
Jul  3 22:59:01 gigabox CRON[13121]: (bpq) CMD (( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.55 8300 8301 7 pi > /dev/null 2>&1 ))
Jul  3 22:59:01 gigabox CRON[13116]: (bpq) CMD (/home/bpq/nodes/pe1rrr/beacons/baconator > /dev/null 2>&1)
Jul  3 22:59:01 gigabox CRON[13119]: (bpq) CMD (sleep 30; /home/bpq/nodes/pe1rrr/rrd/make_graph.sh > /dev/null 2>&1)
Jul  3 22:59:01 gigabox CRON[13122]: (bpq) CMD (/home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.55 8300 8301 7 pi > /dev/null 2>&1)
Jul  3 22:59:01 gigabox CRON[13123]: (bpq) CMD (/home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.27 8300 8301 3 pi > /dev/null 2>&1)
Jul  3 22:59:01 gigabox CRON[13124]: (bpq) CMD (/home/bpq/nodes/pe1rrr/scripts/snmp.sh > /dev/null 2>&1)
Jul  3 22:59:01 gigabox CRON[13118]: (bpq) CMD (( sleep 30; /home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.42 8300 8301 23 bpq > /dev/null 2>&1 ))
Jul  3 22:59:01 gigabox CRON[13131]: (bpq) CMD (/home/bpq/nodes/pe1rrr/scripts/kicktnc.sh 192.168.1.42 8300 8301 23 bpq > /dev/null 2>&1)
Jul  3 22:59:01 gigabox systemd[1]: Started Session 87362 of user bpq.
Jul  3 22:59:01 gigabox systemd[1]: session-87362.scope: Succeeded.
Jul  3 22:59:02 gigabox LINBPQ[23582]: New Disconnect Port 3 Q 0#015
Jul  3 22:59:03 gigabox LINBPQ[23582]: New Disconnect Port 7 Q 0#015
Jul  3 22:59:04 gigabox LINBPQ[23582]: New Disconnect Port 23 Q 0#015
Jul  3 22:59:31 gigabox systemd[1]: Started Session 87363 of user bpq.
Jul  3 22:59:31 gigabox systemd[1]: session-87363.scope: Succeeded.
Jul  3 22:59:32 gigabox LINBPQ[23582]: New Disconnect Port 3 Q 0#015
Jul  3 22:59:33 gigabox LINBPQ[23582]: New Disconnect Port 23 Q 0#015
Jul  3 22:59:34 gigabox LINBPQ[23582]: New Disconnect Port 7 Q 0#015