Follow-up to this note.
Dragan had asked me to do repeated power-cycle tests with different kernel versions using the patched dtb for #RockPro64 to make sure the kernel #oops wouldn't still be an issue.
I learned that cutting the power of the device could kill the #LPDDR4 #RAM. This is documented in the specification referenced on the RockPro64 wiki page for Micron LPDDR4 Mobile LPDDR4 Datasheet as stated on page 37 in Uncontrolled Power-Off:
An uncontrolled power-off sequence can occur a maximum of 400 times over the life of the device.
I never had heard about this before! Cutting power without shutdown can kill my RAM?
show dmesg and shutdown
To get all the debugging information I needed I wanted the system after booting to print dmesg
to the serial console, wait a short time and then actually shutdown.
root:~# cat /root/bin/dmesg_and_shutdown.sh #!/bin/bash # a small script that outputs dmesg to serial # console, waits 20 seconds and shuts down dmesg > /dev/ttyS2 # show a message how to stop this script and wait 20 seconds echo "Will shutdown in 20 seconds - to stop me call 'pkill dmesg_and_shut'" > /dev/ttyS2 sleep 20 echo "shutdown -h" > /dev/ttyS2 shutdown -h now # a cronjob that runs after each boot root:~# crontab -l @reboot /root/bin/dmesg_and_shutdown.sh
powercycle the board
I took the time needed for a complete cycle of booting, showing dmesg, waiting and shutting down: well below 2 minutes.
To automate the power cycle I used an #esp8266 based power switch made by #Sonoff (Powr2) running ESP Easy (mega-20210503).
#ESPeasy offers a simple scripting language I used to powercycle after 120 seconds of being switched on:
On System#Boot do gpio,12,0 gpio,13,1 endon On button#button_state do if [blue_led#blue_led_state]=1 gpio,13,0 timerSet,1,2 else gpio,13,1 gpio,12,1 timerSet,1,0 timerSet,2,0 endif endon On Rules#Timer=1 do gpio,12,0 timerSet,2,1 endon On Rules#Timer=2 do gpio,12,1 timerSet,1,2 endon
Pressing the button on the Sonoff device toggles between:
- blue led off: timers disabled, relay on permanently
- blue led on: timers switch the relay off for 5 seconds, on for 120 seconds and then repeat
logging
minicom
logged the serial output to a file.
Further down the #RabbitHole I went when looking at the resulting logfile…
Follow-up to this note
Meanwhile Dragans changes to the dtb file helped my testing setup to boot without eMMC. So I could test booting manually from scsi devices like on my production system.
Looking for some simple instructions on how-to do this failed and I put together the following information.
u-boot on #RockPro64 uses variables written to flash. The important ones for choosing a device/kernel to boot:
# this u-boot will look for scsi devices only boot_targets=scsi # it will scan the devices for bootcmd=bootflow scan
Since I have a boot.scr
in my /boot
the bootmeth seems to be script. There's also the source of that file boot.cmd
available and from there I extracted the commands to run on the u-boot console to start any kernel/initrd/dtb I could find on disk:
# initialize pci bus pci enum # show devices on pci pci # reset bus and scan for scsi devices scsi reset # get partition table scsi part # find boot files ls scsi 0:1 /boot # load armbian defaults load scsi 0:1 0x800800 /boot/armbianEnv.txt # replace the xyz on the following line with the # filesize output by 'load' above env import -t 0x800800 xyz # write uuid of partition to variable partuuid part uuid scsi 0:1 partuuid # arguments passed to the kernel on boot setenv bootargs "root=${rootdev} rootwait rootfstype=${rootfstype} ${consoleargs} consoleblank=0 loglevel=${verbosity} ubootpart=${partuuid} usb-storage.quirks=${usbstoragequirks} ${extraargs} ${extraboardargs}" # look for available images, initrds and dtbs ls scsi 0:1 /boot # get the dtb directory, **uInitrd** and the vmlinuz # from the output to use with the following 'load' commands load scsi 0:1 0x02080000 /boot/ load scsi 0:1 0x06000000 /boot/ load scsi 0:1 0x01f00000 /boot//rockchip/rk3399-rockpro64.dtb booti 0x02080000 0x06000000 0x01f00000
At this point I only needed to wait for the Pine64 sata ctrl to arrive to test the current
kernel with the same ctrl used in my production system.
So I went back to the fork in the tunnels and took the other way down the #RabbitHole…
Follow-up to this note:
At one of the times the current
#Armbian #linux kernel didn't crash, but booted on my #RockPro64 I installed #GotoSocial and yes: no more strange error messages running it. The only time I remember that I updated a linux kernel to make an application work.
Preparations to get the production system running with the newer kernel:
- in case of problems I'd need to know my way back to boot the old kernel: u-boot recap and learning
- test suspected problems with the #sata #pcie ctrl beforehand (get a second Pine64 sata ctrl for the test system)
Meanwhile Dragan had patched the device tree and asked to check on a few reboots whether this would make a difference or cause any regressions.
Further down the #RabbitHole the tunnel forks…
Follow-up to this note:
I decided to test whether #GotoSocial would work on the #RockPro64 using the current
instead of the legacy
#Armbian kernel before risking problems upgrading my production #yunohost.
I have a similar testing setup: old mechanical drives instead of SSDs, a different PCIe SATA controller, an additional eMMC.
The additional eMMC I need to boot from, because for some reason the PCIe SATA ctrl didn't work, when u-boot initializes the controller before the kernel.
Installing the current
kernel on my testing setup I found it not working: no console on hdmi, no network. Forgot to install the newer dtb (device tree binary) as well.
With currrent
kernel and dtb from #Armbian installed the system booted half of the times I switched it on. Asking around on the rock64 chat I met Dragan Simic who greatly helped me to get further down the #RabbitHole…
On my #yunohost I tried to update #gotosocial from 0.17.4~ynh1
to 0.18.1~ynh1
and the update failed.
GotoSocial just showed some cryptical error messages when started.
@dumpsterqueer@superseriousbusiness.org traced the problem back to a changed lib version used in the new GotoSocial version and the developer of the lib answered my #linux kernel would be too old.
The old #armbian legacy kernel is running, because … I don't remember.
I need a kernel update for my #rockpro64.
Down the #RabbitHole…