Assistance needed getting Alpine onto s905x/s905x-v2

I have most of the work done as I maintain the ‘usable’ Raspberry Pi images, but the s905x/s905x-v2 is just kicking my butt and I’m at a loss as to why.
Here is the complete process currently:

  1. Create 1GB image file (trust me, it’s enough) and set it up as a loop
  2. Clone GitHub - libre-computer-project/libretech-flash-tool
  3. Run ./lft.sh bl-flash aml-s905x-cc /dev/loop1
  4. Partition as MBR with an msdos (vfat) at +4MB
  5. Partition with another primary of an ext4 at 261MB-100%
  6. Lay down Alpine minirootfs and various packages in bulk
  7. Fight a lot with grub till it seems to have everything
  8. Blue LED constantly, like it’s not even finding the Libre u-boot

I’m at wit’s end on this one, and I’m CLOSE. The whole point of this is to provide users with a trusted, reliable, “just works” Alpine image. I’m at a complete loss as to where I’m going wrong at this point though. What am I missing here? What can I use to debug it apparently not finding the official Libre u-boot bootloader here?

Run ./lft.sh bl-flash aml-s905x-cc loop1 after you’ve partitioned. Partitioning tools may wipe out everything from the start of the image.

Ah-ha, it also doesn’t help when your offset is ‘419’ not ‘4196.’ That will definitely wipe out the u-boot. :wink:

The problem now is that it seems to be failing to detect the EFI partition still, and no keyboard is working with my test unit. Some fail to get state, some just don’t detect, some detect but just don’t work.

Loading Environment from FAT... Unable to read "uboot.env" from mmc1:1... (expected - no uboot.env!)
Error (-2): cannot determine file size (I presume related)
Failed to load 'boot.ini'
starting USB...
(all good here)
scanning bus usb@c9000000 for devices... Failed to get keyboard state from device 05ac:024f 
(good here)
switch to partitions #0, OK
mmc1 is current device
scanning mmc 1:1...
No EFI system partition
No EFI system partition
Failed to persist EFI variables
BootOrder not defined
EFI boot manager: Cannot load any image

Some more tweaking has revealed that if mmcblk0p1 is /boot and mmcblk0p2 is / then grub-install works. When mmcblk0p1 is /boot/efi then it returns an unknown filesystem error. I’ve not seen this behavior before, and without the keyboard working, there’s nothing I can do from the prompt. Any suggestions? This is very, very close.

Failed to get keyboard state from device 05ac:024f

Apple keyboards, what can we say?

Some more tweaking has revealed that if mmcblk0p1 is /boot and mmcblk0p2 is / then grub-install works. When mmcblk0p1 is /boot/efi then it returns an unknown filesystem error.

When you partition mmcblk0, you have to give the FAT partition the ef type and not the standard 07 type.

grub-install should work with p1 as /boot/efi

We will create a guide eventually on how to bootstrap your own OS. This is fairly easy for us as we do it day in and day out.

Apple keyboards, what can we say?

That’s not - that’s a Keychron K8 in Windows mode, and the only one that gives any explanation as to why it’s not working. (Expected to an extent.) GMMK Pro doesn’t work. Costar doesn’t work. Custom VIA doesn’t work. Generic crashcart mushboard doesn’t work.

No keyboard works at all. So something is broken. Question is, what is it? Because it’s not 5+ different keyboards.

edit: should note, this is happening on two different s905x-v1.0’s, and the same ‘no keyboard’ behavior with Raspbian and Ubuntu.

@librecomputer OK, at this point, I’ve confirmed that the problem is not me - it is, in fact, the latest u-boot in the CI. aml-s905x-cc-2022-07 is the last working one. Here’s what I went through:

  • Latest Debian and Ubuntu from CI
  • 2 different LePotatos (both v1; one early, one later color-coded header)
  • 8 different C10U3 SD cards from 4 different vendors (64GB PNY, 64GB SanDisk, 16GB SanDisk, 32GB SanDisk)
  • 7 different keyboards (toss-away crashcart style, generic USB 104, Costar CST104, GH60, Keychron K8 as control, EVGA Z20 as control)
  • 3 different HDMI attachments; USB-HDMI adapter, Dell U2415, Samsung Odyssey G9
  • 6 different power supply setups from 2.5A to 7.5A capability

And every single configuration has presented precisely identical results.

Debian finds keyboard and GRUB but keyboard input does not work at all, and after entering boot, the green LED turns solid, the HDMI output turns into a solid green or purple, and the processor quickly gets very hot.

Ubuntu finds keyboard and GRUB but keyboard input does not work at all, and after entering boot, the green LED turns solid, the HDMI output turns into a solid green or purple, and the processor quickly gets very hot.

Naked SD with aml-s905x-cc from July 16, 2023 detects keyboard and has expected loader error, but keyboard input does not work at all. Processor does not get hot.

Alpine Imager with aml-s905x-cc from July 16, 2023 detects keyboard, has unexpected loader error (probably uboot.env or ini), and keyboard input does not work at all. Processor does not get hot.

Alpine Imager with aml-s905x-cc-2022-07? This is the only u-boot that works correctly at all. Keyboard works, manually loading works, boots and runs. Yup. Works flawlessly.

Probing around didn’t reveal any clue as to what the board is doing other than that I/O isn’t working beyond the SD slot, and the CPU locks up when transitioning to kernel.

Thanks for bringing this to our attention. That was a pre-release CI version. Please test the current version.

Thanks for bringing this to our attention. That was a pre-release CI version. Please test the current version.

Thanks! Looking much better with latest; keyboard works again, no lockup!
Only weird thing I see now is that spi uclass doesn’t seem to be present, and pxe and dhcp are also missing. (But IIRC the s905x’s MAC needs a DTB from tree for that, and I know I have the wrong path there.)

So I think honestly at this point, really all I need is to sort out the uboot env and boot.ini for Alpine (which needs to be /boot looks like.) Is there somewhere all the addresses for the LePotato are listed? I’ve looked and not had much luck finding it. And the uboot env has fdt_addr_r and kernel_addr_r both at 0x080008000, which doesn’t seem correct?

Network is not enabled on AML-S905X-CC for security and other reasons.

There’s no SPI on AML-S905X-CC. Only AML-S905X-CC-V2 (different product than V1, not to be confused with revision).

The MAC is set in the DT and it’s passed to Linux even without the network.

S805X shares the same memory layout as S905X.

Okay, at this point, I seem to be hitting a brick wall. I should, in theory, have a bootable image. I have a boot.cmd, I have a fat32 with 0xef at /boot, I have a known good u-boot, a known good kernel, a known good DTB, and my HDMI interface isn’t a broken one (it’s just a MacroSilicon USB.)
Here’s the boot.cmd

But it stubbornly refuses to work. u-boot fails to find any bootflow at mmc@74000.bootdev or mmc@72000.bootdev. ext4ls works on mmc 1:2, fatinfo on 1:1. Manual booting is also not working, but for a very different reason.

... various load commands ...
=> load mmc 1:1 $ramdisk_addr_r /initramfs-lts
23870963 bytes read in 1039 ms (21.9MiB/s)
=> booti $kernel_addr_r $ramdisk_addr_r $fdt_addr_r
   Uncompressing Kernel Image
Moving Image from 0x8080000 to 0x8200000, end=a600000
Wrong Ramdisk Image Format
Ramdisk image is corrupt or invalid

Except this is a known good initramfs-lts; it’s the same I use on rpi without issue. Just a standard gzip image with everything in it. When I bypass initramfs with - in booti and use the dtb from Libre’s CI with 2023.07+ (Jul 22 2023) I get this instead:

=> booti $kernel_addr_r - $fdt_addr_r
    Uncompressing Kernel Image
Moving Image from 0x8080000 to 0x8200000, end=a600000
## Flattened Device Tree blob at 08008000
    Booting using the fdt blob at 0x8008000
Working FDT set to 8008000
    Loading Device Tree to 00000007ae8e000, end 00000007ae9b177 ... OK
Working FDT set to 7ae8e000

Starting kernel ...

… and then just a hard lock. Even the reset button doesn’t work. When I try the 2022-07 u-boot with the same booti command, I get:

Moving Image from 0x8080000 to 0x8200000, end=a600000
## Flattened Device Tree blob at 08008000
    Booting using the fdt blob at 0x08008000
    Loading Device Tree to 00000007be5800, end 00000007be621a7 ... OK

Starting kernel ...

"Synchronous Abort" handler, esr 0x020000000
elr: ffffffff8c99f770 lr : 0000000000108d5a0 (reloc)

Comparing a (non-working because they built a flat ext4 image) Armbian 6.1.30 and Alpine 6.1.42, the only thing “missing” is:

lib/modules/6.1.30-meson64/kernel/drivers/phy/amlogic:
-rw-r--r--    1 root     root         11992 Jul 29 13:48 phy-meson-axg-mipi-dphy.ko
-rw-r--r--    1 root     root          9192 Jul 29 13:48 phy-meson-g12a-mipi-dphy-analog.ko

So realistically, it’s missing nothing. Going through the bootm flow instead results in a crash at bootm fdt. But both DTBs do pass bootefi selftest $fdt_addr_r (aml-s905x-cc.dtb and dtbs-lts/amlogic/meson-gxl-s905x-libretech-cc.dtb) with 1 expected failure.

At this point, I’m completely stumped.

  1. Don’t use Armbian’s boot.cmd as a starting point. It’s a giant non-portable hack and very prone to problems.

  2. Just manually boot via a small boot.scr script like such:

load mmc 1 $kernel_addr_r KERNEL_FILE
load mmc 1 $ramdisk_addr_r INITRAMFS_FILE
bootm $kernel_addr_r $ramdisk_addr_r $fdtcontroladdr

@librecomputer Armbian’s closer to what the end-state needs to be (since this will cover many BSPs.)
But again, even doing that manually has the exact same results.

Latest u-boot results in a hard lock with no output and dead clock.
The previously tested u-boot results in an immediate synchronous abort crash.

Both insist a known and confirmed good gzip initramfs is corrupt or invalid.

=> load mmc 1:1 $kernel_addr_r vmlinuz-lts
10611147 bytes read in 460 ms (22 MiB/s)
=> load mmc 1:1 $ramdisk_addr_r initramfs-lts
23911485 bytes read in 1037 ms (22 MiB/s)
=> bootm $kernel_addr_r $ramdisk_addr_r $fdtcontroladdr
Wrong Image Format for bootm command
ERROR: can't get kernel image!
=> booti $kernel_addr_r $ramdisk_addr_r $fdtcontroladdr
    Uncompressing Kernel Image
Moving Image from 0x80800000 to 0x8200000, end=a660000
Wrong Ramdisk Image Format
Ramdisk image is corrupt or invalid

3d6d8793f710 [/chroot/boot]$ file vmlinuz-lts 
vmlinuz-lts: gzip compressed data, max compression, from Unix, original size modulo 2^32 32324096
3d6d8793f710 [/chroot/boot]$ file initramfs-lts 
initramfs-lts: gzip compressed data, max compression, from Unix, original size modulo 2^32 30432216

Armbian makes a lot of assumptions and does a lot of things incorrectly. For example, it loads the image to 0x80800000 which is beyond usable memory. A lot of Armbian’s use of out-of-tree changes can be giant hacks and will cause side-band problems if you don’t know what those patches are doing and the side effect of them.

You are using booti incorrectly: booti command — Das U-Boot unknown version documentation

Armbian makes a lot of assumptions and does a lot of things incorrectly. For example, it loads the image to 0x80800000 which is beyond usable memory. A lot of Armbian’s changes are giant hacks and will cause side-band problems.

Yeah, I noticed their setup was… not good. If you look, my setup explicitly sets all of the addresses correctly (based on the s805x/s905x) and strips out a lot of their extraneous stuff. This is admittedly pretty temporary; it will actually use a board specific env file in final.

You are using booti incorrectly: booti command — Das U-Boot unknown version documentation

Ah, good catch. But alas, no change at all with that.

With latest u-boot:

=> size mmc 1:1 initramfs-lts
=> load mmc 1:1 $ramdisk_addr_r initramfs-lts
23911485 bytes read in 1037 ms (22 MiB/s)
=> load mmc 1:1 $kernel_addr_r vmlinuz-lts
10611147 bytes read in 460 ms (22 MiB/s)
=> booti $kernel_addr_r $ramdisk_addr_r:$filesize $fdtcontroladdr
    Uncompressing Kernel Image
Moving Image from 0x8080000 to 0x8200000, end=a660000
## Flattened Device Tree blob at 75ea5c20
    Booting using the fdt blob at 0x75ea5c20
Working FDT set to 75ea5c20
    Loading Ramdisk to 737cd000, end 74e9ac3d ... OK
    Loading Device Tree to 00000000737bf000, end 00000000737cc177 ... OK
Working FDT set to 737bf000

Starting kernel ...

<board completely dead>

By completely dead I mean pretty much anything that isn’t voltage is just straight dead. I can’t probe the CPU clock pins, but everything I can probe is just flat. Even the MMC_CLK line has nothing.

With the 2022-07 u-boot:


=> size mmc 1:1 initramfs-lts
=> load mmc 1:1 $ramdisk_addr_r initramfs-lts
23911485 bytes read in 1037 ms (22 MiB/s)
=> load mmc 1:1 $kernel_addr_r vmlinuz-lts
10611147 bytes read in 460 ms (22 MiB/s)
=> booti $kernel_addr_r $ramdisk_addr_r:$filesize $fdtcontroladdr
    Uncompressing Kernel Image
Moving Image from 0x8080000 to 0x8200000, end=a660000
## Flattened Device Tree blob at 7be66dd0
    Booting using the fdt blob at 0x7be66dd0
    Loading Ramdisk to 7b443000, end 7be619cb ... OK
    Loading Device Tree to 000000007b436000, end 000000007b442fc7 ... OK

Starting kernel ...

<CPU lockup?>

Here there are differences; the green LED remains lit, and MMC_CLK has good signal, but nothing on the data lines. There is no signal on GCLK0 but all of the SPI and I2C pins are held at 3.3V steady. No sign at all of clock.

Copy the working kernel and initramfs from our images and try it with those. Make sure bootargs are set (check grub.cfg in /boot).

Well, that made it extra weird. (Also note that for the moment, I’m testing using syslinux/extlinux.conf to eliminate grub2 from the variables.)

Using the Armbian 6.1.30 kernel and booti, immediate Synchronous Abort. Honestly can’t say that I’m surprised there. I haven’t had that one working at all.

Using the 2023-05-03 Raspbian Bullseye lite from CI and the 2022-07 u-boot results in similar but different symptoms. Instead of being completely dead, it goes to Starting kernel ... At that point, all three LEDs light for about 1 second, then the MMC data access LED (green) blinks at about 1Hz continuously. However, checking the data lines shows no activity, the reset button doesn’t work, and the GPIO lines are held again.
Unfortunately it’s not a flaky or cranky SD. This reproduced with all of them. No matter how slow or fast I wrote. And it’s completely unresponsive to all inputs. I’d say “oh, the board is bad” but it does the exact same on two s905x’s.
So I decided to try a wildly different tact; an SD card with nothing but u-boot and everything on a dead reliable SanDisk USB drive. That’s the only thing that changed behavior at all. With the OS on the USB drive, using load usb 0:1..., instead of even attempting anything it just sets all three LEDs to on and locks up solid with every kernel.

If you need help with Armbian, pleaase use the Armbian forum. They do too many things differently for us to be able to support.

Why would you do that? Not sure what you’re trying to do by doing this.

This is not the MMC access LED. It’s a heartbeat LED to indicate Linux has started and is operating correctly.

Again, not sure what your trying to accomplish with this. It’s obvious that your kernel and initramfs has an issue booting. It’s best to grab the working kernel and initramfs from our CI images and then try to boot it manually using the script. If it doesn’t boot, the script is bad. If the script does boot, then your custom kernel/initramfs is bad.

Why would you do that? Not sure what you’re trying to do by doing this.

Because we’ve already confirmed this specific u-boot is known working and stable, versus the latest in CI. I’m trying to control variables as much as possible, and since the board has never booted with the fixed CI, I can’t say for certain it’s 100% fixed.

This is not the MMC access LED. It’s a heartbeat LED to indicate Linux has started and is operating correctly.

Which raises a question: why is it showing heartbeat when it most definitely is not operating? If it was actually executing, there should be output on the HDMI and I should also see activity on the MMC data lines. It should be attempting root mount; it does not. It just falls flat on it’s face.
And just to be extra certain, I have a stack of other boards for testing the generic kernel builds. RPi3, RPi4, Pine A64-LTS, and a Radxa ROCK 3C. (So bcm, Allwinner A64, and RK3566.) Every one of them is able to run the exact same kernel just fine from the same build process when fed the correct DTBs. I also double checked, and their configuration for 6.1.42 checks out.

It’s best to grab the working kernel and initramfs from our CI images and then try to boot it manually using the script. If it doesn’t boot, the script is bad. If the script does boot, then your custom kernel/initramfs is bad.

Except that is exactly what I am saying: the kernel and initramfs from Libre’s CI images are not working.
I specifically grabbed 2023-05-03-raspbian-bullseye-arm64-lite, loop mounted it, copied the complete kernel including System.map to a good u-boot SD, and that kernel produces the exact same behavior.
When I write that exact image, as-is with no changes at all, it behaves exactly the same. The only difference is it finds grub, executes kernel load, and then I get a solid cyan or purple screen and everything is locked solid with no heartbeat. On both boards. Sometimes - but not consistently - it will get through the EFI stubs post-kernel and go to a solid yellow screen, still locked solid.

And obviously, with the official image doing largely the same, it’s not simply a kernel issue.

So, I had to set aside another project to swipe the FT232 off it (which I’ve been putting off because that project is a real pain to get back to a manageable state.) Which was illuminating in that I’ve been correct. It’s not reaching operating state.

GXL:BL1:9ac50e:bb16dc;FEAT:ADFC318C:0;POC:0;RCY:0;USB:0;SPI:0;CHK:A7;EMMC:400;N;
no sdio debug board detected                                                 
TE: 1745178                                                                  
                                                                             
BL2 Built : 15:21:18, Aug 28 2019. gxl g1bf2b53 - luan.yuan@droid15-sz       
                                        
set vcck to 1120 mv                     
set vddee to 1000 mv                    
Board ID = 3                            
CPU clk: 1200MHz                        
DQS-corr enabled                        
DDR scramble enabled
DDR3 chl: Rank0+1 @ 912MHz
bist_test rank: 0 1a 03 31 27 13 3b 17 00 2f 2b 14 43 18 01 2f 29 13 40 19 02 3S

Rank0: 1024MB(auto)-2T-13

Rank1: 1024MB(auto)-2T-13
AddrBus test pass!
Load fip header from SD, src: 0x0000c200, des: 0x01400000, size: 0x00004000, pa0
New fip structure!
Load bl30 from SD, src: 0x00010200, des: 0x013c0000, size: 0x0000d600, part: 0
Load bl31 from SD, src: 0x00020200, des: 0x05100000, size: 0x0001b800, part: 0
Load bl33 from SD, src: 0x0003c200, des: 0x01000000, size: 0x00086a00, part: 0
NOTICE:  BL31: v1.3(release):c3714b49be
NOTICE:  BL31: Built : 09:23:36, Jun 20 2023. gxl bl-3.5.0 gc3714b49be - jenkinh
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services...
[BL31]: tee size: 0
... 
usual dmesg stuff
...
[    4.095407] mmc1: new ultra high speed SDR104 SDXC card at address 5048
[    4.097131] mmcblk1: mmc1:5048 SD64G 58.0 GiB 
[    4.103959]  mmcblk1: p1 p2
Starting version 247.3-7+deb11u2
[    4.183012] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[    4.332592] usb 1-1: New USB device found, idVendor=05e3, idProduct=0610, bc8
[    4.335226] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    4.342497] usb 1-1: Product: USB2.0 Hub
[    4.403144] hub 1-1:1.0: USB hub found
[    4.403558] hub 1-1:1.0: 4 ports detected
[    4.415441] mmc0: Card stuck being busy! __mmc_poll_for_busy
[    4.516508] meson-vrtc c81000a8.rtc: registered as rtc0
[    4.516584] meson-vrtc c81000a8.rtc: setting system clock to 1970-01-01T00:0)
<halts here>

So post EFI, the HDMI is falling down. Every time, even though it’s a well known MacroSilicon. And as soon as it hits the RTC, dead lock.

And then, Alpine? Alpine’s been booting just fine with booti this whole time aside from a minor DTB issue! I actually had successful bring-up with both 2022-07 and latest u-boot several days ago, except the HDMI output is breaking as soon as the kernel starts booting. So I had a working config that was crashing for a wholly unrelated reason (busybox config mistake on my part.)

The adapter I’m using can also be ruled out; it’s a MacroSilicon MS2109. Is it great? Hell no. But it’s functional and cheap and I’m not admitting how many I own or have misplaced. :wink:

So at this point, we can accurately say the following:

  • Alpine Linux 3.18.2 with linux-lts is able to boot the kernel successfully, but HDMI fails immediately, and the boot fails early due to a distro issue I caused
  • Raspbian (Libre) 2023-05 does NOT boot successfully; HDMI fails either before or immediately after EFI stub, and then dies fully when the RTC is intialized. Occurs on both boards.
  • Raspbian (Libre) 2023-05 also intermittently locks up attempting to execute EFI stubs which reproduced with multiple SD cards
  • Kernel 6.1.40 (Raspbian) does not have functioning HDMI output, and never makes it past RTC
  • Kernel 6.1.42 (Alpine) does not have functioning HDMI output, but is able to make it past RTC
  • Both show identical results with u-boot 2022-07 or latest CI

So the question really is: why is HDMI breaking here, and why is the Raspbian image dying at RTC or before?

The CI images are tested on a dozen boards continuously. There’s no way it doesn’t work unless there’s a problem with power or MicroSD.

Use the latest in CI. u-boot controls the device tree which get updated for our kernel. These two go hand in hand and should not be used separately.

The heartbeat is the kernel heartbeat. If it’s running, the kernel is running code and operating correctly. HDMI output is controlled by the kernel driver which is a different subsystem. These are confirmed 100% working on all the images released on the distro server. Otherwise they will not be on the distro server.

The distro server images are confirmed working. There’s another problem with your setup if they don’t boot. Either you’re flashing incorrectly or the power is insufficient.

Just because they work on the other boards does not mean it has the necessary configs enabled for a different platform. This assumption is incorrect.

This is not possible.

Provide the full log, we can point out the issue for you.