OneWire frequently drops sensors

With literally the same 31820 temperature sensor that has worked fine with a BeagleBone when that was the SBC plugged into the test setup, w1_therm keeps dropping it and eventually reattaching it. Logs are just full of [re]attach messages:

# dmesg | grep w1
[   12.622469] w1_master_driver w1_bus_master1: Attaching one wire slave 28.00000f1aca51 crc e4
[   66.437895] w1_master_driver w1_bus_master1: w1_search: max_slave_count 64 reached, will continue next search.
[  211.955823] w1_master_driver w1_bus_master1: Attaching one wire slave 28.00000f1aca51 crc e4
[ 1101.022224] w1_master_driver w1_bus_master1: Attaching one wire slave 28.00000f1aca51 crc e4

And on and on. No, w1_master_slave_count never shows more than 1 slave, I don’t understand the nonsense about reaching 64. That message has appeared once in 40 days of uptime in a BBBlack running chrony that has continued to log the temperature all that time. Oh, when I make a simple script to poll the sensor and run it on the frite, it gets a value not much more than 10% of the time. Same script modified only for the different serial number ran on the BBB for over 900 polls without a single dropout. :frowning:

First, what board? Different boards have different IRQ implementations with different characteristics.

Second, how is the 1W driver selecting the IRQ? Is there logs and the actual overlay you are using?

This was with a La Frite, and I was using the w1-gpio overlay that came with the overlay tools package (or kernel, or whatever ships it). the frite board is no longer setup, as I moved back to one of the beagles to work on other things.

The boot/w1 driver install lines were trimmed to focus on the problem, but there was not much else - a non-error line about installing the w1 driver/master just before the slave attach at 12.6 seconds. Sorry if that annotation of the source of the selected log lines was misleading.

Okay, I’d forgotten some details but I don’t believe they change anything. With the stock w1-gpio, there was a warning message at module load time to the effect that open-collector had been forced, please fix that in the DT. So I did that (edited w1-gpio.dts, rebuilt), but it made no difference other than silencing that warning.

For La Frite based on Amlogic’s GXL, GPIO interrupts are muxed down to 8 triggers.

https://www.kernel.org/doc/Documentation/devicetree/bindings/interrupt-controller/amlogic%2Cmeson-gpio-intc.txt

The 1w-gpio bindings are as follows:

https://www.kernel.org/doc/Documentation/devicetree/bindings/w1/w1-gpio.txt

All IRQ must be explicitly created via device tree and consumed by the driver.

If you are losing messages over 1w-gpio driver, it could be the device-tree setup, the pull-up/pull-down time, or the interrupt handling.

You would need to scope the IO and add a pattern generator (manual is fine too) to determine the exact cause of the driver malfunction. Your logs indicate crc error so the communication channel is not meeting some timing spec.

The 1w has an external 4k7 pullup, the device tree is as you provided, the driver is what you ship. I did probe the bus, but with a simple oscilloscope - no fancy protocol analyzer here - I can’t see anything useful.

We do not ship drivers. The driver was written by the person described in the driver file. libretech-wiring-tool is an open source project. Per the file headers, we did not write this.

Any peripheral expansion requires configuration and testing due to electrical and platform characteristics. Please do not assume we provide free engineering for your application.

You do ship the unreliable w1-gpio for the La Frite, compiled, as part of ldto package (or some such - once again, I’ve set the AML boards aside to work on other things for a while). As for the DT being not yours, well, that may be true for that version for the S905 board you linked, but the one for S805 seems to be libretech’s own.
https://raw.githubusercontent.com/libre-computer-project/libretech-wiring-tool/master/libre-computer/aml-s805x-ac/dt/w1-gpio.dts

So I don’t think I was asking too much for it to actually work, y’know.

Don’t recall you paying for software or support but happy to point you to the part of GPL license at the top of the file that you missed which states:

NO WARRANTY

11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

Feel free to test and fix the issue since this is open source software or hire a consultancy to help you with your specific application.

Already told you that the problem is due to how the w1-gpio driver grabs the interrupts.