BGS5 - Module does not always seem to fully restart on EMERG_RST signal | Telit Cinterion IoT Developer Community
May 29, 2020 - 10:35am, 1134 views
I am trying to diagnose a very infrequent crash with our custom designed PCB which uses a BGS5 (Rev 1) module, and I was wondering if anyone could provide some insight on the behaviour of the EMERG_RST pin.
Firstly, a little background info on the issue:
We have the BGS5 connected to a small microprocessor via a level shifter, which acts as an external watchdog. The Java userware on the BGS5 then toggles a GPIO which is connected to it approximately every second or so, to signal to the watchdog that the Java userware and BGS5 is working.
If there are no kicks for 10 minutes, the watchdog resets, and then sends the BGS5 power on sequence (as described in the Hardware Interface Document):
- ON initialises to low (would have been low from before reset, so no change in level)
- EMERG_RST is asserted (driven low)
- ON is pulsed high for 200ms and returned to low
- Wait 500ms
- EMERG_RST is released (driven high)
- Module starts
The issue we have is, that occasionally we can see remotely that a system has gone offline as we stop receiving data server-side. I managed to catch a system in this state once a while ago, connected a USB lead and oscilloscope, so I could try and see what was happening - although I'd missed what caused the crash in the first place.
- On the AT interface (USB0) I could see ^SYSSTART and ^SYSLOADING and +PBREADY repeating every 10 minutes, which would suggest the watchdog is working and the module trying to restart
- I connected the oscillioscope to ON and EMERG_RST and could see they were being driven as described above to double confirm the watchdog is triggering.
- Some***** - maybe 1 in 3 restarts - the following error appears on STDOUT (USB1): "java.lang.IllegalStateException: no factory available", with a stack trace starting with " com.cinterion.jrc.JRC_Factory.a(), bci=12"
- I can see our Java userware attempting to start on STDOUT (USB1). It prints the first line of output, very slowly (character by character) as if the CPU is under heavy load.
- The userware then seems to freeze. I believe it's at the point where it tries to open an ATCommand instance.
- A minute or so later, a ^SYSINFO: 201 URC appears on AT Interface (USB0) and "MIDlet:com.cinterion.jrc.JRC_Midlet abnormal exit" appears on STDOUT (USB1).
- The module then reboots again and the loop starts over.
If I physically unplug the module and then power it on again, everything returns to normal.
Manually testing the watchdog's reset by disabling the signal from the BGS5, it resets correctly - the module reboots and resumes as normal every 10 minutes, so I believe the watchdog's reset process itself works.
The other issue is that it is near impossible to reproduce this issue on demand as it the shortest time I've seen it take to start happening from power on is about 6 weeks. A couple more systems crashed after 3 months or so. The majority have been running fine since around early February, so not every system seems to do it.
With that in mind, what I'd like to ask is:
- Is there anything in the module which isn't cleared by EMERG_RST compared to disconnecting the power? (E.g. AT interfaces which were in use?)
- Could pulsing the ON signal again after the module has already been started - and while EMERG_RST is asserted - cause this? Although I have manually tested the watchdog reset many ***** and it has always worked, I've made a modified version of the watchdog code on a test system which only pulls EMERG_RST without touching ON, but it'll be months before I'll know if it makes any difference.
- Are there any other factors which could stop the BGS5 module starting up correctly? We have a BLE module connected to ASC0 using a 4 wire serial connection with RTS and CTS, and some systems also have an optional WiFi module using a 2 wire serial interface to ASC1. Both modules are held in reset until released by the Java userware - they are connected to two GPIO which are by default pulled down by the BGS5 during power up) - but say if somehow one of them sends spurious data during the BGS5's bootup, might this cause the module to be unable to start up properly?
^SCFG: "Serial/USB/DDD","0","0","0409","1E2D","0059","Cinterion Wireless Modules","Cinterion BGx USB Com Port",""