BGS5 - Module does not always seem to fully restart on EMERG_RST signal | Telit Cinterion IoT Developer Community
May 29, 2020 - 10:35am, 1359 views
Hi,
I am trying to diagnose a very infrequent crash with our custom designed PCB which uses a BGS5 (Rev 1) module, and I was wondering if anyone could provide some insight on the behaviour of the EMERG_RST pin.
Firstly, a little background info on the issue:
We have the BGS5 connected to a small microprocessor via a level shifter, which acts as an external watchdog. The Java userware on the BGS5 then toggles a GPIO which is connected to it approximately every second or so, to signal to the watchdog that the Java userware and BGS5 is working.
If there are no kicks for 10 minutes, the watchdog resets, and then sends the BGS5 power on sequence (as described in the Hardware Interface Document):
- ON initialises to low (would have been low from before reset, so no change in level)
- EMERG_RST is asserted (driven low)
- ON is pulsed high for 200ms and returned to low
- Wait 500ms
- EMERG_RST is released (driven high)
- Module starts
The issue we have is, that occasionally we can see remotely that a system has gone offline as we stop receiving data server-side. I managed to catch a system in this state once a while ago, connected a USB lead and oscilloscope, so I could try and see what was happening - although I'd missed what caused the crash in the first place.
- On the AT interface (USB0) I could see ^SYSSTART and ^SYSLOADING and +PBREADY repeating every 10 minutes, which would suggest the watchdog is working and the module trying to restart
- I connected the oscillioscope to ON and EMERG_RST and could see they were being driven as described above to double confirm the watchdog is triggering.
- Some***** - maybe 1 in 3 restarts - the following error appears on STDOUT (USB1): "java.lang.IllegalStateException: no factory available", with a stack trace starting with " com.cinterion.jrc.JRC_Factory.a(), bci=12"
- I can see our Java userware attempting to start on STDOUT (USB1). It prints the first line of output, very slowly (character by character) as if the CPU is under heavy load.
- The userware then seems to freeze. I believe it's at the point where it tries to open an ATCommand instance.
- A minute or so later, a ^SYSINFO: 201 URC appears on AT Interface (USB0) and "MIDlet:com.cinterion.jrc.JRC_Midlet abnormal exit" appears on STDOUT (USB1).
- The module then reboots again and the loop starts over.
If I physically unplug the module and then power it on again, everything returns to normal.
Manually testing the watchdog's reset by disabling the signal from the BGS5, it resets correctly - the module reboots and resumes as normal every 10 minutes, so I believe the watchdog's reset process itself works.
The other issue is that it is near impossible to reproduce this issue on demand as it the shortest time I've seen it take to start happening from power on is about 6 weeks. A couple more systems crashed after 3 months or so. The majority have been running fine since around early February, so not every system seems to do it.
With that in mind, what I'd like to ask is:
- Is there anything in the module which isn't cleared by EMERG_RST compared to disconnecting the power? (E.g. AT interfaces which were in use?)
- Could pulsing the ON signal again after the module has already been started - and while EMERG_RST is asserted - cause this? Although I have manually tested the watchdog reset many ***** and it has always worked, I've made a modified version of the watchdog code on a test system which only pulls EMERG_RST without touching ON, but it'll be months before I'll know if it makes any difference.
- Are there any other factors which could stop the BGS5 module starting up correctly? We have a BLE module connected to ASC0 using a 4 wire serial connection with RTS and CTS, and some systems also have an optional WiFi module using a 2 wire serial interface to ASC1. Both modules are held in reset until released by the Java userware - they are connected to two GPIO which are by default pulled down by the BGS5 during power up) - but say if somehow one of them sends spurious data during the BGS5's bootup, might this cause the module to be unable to start up properly?
Module info:
ATI1
Cinterion
BGS5
REVISION 01.100
A-REVISION 00.001.36
AT^SCFG?
^SCFG: "Call/ECC","0"
^SCFG: "GPRS/AutoAttach","enabled"
^SCFG: "Gpio/mode/ASC1","std"
^SCFG: "Gpio/mode/DAI","gpio"
^SCFG: "Gpio/mode/DCD0","gpio"
^SCFG: "Gpio/mode/DSR0","gpio"
^SCFG: "Gpio/mode/DTR0","gpio"
^SCFG: "Gpio/mode/FSR","gpio"
^SCFG: "Gpio/mode/PULSE","gpio"
^SCFG: "Gpio/mode/PWM","gpio"
^SCFG: "Gpio/mode/RING0","gpio"
^SCFG: "Gpio/mode/SPI","rsv"
^SCFG: "Gpio/mode/SYNC","gpio"
^SCFG: "Ident/Manufacturer","Cinterion"
^SCFG: "Ident/Product","BGS5"
^SCFG: "MEShutdown/Fso","0"
^SCFG: "MEopMode/SoR","on"
^SCFG: "Radio/Band","15"
^SCFG: "Radio/OutputPowerReduction","4"
^SCFG: "Serial/Interface/Allocation","1","1"
^SCFG: "Serial/USB/DDD","0","0","0409","1E2D","0059","Cinterion Wireless Modules","Cinterion BGx USB Com Port",""
^SCFG: "Tcp/IRT","3"
^SCFG: "Tcp/MR","10"
^SCFG: "Tcp/OT","6000"
^SCFG: "Tcp/WithURCs","on"
^SCFG: "Trace/Syslog/Otap","0"
^SCFG: "URC/Ringline","local"
^SCFG: "URC/Ringline/ActiveTime","2"
^SCFG: "Userware/Autostart","1"
^SCFG: "Userware/Autostart/Delay","0"
^SCFG: "Userware/Passwd",
^SCFG: "Userware/Stdout","usb1",,,,"off"
^SCFG: "Userware/Watchdog","1"
Many thanks,
David
Hello,
From the software perspective EMERG_RST performs a full reset - the whole system is restarted.
According to what I see in the hardware description dcoument this procedure only requires pulling down EMERG_RST line and no change of ON line is required. So you should not do it but I can't state if toggling this line can cause any particular negative effect.
As for RS232 interfaces the documetnation states that no data must be sent over the ASC0 interface before the interface is active and ready to receive data. No AT commands should be executed bofore SYSSTART URC.
As for your observations ^SYSLOADING , ^SYSSTART and +PBREADY URCs indicate the proper system start. com.cinterion.jrc.JRC_Factory.a(), bci=12 is an exception thrown by JRC MIDlet and ^SYSINFO: 201 confirms that JRC was started but did not succeed to fully init itself within a certain timeout. 5 seconds after this URC, the module should restart.
It's a pity that we don't know what caused this sequence. And it is strange that powering off fixed the issue while without that the app was failing to kick the HW watchdog. That would probably suggest some hardware issue.
On the other hand the very slow output could mean the system overload on start - here you could verify if your MIDlet has Oracle-MIDlet-Autostart parameter greater than 1 to make sure it is started by the system after JRC. Additionally you could try to modify the starting sequence of your app like adding some delay before it initilizes hardware interfaces or performs some heavy actions.
Regards,
Bartłomiej
Hi Bartłomiej,
Thanks for the reply.
It could well be that that our Java code and the JRC are competing with each other for resources after a reset (the Java code does do a lot of I/O and modem configuration at startup) and maybe it could cause the module to fall into some sort of deadlock situation?
I checked Oracle-MIDlet-Autostart in the JAD file and it was set to 2.
I'll add a 10 second delay at the start of the userware to allow the JRC time to finish initialising as you suggest. I'll also amend the hardware watchdog code so that it only pulls EMERG_RST rather than resetting itself and resending a full power on sequence to the BGS5 just in case that is part of the problem.
"As for RS323 interfaces the documetnation states that no data must be sent over the ASC0 interface before the interface is active and ready to receive data. No AT commands should be executed bofore SYSSTART URC."
I'm pretty sure it's not happening in this instance (the external BLE and WiFi modules are deliberately held in reset by default in hardware until the Java code releases it when it's ready for them via GPIOs, and I've verified their reset lines are indeed low during boot using a scope, so there should never be anything sent), but I was just curious as to what would happen if some data (or random noise) got on to those data lines during boot? Is it just that it would cause the ASC0's receive buffer to overflow and crash / cause memory **********, or are those bytes passed through to the modem and it attempts to parse it like an AT command?
Separately, I found an example schematic within the Hardware Interface Document (section 3.2.4.2 - Disconnect BGS5 BATT+ Lines), we'll have a go at implementing that on a test board here as a fallback option.
I'll let you know how it gets on, although it might be several months before I can be sure if it has worked.
Thanks again,
David.
Hello,
If Oracle-MIDlet-Autostart is set to a value greater than 1 (2 in your caswe) in your application, it should is started by the system after JRC. If two apps have the same value the sequence is undefined and either one could be started as first. So here it's fine. But I can imagine that the module may sitll be doing some initializations in the background after JRC is started, there's network registration ongoing etc. So I think that in case of the problems that you observe it could be worth addign some delay (as you did) before the app takes some heavy actions and initialize hardware interfaces.
As for ASC0 I don't think that there are any particular negative effects that must happen if the interface is used too early. At least I don't know them. Maybe it's more to prevent something unpredictable. In case of AT commands before SYSSTART it is more obvious as some of the commands are implemented in JRC and SYSSTRART is displayed by JRC after start.
Best regards,
Bartłomiej