ELS61T falling off the grid in remote location | Telit Cinterion IoT Developer Community
February 13, 2022 - 8:25pm, 2063 views
Hello everybody,
I have a serious issue on my hands and would appreciate any helpful insights.
I have a fleet of so far 50 ELS61T-E2 LAN modules deployed at hard-to-reach locations since 3-5 months in a wide area IoT system.
All the SIM cards are in roaming mode and connect to various mobile operators in multiple countries. Some locations barely have enough signal to manage the connection. All access technologies are enabled (2G, 3G and LTE).
The problem:
In the initial period after installations in the field (from a week to a couple of months), all terminals were working OK, sending sensor data to the central system every minute.
In the next two months. we lost all communication with more than 10 terminals. They were all working OK up to a certain point, then all comms were cut instantly, the devices just fell off the grid.
These are remote locations and service intervals are scarce, so it took two months to organize a visit to these locations. From the mobile provider side, all that was said was that they are not registered to the network.
A physical inspection, performed last week on ten locations, revealed the same status everywhere:
All terminals were powered on (green LED), but were attempting to register to the network (orange LED blinking each second).
In an attempt to fix the situation, several options were explored:
- power off/on - NO IMPROVEMENT
- SIM card taken out, wiped, reinserted - NO IMPROVEMENT
- a new SIM from the same provider inserted - NO IMPROVEMENT
- terminal replaced with a spare one (using original SIM card)- NORMAL COMMS ESTABLISHED!!!
This procedure was tried on all ten locations with the same outcome - 10 terminals were replaced to restore the funcionality.
Of course, this is by far not the end of the story. In the following days, most of the serviced locations fell of the grid again in the same manner that was observed previously.
The replaced terminals, uninstalled from the visited locations, were tested locally. All but one are fully operational!
In the mean time, 30+ terminals continue to work without issues.
I'm asking for suggestions, where and what to debug in this situation:
- measuring cabinet conditions
- network operator
- device settings
- firmware
Please, ask if more specific info is needed.
thanks in advance...
Jure
Hello,
In such case, assuming that the problem is related to the network registration, it would be good to check on place what is happening with the module - connect to the serial interface and check with AT commands starting from for instance AT+CREG?, AT+CEREG?, AT+CGREG?, (also register URCs to see live updates), AT+COPS?, AT^SMONI, register some useful URCs with AT^SIND etc.
And check AT+CEER output if any error report is available (AT+CEER=0 to reset before the test).
I would also test with a SIM form some other provider and then with the original one again.
It might also make sense to login to OpenWrt with SSH and check system log or trace the USB interfaces between the OS and module.
It is interesting that it was OK for quite a long time before. As the devices are located in different countries the only common point seems to be the SIM provider (and terminals). Please check if the operator was doing any updates on SIMs recently or maybe they changed their roaming agreements with some operators. Maybe this have something to do with the problem.
Did you notice any common point for the not working and working devices like local operator used, country etc.?
On the other hand are all the terminals from the same batch? Do they have the same firmware version, were they configured in the same way?
And how about this single terminal which is still not working? Does it look like the same issue? Please start from this one. Start from getting some data from serial interface.
Best regards,
Bartłomiej
Thanks for the comments, Bartłomiej
I'm planning an on-site visit in the next week. I bring a spare SIM card from some other provider and try to play with commands you suggested.
I've contacted the operator who is of course denying any actions from their side, but confirming that all the faulty locations are simply not registered to the network. No trace of them anywhere since the first connection loss.
Physically all locations are in the same country, but close to state borders, that is way different operators come into play.
What determines the terminals' batch? Some are still labeled Gemalto, others Thales, so they are probably not from the same batch. But, all terminals from the current problematic locations are labeled Gemato.
ATI1 response is:
Can I check anything else?
Maybe it wasn't clear from the first post, but on all locations where terminals were replaced, the new terminals also failed in the next couple of days (the SIM cards were not replaced). So far we expect the same symptoms. A new inspection is planned next week...
The one terminal that really seems to have failed, indicates a successful registration to network (Orange LED 4s), but System out only outputs:
not even loading the system JCR Midlet. So this one really seems broken.
I'm much more concerned about the other 10...
More hints?
Thank you,
Jure
Hello,
There is System.out configured on the interface - do you have any own Java MIDlet running on your devices?
Please connect to some other interface and send AT^SCFG?, AT^SJAM=4 and AT^SJAM=5 to see verify JRC and autostart status. And you can disable System.out if you have no MIDlet with at^scfg="Userware/Stdout","null".
Also check the PIN status with AT+CPIN?. Set AT+CMEE=2 before to see the verbose error replies.
You may also check AT+COPS=? output which will list the available operators. It may take a while to get the response.
You may also try manual network registration - example AT+COPS=1,2,"26001".
It might be useful to register some URCs and observe the output for some time.
The background of the batch question is to check if the version of the terminal may have any influence on the situation. Are you able to state if all the failing terminals are of the same version or have the same firmware on the modules? And how about the ones that do not fail - do they have some other firmware or the same, are also branded Gemalto or Thales etc.?
Is the home network available or only roaming?
What else you could try is to clear LOCI, PSLOCI and FPLMN files on the SIM card with these commands:
Read:
AT+CRSM=176,28542,0,0,11
AT+CRSM=176,28531,0,0,14
AT+CRSM=176,28539,0,0,12
Clear:
AT+CRSM=214,28542,0,0,11,"FFFFFFFFFFFFFFFFFFFFFF"
AT+CRSM=214,28531,0,0,14,"FFFFFFFFFFFFFFFFFFFFFFFFFFFF"
AT+CRSM=214,28539,0,0,12,"FFFFFFFFFFFFFFFFFFFFFFFF"
reboot the module and test.
Please try to verify if the operator changed any roaming partners recently. You may verify with AT+CPOL and AT+CPLS commands user and operator controlled PLMN lists if there is any preferred operator which is for instance no longer valid for your MNO.
Anyway at the moment we are walking around in the fog as we don't know what really happened.
Regards,
Bartłomiej
Hello,
back from my field trip to two locations, triying to make a clear case of the issues. It gets wierder by the hour...
I visited two locations in Germany that fell of the grid a couple of weeks ago. The modems had the orange LEDs blinking every second, trying to register to network without success. I was able to check at the operator side that no data connection was established at the time.
I took the two modems though a series of checks on-site:
In resume, nothing I tried would make the two devices register to the network. Our java application on the module did start, but of course could not reach the network - no data connection could be established, no SMS communication was possible.
In the end three different SIM providers (roaming and local) where tested, all with the same outcome.
The failing devices (ELS61T-E2 LAN, one labeled GEMALTO, one THALES) both have the same firmware version:
Both were replaced on-site with spare modems (same specs) which could connect to network instantly.
So far, the situation was as expected from previous experience with this issue, still without a clue about what causes this.
However the most interesting thing happened on the way home from the site. As we noticed before, the modem failing in production started working as normal when brought to service (in Slovenia). So we kept these two devices powered-on in car for the drive home (8-hour drive).
So, for 4 hours driving through Germany, modems were unable to register to any mobile network. In the minute we crossed the German-Austrian border, both devices registered instantly and all was well. It's been two days and they are still OK.
So, conclusion: SIM cards don't matter, just take the Cinterion module out of the country and it starts working again.
Is there such a thing as a device-level ban at mobile providers', which gets cleared when leaving the country?
This is an extremely severe issue in our current project.
I desperately need advice about how to proceed.
Thanks,
Jure
Hello,
Do you have any logs from the AT commands I suggested?
The network can block the device.
Is it possible that for example your application is aggressively trying to to re-establish the data connection in case it fails and for example it frequently fails due to the poor network coverage?
Regards,
Bartłomiej
Hello again,
it seems our problem is escalating, with more of our devices failing to register to any network daily (a new pandemic)...
Lately, as new devices are being installed in the field, some of them only last some hours, then they disappear from the network.
The devices are locked down to prevent local tampering so I was not able to see local logs or execute AT commands locally on-site (I can only reconfigure logging remotely through OTAP, which is not possible without the device attaching to network). When I brough the device to service (other country), it was working OK. We've since installed a few partially unlocked devices to enable us to see runtime logs on-site.
I don't think it's poor network coverage. Signal strenghts are generally good, and some of the devices that have no problems show very poor signal data.
I opened a support case on the SIMs provider's side to ask them to help with further investigation into the issue. As all devices are in roaming mode, the provider says no event is registered on their side of the devices trying to register to network. They would however try to contact some of their roaming partners the devices are actually registering to. Hope it yields some positive results.
I'm still stumped...
Regards,
Jure
Hello,
I think that it's a good direction that you want to enable access to logs. Please collect the application logs. And also it would be good to include some diagnostic AT commands in the application or at least be able to access the AT interface on sit to see what happens when the device boots and tries to register to the network. That would at least give some basic information.
Maybe it will be just that the module is rejected by the network. But this has to have some reason, so it can be important to record and store the moment when this problem actually starts.
BR,
Bartłomiej
Hello guys,
some good news finally...
Browsing some example source code I got from this forums quite some time ago, I stumbled upon a comment saying "do this or you might have problems with network registration on 4G modules", which instantly trigger my brain. It was about clearing all PDP context s(1..11) with AT+CGDCONT and also resetting authentication with AT^SGAUTH on every device startup. This sounded lika a sane thing to try and by now (10 days testing), it seems it 'll do the trick.
It still means every failed device installed in Germany has to be deinstalled and taken out of the country (!!) to enable it to register to a network, than we can upgrade the app to a version that performs the above described initialization on startup. So far, every upgraded device is able to succesfully register to a German network provider as expected.
Still have no clue in which way Germany is special in this regard, but there is something different there (Swiss, Austrian, French and Slovenian network providers do not trigger this problem). How to explain this to our client, that is another matter...
Thank you for suggestions so far, I'll say this issue is now resolved (knock-on-wood). There are still others, I'll write about them in separate threads here...
Regards,
Jure
Hello,
Thank you for this information.
And what was AT+CGDCONT? output on these devices that you tested? Did you try any other AT commands to diagnose the device state in Germany?
The module uses CID1 for LTE registration but it is not necessary for 2G.
Anyway fingers crossed.
BR,
Bartłomiej