Malfunction of EHS5-US related to simcard | Telit Cinterion IoT Developer Community
January 23, 2019 - 3:04pm, 5591 views
I have the following problem with an EHS5-US module:
I have thousands of EHS5-USs running and suddenly 1 of them stops working after months of doing it correctly. The symptom is verified because the LED of the EHS5 is always off (command AT ^ SLED = 2) and there is no further communication with the module.
Resetting or turn off the power supply (20 seconds without VBat and due to the capacitor in VBat the voltage drops to 1.2V approx.) does not solve the problem.
If, when replacing the module with a new one (tested in the factory), the same simcard that was in the faulty module is used, the same fault is presented in the new module. It is the "simcard of death". If the simcard is removed both modules work normally.
If I reuse the "simcard of death" in the first module that failed now works normally. It seems that using the simcard in another module (which also failed) fixes the problem (this behavior is repeated in all cases).
Can the simcard cause a malfunction in the EHS5-US module?
Where can I find a list of the bugs that the EHS5-US has or had in previous versions?
The EHS5-US reports:
Have you checked if it is possible to communicate with the module over RS232 or USB when this happens? Was it possible to send any AT commands and verify the module state?
It is possible that the SIM card is damaged. Generally if the module cannot communicate with the SIM it switches off the SIM interface. You'd have to do some hardware measurements to verify it.
The problem might also be in the hardware design. Until it's only one SIM you could assume that it's the faulty SIM.
When you write about new module do you mean only the module or the complete hardware?
The equipment has a microcontroller that performs communication through AT commands with the EHS5-US and has been developed by us. The hardware design was reviewed by William Souza of Gemalto in Sao Paulo Brazil.
The problem was presented up to now in 9 devices (9 devices with 9 sim cards) and always in field conditions (in the final client) so making measurements or capturing communications is a bit difficult. But from what can be seen looking at the status LEDs of the pcb it seems as if the EH5S continuously restarted a few seconds after booting.
When we change the equipment all the hardware is changed so it is discarded a problem related to it.
A completely new device (new PCB, microcontroller and EH5S) with the sim card of the equipment that fails presents the same fault but if that sim card is put back in the equipment that failed first, it "comes back to life".
In the sim card management system it can be observed that when the equipment is changed (that is, when the sim of the equipment that failed was placed in the new equipment), the new equipment registers in the network with a zero data session and disconnects without get to identify the new IMEI.
In the laboratory I have 1 of the 9 equipment rescued with the sim card that made it fail working for 6 months without returning failures.
Is there an errata document from the EHS5-US module?
Is there a new firmware version for the EHS5-US that can correct a problem related to the sim card?
Ing. Sergio Lapilli
The module is connected to some MCU. What is the use case, how is the module used? If it fails somehow it is probably also somehow visible for MCU. Do you have any logs from MCU? If the module restarts in a loop or not, if it replies to AT commands or not - it should be visible in such logs. If it is only reproducible in the field I think that it would be helpful to store such logs on the device if possible.
As after moving the failing SIM card to another new device it also fails and after moving the SIM back to the old device it can get back to life, I think that it can be some hardware problem and it can be either in the SIM card or device design. If the design was reviewed by Gemalto we could assume for now that SIM failure is more probable. Have you tried to put such failing SIM to some other device like a phone for example? How about the case that you try to insert a new SIM card to the failing device after this issue happens?
Is your application performing any special operations on SIM card? Do you use SIM cards from one operator only?
There is an update for your firmware available but for now I don't know about any particular problem related to SIM that this FW could correct.
Anyway it seems that it might not be so easy to investigate this problem. Some field tests might be needed, logs from your application, maybe the whole device would need to be analyzed by Gemalto. So I think that this topic would be more appropriate for the support line then a forum.
The equipment works in the following way: The MCU monitors some physical variables and reports the status to the server using a TCP connection. The MCU selects 1 of 2 possible SIMs and turns on the EHS5-US, connects and sends the information to the server. The MCU constantly monitors the Vcore and V180 voltages of the EHS5 and in case of failure generates a shutdown. When the communication with the server is interrupted and it is not possible to reconnect, the EHS5 is switched off using the AT command (in case the message is not received in more than 30 seconds ^ SHUTDOWN the VBat power of the EHS5 is cut off and wait 20 seconds before turning on again), the sim is changed if there is more than 1 installed and it is switched on again.
The MCU generates a log (not of the communication on the ASC0 port but on what happens in general) but unfortunately it is recorded in the internal memory of the EHS5 that is not available while the problem appears.
In all the cases that presented failures there was only 1 SIM in the device belonging to the same operator. The signal level is excellent in all cases with RSSI values between -58dBm and -72dBm.
In 1 case, while the device was not communicating, a second SIM of another operator was placed (without turning off or restarting the MCU) and the device began to communicate normally. When removing the second SIM to return to the first SIM () with which the communication failed, the device communicated normally (the same effect of placing the SIM in another equipment and returning it to the original).
So it seems that besides the fact that the communication stops working we don't know for sure what exactly has happened between MCU and EHS5 module if there are no logs available. It is a serious problem. We can't be sure why there was no communication, if it was network registration problem, packet domain attach, hardware failure, if AT interface was available and how was the module rebooted by MCU. Maybe it could be possible to connect to the failing device in the field and grab some live logs or trace serial interface.
Can you write how the logs are sent to the module? If these logs are missing we could suspect that the module might be unavailable. Or maybe the logs were not sent to the module because it was rebooted by MCU before.
And maybe it is only an operator problem (if it only happens with one operator). Additionally it seems to happen only in the field - maybe there is anything specific in the places that this has happened.
Additionally I think that this module reset procedure that you have described may be causing problems. Cutting off the power is not recommended because it can sometimes damage the module (for example if there is some flash operation ongoing which can happen if for example AT command is being executed). There is emergency reset line which can be used for reset. I'm not HW specialist but according to my knowledge when the module is switched off Vbat should be cut off, I think that 1.2 V may be too much and may cause module malfunction if this procedure is triggered. You have written that the hardware design was reviewed - could you possibly share CSP number?
We have some problems with files that disappear from the EHS5 flash, such as the log file, probably because the EHS5 took more than 30 seconds to reply to the AT ^ SMSO command. This happens very rarely, 2 or 3 per year for every 1000 devices and I find out because the device reports the loss of the log file. The problem was reported to Gemalto Brasil and they responded that it is a problem that occurs with the EHS5 and there is no possibility of correcting it and the best solution is to place an external flash.
How long is it necessary to wait for the response to the SMSO command before performing an emergency reset or shutdown?
The EHS5 has 6 x 47uF ceramic distributed capacity (total of 282uF) that ensure the minimum of 150uF at the working voltage of 4.0V and for the life time of the device required by the manual "EHS5_HID_v02.000c". When I cut off the power supply (Texas Instruments TPS27081A switch) the EHS5 is isolated from the circuit (all lines of communication with the MCU are isolated) and the 282uF are discharged due to the consumption of the EHS5 itself.
What is the minimum reset voltage of the EHS5, below which the equipment always turns on correctly?
I can wait longer, but below 1.5V the voltage drops very slowly.
The design was reviewed by Souza Willian of Gemalto Brasil and it was simply by exchange of the schematics via email, generating a series of recommendations that were taken into account. I can email you the current design files (and the originals with the recommendations). Please, send me your email.
The device has a battery so I will try to bring the next device that fails to the lab and take a log of the current communication between the MCU and the EHS5.
After SMSO command the module deregisters from the network, finalizes any flash write operations and does any other necessary finalizing actions like stopping Java MIDlets which can also take some time depending on the application's logic. To speed things up you can use fast shutdown - the module still will not be damaged but in this case some actions including network deregistration will not be performed.
Can you write how this log file is created?
We have a special procedure for hardware review - it is then being done by the dedicated hardware specialists.
As for the battery - when you test the failing devices in the office are they still powered from the same battery or a power supply?
I'll send you an email - you can send me the design in a reply. I'm not really hardware oriented but I'll ask a college to look at it.