OK. I did all the mentioned advice and added some of my own. I had 6 devices under test. Unfortunately 3 of them had the error again. They were all started at the same time, but the error of all three was at random intervals. All logs that were outputted shows that an address error was thrown. The recovered function pointer of the error locations shows that two of them pointed to 0xA838 and one of them to 0x4C26.
At 0x4C26 in flash, there is no code available, the entire flash page at that address is 0xFFFFFF.
At 0xA838 there is code, but that should only be possible to be executed when there are certain packages over the UART are received. And before that, the CRC should also match before the 0xA838 code can be executed.
I get the idea that the PC crashed and is just pointing randomly in the memory. What do you guys think? My last try would be to lower the clock radically down to a very slow speed. Just to test what will happen then. Furthermore I`m out of options. This below is what I've tried up so far;
Board is a 2 layer type. Ground plane with a minimum of 1mm trace from power supply to micro controller. No traces under controller. I met or exceed all recommended minimum mentioned in datasheet. All caps are types of high voltage with low-esr.
Btw. I Actually don't care about the speed, I needed the extra memory witch was not available in the older PIC24FJ range. So I might try to lower the operating clock to 16mhz, maybe this resolves anything.
At 0x4C26 in flash, there is no code available, the entire flash page at that address is 0xFFFFFF.
At 0xA838 there is code, but that should only be possible to be executed when there are certain packages over the UART are received. And before that, the CRC should also match before the 0xA838 code can be executed.
I get the idea that the PC crashed and is just pointing randomly in the memory. What do you guys think? My last try would be to lower the clock radically down to a very slow speed. Just to test what will happen then. Furthermore I`m out of options. This below is what I've tried up so far;
- no floating pins (made them internally PullDown)
- no more nesting interrupts (apparently this is enabled by default)
- reduced clock speed by 2 (60mhz)
- added trap functions that are outputted to uart, where the PC logs all the data
- added in trap to get function pointer from the last stack so I should now where it went wrong
- tested the trap function by deliberately creating a address fault; trap, logging and function pointer all worked. Function pointer indeed pointed me to the location in flash where the fault happened
- replaced the SMPS for another type - just to be sure - (from Murata to Recom, these are integrated SMPS/direct 7805 replacements)
- enabled the option in compiler to enable errata workarounds (didn't know this). FYI, the compiler listed a lot of errata problems that were not mentioned in the errata datasheet... weird. See below.
- Added compiler options that CONST variables would be put in RAM to minimize PSV usage - just in case-.
- Added extra caps on all (A)Vdd / (A)Vss and Vcap pins in range 10nF, 100nF and 4.7uF.
- Measured the (A)Vdd / (A)Vss and Vcap pin using 100mhz oscilloscope using AC input. During operation of the micro controller, no unexpected behavior was seen on the power lines.
- Checked that no IO pin was exceeding the current. This is not the case, the highest current comes from an indication LED that consumes 4mA, the rest was all signal (SPI/UART1/UART2).
Supported -merrata= errata
retfie A retfie interrupted while in the process of
returning to a repeat instruction can cause an
AddressError.
retfie_disi A retfie interrupted while in the process of
returning to a repeat instruction can cause an
AddressError. Disable using a disi instruction
which will not prevent level 7 interrupts from
occurring.
psv Indirect access to PSV data may cause CPU status
registers to be corrupted producing incorrect
results.
exch Use of the exch instruction on certain devices
supporting the DMA peripheral can cause
corruption in the exchanged data.
psv_trap Use of certain access modes can cause an
_AddressError.
Supported -merrata= errata
retfie A retfie interrupted while in the process of
returning to a repeat instruction can cause an
AddressError.
retfie_disi A retfie interrupted while in the process of
returning to a repeat instruction can cause an
AddressError. Disable using a disi instruction
which will not prevent level 7 interrupts from
occurring.
psv Indirect access to PSV data may cause CPU status
registers to be corrupted producing incorrect
results.
exch Use of the exch instruction on certain devices
supporting the DMA peripheral can cause
corruption in the exchanged data.
psv_trap Use of certain access modes can cause an
_AddressError.
BobAGI
What kind of PCB are you using?
2- or 4-layers?
If you don't have an inner layer dedicated to ground and run all connections to gnd directly down to this plane you may have problems especially at high clock frequencies. This layer should have no traces so it is one continuous metal plane.
But I guess your board is not a simple 2-layer board, right?
Board is a 2 layer type. Ground plane with a minimum of 1mm trace from power supply to micro controller. No traces under controller. I met or exceed all recommended minimum mentioned in datasheet. All caps are types of high voltage with low-esr.
Btw. I Actually don't care about the speed, I needed the extra memory witch was not available in the older PIC24FJ range. So I might try to lower the operating clock to 16mhz, maybe this resolves anything.