ECC DRAM

ECC DRAM Introducing I’M Intelligent Memory’s ECC DRAM components built with integrated ECC error correction. The on die algorithm corrects single bit errors on the fly, elevating your application to new levels of memory reliability previously only attainable in servers.

Intelligent Memory’s ECC DRAM components are drop-in replacements for conventional DRAM chips and do not require hardware or software changes to function. The data correction is performed within the chip itself without noticeable delays or latencies and completely independent of a processor.

Customers using the I’M ECC DRAM may promote their application utilizing the I’M ECC protected badge.

We offer our ECC DRAM products with operation temperature ranges up to 125° C (X-Grade). Please contact us for AEC-Q100 Grade 1 qualified parts.

ECC DRAM DDR3 ECC DRAM DDR2 ECC DRAM DDR1 ECC SDRAM ECC DRAM mobile DDR

ECC DRAMs are memory components with integrated error-correction logic. The ECC DRAMs internally generate parity-data for each data-block of 64 bit which allow to detect and correct single bit errors within each 64-bit internal data-block. As an example, a 1 Gigabit ECC DRAM internally consists of 16 million blocks of 64 bit. Even in the extremely rare case that each and every block would have a single bit error, the DRAM would still work perfectly as the ECC algorithm will correct all these errors. The error-correction algorithm is identical to what is used on server-memory-modules, but servers perform this algorithm by the CPU, while the ECC DRAMs perform the algorithm in the DRAM-chip itself. This is why ECC DRAMs make it possible to add a 'server level memory reliability' to any application, even if the CPU on your application is unable to perform ECC-correction.

With the ECC DRAM we did not change the technology used to manufacture the memory-array of the DRAMs, but we added a validation and correction algorithm to the device-internal logic.

Intelligent Memory´s ECC eXtra Robust DRAM components utilize a physical protection to the stored data-bits by holding the bits in larger capacitors, with a redundant data topology combined with built-in error correction features.

Explanation:
Each databit of a DRAM is stored in a very small capacitor holding a minimal electron-charge that defines the databit to be Zero or One. The data-integrity completely depends on these little capacitor-charges. On eXtra Robustness ECC DRAM each two cells are being 'twinned' to hold one databit, doubling the total electron-charge. As a result, the retention-time of each cell increases exponentially. Even increased leakage after cell-degradation does not cause a data-loss anymore as there is a much higher charge in each cell. External influences through radiation, antennas, etc can hardly flip the databits any more. The signal-margins (difference of charge-level for a Zero or a One) are much greater.

The eXtra Robustness (XR) DRAMs also have the ECC error correction functionality. In the very rare case that an XR ECC DRAM should have a bit-flip, the error-correction will catch and correct it. They are the most reliable DRAMs available.

Yes, they are 100% fit/form/function compatible to conventional DDR1, DDR2, DDR3 or LPDDR DRAMs according to the JEDEC specifications. They will work as drop-in-replacement without any changes to your hardware or software. Also there are no timing-differences to conventional DRAM. The error-correction logic of the ECC DRAMs is extremely fast, thus there are no additional delays or latencies compared to the JEDEC standard specifications.

Please try our Cross Reference Search to find replacements for the part you are currently using.

Yes, absolutely. If you assemble your board with ECC DRAMs instead of using conventional DRAMs, you will immediately have the error-correction functionality.

Fact is: DRAM components are not perfect. Some databits inside every DRAM will flip from 0 to 1 or from 1 to 0 from time to time. There are multiple analyses and statistics about how often bit-flips in DRAMs occur, but none of them can be used universally for all applications. One interesting research comes from the University Of Toronto and is called 'DRAM Errors in the Wild - A large scale field study'. This study monitored the DRAM errors in the thousands of systems of the famous Google server-farm for a period of 2 1/2 years. All those servers were surely perfectly air-conditioned, dust-free and protected from radiations of all kinds. Still they came to the result of 25000 to 70000 FIT (failures per billion device hours) of 'ECC correctable errors' per Megabit of DRAM. This converts into an average of one single-bit-error every 14 to 40 hours per Gigabit of DRAM.

The field study also explains that the error-rate increases by the age of the memory. Brandnew DRAMs might not show any errors for weeks and months, but then the error-rate suddenly goes up.

Uncorrectable errors could be double- or multi-bit errors or complete functional fails of the DRAM. These can all not be corrected, but are extremely rare.

A 1 Gigabit ECC DRAM contains 16 Million blocks of 64 bit datawords. Per each of these 64 bit words, one error is correctable. In other words: Statistically one out of 16 million hits might be a double-bit error. If one error hits per day, this would mean that it takes hypothetically 16 Million days or 48000 years for a double-bit error to hit. But this is just the maths. Finally the real numbers depend on the stress and the environment the application is running in.

DRAM errors are transient. They come and go. Look at the electronics in your household. Everybody knows the blue-screen of computers. Navigation systems sometimes suddenly hanging up or showing weird things on the screen. The WiFi router sometimes does not want to connect to the internet any more or repeatedly asks you for your WiFi password, although you entered it correctly. After a Reset/Reboot these application work fine again.  In most cases a "single event effect" like a DRAM single-bit-error was the root-cause. Do you return your device to the manufacturer just because you had a non-repeatable problem that got solved after a Reset/Reboot? I guess the answer is No, but still it is annoying.

But not every error results in a system crash. If the error hits into graphics, audio data or unused DRAM areas, the user typically does not even notice it. When the error hits important data, calculations or the program function might be corrupted. The worst scenario is a hit into the program-code.

Real 'defects' of DRAM components are extremely rare. The majority of problems are single-event-'effects' which the customers can not repeat again.
On a server, for example, the customers expect them to run 24 hours a day, 365 days a year. This is why server-CPUs are equipped with an ECC logic and require special 72 bit wide memory-modules that can store 8 parity bits additional to the 64 databits for every access.

You better use error-correction whenever you want your product to be running stable, even after months and years of use, even when people put their cellphone right onto it (disturbance from the antenna) or when the sun heats the product up.

There are many root causes. A major factor is that the memory-cells in the DRAM have slight weaknesses and cause a bit-flip. But even those weak cells are not permanently damaged. The cell might work fine for a million or more accesses and then suddenly loses its data one time, not repeatable. You can only find out it is a weakness by noticing that the same databit-cell is affected again by an error after some time.

DRAMs also suffer from aging, they degredate. The isolations of some single bit-cells get weaker and the leakage increases. Single bit errors suddenly appear although there never has been such problem in the past.

Radiation of any kind can influence the DRAM cells. Even the natural ambient radiation we have on earth is able to flip a bit in a DRAM. The higher above sea-level, the stronger is the radiation.

Also antennas, for example from Cellphones, can disturb right into the memory-cells of a DRAM.

Heat is a big issue for memory-components. As a DRAM stores the data in little capacitors which have a certain leakage anyway, higher temperatures cause an increased leakage and at some point the first one or two memory-cells lose their data before they get refreshed by the CPU. We call this a 'retention-fail'.

Running DRAMs constantly under high temperatures or radiation accelerates the degredation of the products.

It depends on your CPU bus-width and the DRAM bit-width, because every single ECC DRAM IC will perform its individual error-correction. For example, if your CPU has a 64 bit wide memory-bus and you use 8 pieces of ECC DRAMs in a x8 organization, you will have total eight error-corrections running in parallel. This way even multiple single-bit errors within the 64 databits will be correctable (one per component).

Let's compare this to the error-correction as it is done on servers and other applications by the CPU: If the CPU is ECC-capable, it typically has a 72 bit wide memory bus. 9 pieces of DRAMs in a x8 configuration or 18 pieces of DRAMs in a x4 organization have to run in parallel. But all over the total 72 databits, only one bit can be corrected by the CPU-internal ECC.

Conclusion: In the majority of applications multiple DRAMs are connected to the memory-bus of the CPU in parallel. Using ECC DRAMs is much more effective than the CPU-controlled-ECC due to the fact that every single ECC DRAM can perform an individual error-correct, which multiplies the effectiveness.

PS: It would even be possible to use ECC DRAMs together with an ECC-capable CPU. This way each ECC DRAM performs a correction and the CPU will do another final check and correction of the data.

It depends on the reliability you expect from your application. You decide if you can accept the risk of data-corruption, malfunction or crash to happen on your application or not.

Todays conventional DRAMs used in simple game-computers are identical to the DRAMs used in high-end industrial applications. Same part numbers, same products. For automotive applications the DRAM-manufacturers perform a more extensive test process, but even those 'AEC-Q100' qualified DRAMs are still having no protection from disturbing external influences, also they can degradate or react sensitive to line-noise, etc. Stronger quality-testing is no guarantee to have no single-bit errors.

ECC DRAMs are pin-compatible to conventional DRAMs, thus they can also be put onto any memory-module PCB, lifting these modules to an unparalleled reliability. Any application taking DIMMs, SO-DIMMs, Mini DIMMs or other form factors can take advantage of the ECC protection, no matter if the CPU supports ECC or not.

Memory modules typically are 64 or 72 bits wide and are made of multiple DRAMs (with x4 or x8 bit-width) running in parallel. Since every single ECC DRAM performs its own error-correction algorithm, a module built with ECC DRAMs can perform multiple data-verifications & corrections in parallel. Even 72 bit wide modules used on applications that perform ECC by the CPU would be able to add multiple levels of reliability.

Any memory-module manufacturing company can build modules based on ECC DRAMs. Our distribution-partner Memphis Electronic produces memory-modules for the industry since over 20 years and has a portfolio of multiple hundreds different module-designs. Please contact them at sales@memphis.ag with your memory-module demands.

Yes, we definitely work on that. If you have an interesting high volume project for which you would like to use ECC DRAM, please contact us.

Product Brief

DRAM with integrated error correcting code

download PDF

Where to buy

Our distributors and representatives

more