A new, small study has highlighted a fairly large problem that was previously unknown. The problem is simple: if the data is stored in the cache and even if the corresponding buffer is emptied with it, it could be lost, which generates a big problem in some SSD models. This begs the most obvious question how can an SSD have a data loss problem from its cache due to a power outage?
Some manufacturers did have the intelligence to see and anticipate this situation, so above all it calms down, since it is not a problem that occurs in all models. So far only 4 models of solid state drives have been tested and two have reported the failure, which has been discovered by an Apple programmer: Russ Bishopwhere you now want to increase the range of scanned devices to get to the bottom of this matter.
SSDs, cache, and a data and power loss issue
Fun story: I tested a random selection of four NVMe SSDs from four vendors. Half lose FLUSH’d data on power loss. That is the flush went to the drive, confirmed, success reported all the way back to userspace. Then I manually yanked the cable. Boom, data gone.
— Russ Bishop (@xenadu02) February 21, 2022
We must emphasize this fact, since we are not talking about the data itself which is housed in the NAND Flash cells and that corresponds to your normal storage, but rather the data that is stored in the cache, which is equally important in time for work or play. It should also be clarified that this has been logically tested on systems Manzanabut the method of operation is indistinct from PC because the SSD works in the same way and the cache is also managed by NVMe and its data flow to it.
With this in mind, the problem is easily reproducible by any programmer, as Bishop says, but the data is really worrying.
The other half never lost data confirmed after a flush (F_FULLFSYNC on macOS) no matter how much I abused them. All four had perfect hits from flushing so they are doing some work.
Top two performers on flush? One lost data 40% of the time. The other never lost any.
— Russ Bishop (@xenadu02) February 21, 2022
What Bishop means here is that data was lost in the process after flushing the cache since it was not transferred to the NAND Flash storage. Why did this happen? First of all, you have to understand that the cache is volatile SRAM memory and therefore when Bishop forced the loss of power by pulling the PC cable, the data that was there was lost, when it shouldn’t be because they shouldn’t be.
Which models are free from this fault?
As we say, this failure should not occur, but it is seen that there are manufacturers that have stopped this problem and others that have not. The four SSDs analyzed were the Samsung 970 EVO PLUS, WD RED SN700 1TB, SK Hynix Gold P31 2TB and Sabrent Rocket 512 with controller Phison PH-SBT-RKT-303.
The Samsung and the WD were the two that correctly maintained the data by doing the emptying in an exemplary manner, while the SK Hynix and the Sabrent were the ones affected. By this week Bishop hopes to have the data from more SSDs:
Tomorrow I’ll have results for:
Intel 670p
Samsung 980
WD Black SN750
WD Green SN350
kingston nv1
Seagate Firecuda 530
Crucial P2
Crucial P5 Plus— Russ Bishop (@xenadu02) February 23, 2022
This is especially interesting because some models like the Samsung 980 or the Kingston NV1 they do not have a cache as such and use HMB (Host Memory Buffer) with the system RAM to simulate its operation, which should in principle produce the same data loss problem as soon as we find ourselves without power in the PC due to the reason whatever. It will be interesting to see the results in just a few days.