![]() For this reason I generally use SMART as a way of proving a drive is bad (if errors are showing, it's probably going to fail sometime soon), rather than proving a drive is still good. Google's study on disk failures found that while there were good correlations between the various SMART early warning indicators and drive failure, it wasn't a useful tool for predicting individual drive failure. However, SMART isn't generally regarded as 100% reliable. There are only really two useful error condition attributes present for these disks though, as most of the useful SMART attributes for normal disks don't apply to SSDs. End to End are also bad - I've had a few X25-M G2 160GB disks fail with large (>1000) End to End errors reporting. ![]() If the drive is showing any significant number of reallocated sectors, it may be a cause for concern, as this probably points to a failing flash chip (in the same way that a significant number of reallocated sectors on a spinning disk generally points towards surface errors). It has had 148418 * 32MiB (attribute 225) written to it. This shows that the drive has had 1 reallocated sector, has used 1% of its available reserved space (attribute 232) and 2% of its projected program/erase cycles (attribute 233). Vendor Specific SMART Attributes with Thresholds: SMART Attributes Data Structure revision number: 5 (The version is important, earlier versions of smartctl had different attribute-name mappings, and didn't actually correctly understand the specific table for this drive). Dmesg also used to complain about a mismatches GPT partition on some of the drives but I folowed it's advice and fixed the drive with parted after which I haven't seen that message reappear.A good, but not infallible, way of checking any drive health is to check the SMART attributes.īelow is the SMART attribute set for an Intel X25-M G2 160GB disk, taken using smartctl v5.41. BTRFS sees the drive and complains about it being a duplicate and, although it says it removed the duplicate the array is no longer usable and the server requires a reboot. Issue: Sometime after booting Dmesg shows that the device reset and the drive reappears with a different letter. The drive enclosure has 12 4TB sata drives setup in BTRFS raid 10 The server has an LSI 3008-8e HBA with both ports conncected to the drive cage. I don't know the full specs off the top of my head but it's an IBM 12 drive SAS unit. ![]() The server provides NAS via a seperate driver enclosure. The main OS is installed on two nvme SSDs in raid 1 (BTRFS) that is working without issue. Setup: I'm running a threadripper 1990x with 32GB ram. Fedora CoreOS, Debian, CentOS, Gentoo Linux, Oracle Linux, and FreeBSD. I should mention that this setup was working fine for a couple months before it started displaying this issue Q: Why am I asked to verify my phone number when signing up for Amazon EC2. I've been tourble shooting for several days without success. I've been having an issue with my server where the drives will periodically suffer a reset and cause the array to go offline.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |