Is there a software that can automatically find errors and failed disks, on any of these types of storage:
- mechanical HDD
- flash-storage like USB-sticks or SD-cards
I'm working with a large Linux environment and need to find a monitoring solution for most kind of storage. We got servers, workstations, raspberry pies and so on.
- I know for mechanical HDDs and SSDs you can use smartctl and even compare the smart values against databases of broken disks and prevent disk failures.
- To find errors on flash storage like SD-cards or USB sticks you can check logfiles on Linux for read/write or USB I/O errors.
- To find errors in RAIDs there is MegaCli and so on...
But what I am looking for is a tool that does all of this above automatically. Preferably presents the information in a unified fashion. Something that detects early warning signs. For example:
- "Disk /dev/nvme1n1 has few write cycles left, change it."
- "There are 3 relocated sector error, disk /dev/sdx is about to break."
- "Disk /dev/mmcblk0 has USB connection errors, change it."
- "Found SATA connection errors, check SATA/power cables."
The program should be Open Source. Is there a program, script or Ansible Playbook like that?