Is there a software that can automatically find errors and failed disks, on any of these types of storage:

  • mechanical HDD
  • flash-storage like USB-sticks or SD-cards
  • RAID
  • SSD
  • NVME

I'm working with a large Linux environment and need to find a monitoring solution for most kind of storage. We got servers, workstations, raspberry pies and so on.

  • I know for mechanical HDDs and SSDs you can use smartctl and even compare the smart values against databases of broken disks and prevent disk failures.
  • To find errors on flash storage like SD-cards or USB sticks you can check logfiles on Linux for read/write or USB I/O errors.
  • To find errors in RAIDs there is MegaCli and so on...

But what I am looking for is a tool that does all of this above automatically. Preferably presents the information in a unified fashion. Something that detects early warning signs. For example:

  • "Disk /dev/nvme1n1 has few write cycles left, change it."
  • "There are 3 relocated sector error, disk /dev/sdx is about to break."
  • "Disk /dev/mmcblk0 has USB connection errors, change it."
  • "Found SATA connection errors, check SATA/power cables."

The program should be Open Source. Is there a program, script or Ansible Playbook like that?

Browse other questions tagged or ask your own question.