Table of Contents
-
The problem decision
-
Removing the failed disk
-
A new disk preparation.
-
Adding a new hard drive in an RAID1 Array
The problem decision
A drive has filed in your linux RAID1 configuration and you need to replace it. We have a dedicated server with CentOS 7 operating system. There are 2 HDD drives 2TB: /dev/sda, /dev/sdb.
Check your failed disk in array by using follow command:
# cat /proc/mdstat
We have active 3 arrays:
# /dev/md125 - /boot
# /dev/md126 - swap
# /dev/md127 - /
To identify if a RAID array or all disk drive has failed, look at the string containing [UU]. Each “U” represents a healthy partition in the RAID array. If you see [UU] the array is healthy. If you see a missing “U” - like a [U_] then the RAID array is degraded or faulty.
For Example: md125 array (/boot) consist from sda2,sdb2. The most detailed information mount point about you can find by using command:
# lsblk
To get detailed information about a RAID device, pass the RAID device with the --detail option to mdadm command:
# mdadm --detail /dev/md125
Removing the failed disk
Before we can physically remove the hard drive from system we must mark as “fail” the disk partition(s) from all RAID arrays to which the failed drive belongs.
In our example, /dev/sdb3 partition is “fail”. Another 2 - /dev/sdb1 and /dev/sdb2 is healthy.
For the new drive installation in RAID 1 array at first need to remove damaged disk. This option apply for each partition.
# mdadm /dev/md125 -r /dev/sdb2
# mdadm /dev/md126 -r /dev/sdb1
# mdadm /dev/md127 -r /dev/sdb3
While your drive have, for example only one of three damaged arrays. You will see status one of arrays is [U_] from massive /dev/md127 and another two of arrays status is [UU]. In this situation need to write only one command:
# mdadm /dev/md127 -r /dev/sdb3
/dev/sdb1 and /dev/sdb2 displayed like - devices or resource busy. By trying to remove they both arrays you will see this error:
To fix it and remove you must to follow commands:
# mdadm --manage /dev/md125 --fail /dev/sdb2
# mdadm --manage /dev/md126 --fail /dev/sdb1
This will help you to change they status to [U_]. Than try to again remove partitions from damaged disk.
Let's check drive and him partitions, which is included there. Disk will removed finaly.
# mdadm --detail /dev/md125
# mdadm --detail /dev/md126
# mdadm --detail /dev/md127
# cat /proc/mdstat
Now we can change damaged drive to new one. Please send to our ticket system request of disk replacement and agree on the time of work with a technician.
P.S. Server must shutdown for a while!
A new disk preparation
With the new empty disk installed, it will not yet have the partition layout needed for the RAID configuration. To prepare the disk, it's partition layout needs to be the same as on the other disk. The partition layout can be copied from the working disk to the new one with the following utilities:
GPT - sgdisk
MBR - sfdisk
In out test server we have both HDD disks 2TB, so we going to use utility sgdisk. To see the most detail information follow this command:
# gdisk -l /dev/sda
Utilities available download from operating system repositories. Choose correct file manager.
CentOS: yum install sgdisk/sfdisk
Debian/Ubuntu: apt install sgdisk/sfdisk
Create and restore backup copies MBR/GPT
Before paste partition table on a new disk need to create backup. This may help you avoid problems with the future errors.
For MBR
Create:
# sfdisk --dump /dev/sdx>sdх_parttable_mbr.bak
Restore:
# sfdisk /dev/sdb>sdх_parttable_mbr.bak
ForGPT
Create:
# sgdisk --backup=sdх_parttable_gpt.bak /dev/sda
Restore:
# sgdisk --load-backup=sdх_parttable_gpt.bak /dev/sdb
a - this letter means which from disk you make a copy.
b - this letter means which on disk you paste a copy.
Adding a new hard drive in an RAID1 Array
Paste a copy partition table from first disk to new. When diskwill be changed can add a new. It must do for each partition.
# mdadm /dev/md125 -a /dev/sdb2
# mdadm /dev/md126 -a /dev/sdb1
# mdadm /dev/md127 -a /dev/sdb3
When the synchronization is finished, the output will look like this:
# cat /proc/mdstat
Reboot server.
In the end you will see new mounted hard drive.
# lsblk