raid1 - Page 11 - Developer IT

Degraded RAID-5 array with lvm2 lost superblock and partition table

- by Fred Phillips

I have a RAID-5 array of 4x1TB hard disks with one lvm2 partition on Ubuntu Linux 10.04 LTS. One of the disks has failed. I have re-assembled the array without this failed disk but now mdadm --examine claims the array has no superblock and fdisk says it has no partition table. What can I do to recover the data? # mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sat Mar 5 14:43:49 2011 Raid Level : raid5 Array Size : 2930276352 (2794.53 GiB 3000.60 GB) Used Dev Size : 976758784 (931.51 GiB 1000.20 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Sat Mar 5 15:06:49 2011 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : boba:1 (local to host boba) UUID : 52eb4bc9:c3d8aab5:e0699505:e0e1aa05 Events : 18 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 65 1 active sync /dev/sde1 2 8 49 2 active sync /dev/sdd1 3 0 0 3 removed 4 8 17 - faulty spare /dev/sdb1 # mdadm --examine /dev/md0 mdadm: No md superblock detected on /dev/md0. # fdisk -l /dev/md0 Disk /dev/md0: 3000.6 GB, 3000602984448 bytes 2 heads, 4 sectors/track, 732569088 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1572864 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md0 : active raid5 sdb1[4](F) sda1[0] sdd1[2] sde1[1] 2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] unused devices: <none>

Read the article

Cannot load from raid with grub

- by Andrew Answer

I have a RAID1 array on my Ubuntu 12.04 LTS and my /sda HDD has been replaced several days ago. I use this commands to replace: # go to superuser sudo bash # see RAID state mdadm -Q -D /dev/md0 # State should be "clean, degraded" # remove broken disk from RAID mdadm /dev/md0 --fail /dev/sda1 mdadm /dev/md0 --remove /dev/sda1 # see partitions fdisk -l # shutdown computer shutdown now # physically replace old disk by new # start system again # see partitions fdisk -l # copy partitions from sdb to sda sfdisk -d /dev/sdb | sfdisk /dev/sda # recreate id for sda sfdisk --change-id /dev/sda 1 fd # add sda1 to RAID mdadm /dev/md0 --add /dev/sda1 # see RAID state mdadm -Q -D /dev/md0 # State should be "clean, degraded, recovering" # to see status you can use cat /proc/mdstat After bebuilding completion "fdisk -l" says what I have not valid partition table /dev/md0. So 1) "update-grub" find only /sda and /sdb Linux, not /md0 2) "dpkg-reconfigure grub-pc" says "GRUB failed to install the following devices /dev/md0" I cannot load my system except from /sdb1 and /sda1, but in DEGRADED mode... This is my partial fdisk -l output: Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000667ca Device Boot Start End Blocks Id System /dev/sdb1 * 63 940910984 470455461 fd Linux raid autodetect /dev/sdb2 940910985 976768064 17928540 5 Extended /dev/sdb5 940911048 976768064 17928508+ 82 Linux swap / Solaris Disk /dev/md0: 481.7 GB, 481746288640 bytes 2 heads, 4 sectors/track, 117613840 cylinders, total 940910720 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table Anybody can resolve this issue? I have big headache with this.

Read the article

hp proliant dl360 disk diagnostic issue

- by user1039384

We recently got two used drives (15000) and installed on our HP proliant dl360 G5 server. Created RAID1 and used HP SmartStart CD to perform diagnostics. Interestingly, the Diagnostic tab immidiately fails on Logical drive testing saying the Disk1 should be replaced, while the Test tab successfully runs all the complete tests on both disks and does not find any issue. At the meantime, when booting to esxi 5, vSphere periodically shows the Disk1 as Unknown and Logical drive in recovery process. This happens every 5-10 minutes. Here is the log from HP SmartScan diagnostic: 1 - Device, Test: Logical Drive 1, Storage Controller in Slot 0 1 - Description: The controller has reported a critical error in the drive error log. 1 - Recommended Repair: This drive should be replaced. 1 - Failed Count: 44 1 - Error code: F157 There is also another error log record (see below): 2 - Device, Test: test_components/libstorage.so ID 2 - Description: An unexpected exception occurred while performing an operation. Exception message: CISS_StatusHandler::evaluate: commandStatus = 4 (INVALID); hexdump of CISS_ErrorInfo: 00000000: __ __ 04 __ 20 __ __ __ __ __ __ __ __ __ __ __ .... ... ........ 00000010: __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ ........ ........ 00000020: __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ __ ........ ........ Device: Hard Drive 2, Storage Controller in Slot 0 Property name: Bad Target Count 2 - Recommended Repair: Reboot or restart Insight Diagnostics. Retry the test. If the problem persists, upgrade to the latest version of Insight Diagnostics. 2 - Failed Count: 48 2 - Error Code: F62 Note that rebooting didn't help and I was running the latest diagnostic software version. Anyone has a clue? Is this a real disk issue? BTW, the controller is Smart Array E200i Thanks in advance

Read the article

Looking to replace Ghost with FSArchiver or Clonezilla, few questions about capabilities

- by Daniel Wright

I work for a PC Repair company and we are looking into setting up a dedicated machine with externally accessible SATA bays to clone harddrives as a safety net incase something goes wrong during a repair. We currently use a SATA/PATA to USB bridge called MagicBridge and Norton Ghost on any workstation, but we're looking to move away from Ghost. We have a computer with a large RAID5 array with Windows Server 2008 Standard currently installed, but this can be replaced with a flavour of *nix. I have some experience with Clonezilla, but FSArchiver also seems like a suitable replacment too. My Head Technician wants to know if my chosen solution (probably Clonezilla or FSArchiver, but I'm open to free suggestions) is capable of: Cloning a degraded RAID, such as a single drive from a RAID1 mirror without complaining Producing images that are easily mountable (he'd prefer them to be mountable in Windows, but if there is no other easy way, *nix should be fine) akin to Ghost Explorer so individual files can be restored as well as being able to do bare metal restores. My apologies for wordiness but I wanted to be thorough in my explaination. Thanks for any suggestions or tips :) EDIT: I've just found out that Clonezilla has a workaround for cloning RADI1 drives EDIT2: Found the answer to both of my questions, aparently I wasn't phrasing my searches right, could this question be deleted please?

Read the article

My client's solution of a Windows SBS 2011 VM on an Ubuntu host and VirtualBox is pinning the host CPU

- by Scott Stamp

Here's my situation, I've got a client hosting two servers (one VM), with the host providing VMware Zimbra, the other Windows Small Business Server 2011. Unfortunately, the person before me had configured this setup as follows. Host: Ubuntu Desktop Edition 10.04 (I know, again, not my choice) running VMware Zimbra 8GB of RAM On-board RAID1 of two 320GB Seagate Barracuda drives for the OS Software RAID5 of four 500GB WD Caviar Black drives on MDADM for bulk storage (sorry, I don't know the model #) A relatively competent quad-core Intel Core i7 CPU from the Nehalem architecture (not suspicious of this as the bottleneck) Guest: Windows Small Business Server 2011 4GB of RAM Host-equivalent CPU allocation VDI file for OS hosted on the on-board RAID, VDI file for storage hosted on the on-board RAID For some reason when running, the VM locks up when sitting nearly idle, and the VirtualBox process reports values of 240%+ in top (how is that even possible?!). Anyone have any ideas or suggestions? I'm totally stumped on this one. Happy to provide whatever logs you'd like to take a look at. Ideally I'd drop VirtualBox and provision this with VMware Workstation, but the client has objected to the (very nominal) costs involved. If hardware needs to be purchased to help, it will be, but we're considering upgrades a last-resort at this time. Thanks in advance! *fingers crossed*

Read the article

Homebrew large data cluster access for 2 user levels?

- by Yegor

The title probably makes little sense, so here is an example. I have a file hosting site, that serves a large amount of semi-randomly accessed files. The setup is as follows: High horsepower front-end +DB server that also does encoding for files that need encoding Fresh file server, which stores newly uploaded content, thats probably (and usually) rapidly accessible, which has 500GB of raided SSD storage, that can push over 3GBit of traffic. 3 cheap node servers, containing 2 x 750GB SATA drives in raid1, where files older than 2 weeks are archived, from the SSD server (mentioned above). Files on each server are accessed via subdomains (via modsec) in a straight forward fashion (server1.domain.com, server2.domain.com, etc) Where I have the problem is this. I introduced a "premium" service where people pay a small fee every month, and get ad-free, quick accesses to stuff on the site. Once they are logged in, they access same files via premium.server1.domain.com via a different modsec script, with a different pass phrase. That all works fine and dandy.... except the cheap node servers are all IO bound, so accessing the files on them via a different, unsaturated network makes no difference, since it cannot read off the drive fast enough. What would be a good way to make files on the site be accessible via 2 different network routes, 1 of which will be saturated (the "free network") while all other files are on an un-saturated "premium" network?

Read the article

100% CPU load on Ubuntu 10.04.3 LTS 64bit

- by deadtired

I have 2 days since I am trying to fix this issue, with no success. The server is a mysql database server. Hardware: DELL Poweredge 1950, 2x Intel Xeon Quad Core E5345 @ 2.33GHz, 16 Gb mem, 2x 146Gb SAS (software RAID1) Software: Ubuntu 10.04.3 LTS, MySQL 5.1.41 Issue: while mysql is not used and runs with no database, everything seems alright. As soon as I install a database, it has the reason to bring all 8 cores in 100% with low memory consumption. So, you can imagine the load average goes high (I saw 212 load average for the first time). The server doesn't become unresponsive, but you can see it's slow while browsing the project installed. Additional info: the database used is not more than 24MB and it was moved from a server with less resources and a lot more larger databases. So it's not the database/project. my.cnf is not a reason also, as I used both default one and the one I use on the same distribution on another server.What is interesting is that mysql doesn't close any process and runs to the limit of the max_connections. Logs are quiet. Nothing there. I switched to this Ubuntu version after I suspected some problems in the newly Ubuntu 11.10 server. This one worked alright for an hour after I made a kernel upgrade to 3.0.1 (it was using the memory also) I tested disk speed and seems alright. Some more output on the running server: dstat -cndymlp -N total -D total 3: htop command: Idea? Did anyone meet the same problem? Any fix you can think of?

Read the article

How do I reinitialise a failed RAID 5 drive using terminal on Ubuntu Server

- by Stephen

I've currently put together a new system and part of that has been creating a software RAID 5 using 'mdadm' in Ubuntu Server. I successfully got to the point where I create the array using: sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 I left it to do its thing overnight then used the following command to check on it: watch cat /proc/mdstat To which the following was returned: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[4](S) sdc1[2] sdb1[1] sda1[0](F) 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [_UU_] unused devices: <none> It appears that one has failed (and I'm not too savvy with why another is a spare). So, just to be sure that something else isn't amiss I wanted to try and re-engage the failed drive. Can someone explain how I can do that and what I should do with the spare (if anything). And also how do I know when synchronisation is complete? The tutorial I used to get this far is located here: http://sonniesedge.co.uk/2009/06/13/software-raid-5-on-ubuntu-904/ Many thanks! p.s. Here is some extra information that may help: sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Jun 18 21:14:21 2012 Raid Level : raid5 Array Size : 5860535808 (5589.04 GiB 6001.19 GB) Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon Jun 18 21:50:26 2012 State : clean, FAILED Active Devices : 2 Working Devices : 3 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Name : myraidbox:0 (local to host myraidbox) UUID : a269ee94:a161600c:fb1665e7:bd2f27b3 Events : 13 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 0 0 3 removed 0 8 1 - faulty spare /dev/sda1 4 8 49 - spare /dev/sdd1

Read the article

Removing a device in "removed" state from Linux software RAID array

- by Sahasranaman MS

My workstation has two disks(/dev/sd[ab]), both with similar partitioning. /dev/sdb failed, and cat /proc/mdstat stopped showing the second sdb partition. I ran mdadm --fail and mdadm --remove for all partitions from the failed disk on the arrays that use them, although all such commands failed with mdadm: set device faulty failed for /dev/sdb2: No such device mdadm: hot remove failed for /dev/sdb2: No such device or address Then I hot swapped the failed disk, partitioned the new disk and added the partitions to the respective arrays. All arrays got rebuilt properly except one, because in /dev/md2, the failed disk doesn't seem to have been removed from the array properly. Because of this, the new partition keeps getting added as a spare to the partition, and its status remains degraded. Here's what mdadm --detail /dev/md2 shows: [root@ldmohanr ~]# mdadm --detail /dev/md2 /dev/md2: Version : 1.1 Creation Time : Tue Dec 27 22:55:14 2011 Raid Level : raid1 Array Size : 52427708 (50.00 GiB 53.69 GB) Used Dev Size : 52427708 (50.00 GiB 53.69 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Nov 23 14:59:56 2012 State : active, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Name : ldmohanr.net:2 (local to host ldmohanr.net) UUID : 4483f95d:e485207a:b43c9af2:c37c6df1 Events : 5912611 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 0 0 1 removed 2 8 18 - spare /dev/sdb2 To remove a disk, mdadm needs a device filename, which was /dev/sdb2 originally, but that no longer refers to device number 1. I need help with removing device number 1 with 'removed' status and making /dev/sdb2 active.

Read the article

How to organise storage for media content such as video and music?

- by thor

Currently, we have a single server hosting all content: music, video and software. This content is downloaded by users through HTTP. Now free space is coming to an end and we are exploring different ways of extending our storage capacity. We want to do it cheap, simple and reliable (protected from disk/ server faults). Currenly, we see two ways: Add a couple of cheap servers with 4 disks (RAID1 ?), run some distributed file-system on top, like GlusterFS. Pros: hopefully, we will see all our disks as single flat file system, just dump content into it and be done. Cons: could be tricky in configuration and handling of faults. Add a couple of cheap servers, all running HTTP servers. Each piece of content (be it a music file or video) is placed on randomly selected two servers. Pros: don't have to deal with RAID, as content is duplicated; single server failure does not bring down any part of content; doubled distribution capacity (as any signle file could be downloaded from any of two servers hosting it). Cons: requires some scripting on part of distribution of content, adding/ removing servers. Do we miss any other ways? Which of the aforementioned options seems to be the best?

Read the article

Software RAID 1 Configuration

- by Corve

I have created a software RAID 1 quite some while ago and it always seemed to work for me. However I am not completely sure that I have configured everything right and do not have the experience to check so I would be very grateful for some advice or just verification that all seems right so far. I am using Linux Fedora 20 (32 bit with plans to upgrade to 64bit) The RAID 1 should consist of two 1TB SATA hard drives. This is the output of mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Sun Jan 29 11:25:18 2012 Raid Level : raid1 Array Size : 976761424 (931.51 GiB 1000.20 GB) Used Dev Size : 976761424 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Sat Jun 7 10:38:09 2014 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : argo:0 (local to host argo) UUID : 1596d0a1:5806e590:c56d0b27:765e3220 Events : 996387 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 0 1 active sync /dev/sda The RAID is mounted successfully: friedrich@argo:~ ? sudo mount -l | grep md0 /dev/md0 on /mnt/raid type ext4 (rw,relatime,data=ordered) Basically my question are: Why do I only have 1 active device? What does the State removed at bottom mean? Also I noticed some strange error messages that I see on the console on system start and shutdown and always repeating in the background when I switch with Ctrl + Alt + F2: ... ata2: irq_stat 0x00000040 connection status changed ata2: SError: { CommWake DevExch } ata2: COMRESET failed (errno=-32) ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen ata2: irq_stat 0x00000040 connection status changed ata2: SError: { CommWake DevExch } ata2: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen ... Are these errors related to the RAID? Something seems wrong with the SATA devices.. All together the system works (I can read and write to the mounted raid) but I always had these strange errors on startup shutdown (probably always in the background). Thx for your help

Read the article

How to display/define Mirror/Stripping pairs with mdadm

- by Chris

I want to make a standard linux software Raid10 over 4 HDD. The server has 4HDDs, 2 pairs from different vendors in order to avoid batch problems. I want to have the mirror over two different Vendors, and then the Stripe over the mirror pairs. I could do that by manually creating Raid1/0, but mdadm supports Raid level 10. I just cant figure out how the Raid10 is then handled and how the data is distributed. mdadm --detail /dev/md10 /dev/md10: Version : 1.2 Creation Time : Wed May 28 11:06:23 2014 Raid Level : raid10 Array Size : 1953260544 (1862.77 GiB 2000.14 GB) Used Dev Size : 976630272 (931.39 GiB 1000.07 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Wed May 28 11:06:23 2014 State : clean, resyncing (PENDING) Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 512K Name : pdwhost:10 (local to host pdwhost) UUID : a3de0ad5:9e694ee1:addc6786:c4449e40 Events : 0 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 81 1 active sync /dev/sdf1 2 8 97 2 active sync /dev/sdg1 3 8 113 3 active sync /dev/sdh1 does not really give any information about that. How it should be: Raid 1 / Mirror over /dev/sda1 /dev/sdf1 and /dev/sdg1 /dev/sdh1 Raid 0 over the two Raid 1 pairs Is it possible to do that with the built in "level=10", how can I see what pairs are mirrored? Thanks a lot for you help

Read the article

How to get data out of a Maxtor Shared Storage II that fails to boot?

- by Jonik

I've got a Maxtor Shared Storage II (RAID1 mode) which has developed some hardware failure, apparently: it fails to boot properly and is unreachable via network. When powering it on, it keeps making clunking/chirping disk noise and then sort of resets itself (with a flash of orange light in the usually-green LEDs); it then repeats this as if stuck in a loop. In fact, even the power button does nothing now – the only way I can affect the device at all is to plug in or pull out the power cord! (To be clear, I've come to regard this piece of garbage (which cost about 460 €) as my worst tech purchase ever. Even before this failure I had encountered many annoyances about the drive: 1) the software to manage it is rather crappy; 2) it is way noisier that what this type of device should be; 3) when your Mac comes out of sleep, Maxtor's "EasyManage" cannot re-mount the drive automatically.) Anyway, the question at hand is how to get my data out of it? As a very concrete first step, is there a way to open this thing without breaking the plastic casing into pieces? It is far from obvious to me how to get beyond this stage; it opens a little from one end but not from the other. If I somehow got the disks out, I could try mounting the disk(s) on one of the Macs or Linux boxes I have available (although I don't know yet if I'd need some adapters for that). (NB: for the purposes of this question, never mind any warranty or replacement issues – that's secondary to recovering the data.)

Read the article

System Center 2012 VMM UI is very slow

- by Grant

I've recently setup system center 2012 a new server 2008 r2 server which I'm using for virtual machines. Everything seems to be working fine, and the virtual machines are nice and fast. But the Virtual Machine Manager interface is always excruciatingly slow. Sometimes taking up to 15 seconds moving between screens. It's very frustrating trying to use it when a task that just involves a couple clicks ends up taking several minutes. Pages that have a lot of form fields seem to take the longest to load - such as the page to change hardware settings of a virtual machine. Is this just normal performance for VMM? If not, where can I look to find what is slowing it down. Nothing else on the system seems to suffer. I can load and use Hyper-V manager with no noticable slowness. Even programs like event viewer that are usually rather slow seem to load fairly fast. Only the system center programs seem slow. Server is a Dell R710, 2x16 core opteron 6274 processors, 96GB RAM. OS drive is 2x500GB 7.2k RPM SAS drives in RAID1 (opted for the less expensive 7.2k drives since pretty much everything is stored on the SAN). Am I just being impatient? Does anyone else use VMM 2012 and find it slow?

Read the article

Partition is missing in /dev

- by haimg

I'm having a strange problem since I moved from Centos5 to Centos6. I have three disks, first two are used as a RAID1, and third one is a stand-alone backup disk that is not listed in /etc/fstab (it is mounded when needed and then unmounted). My problem: After a boot, /dev/sdc exists but /dev/sdc1 does not. Also, the links in /dev/disks are also absent for the first partition of sdc. Disk itself is fine, and if I hot-remove it and plug it back in, /dev/sdc1 appears ok and everything is working. My question: What subsystem manages auto-discovery of disks, partitions, etc. during the boot process (e.g. what creates /dev/disks/by-label)? How do I configure it to scan /dev/sdc too and create all relevant files and links in /dev ? Edit: Here's the relevant part of dmesg output (the only place sdc appears). It does list sdc1, but it's not in /dev! sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) sd 3:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 3:0:0:0: [sdc] Write Protect is off sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdc: sd 0:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: DMAR:[DMA Read] Request device [00:1e.0] fault addr 361bc000 DMAR:[fault reason 06] PTE Read access is not set sdb1 sdb2 sdb3 sdc1 sda1 sd 1:0:0:0: [sdb] Attached SCSI disk sd 3:0:0:0: [sdc] Attached SCSI disk sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk

Read the article

Recovering ZFS pool with errors on import.

- by Sqeaky

I have a machine that had some trouble with some bad RAM. After I diagnosed it and removed the offending stick of RAM, The ZFS pool in the machine was trying to access drives by using incorrect device names. I simply exported the pool and re-imported it to correct this. However I am now getting this error. The pool Storage no longer automatically mounts sqeaky@sqeaky-media-server:/$ sudo zpool status no pools available A regular import says its corrupt sqeaky@sqeaky-media-server:/$ sudo zpool import pool: Storage id: 13247750448079582452 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: Storage UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data 805066522130738790 ONLINE sdd3 ONLINE sda3 ONLINE sdc ONLINE A specific import says the vdev configuration is invalid sqeaky@sqeaky-media-server:/$ sudo zpool import Storage cannot import 'Storage': invalid vdev configuration I should have 4 devices in my ZFS pool: /dev/sda3 /dev/sdd3 /dev/sdc /dev/sdb I have no clue what 805066522130738790 is but I plan on investigating further. I am also trying to figure out how to use zdb to get more information about what the pool thinks is going on. For reference This was setup this way, because at the time this machine/pool was setup it needed certain Linux features and booting from ZFS wasn't yet supported in Linux. The partitions sda1 and sdd1 are in a raid 1 for the operating system and sdd2 and sda2 are in a raid1 for the swap. Any clue on how to recover this ZFS pool?

Read the article

How to diagnose issue between mobo, RAID, and SSD cache drive? [migrated]

- by goober

Background This issue is happening on my custom-built desktop. Relevant specs: Motherboard: ASUS P8Z68-V PRO Utilizing Intel RST technology (application that uses unused SSD as cache) Processor: Intel core i7-2600k (not overclocked) HDDs: RAID1 of 2x Seagate Barracuda 1TB (ST31000524AS) (RAID performed via z68 chipset) Machine has run fine for ~1 year with no issues, and has been well-maintained (dust, etc.) What Happened Random Freezing issues -- intermittent Looked at the RST application screen to see that the acceleration cache was listed as "unavailable" -- recommended that I power down and reconnect the drive. Reconnected the drive to no avail. Attempted to move the drive to another SATA port. Acceleration option disappeared from RST software. Now, the freeze happens whenever loading something particularly data-driven (a video, a game, etc.) Steps Attempted Reconnected the drive to no avail. Updated Intel RST software to v. 11.6.0.1030 to see if that made a difference. Attempted to move the drive to another SATA port. Acceleration option disappeared from RST software. Connected the drive as its own volume. Formatted it, ran disk check errors -- all seems fine. Reconnected the drive and selected it again as the cache drive. Now, what happens when there is a freeze: Machine freezes I am unable to perform any command Screen then goes black I hit the reset button During boot, all drives show as "Disabled" and I am told no volume can be found I then hit the reset button (or power off/on) again. Either the next time (or sometimes after repeating this once more), the metadata cache is reconstructed and the system boots fine, showing the SSD as a cache. Question I believe this is an issue with the SSD itself, but how can I be sure since connecting it separately appeared to show no problems? I want to make sure it's not an issue with the motherboard, SATA ports, etc.

Read the article

Replicated filesystem and EC2 MySQL

- by El Yobo

I'm currently investigating migrating our infrastructure over to run on Amazon's EC2 and am trying to figure out the best way to set up a MySQL service. I'm leaning towards running our own MySQL instances, rather than going with Amazon's RDS, but am still considering the best approach for performance and cost on the instance itself. In order to have persistent data, the MySQL data needs to be on an EBS volume (with some form of striped RAID, e.g. RAID0 or RAID10) to improve persistence. However, EBS IO is limited by the network interface (gigabit, so a theoretical maximum of 128 MB/s), while the ephemeral volumes have no such problem. I did see a suggestion for running two MySQL servers on an instance, with a master running on the ephemeral disk (which we would also RAID) and a slave storing changes to an EBS volume, but this has some additional overhead and complexity (two servers). What I was imagining is using some form of replicated file system such that I could have a filesystem on top of a RAID0 of ephemeral volumes to maximise performance all changes from the above immediately replicated to another RAID1 volume backed by multiple EBS volumes to ensure no data loss The advantages of this would be best possible IO performance for the DB server; no network delay in IO decreased IO on EBS volumes (as all read IO will be done on the ephemeral volumes) so decreased cost good data security, as it's backed onto redundant EBS volumes However, I haven't seen an appropriate system to replicate all changes from one volume to the other; is there a filesystem, or any other approach, which will do this? The distributed file systems, e.g. GlusterFS, DRBD etc seem to focus on replicating disks between servers, can they be set up to do what I'm interested in here? I also haven't seen anything about other's taking this approach. Do I have a solution in need of a problem here (i.e. is performance good enough, so this whole idea is redundant)? Is there some flaw in the plan?

Read the article

How to verify that a physical volume is encrypted? (Ubuntu 10.04 w/ LUKS)

- by Bob B.

I am very new to LUKS. During installation, I tried to set up an encrypted physical volume so that everything underneath it would be encrypted. I chose "Use as: physical volume for encryption," the installation completed and I have a working environment. How can I verify that the PV is indeed encrypted? I was never prompted to provide a passphrase, so I most likely missed a step somewhere. At the end of the day, I'd like whole disk encryption if that's possible, so I don't have to worry about which parts of the file system are encrypted and which aren't. If I did miss something, do I have to start over and try again, or can it be done (relatively easily?) after the fact? I would prefer not to introduce more complexity by using TrueCrypt, etc. Environment details: The drives are md raid1. One volume group. A standard boot lv. An encrypted swap lv using a random key (which seems to be working fine). Thank you in advance for your help. This is very much a learn-as-I-go experience.

Read the article

Oracle Linux screen freezes during installation

- by Fearless

I was installing Oracle Linux 6.4 on a server, and the screen suddenly froze. Here were the previous steps: I put in the disk, clicked install, checked the disk (no errors), did pre-install setup (clock, root password, host+domain name, etc.), configured two 40GB hard drives in a RAID1 array (no swap, 3100mb encrypted raid partitions, ~100mb ext4 partition mounting to /boot, encrypted ext4 RAID device with mounting to /), selected packages, hit continue. The system did its short preinstall processes, then when to the main installation screen with the long status bar. The installer proceeded like always, but around package 250 out of ~1000, the screen suddenly went black with a text cursor in the upper left corner of the screen and the mouse cursor in its previous place. Neither cursor moved and the only thing that triggered a response was a ctrl-alt-delete that rebooted it. I have run this in VMs before without this issue. Memtest hasn't reported anything, and the media check went smoothly. The machine has supported Ubuntu server without issues before. Any ideas? I have tried booting after that, but the grub bootloader tries to find fd0 for some reason (I have no idea why it would search for the floppy disk). UPDATE My server successfully installed, but won't boot up. I think that, for some reason, it is still using the old bootloader from the previous installation. Any ideas on how to fix that?

Read the article

Performance-optimizing Oracle 10g on a server that is also a Tomcat JSP app server?

- by PKHunter

I have inherited a simple RedHat 5 - 64bit platform. It has SCSI disks on RAID1, with 16GB of RAM. Double Core CPU. Oracle 10g, Release 2. This would be a decent platform for running the DB only, perhaps, but the same server in an "A-A mode" clustering (very simple) also runs Tomcat and there are several Java servlets running on this. Sadly there is no caching platform etc. We only use an external CDN for some html caching. I am personally more familiar with web environments on the LAMPP platform (apache, php, mysql, postgresql). PROBLEM: Because the server has both Tomcat JSP/Java and Oracle 10g running on the same server, with no caching, I have some issues of the server going down. Often, sadly. QUESTION: What are my options in terms of improving performance of all these different apps? Connection Pooling? Example, in Postgresql world we have PgBouncer, which really helps things. Does Oracle have something similar? Or is there a famous Java-based external pooler that people use in production environments? (I'm not familiar with Java) Any "SQL cache" as in the MySQL and Postgresql world? Any other kind of application cache, as "APC" or "eAccelarator" in the PHP world? The "OSCache" stuff from the Java world (JSP thingie I found on Google: http://onjava.com/pub/a/onjava/2005/01/05/jspcache.html?page=2) ... What else? Sorry if this is a noob question. I have googled and googled, but problem is I don't know what to google for, other than the broad general concepts above. So if not full answers, I would even appreciate basic pointers and I am happy to JFGI myself. Thanks!

Read the article

Performance of external USB disk with ESXi5

- by PeterMmm

I have a new HP DL120 G7 server with ESXi5. One VM is a Win2003 instalation and I have an external USB2.0 drive attached by USB Controller and USB Device. I copy a 4GB file from external USB to server disk. In the VM that takes up to 10 minutes. On a native Win2003 that takes aprox. 3 minutes. I have no explaination for that diference: In any case the bottleneck is the USB connection, much slower than the disks (SAS, RAID1). If the USB connection on the VM would be USB1.1 and not USB2.0 it would take much more time. (The disk performance between server partitions on the VM is correct. - see update) Could be that my native box is extremely fast and the VM is the normal case. ??? Update I try with passtrough and a first run copy the same data in aprox. 7 minutes. Still 2 times slower than the native connection. I also did another messure and the copy between partitions on the same VM takes 3 minutes.

Read the article

Question marks showing in ls of directory. IO errors too.

- by jaymoo

Has anyone seen this before? I've got a raid 5 mounted on my server and for whatever reason it started showing this: jason@box2:/mnt/raid1/cra$ ls -alh ls: cannot access e6eacc985fea729b2d5bc74078632738: Input/output error ls: cannot access 257ad35ee0b12a714530c30dccf9210f: Input/output error total 0 drwxr-xr-x 5 root root 123 2009-08-19 16:33 . drwxr-xr-x 3 root root 16 2009-08-14 17:15 .. ?????????? ? ? ? ? ? 257ad35ee0b12a714530c30dccf9210f drwxr-xr-x 3 root root 57 2009-08-19 16:58 9c89a78e93ae6738e01136db9153361b ?????????? ? ? ? ? ? e6eacc985fea729b2d5bc74078632738 The md5 strings are actual directory names and not part of the error. The question marks are odd, and any directory with a question mark throws an io error when you attempt to use/delete/etc it. I was unable to umount the drive due to "busy". Rebooting the server "fixed" it but it was throwing some raid errors on shutdown. I have configured two raid 5 arrays and both started doing this on random files. Both are using the following config: mkfs.xfs -l size=128m -d agcount=32 mount -t xfs -o noatime,logbufs=8 Nothing too fancy, but part of an optimized config for this box. We're not partitioning the drives and that was suggested as a possible issue. Could this be the culprit?

Read the article

LSI 9285-8e and Supermicro SC837E26-RJBOD1 duplicate enclosure ID and slot numbers

- by Andy Shinn

I am working with 2 x Supermicro SC837E26-RJBOD1 chassis connected to a single LSI 9285-8e card in a Supermicro 1U host. There are 28 drives in each chassis for a total of 56 drives in 28 RAID1 mirrors. The problem I am running in to is that there are duplicate slots for the 2 chassis (the slots list twice and only go from 0 to 27). All the drives also show the same enclosure ID (ID 36). However, MegaCLI -encinfo lists the 2 enclosures correctly (ID 36 and ID 65). My question is, why would this happen? Is there an option I am missing to use 2 enclosures effectively? This is blocking me rebuilding a drive that failed in slot 11 since I can only specify enclosure and slot as parameters to replace a drive. When I do this, it picks the wrong slot 11 (device ID 46 instead of device ID 19). Adapter #1 is the LSI 9285-8e, adapter #0 (which I removed due to space limitations) is the onboard LSI. Adapter information: Adapter #1 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9285-8e Serial No : SV12704804 FW Package Build: 23.1.1-0004 Mfg. Data ================ Mfg. Date : 06/30/11 Rework Date : 00/00/00 Revision No : 00A Battery FRU : N/A Image Versions in Flash: ================ BIOS Version : 5.25.00_4.11.05.00_0x05040000 WebBIOS Version : 6.1-20-e_20-Rel Preboot CLI Version: 05.01-04:#%00001 FW Version : 3.140.15-1320 NVDATA Version : 2.1106.03-0051 Boot Block Version : 2.04.00.00-0001 BOOT Version : 06.253.57.219 Pending Images in Flash ================ None PCI Info ================ Vendor Id : 1000 Device Id : 005b SubVendorId : 1000 SubDeviceId : 9285 Host Interface : PCIE ChipRevision : B0 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5003048000ee8e7f 1 5003048000ee8a7f 2 0000000000000000 3 0000000000000000 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b0038f9210 BBU : Present Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 1024MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Present Temperature sensor for controller : Absent ROC temperature : 70 degree Celcius Settings ================ Current Time : 18:24:36 3/13, 2012 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 2 Delay Among Spinup Groups : 12s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Enabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 10 Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 60 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 887 MB Device Present ================ Virtual Drives : 28 Degraded : 0 Offline : 0 Physical Devices : 59 Disks : 56 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : No Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : Yes Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : Yes Block SSD Write Disk Cache Change: Yes Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : Yes Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : Yes Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 64kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : No Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : No Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : Yes Time taken to detect CME : 60s Exit Code: 0x00 Enclosure information: # /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a1 Number of enclosures on adapter 1 -- 3 Enclosure 0: Device ID : 36 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port B Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 65 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11820 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 48 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 1: Device ID : 65 Number of Slots : 28 Number of Power Supplies : 2 Number of Fans : 3 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 28 Status : Normal Position : 1 Connector Name : Port A Enclosure type : SES VendorId is LSI CORP and Product Id is SAS2X36 VendorID and Product ID didnt match FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 36 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0718 Vendor Specific : x36-55.7.24.1 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5020 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11760 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 3 Fan : 0 Fan Speed :Low Speed Fan Status : OK Fan : 1 Fan Speed :Low Speed Fan Status : OK Fan : 2 Fan Speed :Low Speed Fan Status : OK Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 47 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK Enclosure 2: Device ID : 252 Number of Slots : 8 Number of Power Supplies : 0 Number of Fans : 0 Number of Temperature Sensors : 0 Number of Alarms : 0 Number of SIM Modules : 1 Number of Physical Drives : 0 Status : Normal Position : 1 Connector Name : Unavailable Enclosure type : SGPIO Failed in first Inquiry commnad FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : Unavailable Inquiry data : Vendor Identification : LSI Product Identification : SGPIO Product Revision Level : N/A Vendor Specific : Exit Code: 0x00 Now, notice that each slot 11 device shows an enclosure ID of 36, I think this is where the discrepancy happens. One should be 36. But the other should be on enclosure 65. Drives in slot 11: Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 5, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 48 WWN: Sequence Number: 11 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : YES Device Firmware Level: A5C0 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8a53 Connected Port Number: 1(path0) Inquiry Data: MJ1311YNG6YYXAHitachi HDS5C3030ALA630 MEAOA5C0 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 36 Slot Number: 11 Drive's postion: DiskGroup: 19, Span: 0, Arm: 1 Enclosure position: 0 Device Id: 19 WWN: Sequence Number: 4 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Firmware state: Online, Spun Up Is Commissioned Spare : NO Device Firmware Level: A580 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048000ee8e53 Connected Port Number: 0(path0) Inquiry Data: MJ1313YNG1VA5CHitachi HDS5C3030ALA630 MEAOA580 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :30C (86.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Update 06/28/12: I finally have some new information about (what we think) the root cause of this problem so I thought I would share. After getting in contact with a very knowledgeable Supermicro tech, they provided us with a tool called Xflash (doesn't appear to be readily available on their FTP). When we gathered some information using this utility, my colleague found something very strange: root@mogile2 test]# ./xflash.dat -i get avail Initializing Interface. Expander: SAS2X36 (SAS2x36) 1) SAS2X36 (SAS2x36) (50030480:00EE917F) (0.0.0.0) 2) SAS2X36 (SAS2x36) (50030480:00E9D67F) (0.0.0.0) 3) SAS2X36 (SAS2x36) (50030480:0112D97F) (0.0.0.0) This lists the connected enclosures. You see the 3 connected (we have since added a 3rd and a 4th which is not yet showing up) with their respective SAS address / WWN (50030480:00EE917F). Now we can use this address to get information on the individual enclosures: [root@mogile2 test]# ./xflash.dat -i 5003048000EE917F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00EE917F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 5003048000E9D67F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:00E9D67F Enclosure Logical Id: 50030480:0000007F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 [root@mogile2 test]# ./xflash.dat -i 500304800112D97F get exp Initializing Interface. Expander: SAS2X36 (SAS2x36) Reading the expander information.......... Expander: SAS2X36 (SAS2x36) B3 SAS Address: 50030480:0112D97F Enclosure Logical Id: 50030480:0112D97F IP Address: 0.0.0.0 Component Identifier: 0x0223 Component Revision: 0x05 Did you catch it? The first 2 enclosures logical ID is partially masked out where the 3rd one (which has a correct unique enclosure ID) is not. We pointed this out to Supermicro and were able to confirm that this address is supposed to be set during manufacturing and there was a problem with a certain batch of these enclosures where the logical ID was not set. We believe that the RAID controller is determining the ID based on the logical ID and since our first 2 enclosures have the same logical ID, they get the same enclosure ID. We also confirmed that 0000007F is the default which comes from LSI as an ID. The next pointer that helps confirm this could be a manufacturing problem with a run of JBODs is the fact that all 6 of the enclosures that have this problem begin with 00E. I believe that between 00E8 and 00EE Supermicro forgot to program the logical IDs correctly and neglected to recall or fix the problem post production. Fortunately for us, there is a tool to manage the WWN and logical ID of the devices from Supermicro: ftp://ftp.supermicro.com/utility/ExpanderXtools_Lite/. Our next step is to schedule a shutdown of these JBODs (after data migration) and reprogram the logical ID and see if it solves the problem. Update 06/28/12 #2: I just discovered this FAQ at Supermicro while Google searching for "lsi 0000007f": http://www.supermicro.com/support/faqs/faq.cfm?faq=11805. I still don't understand why, in the last several times we contacted Supermicro, they would have never directed us to this article :\

Read the article

disks not ready in array causes mdadm to force initramfs shell

- by RaidPinata

Okay, this is starting to get pretty frustrating. I've read most of the other answers on this site that have anything to do with this issue but I'm still not getting anywhere. I have a RAID 6 array with 10 devices and 1 spare. The OS is on a completely separate device. At boot only three of the 10 devices in the raid are available, the others become available later in the boot process. Currently, unless I go through initramfs I can't get the system to boot - it just hangs with a blank screen. When I do boot through recovery (initramfs), I get a message asking if I want to assemble the degraded array. If I say no and then exit initramfs the system boots fine and my array is mounted exactly where I intend it to. Here are the pertinent files as near as I can tell. Ask me if you want to see anything else. # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default (built-in), scan all partitions (/proc/partitions) and all # containers for MD superblocks. alternatively, specify devices to scan, using # wildcards if desired. #DEVICE partitions containers # auto-create devices with Debian standard permissions # CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays # This file was auto-generated on Tue, 13 Nov 2012 13:50:41 -0700 # by mkconf $Id$ ARRAY /dev/md0 level=raid6 num-devices=10 metadata=1.2 spares=1 name=Craggenmore:data UUID=37eea980:24df7b7a:f11a1226:afaf53ae Here is fstab # /etc/fstab: static file system information. # # Use 'blkid' to print the universally unique identifier for a # device; this may be used with UUID= as a more robust way to name devices # that works even if disks are added and removed. See fstab(5). # # <file system> <mount point> <type> <options> <dump> <pass> # / was on /dev/sdc2 during installation UUID=3fa1e73f-3d83-4afe-9415-6285d432c133 / ext4 errors=remount-ro 0 1 # swap was on /dev/sdc3 during installation UUID=c4988662-67f3-4069-a16e-db740e054727 none swap sw 0 0 # mount large raid device on /data /dev/md0 /data ext4 defaults,nofail,noatime,nobootwait 0 0 output of cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sda[0] sdd[10](S) sdl[9] sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2] sdb[1] 23441080320 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10] [UUUUUUUUUU] unused devices: <none> Here is the output of mdadm --detail --scan --verbose ARRAY /dev/md0 level=raid6 num-devices=10 metadata=1.2 spares=1 name=Craggenmore:data UUID=37eea980:24df7b7a:f11a1226:afaf53ae devices=/dev/sda,/dev/sdb,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev/sdd Please let me know if there is anything else you think might be useful in troubleshooting this... I just can't seem to figure out how to change the boot process so that mdadm waits until the drives are ready to build the array. Everything works just fine if the drives are given enough time to come online. edit: changed title to properly reflect situation

Search Results

Search found 303 results on 13 pages for 'raid1'.

Page 11/13 | < Previous Page | 7 8 9 10 11 12 13 | Next Page >

- by Fred Phillips

- by Andrew Answer

- by user1039384

- by Daniel Wright

- by Scott Stamp

- by Yegor

- by deadtired

- by Stephen

- by Sahasranaman MS

- by thor

- by Corve

- by Chris

- by Jonik

- by Grant

- by haimg

- by Sqeaky

- by goober

- by El Yobo

- by Bob B.

- by Fearless

- by PKHunter

- by PeterMmm

- by jaymoo

- by Andy Shinn

- by RaidPinata

< Previous Page | 7 8 9 10 11 12 13 | Next Page >