Some time after 0800, reports of errors with Webmail were received. Hungyon arrived on site at about 0830 & confirmed tha webmail would fail & report some error. She initiated a reboot at 0847. However, the reboot decided to fsck the disks, so the system was not up & running until 1000.
The system appears to be running normally. But there were numerous errors in the messages log concered SCSI disks ( see below) both before & during the reboot. The kernel's trouble seems to have started at 0316, which more or less coincides with events on the Storage Array at 0313.
Storage Controller A has a amber LED lit on front panel. The SMClient reported an service needed -- but following the details it was refering to the Battery nearing Expiration. The batteries in both controllers are dur to expire in 2012-06-06+7d. Should we order new ones? The batteries protect the write cached data in event of power loss. However, perhaps more serious, is the info in the "Events Log" concerning a failed SAS Port. NB I cannot find a "PHY LOG".
Jun 5 03:16:52 mail kernel: sd 2:0:0:1: SCSI error: return code = 0x00020000
Jun 5 03:16:52 mail kernel: end_request: I/O error, dev sdf, sector 845704641
Jun 5 03:16:52 mail kernel: printk: 20 messages suppressed.
Jun 5 03:16:52 mail kernel: Buffer I/O error on device dm-2, logical block 211426065
Jun 5 03:16:52 mail kernel: lost page write due to I/O error on dm-2
Jun 5 03:16:52 mail kernel: Aborting journal on device dm-2.
Jun 5 03:16:52 mail kernel: sd 2:0:0:1: SCSI error: return code = 0x00020000
Jun 5 03:16:52 mail kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
Jun 5 03:16:52 mail kernel: end_request: I/O error, dev sdf, sector 845730313
Jun 5 03:16:52 mail kernel: Buffer I/O error on device dm-2, logical block 211432483
Jun 5 03:16:52 mail kernel: lost page write due to I/O error on dm-2
Jun 5 03:16:52 mail kernel: sd 2:0:0:1: SCSI error: return code = 0x00020000
Jun 5 03:16:52 mail kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
Jun 5 03:16:52 mail kernel: end_request: I/O error, dev sdf, sector 845760193
Jun 5 03:16:52 mail kernel: Buffer I/O error on device dm-2, logical block 211439953
Jun 5 03:16:52 mail kernel: lost page write due to I/O error on dm-2
Jun 5 03:16:52 mail kernel: sd 2:0:0:1: SCSI error: return code = 0x00020000
Jun 5 03:16:52 mail kernel: end_request: I/O error, dev sdf, sector 497
Jun 5 03:16:52 mail kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
Date/Time: 6/5/12 3:13:50 AM Sequence number: 7547 Event type: 1707 Description: Degraded wide port becomes failed Event specific codes: 0/0/0 Event category: Error Component type: Enclosure Component (ESM, GBIC/SFP, Power Supply, or Fan) Component location: Enclosure 85, Slot 2 Logged by: Controller in slot A
| Event | Desc | Action |
|---|---|---|
| 1707 Degraded wide port becomes Failed | A SAS port has been marked as failed. | A degraded port is usually caused by a faulty cable, environmental services monitor (ESM), controller, disk drive, or enclosure connector. Analysis of the PHY Error Logs might help isolate the problem and indicate which component must be replaced. |
Storage Subsystem: PHAS_MailStore Component reporting problem: Battery Status: Near expiration Location: Controller enclosure 85, Controller in Slot A Smart battery: Yes Component requiring service: Controller A Service action (removal) allowed: No Service action LED on component: No