|
|||||
|
|
#11 |
|
|
> Hello. > For about two weeks ago, I had two of these incidents (13 days apart > between them): > hdb: dma_timer_expiry: dma status == 0x61 > hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > ide: failed opcode was: unknown > hda: DMA disabled > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 > hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 > hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 > hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success These are some problems the sytem has with getting data from.to the HDD. It can be cabeling, the HDD and the controller on the mainboard. Unlukley, but possible, is also a failing PSU. > From this, my old Linux/Debian system became slow and unresponsive due > to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r > now) Linux/Debian, reboot, and things are back to normal speed. I doubt > it is temperature related because the room is in the 60s and 70s > degrees(F) and computer wasn't working intensely (e.g., surfing the > Web). > Also, I recalled before these problems started, my motherboard (CMOS and > BIOS) didn't see both of my primary master drives (both HDDs: hda and > hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to > open the case, but didn't see anything wrong. I wiggled the cable ends > for the HDDs. I booted my machine up and it seemed fine for a few days/a > week and then these errors came up (not disconnections). > I ran smartctl utility on both of my HDDs for information and results: > http://pastebin.ca/930776 ... hdb looks fine and the absence of seek errors may indicate your PSU is fine too. For hda, Raw_Read_Error_Rate, Seek_Error_Rate and Hardware_ECC_Recovered look bad. If it was not for the seek error rate, I would have said the read circuitry is going bad. This way it looks like the power may be bad, although I would expect a stronger impact on the Spin_Up_Time. Your data cable is fine, as a problem with that would have caused the UDMA_CRC_Error_Count to show something. As to the tests, you run them and when they are finished, you look at the smart attributes and the test-log. > My full system specifications can be found here: > http://alpha.zimage.com/~ant/antfarm.../computers.txt > (secondary/backup machine). Does that mean my decade old Quantum 6.4 GB > HDD (already made a backup just in case) is finally dying? Or is it > something else? > Thank you in advance. ![]() The Quantum looks fine. If something is dying, it is the Seagate. I think you should do the following: 1. Remove and reseat the power connector on the seagate, and if you use a splitter-cable, connect it directly to the PSU. 2. Run a long SMART selftest on the disk to its completion and then post the SMART attributes again. If they are unchanged, then I would say that your HDA is likely dying. Arno |
|
|
#12 |
|
|
> > between them): > > hdb: dma_timer_expiry: dma status == 0x61 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hda: DMA disabled > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > These are some problems the sytem has with getting data from.to the > HDD. It can be cabeling, the HDD and the controller on the mainboard. > Unlukley, but possible, is also a failing PSU. I had to replace the PSU on 5/14/2007 according to my log: http://alpha.zimage.com/~ant/antfarm/about/toys.html ... It's almost ten months old. > > From this, my old Linux/Debian system became slow and unresponsive due > > to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r > > now) Linux/Debian, reboot, and things are back to normal speed. I doubt > > it is temperature related because the room is in the 60s and 70s > > degrees(F) and computer wasn't working intensely (e.g., surfing the > > Web). > > Also, I recalled before these problems started, my motherboard (CMOS and > > BIOS) didn't see both of my primary master drives (both HDDs: hda and > > hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to > > open the case, but didn't see anything wrong. I wiggled the cable ends > > for the HDDs. I booted my machine up and it seemed fine for a few days/a > > week and then these errors came up (not disconnections). > > I ran smartctl utility on both of my HDDs for information and results: > > http://pastebin.ca/930776 ... > hdb looks fine and the absence of seek errors may indicate your > PSU is fine too. > For hda, Raw_Read_Error_Rate, Seek_Error_Rate and > Hardware_ECC_Recovered look bad. If it was not for the seek error > rate, I would have said the read circuitry is going bad. This way it > looks like the power may be bad, although I would expect a stronger > impact on the Spin_Up_Time. Your data cable is fine, as a problem > with that would have caused the UDMA_CRC_Error_Count to show > something. > As to the tests, you run them and when they are finished, you look > at the smart attributes and the test-log. Where are these logs at? > > My full system specifications can be found here: > > http://alpha.zimage.com/~ant/antfarm.../computers.txt > > (secondary/backup machine). Does that mean my decade old Quantum 6.4 GB > > HDD (already made a backup just in case) is finally dying? Or is it > > something else? > > Thank you in advance. ![]() > The Quantum looks fine. If something is dying, it is the Seagate. > I think you should do the following: > 1. Remove and reseat the power connector on the seagate, and > if you use a splitter-cable, connect it directly to the PSU. > 2. Run a long SMART selftest on the disk to its completion and > then post the SMART attributes again. > If they are unchanged, then I would say that your HDA is likely > dying. Odd that dmesg and /var/log/messages don't mention hda. Same for SMART. -- "All the best work is done the way that ants do things -- by tiny but untiring and regular additions." --Lafcadio Hearn /\___/\ / /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( ) |
|
|
#13 |
|
|
> In comp.sys.ibm.pc.hardware.storage Ant <ANTant@zimage.com> wrote: > > Hello. > > > For about two weeks ago, I had two of these incidents (13 days apart > > between them): > > > hdb: dma_timer_expiry: dma status == 0x61 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hda: DMA disabled > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > > hdb: dma_timer_expiry: dma status == 0x41 > > hdb: DMA timeout error > > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest } > > ide: failed opcode was: unknown > > hdb: DMA disabled > > ide0: reset: success > > These are some problems the sytem has with getting data from.to the > HDD. It can be cabeling, the HDD and the controller on the mainboard. > Unlukley, but possible, is also a failing PSU. > > > From this, my old Linux/Debian system became slow and unresponsive due > > to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r > > now) Linux/Debian, reboot, and things are back to normal speed. I doubt > > it is temperature related because the room is in the 60s and 70s > > degrees(F) and computer wasn't working intensely (e.g., surfing the > > Web). > > > Also, I recalled before these problems started, my motherboard (CMOS and > > BIOS) didn't see both of my primary master drives (both HDDs: hda and > > hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to > > open the case, but didn't see anything wrong. I wiggled the cable ends > > for the HDDs. I booted my machine up and it seemed fine for a few days/a > > week and then these errors came up (not disconnections). > > > I ran smartctl utility on both of my HDDs for information and results: > > http://pastebin.ca/930776 ... > > hdb looks fine and the absence of seek errors may indicate your > PSU is fine too. > > For hda, Raw_Read_Error_Rate, Seek_Error_Rate and > Hardware_ECC_Recovered look bad. If it was not for the seek error > rate, I would have said the read circuitry is going bad. This way it > looks like the power may be bad, although I would expect a stronger > impact on the Spin_Up_Time. > Your data cable is fine, as a problem with that would have caused the > UDMA_CRC_Error_Count to show something. Not if the commands that initiate the UDMA data tranfers never make it. Which apparently they didn't as a bus reset was needed to get it going again. Reversely, UDMA CRC errors say nothing about cable quality. > > As to the tests, you run them and when they are finished, you look > at the smart attributes and the test-log. > > > My full system specifications can be found here: > > http://alpha.zimage.com/~ant/antfarm.../computers.txt > > (secondary/backup machine). Does that mean my decade old Quantum 6.4 GB > > HDD (already made a backup just in case) is finally dying? Or is it > > something else? > > > Thank you in advance. ![]() > > The Quantum looks fine. > If something is dying, it is the Seagate. The Seagate is fine. > > I think you should do the following: > 1. Remove and reseat the power connector on the seagate, and > if you use a splitter-cable, connect it directly to the PSU. > 2. Run a long SMART selftest on the disk to its completion and > then post the SMART attributes again. > If they are unchanged, then I would say that your HDA is likely > dying. > > Arno |
|
|
#14 |
|
|
In comp.sys.ibm.pc.hardware.storage Ant <ANTant@zimage.com> wrote:
[...] >> For hda, Raw_Read_Error_Rate, Seek_Error_Rate and >> Hardware_ECC_Recovered look bad. If it was not for the seek error >> rate, I would have said the read circuitry is going bad. This way it >> looks like the power may be bad, although I would expect a stronger >> impact on the Spin_Up_Time. Your data cable is fine, as a problem >> with that would have caused the UDMA_CRC_Error_Count to show >> something. >> As to the tests, you run them and when they are finished, you look >> at the smart attributes and the test-log. > Where are these logs at? It seems your hdb cannot log self-test results. For hda, the log starts at line 139 in your SMART attribute list. >> > My full system specifications can be found here: >> > http://alpha.zimage.com/~ant/antfarm.../computers.txt >> > (secondary/backup machine). Does that mean my decade old Quantum 6.4 GB >> > HDD (already made a backup just in case) is finally dying? Or is it >> > something else? >> > Thank you in advance. ![]() >> The Quantum looks fine. If something is dying, it is the Seagate. >> I think you should do the following: >> 1. Remove and reseat the power connector on the seagate, and >> if you use a splitter-cable, connect it directly to the PSU. >> 2. Run a long SMART selftest on the disk to its completion and >> then post the SMART attributes again. >> If they are unchanged, then I would say that your HDA is likely >> dying. > Odd that dmesg and /var/log/messages don't mention hda. Same for SMART. Indeed for the message log. However it is possible that hda does things to the bus, like keeping it occupied too long that causes the error messages for hdb. The error messages in the log are basically a timout only, and do not indicate that there is necessarily anything wrong with the disk it happened on. As for SMART, it is possible that hdb is actually breaking down, but there seems to be something wrong with hda, so my first take would be that hda causes interference to hdb in some way. They are on the same cable, after all. I especially do not like the values opf attributes 1, 7, and 195 on hda as mentioned before. They indicate there is some problem with the seeking mechanism. This can be due to power issues. It also can be problems with the drive hardware itself. Admittedly my comments are highly speculative. I have had similar errors in my logs in the past. One turned out to be a shot drive controller (possibly inadequate cooling for this particular chip). An other one was a software issue and went away after a kernel update. Arno |
|
|
#15 |
|
|
Ant <ANTant@zimage.com> wrote:
>>> For about two weeks ago, I had two of these incidents (13 days apart between them): >>> hdb: dma_timer_expiry: dma status == 0x61 >>> hdb: DMA timeout error >>> hdb: dma timeout error: status=0x58 { DriveReady SeekComplete >>> DataRequest } ide: failed opcode was: unknown >>> hda: DMA disabled >>> hdb: DMA disabled >>> ide0: reset: success >>> hdb: dma_timer_expiry: dma status == 0x41 >>> hdb: DMA timeout error >>> hdb: dma timeout error: status=0x58 { DriveReady SeekComplete >>> DataRequest } ide: failed opcode was: unknown >>> hdb: DMA disabled >>> ide0: reset: success >>> hdb: dma_timer_expiry: dma status == 0x41 >>> hdb: DMA timeout error >>> hdb: dma timeout error: status=0x58 { DriveReady SeekComplete >>> DataRequest } ide: failed opcode was: unknown >>> hdb: DMA disabled >>> ide0: reset: success >>> hdb: dma_timer_expiry: dma status == 0x41 >>> hdb: DMA timeout error >>> hdb: dma timeout error: status=0x58 { DriveReady SeekComplete >>> DataRequest } ide: failed opcode was: unknown >>> hdb: DMA disabled >>> ide0: reset: success >> Thats normally just a bad cable. And since the problem is seen >> with more than one hard drive, its almost certainly just a bad cable. >> Can be a bad hard drive controller on the motherboard etc but thats much less likely. >>> From this, my old Linux/Debian system became slow and >>> unresponsive due to high CPU usage (e.g., 7.xx in top). >> Because its turned the DMA off, as it says. > I wonder if I can re-enable DMA without rebooting. Yes, but its much better to fix whatever is causing the problem instead. > I think hdparm controls that? Dunno, I dont bother with Linux at that level myself. >>> I had to shutdown (shutdown -r now) Linux/Debian, >>> reboot, and things are back to normal speed. >> Because its turned the DMA on again. > So why did my DMA go off? As a precaution? Yes, when it decided that there is a problem with the DMA, because of the timeouts it can see. Win does the same thing. >>> I doubt it is temperature related because the room is in the 60s and 70s >>> degrees(F) and computer wasn't working intensely (e.g., surfing the Web). >> Yeah, most likely just a bad cable. >>> Also, I recalled before these problems started, my motherboard (CMOS >>> and BIOS) didn't see both of my primary master drives (both HDDs: hda >>> and hdb), but can see my secondary master (DVD-ROM drive = hdc). >> More evidence of a bad cable to the hard drives. >>> I had to open the case, but didn't see anything wrong. >>> I wiggled the cable ends for the HDDs. >> And that likely got it going again. Those cable piercing connectors >> can bend one of the things that bite the cable when the cable is made >> and can get loose if you reef the cable off the drive or motherboard >> end by pulling on the ribbon etc. > Hmm, I have those old fashion flat cables. I guess I will go replace it. Yep, best thing to try first, they are so cheap. > I ***ume replacing the whole ribbon cable is enough? Yes. > I didn't see how many there were in my mini-tower case (hard to see and it's crowded). It will have two. > I ***ume two are in total for primary and secondary drives. Yep. >> If its a round cable, its ****ed by design. > Yeah, I don't have those. My other PC has a SATA cable that are round. > ![]() Thats why this PC is playing up, its jealous. >>> I booted my machine up and it seemed fine for a few days/a >>> week and then these errors came up (not disconnections). >>> I ran smartctl utility on both of my HDDs for information and >>> results: http://pastebin.ca/930776 ... >>> My full system specifications can be found here: >>> http://alpha.zimage.com/~ant/antfarm.../computers.txt >>> (secondary/backup machine). Does that mean my decade old Quantum >>> 6.4 GB HDD (already made a backup just in case) is finally dying? >> Nope, just the cable. > Hmm, OK! I will go try the cable first then! Yep, thats the best thing to do first. >>> Or is it something else? >> Yep, the cable. |
|
|
#16 |
|
|
On Thu, 06 Mar 2008 14:47:46 -0600, Ant wrote:
> Hello. > > For about two weeks ago, I had two of these incidents (13 days apart > between them): > > hdb: dma_timer_expiry: dma status == 0x61 hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete > DataRequest } ide: failed opcode was: unknown > hda: DMA disabled > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete > DataRequest } ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete > DataRequest } ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success > hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error > hdb: dma timeout error: status=0x58 { DriveReady SeekComplete > DataRequest } ide: failed opcode was: unknown > hdb: DMA disabled > ide0: reset: success > > From this, my old Linux/Debian system became slow and unresponsive due > to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r > now) Linux/Debian, reboot, and things are back to normal speed. I doubt > it is temperature related because the room is in the 60s and 70s > degrees(F) and computer wasn't working intensely (e.g., surfing the > Web). > > Also, I recalled before these problems started, my motherboard (CMOS and > BIOS) didn't see both of my primary master drives (both HDDs: hda and > hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to > open the case, but didn't see anything wrong. I wiggled the cable ends > for the HDDs. I booted my machine up and it seemed fine for a few days/a > week and then these errors came up (not disconnections). > > I ran smartctl utility on both of my HDDs for information and results: > http://pastebin.ca/930776 ... > > My full system specifications can be found here: > http://alpha.zimage.com/~ant/antfarm.../computers.txt > (secondary/backup machine). Does that mean my decade old Quantum 6.4 GB > HDD (already made a backup just in case) is finally dying? Or is it > something else? > > Thank you in advance. ![]() Suggest you boot a Live CD and run badblocks on the hard drive. |
|
|
#17 |
|
|
Ant wrote:
.... >> I would make sure to run a backup ASAP, like right now. > > Already did onto hda since that's the only HDD I have to back up to). > Now, how come before these errors came up my motherboard couldn't see > BOTH HDDs? Is that related or just a coincident? > Well, you have a Quantum and a Seagate, both on the same cable. Some models of these competing manufacturers (oh that was a while ago) could not play nice with each other. That problem needed not to show up at the first time though .... And one stalling/blocking harddrive (electronics) can block the whole bus. -- vista policy violation: Microsoft optical mouse found penguin patterns on mousepad. Partition scan in progress to remove offending incompatible products. Reactivate MS software. Linux 2.6.24. [LinuxCounter#295241,ICQ#4918962] |