DaemonForums  

Go Back   DaemonForums > FreeBSD > FreeBSD General

FreeBSD General Other questions regarding FreeBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default Constant Kernel Panics, Corupted file system reboots over and over

I am having some serious issues with what appears to be drive failures. However, this is a new drive. When I reisntall the system, it works perfectly fine for a few days, and then crashes with the error below. I have run smartctl -t long and no errors come up. After this crash, it rebooted twice and then came back up. I umounted the partition, did a fsck on it and it seemed to be working fine, for a few minutes. Then it crashed again and when the computer rebooted it couldn't find a valid boot partition. Luckly rebooting again into single user mode then running a fsck all all drives will bring it back up, temporarily.

Does anyone know what is going on here? What I am reading is that this an issue that has been known since 2004 but no one has bothered to fix. Something about softupdate being broken. System specs are FreeBSD 7.0 running on a Intell 1.66 GHz dual core with 2Gigs of DDR2, Intel (not sure model) motherboard with a Jmicron JMB363 controller. The log pasted below is the lastest crash. It also occurs when I put in ANY disk that isn't a member of an array. I have pasted in the logs from those crashes as well. ad8, ad10 and ad12 are all brand new and have no issues in any other computer.

I am beginning to panic ATM there seems to be a path to bring the computer back up, but as time goes on, sooner or later it just isn't going to work.

Any help you guys could give me would really be appreciated.

Thanks

~m

Most current crash:

Jul 11 22:46:45 octavian kernel: ad8: FAILURE - device detached
Jul 11 22:46:45 octavian kernel: subdisk8: detached
Jul 11 22:46:45 octavian kernel: ad8: detached
Jul 11 22:46:45 octavian kernel: g_vfs_done():ad8s1d[WRITE(offset=215219027968, length=16384)]error = 6
Jul 11 22:46:45 octavian kernel: g_vfs_done():ad8s1d[READ(offset=222849878016, length=2048)]error = 6



Crash 1:

Jul 7 08:39:52 octavian fsck: /dev/ar0s1d: 3694 files, 352639422 used, 89402533 free (2053 frags, 11175060 blocks, 0.0% fragmentation)
Jul 7 08:55:39 octavian kernel: pid 20407 (conftest), uid 0: exited on signal 12 (core dumped)
Jul 7 08:55:55 octavian fsck: /dev/ar1s1d: UNREF FILE I=11 OWNER=root MODE=100400
Jul 7 08:55:55 octavian fsck: /dev/ar1s1d: SIZE=500105236824 MTIME=Jul 7 01:20 2008 (CLEARED)
Jul 7 08:55:55 octavian fsck: /dev/ar1s1d: Reclaimed: 0 directories, 1 files, 0 fragments
Jul 7 08:55:55 octavian fsck: /dev/ar1s1d: 25129 files, 175430918 used, 61076861 free (3965 frags, 7634112 blocks, 0.0% fragmentation)
Jul 7 09:13:08 octavian fsck: /dev/ad8s1d: 7342 files, 186122814 used, 168636875 free (1963 frags, 21079364 blocks, 0.0% fragmentation)
Jul 7 09:15:31 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54571231
Jul 7 09:15:45 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=255543199
Jul 7 09:15:51 octavian kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=275113503
Jul 7 09:15:51 octavian kernel: ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=275113503
Jul 7 09:15:51 octavian kernel: g_vfs_done():ad10s1d[WRITE(offset=140858081280, length=16384)]error = 5
Jul 7 09:16:03 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=139638655
Jul 7 09:16:13 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=257060479
Jul 7 09:16:23 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=77540607
Jul 7 09:16:29 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=114423103
Jul 7 09:16:35 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=148671135
Jul 7 09:17:58 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=192315999
Jul 7 09:18:05 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=247263391
Jul 7 09:18:29 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=89948319
Jul 7 09:18:39 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=226564095
Jul 7 09:18:45 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=227316799
Jul 7 09:18:50 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=236349247
Jul 7 09:19:00 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=47432415
Jul 7 09:19:05 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=48937823
Jul 7 09:19:12 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=96358175
Jul 7 09:19:17 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=103132511
Jul 7 09:19:23 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=115552127
Jul 7 09:19:36 octavian kernel: ad10: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=307867999
Jul 7 09:19:36 octavian kernel: ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=307867999
Jul 7 09:19:36 octavian kernel: g_vfs_done():ad10s1d[WRITE(offset=157628383232, length=16384)]error = 5
Jul 7 09:19:42 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=39152703
Jul 7 09:19:49 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=87702111
Jul 7 09:19:56 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=150176543
Jul 7 09:20:02 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=172381311
Jul 7 09:20:08 octavian kernel: ad10: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=223565183
Jul 7 09:20:08 octavian fsck: /dev/ad10s1d: 3 files, 2 used, 75664270 free (22 frags, 9458031 blocks, 0.0% fragmentation)
Jul 7 09:23:57 octavian syslogd: kernel boot file is /boot/kernel/kernel
Jul 7 09:23:57 octavian kernel: dev = ad10s1d, block = 1, fs = /archive/rome
Jul 7 09:23:57 octavian kernel: panic: ffs_blkfree: freeing free block
Jul 7 09:23:57 octavian kernel: cpuid = 0
Jul 7 09:23:57 octavian kernel: Uptime: 1h8m37s
Jul 7 09:23:57 octavian kernel: Physical memory: 2025 MB
Jul 7 09:23:57 octavian kernel: Dumping 209 MB: 194 178 162 146 130 114 98 82 66 50 34 18 2
Jul 7 09:23:57 octavian kernel: Dump complete
Jul 7 09:23:57 octavian kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Jul 7 09:23:57 octavian kernel: --> Press a key on the console to reboot,
Jul 7 09:23:57 octavian kernel: --> or switch off the system now.
Jul 7 09:23:57 octavian kernel: Rebooting...

Last edited by pseudonym; 13th July 2008 at 02:02 AM. Reason: Wrong MB in hardware discription.. sorry.. have so many I lose track =P
Reply With Quote
  #2   (View Single Post)  
Old 13th July 2008
graudeejs's Avatar
graudeejs graudeejs is offline
Real Name: Aldis Berjoza
ISO Quartermaster
 
Join Date: Jul 2008
Location: Riga, Latvia
Posts: 589
Default

What is your disk vendor?
Do you use custom kernel? if so post it here...
have you tried to use UFS without softupdates?

also have you used tunefs?
Reply With Quote
  #3   (View Single Post)  
Old 13th July 2008
ninjatux's Avatar
ninjatux ninjatux is offline
Real Name: Baqir Majlisi
Spam Deminer
 
Join Date: May 2008
Location: Antarctica
Posts: 293
Default

I experienced identical symptoms with another hard drive almost two or three weeks ago. You have a failing hard drive. There are certain sectors on your hard drive that are damaged. Writing to those sectors causes the write delays. Since they aren't at the beginning of the disk, you can use your hard drive for some time before you start experiencing those symptoms. With my hard drive, my system ran fine for a couple of days. Then, the kernel panicked infrequently. The file system remained corrupted and irreparable. I gave up and tried to salvage my data. I suggest you do the same because the more you keep spinning that drive the more difficult it'll become to access data.
__________________
"UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity."
MacBook Pro (Darwin 9), iMac (Darwin 9), iPod Touch (Darwin 9), Dell Optiplex GX620 (FreeBSD 7.1-STABLE)
Reply With Quote
  #4   (View Single Post)  
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

Either the drive or the cable. The latter is easy enough to swap out. For the former, I'd suggest getting all your data off, and writing some character (like a zero) to the entire disk using dd. That will force bad sector relocation. Then reinstall.

I had to do that a few years ago when I had a bad cable for a while. After getting the bad sectors mapped out, it has worked fine.

Now the disk might be toast, and you have to replace it. BTW, what does the bad sector count show in smartctl? If it is growing, you have troubles.

Correction: not the bad section info, but the error rate.

Last edited by DrJ; 13th July 2008 at 04:03 PM.
Reply With Quote
  #5   (View Single Post)  
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

The drive is fine. It has been low level formatted and I have run the Maxtor diagnostics on it with no errors, no bad sectors.. nothing. I have the same issues with two brand new Seagate 160s I threw in to back up my data with.

Smartctl -t long shows no errors at all. No bad sectors.

I am using the gen kernel. Not sure how to use UFS without softupdates. Nor have I run tunefs. Not sure how really, or what param I want to tune. Never had to do so before.

Here is the full disk info

ar0: 3 250GB SATAII Seagate RAID5 JM363
ar1: 3 320GB SATAII Seagate RAID0 JM363
ad8: Maxtor 750GB IDE
ar10 and ar12: 160GB SATAII Seagate (unplugged and disabled ATM)

At first I thought it was the drive as well, but I have the issue with ar10 and ar12 as well. I have also tried connecting up ad8 using a IDE to SATA bridge and I get the same errors. At first I thought it was the bridge device that was fried. But I get the same thing when I connect up using a brand new IDE cable.


*** Just tried turning off softupdates.
[root@octavian /home/pseudonym]# tunefs -n 'disable' /dev/ad8s1d
tunefs: soft updates cleared
tunefs: /dev/ad8s1d: failed to write superblock

Last edited by pseudonym; 13th July 2008 at 08:47 PM.
Reply With Quote
  #6   (View Single Post)  
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

Quote:
Originally Posted by pseudonym View Post
Smartctl -t long shows no errors at all. No bad sectors.
That test is rather cursory. You should look at smartctl -a for the device and look at the uncorrectable error rate, and the non-media errors.

Still, I doubt that that many drives are hosed. Something else must be up.
Reply With Quote
  #7   (View Single Post)  
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

Here is what I get when I do a smartctl -a

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   069   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   096   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       99
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       8
  7 Seek_Error_Rate         0x000f   045   045   030    Pre-fail  Always       -       9629385353742
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1492
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       153
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   054   040   045    Old_age   Always   In_the_past 774111278
194 Temperature_Celsius     0x0022   046   060   000    Old_age   Always       -       46 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   071   055   000    Old_age   Always       -       51827
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged
some of those numbers look just.. um.. wrong. Gonna run the test again.



=(.
Reply With Quote
  #8   (View Single Post)  
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

The format (and some of the numbers) depend strongly on the disk you use. Here's what mine displays (with lots of edits):
Code:
Device: SEAGATE  ST336753LW
...
Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   25504257        0         0  25504257   28315309       9329.822       20979
write:         0        0         0         0          0       3033.998           0

Non-medium error count:    15235
The error rate seems high, but it has not changed much since I forced the bad-sector relocation (which differs from a low-level format). That is what you keep your eye on.
Reply With Quote
  #9   (View Single Post)  
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

This is very odd.. I am not seeing any of that, even when I try it on one of my seagate drives....

Here is the entire output of smartctl -t long /dev/ad8

Code:
[root@octavian /home/pseudonym]# smartctl -a /dev/ad8
smartctl version 5.37 [i386-portbld-freebsd7.0] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3750640A
Serial Number:    5QD3L4N0
Firmware Version: 3.AAE
User Capacity:    750,156,374,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Jul 13 03:08:39 2008 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 202) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   069   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   096   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       99
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       8
  7 Seek_Error_Rate         0x000f   045   045   030    Pre-fail  Always       -       9629385411627
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1492
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       153
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   052   040   045    Old_age   Always   In_the_past 807665712
194 Temperature_Celsius     0x0022   048   060   000    Old_age   Always       -       48 (Lifetime Min/Max 0/23)
195 Hardware_ECC_Recovered  0x001a   060   055   000    Old_age   Always       -       70299244
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%      1428         -
# 2  Extended offline    Self-test routine in progress 90%      1492         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by pseudonym; 13th July 2008 at 09:17 PM. Reason: drive WAS a seagate, not maxtor
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

Figured I would also throw in the dmesg as well.

Code:
[root@octavian /home/pseudonym]# dmesg
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008
    root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz (1872.27-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f2  Stepping = 2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
real memory  = 2137153536 (2038 MB)
avail memory = 2081665024 (1985 MB)
ACPI APIC Table: <INTEL  DQ965GF >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27)
acpi0: <INTEL DQ965GF> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x3438-0x343f mem 0x90200000-0x902fffff,0x80000000-0x8fffffff irq 16 at device 2.0 on pci0
agp0: <Intel Q965 SVGA controller> on vgapci0
agp0: detected 7676k stolen memory
agp0: aperture size is 256M
pci0: <simple comms> at device 3.0 (no driver attached)
atapci0: <Intel ATA controller> port 0x3430-0x3437,0x344c-0x344f,0x3428-0x342f,0x3448-0x344b,0x3400-0x340f irq 18 at device 3.2 on pci0
atapci0: [ITHREAD]
ata2: <ATA channel 0> on atapci0
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci0
ata3: [ITHREAD]
pci0: <simple comms, UART> at device 3.3 (no driver attached)
em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.3> port 0x30e0-0x30ff mem 0x90300000-0x9031ffff,0x90324000-0x90324fff irq 20 at device 25.0 on pci0
em0: Using MSI interrupt
em0: Ethernet address: 00:19:d1:04:50:52
em0: [FILTER]
uhci0: <UHCI (generic) USB controller> port 0x30c0-0x30df irq 16 at device 26.0 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x30a0-0x30bf irq 21 at device 26.1 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0x90326c00-0x90326fff irq 18 at device 26.7 on pci0
ehci0: [GIANT-LOCKED]
ehci0: [ITHREAD]
usb2: EHCI version 1.0
usb2: companion controllers, 2 ports each: usb0 usb1
usb2: <EHCI (generic) USB 2.0 controller> on ehci0
usb2: USB revision 2.0
uhub2: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb2
uhub2: 4 ports with 4 removable, self powered
pci0: <multimedia> at device 27.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 28.1 on pci0
pci2: <ACPI PCI bus> on pcib2
atapci1: <Marvell 88SX6101 UDMA133 controller> port 0x2018-0x201f,0x2024-0x2027,0x2010-0x2017,0x2020-0x2023,0x2000-0x200f mem 0x90100000-0x901001ff irq 17 at device 0.0 on pci2
atapci1: [ITHREAD]
ata4: <ATA channel 0> on atapci1
ata4: [ITHREAD]
pcib3: <ACPI PCI-PCI bridge> at device 28.2 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 28.3 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 28.4 on pci0
pci5: <ACPI PCI bus> on pcib5
uhci2: <UHCI (generic) USB controller> port 0x3080-0x309f irq 23 at device 29.0 on pci0
uhci2: [GIANT-LOCKED]
uhci2: [ITHREAD]
usb3: <UHCI (generic) USB controller> on uhci2
usb3: USB revision 1.0
uhub3: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb3
uhub3: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x3060-0x307f irq 19 at device 29.1 on pci0
uhci3: [GIANT-LOCKED]
uhci3: [ITHREAD]
usb4: <UHCI (generic) USB controller> on uhci3
usb4: USB revision 1.0
uhub4: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb4
uhub4: 2 ports with 2 removable, self powered
uhci4: <UHCI (generic) USB controller> port 0x3040-0x305f irq 18 at device 29.2 on pci0
uhci4: [GIANT-LOCKED]
uhci4: [ITHREAD]
usb5: <UHCI (generic) USB controller> on uhci4
usb5: USB revision 1.0
uhub5: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb5
uhub5: 2 ports with 2 removable, self powered
ehci1: <EHCI (generic) USB 2.0 controller> mem 0x90326800-0x90326bff irq 23 at device 29.7 on pci0
ehci1: [GIANT-LOCKED]
ehci1: [ITHREAD]
usb6: EHCI version 1.0
usb6: companion controllers, 2 ports each: usb3 usb4 usb5
usb6: <EHCI (generic) USB 2.0 controller> on ehci1
usb6: USB revision 2.0
uhub6: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb6
uhub6: 6 ports with 6 removable, self powered
pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci6: <ACPI PCI bus> on pcib6
atapci2: <SiI SiI 3512 SATA150 controller> port 0x1018-0x101f,0x1024-0x1027,0x1010-0x1017,0x1020-0x1023,0x1000-0x100f mem 0x90004800-0x900049ff irq 21 at device 0.0 on pci6
atapci2: [ITHREAD]
ata5: <ATA channel 0> on atapci2
ata5: [ITHREAD]
ata6: <ATA channel 1> on atapci2
ata6: [ITHREAD]
fwohci0: <Texas Instruments TSB43AB22/A> mem 0x90004000-0x900047ff,0x90000000-0x90003fff irq 19 at device 3.0 on pci6
fwohci0: [FILTER]
fwohci0: OHCI version 1.10 (ROM=0)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:02:b3:00:37:13:44:37
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
dcons_crom0: <dcons configuration ROM> on firewire0
dcons_crom0: bus_addr 0x12dc000
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: 02:02:b3:13:44:37
fwe0: Ethernet address: 02:02:b3:13:44:37
fwip0: <IP over FireWire> on firewire0
fwip0: Firewire address: 00:02:b3:00:37:13:44:37 @ 0xfffe00000000, S400, maxrec 2048
sbp0: <SBP-2/SCSI over FireWire> on firewire0
fwohci0: Initiate bus reset
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci3: <Intel ICH8 SATA300 controller> port 0x3418-0x341f,0x3444-0x3447,0x3410-0x3417,0x3440-0x3443,0x3020-0x303f mem 0x90326000-0x903267ff irq 19 at device 31.2 on pci0
atapci3: [ITHREAD]
atapci3: AHCI called from vendor specific driver
atapci3: AHCI Version 01.10 controller with 6 ports detected
ata7: <ATA channel 0> on atapci3
ata7: [ITHREAD]
ata8: <ATA channel 1> on atapci3
ata8: [ITHREAD]
ata9: <ATA channel 2> on atapci3
ata9: [ITHREAD]
ata10: <ATA channel 3> on atapci3
ata10: [ITHREAD]
ata11: <ATA channel 4> on atapci3
ata11: [ITHREAD]
ata12: <ATA channel 5> on atapci3
ata12: [ITHREAD]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
pmtimer0 on isa0
ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0
ata0: [ITHREAD]
ata1 at port 0x170-0x177,0x376 irq 15 on isa0
ata1: [ITHREAD]
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 2> on uhub1
kbd2 at ukbd0
uhid0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 2> on uhub1
Timecounters tick every 1.000 msec
hptrr: no controller detected.firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)

ad8: 715404MB <Seagate ST3750640A 3.AAE> at ata4-master UDMA100
ad14: 305245MB <Seagate ST3320620AS 3.AAE> at ata7-master SATA150
ad16: 238475MB <Seagate ST3250410AS 3.AAC> at ata8-master SATA150
ad18: 238475MB <Seagate ST3250410AS 3.AAC> at ata9-master SATA150
ad20: 305245MB <Seagate ST3320620AS 3.AAC> at ata10-master SATA150
ad22: 238475MB <Seagate ST3250410AS 3.AAC> at ata11-master SATA150
ad24: 305245MB <Seagate ST3320620AS 3.AAE> at ata12-master SATA150
ar0: 915729MB <Intel MatrixRAID RAID0 (stripe 128 KB)> status: READY
ar0: disk0 READY using ad14 at ata7-master
ar0: disk1 READY using ad20 at ata10-master
ar0: disk2 READY using ad24 at ata12-master
ar1: 476944MB <Intel MatrixRAID RAID5 (stripe 128 KB)> status: READY
ar1: disk0 READY using ad16 at ata8-master
ar1: disk1 READY using ad18 at ata9-master
ar1: disk2 READY using ad22 at ata11-master
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ar0s1a
WARNING: /archive/babylon was not properly dismounted
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

opps! Sorry.. thought the 750 was a maxtor. Guess it is also a seagate =)
Reply With Quote
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

Smartctl output has always been more useful for SCSI drives. What you have is typical of IDEs (and probably SATAs); if you don't get the error rates most of the information is not that helpful. Temperatures are, I suppose, but I doubt that this is temperature related.

But do try the -a flag. It may not give you any different information, but it might.
Reply With Quote
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

Is this IDE or SATA? What controller are you using (you have a few)?
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

That makes sense. I am not using my SCSI stuff right now (have a 9 bay external SCSI tower that I am going to be converting to a SATA RAID box.. damn thing sounds like a jet!). I have a whole mess of 36gig ULTRA320 drives. Having quite a few issues with the Adaptec 2100s controllers I have so they aren't in use. Besides, don't need the speed as this is mostly my torrent/fileserver (also running SWARM so that is why it is a beefy system).

Any idea why I am getting a "unable to write to superblock" when I try to do a tunfs -n disable ? It doesn't say to run fsck /dev/ad8sd1 like most of the errors I see when I look online.

The output above is with the -a flag =(.
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

the 750 is IDE, the rest are SATA. They are all hooked up to a JMICRON JMB363 onboard RAID controller. It has 6 SATAII ports and of course, two IDE connectors. I have another SATA controller in there, but it isn't in use ATM. That is what the two new 160s are hooked up to. But they are unplugged.
Reply With Quote
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

I *think* the JMicron device was fixed, but it was broken for a long time. I'd check the lists to see if that is so. I don't keep up much with IDE or SATA devices, since all my FreeBSD disks are SCSI. So I can't help you much with that.

Do you have another IDE controller laying around to try?

While it shouldn't matter, you might want to disconnect all your disks other than the one you are having trouble with. Just removing the power connection would be good enough.
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

Thanks for all the help BTW. I really apprecate it. I really wish I could disconnect all the drives, but root is on the ar0. Do you think that could be it? I actually picked FreeBSD because it supported the JMIRCON device (I am a OpenBSD guy myself, Theo is pretty much my hero ever since I went down to Calgary to hear him speak =)).

If I could, right now I would back up all my Data and try again, however, when you have almost 2TB of data, it gets a little hard to find a place to put it! LOL. =).

Do you think that having root on ar0 might be the issue? I was thinking of doing a reinstall and putting the system on one of the 160s....
Reply With Quote
Old 13th July 2008
DrJ DrJ is offline
ISO Quartermaster
 
Join Date: Apr 2008
Location: Gold Country, CA
Posts: 507
Default

Quote:
Originally Posted by pseudonym View Post
I really wish I could disconnect all the drives, but root is on the ar0. Do you think that could be it?
I've no idea.
Quote:
I actually picked FreeBSD because it supported the JMIRCON device.
I *think* it does, but if I were you I would not rely on that.
Quote:
Do you think that having root on ar0 might be the issue? I was thinking of doing a reinstall and putting the system on one of the 160s....
I really can't say. Does the RocketRaid controller (which I assume is what is running your array) use the "standard" driver? If so, it should be OK. But I really am not a RAID expert at all, so again you will have to rely on someone more familiar with your hardware to lend a hand.

But this sure sounds like a driver bug, or a subtle interaction of things on your system. For me, the OS either worked or it didn't. Except when I had bad cables (one was on an IDE CD on the JMicron, which I did get to work).
Reply With Quote
Old 13th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

The Rocket RAID is definatly not supported. It is the secondary SATA controller I threw in for the two 160s and I am just using it for the sata ports. Nothing is connected to it atm.

I was thinking that it might be becuase I have that controller is on the PCI bus, and isn't a PCI-e controller and is flooding out the PCI bus. In fact.. that is my big concern.. am I flooding out the IDE bus and getting read errors. If so.. any idea how to slow things down?

it does sound like a driver issue to me as well. Guess I might just have to wait for an updated version...
Reply With Quote
Old 14th July 2008
pseudonym pseudonym is offline
Port Guard
 
Join Date: Jul 2008
Posts: 13
Default

Okay. I think I have found a bad block! W00t! Can't belive I am saying that LOL. Still having issues with ad10 and ad12, but hopefully can fix this. Here is what I am getting:

1 Extended offline Completed: read failure 30% 1495 980569737

980569737 is the LBA_of_first_error. So now I have to figure out how to mark that block bad. Any ideas?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
kernel file from 6.3 to 7.2 l2fl2f FreeBSD Installation and Upgrading 3 14th September 2009 06:53 PM
File system not properly dismounted rex FreeBSD General 6 12th September 2008 02:45 PM
File system at more than 100% michaelrmgreen FreeBSD General 4 28th July 2008 01:52 PM
Kernel configuration file ignored? FWS FreeBSD Installation and Upgrading 16 26th June 2008 10:28 AM
Which file system use to share data on Bsd system? aleunix Other BSD and UNIX/UNIX-like 2 1st June 2008 04:14 PM


All times are GMT. The time now is 08:03 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick