Virtualization with a Fedora 13 Host ( Part 2 )

after the first Odyssey I (think ) i learned that btrfs is really experimental ! So I try the robust Filesystem 🙂 This end in a Bugzilla:

Bug 624293 – XFS internal error / mount: Structure needs cleaning

Description of problem:

This is a virtual KVM System on a Fedora 13 Host. The disk is LVM based Disk
with xfs direct attached to the Guest

[root@scheat /]# mount /dev/vdb
mount: Structure needs cleaning
[root@scheat /]#
0 bash  1 bash                            

Aug 15 20:30:05 scheat kernel: Filesystem "vdb": Disabling barriers, trial
barrier write failed
Aug 15 20:30:05 scheat kernel: XFS mounting filesystem vdb
Aug 15 20:30:05 scheat kernel: Starting XFS recovery on filesystem: vdb
(logdev: internal)
Aug 15 20:30:05 scheat kernel: ffff88007c7f1400: 00 80 00 01 00 00 00 00 d2 ff
12 0f 01 3c 00 00  .............<..
Aug 15 20:30:05 scheat kernel: Filesystem "vdb": XFS internal error
xfs_read_agi at line 1499 of file fs/xfs/xfs_ialloc.c.  Caller
0xffffffffa009e190
Aug 15 20:30:05 scheat kernel:
Aug 15 20:30:05 scheat kernel: Pid: 20624, comm: mount Not tainted
2.6.33.6-147.2.4.fc13.x86_64 #1
Aug 15 20:30:05 scheat kernel: Call Trace:
Aug 15 20:30:05 scheat kernel: [<ffffffffa009c3aa>] xfs_error_report+0x3c/0x3e
[xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009e190>] ?
xfs_ialloc_read_agi+0x1b/0x62 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009c3fa>]
xfs_corruption_error+0x4e/0x59 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009e162>] xfs_read_agi+0xc9/0xdc
[xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009e190>] ?
xfs_ialloc_read_agi+0x1b/0x62 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009e190>]
xfs_ialloc_read_agi+0x1b/0x62 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009e1f4>]
xfs_ialloc_pagi_init+0x1d/0x3f [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00b100c>]
xfs_initialize_perag_data+0x61/0xea [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00b1b69>] xfs_mountfs+0x32a/0x60d
[xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa009c8df>] ?
xfs_fstrm_free_func+0x0/0x99 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00ba6e0>] ? kmem_zalloc+0x11/0x2a
[xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00b245b>] ?
xfs_mru_cache_create+0x117/0x146 [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00c4a06>]
xfs_fs_fill_super+0x1f4/0x36e [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffff811039bf>] get_sb_bdev+0x134/0x197
Aug 15 20:30:05 scheat kernel: [<ffffffffa00c4812>] ?
xfs_fs_fill_super+0x0/0x36e [xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffffa00c2c60>] xfs_fs_get_sb+0x13/0x15
[xfs]
Aug 15 20:30:05 scheat kernel: [<ffffffff81103193>] vfs_kern_mount+0xa4/0x163
Aug 15 20:30:05 scheat kernel: [<ffffffff811032b0>] do_kern_mount+0x48/0xe8
Aug 15 20:30:05 scheat kernel: [<ffffffff8111801e>] do_mount+0x752/0x7c8
Aug 15 20:30:05 scheat kernel: [<ffffffff810d3424>] ? copy_from_user+0x3c/0x44
Aug 15 20:30:05 scheat kernel: [<ffffffff810d37b2>] ? strndup_user+0x58/0x82
Aug 15 20:30:05 scheat kernel: [<ffffffff81118117>] sys_mount+0x83/0xbd
Aug 15 20:30:05 scheat kernel: [<ffffffff81009b02>]
system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):

see sosreport

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

How should i proced with this error?

thanks Michael

So I open a Call at LSI to get help from there because everyone say that the controller should cover such errors.They found out that my Disk are not compatible with my Controller with 3 GBs so I have to limit it to 1.5 GBs.

But also this does not help, so I decide to half my capacity to 2 x 2TB brutto, also this had no succes. At the End I get a new Controller from LSI ( really thanks a lot to LSI for that good Support ! ). They have excellent Scripts for collecting logs and very good Support People.

Also with the new controller I get the same errors on XFS --> so I open another Bugzilla:

Bug 626684 - Filesystem corruption in both xfs & ext4 with KVM guest

Created attachment 440563 [details]
Guest Sosreport Scheat

Description of problem:

I have a Fedora 13 Kvm Host (Enif, 2 Core/6GB Mem)  with a Fedora KVM Guest (
Scheat, 1 Core / 2GB Memory) . the Guest has a 4 TB LVM Lun ( from 3Ware
3690SA-8i Controller RAID 10 ) attached with xfs.

I try to copy a complete dir with 

 cp -a data data-test

then after minutes the xfs shutdown

Aug 23 23:47:08 scheat kernel: Pid: 13255, comm: cp Not tainted
2.6.33.6-147.2.4.fc13.x86_64 #1
Aug 23 23:47:08 scheat kernel: Call Trace:
Aug 23 23:47:08 scheat kernel: [<ffffffffa009c3aa>] xfs_error_report+0x3c/0x3e
[xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffffa00b80d9>] ? xfs_create+0x4b8/0x547
[xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffffa00b39d8>] xfs_trans_cancel+0x5f/0xea
[xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffffa00b80d9>] xfs_create+0x4b8/0x547
[xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffffa00c120b>] xfs_vn_mknod+0xd0/0x16d
[xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffffa00c12c3>] xfs_vn_create+0xb/0xd [xfs]
Aug 23 23:47:08 scheat kernel: [<ffffffff81109e66>] vfs_create+0x73/0x95
Aug 23 23:47:08 scheat kernel: [<ffffffff8110c445>] do_filp_open+0x36c/0xad5
Aug 23 23:47:08 scheat kernel: [<ffffffff8120396d>] ? might_fault+0x1c/0x1e
Aug 23 23:47:08 scheat kernel: [<ffffffff81114fdd>] ? alloc_fd+0x76/0x11f
Aug 23 23:47:08 scheat kernel: [<ffffffff810ff79a>] do_sys_open+0x5e/0x10a
Aug 23 23:47:08 scheat kernel: [<ffffffff810ff86f>] sys_open+0x1b/0x1d
Aug 23 23:47:08 scheat kernel: [<ffffffff81009b02>]
system_call_fastpath+0x16/0x1b
Aug 23 23:47:08 scheat kernel: xfs_force_shutdown(vdb,0x8) called from line
1163 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa00b39f1
Aug 23 23:47:08 scheat kernel: Filesystem "vdb": Corruption of in-memory data
detected.  Shutting down filesystem: vdb
Aug 23 23:47:08 scheat kernel: Please umount the filesystem, and rectify the
problem(s)
Aug 23 23:47:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.
Aug 23 23:47:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.
Aug 23 23:48:05 scheat abrtd: Can't load
'/usr/lib64/abrt/libKerneloopsScanner.so':
/usr/lib64/abrt/libKerneloopsScanner.so: cannot open shared object file: No
such file or directory
Aug 23 23:48:05 scheat abrtd: Plugin 'KerneloopsScanner' is not registered
Aug 23 23:48:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.
Aug 23 23:48:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.
Aug 23 23:49:11 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.
Aug 23 23:49:41 scheat kernel: Filesystem "vdb": xfs_log_force: error 5
returned.


Version-Release number of selected component (if applicable):


How reproducible:

 cp -a data data-test

Steps to Reproduce:
1. 
2.
3.

Actual results:


Expected results:


Additional info:

After the XFS don't had any success I try it with ext4 ( this must be a rock solid Filesystem ), maybe not !

Hi Eric

no Idea why I have some special files in my data / home ?

I try to rerun this test. xfs_repair found errors but I have to put away the
xfs logs.

In the meantime I try another test. I build with my 4 2 TB Disks two Raid1 with
2TB brutto and format it with ext4 ( assumption that the xfs filesystem is bad
or the Disksize is to much) 

The disks are presentend to the Host System Enif ( Fedora13 ) and one is
exported to the Guest Scheat( Fedora13 ) with LVM on it and then ext4 as a
disk.

[root@enif ~]# pvs
  PV         VG       Fmt  Attr PSize   PFree 
  /dev/sda3  vg_local lvm2 a-   288.01g 68.01g
  /dev/sdb   vg_data1 lvm2 a-     1.82t     0 
  /dev/sdc   vg_data2 lvm2 a-     1.82t     0 
[root@enif ~]# vgs
  VG       #PV #LV #SN Attr   VSize   VFree 
  vg_data1   1   1   0 wz--n-   1.82t     0 
  vg_data2   1   1   0 wz--n-   1.82t     0 
  vg_local   1   4   0 wz--n- 288.01g 68.01g
[root@enif ~]# lvs
  LV               VG       Attr   LSize   Origin Snap%  Move Log Copy% 
Convert
  lv_data1         vg_data1 -wi-ao   1.82t                                      
  lv_data2         vg_data2 -wi-ao   1.82t                                      
  lv_enif_crash    vg_local -wi-ao  10.00g                                      
  lv_enif_root     vg_local -wi-ao  30.00g                                      
  lv_old_enif_root vg_local -wi-a-  30.00g                                      
  lv_virt          vg_local -wi-ao 150.00g                                      
[root@enif ~]# 

config snip from kvm xml

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/mapper/vg_data1-lv_data1'/>
      <target dev='vdb' bus='virtio'/>
    </disk>

and now I copy the data to a 2TB disk inside the guest and the other 2TB on the
host as follow:

- mount over nfs the data to Host and to guest
- rsync the data from nfs mounted share to the attached disk

result:

- on the Host enif no problem all data are there wihout error!
- on the KVM Guest Scheat lot of problems !!
- fsck run very long 


 EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#67502197: rec_len is too small for name_len - offset=0, inode=8388608,
rec_len=16, name_len=128
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#69206100: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=255
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#69206099: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=255
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#70516741: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=255
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#70516739: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=255
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#69206101: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=255
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73400854: rec_len is too small for name_len - offset=0, inode=12582912,
rec_len=16, name_len=192
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531481: directory entry across blocks - offset=0, inode=262672436,
rec_len=248684, name_len=169
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531484: directory entry across blocks - offset=0, inode=3230352654,
rec_len=119404, name_len=187
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531941: directory entry across blocks - offset=0, inode=1284650619,
rec_len=56464, name_len=143
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531488: directory entry across blocks - offset=0, inode=2094283311,
rec_len=176972, name_len=73
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531945: directory entry across blocks - offset=0, inode=4200826031,
rec_len=195440, name_len=11
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531949: directory entry across blocks - offset=0, inode=2307910799,
rec_len=40712, name_len=108
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73533776: rec_len is too small for name_len - offset=0, inode=12582912,
rec_len=16, name_len=192
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531955: directory entry across blocks - offset=0, inode=2974024306,
rec_len=45480, name_len=211
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531954: directory entry across blocks - offset=0, inode=2359655960,
rec_len=64764, name_len=19
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531953: directory entry across blocks - offset=0, inode=2773650414,
rec_len=40312, name_len=7
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73531957: directory entry across blocks - offset=0, inode=2563861061,
rec_len=125588, name_len=200
EXT4-fs error (device vdb): htree_dirblock_to_tree: bad entry in directory
#73269736: inode out of bounds - offset=0, inode=4294967295, rec_len=4096,
name_len=2


IMHO it looks like the virtualisation with KVM made some problem

Mike

So I could verify that somehow the Virtualization was the problem !

then Daniel wrote this to my Bugzilla:

Daniel Berrange 2010-08-26 13:30:31 EDT

> and now I copy the data to a 2TB disk inside the guest and the other 2TB on the
> host as follow:

Ah the magic phrase "2TB disk guest disk". Might well be hitting this bug:

  "2tb virtio disk gets massively corrupted filesystems "

  https://bugzilla.redhat.com/show_bug.cgi?id=605757

So after a few month I have now a running Homeserver again and migrated all my stuff from the old HP ML370 ( which used actually lot of power )