GlusterFS: "gfid different on data file"
Posted: Tue Jan 27, 2015 1:55 pm
[PROBLEM]
I'm having problems with a glusterfs share crashing when I try to copy files to certain subfolders.
Once the mount crashes, rsync for example gives the following error:
I've checked the location/file mentioned in the logs, and the problem seems to be that in that folder, duplicates of identical files exists on several bricks.
That's what the entry in the log clearly says:
In a previous glusterfs version (3.4.0), the file listing (ls) showed identical filenames multiple times then. In v3.4.6, it crashes like this.
In my case, I'm dealing with a lot of such duplicate files on bricks (*), scattered across several subfolders.
Therefore, manually renaming them is very time consuming, so I'm looking for a quicker/scriptable solution.
...to be continued.
(*) NOTE: The problem was very likely caused by manual interference from my side when importing data from a previous glusterfs test-setup into a newly set up machine. So it's very likely not glusterfs's fault, but probably human error
I'm having problems with a glusterfs share crashing when I try to copy files to certain subfolders.
Once the mount crashes, rsync for example gives the following error:
In the logfiles for that volume in /var/log/glusterfs, I get the following messages:Transport endpoint is not connected (107)
The detailed log looks like this:gfid different on data file on dlp-storage-client-X
[...]
multiple subvolumes (dlp-storage-client-X and dlp-storage-client-Y) have file XXX
Code: Select all
[2015-01-27 00:24:11.553397] W [dht-common.c:1580:dht_lookup_linkfile_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid different on data file on dlp-storage-client-0
[2015-01-27 00:24:11.554221] W [dht-common.c:1335:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid differs on subvolume dlp-storage-client-0
[2015-01-27 00:24:11.554299] W [dht-common.c:1335:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid differs on subvolume dlp-storage-client-2
[2015-01-27 00:24:11.554318] W [dht-common.c:1397:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: multiple subvolumes (dlp-storage-client-0 and dlp-storage-client-2) have file /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg (preferably rename the file in the backend, and do a fresh lookup)
[2015-01-27 00:24:11.554558] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 14805548: LOOKUP() /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg => -1 (Stale NFS file handle)
[2015-01-27 00:24:11.559353] E [dht-helper.c:1240:dht_inode_ctx_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0xa15) [0x7fb8150234c5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7fb815009879] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7fb81500bc44]))) 0-dlp-storage-dht: invalid argument: inode
[2015-01-27 00:24:11.559397] E [dht-helper.c:1259:dht_inode_ctx_set] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0xa15) [0x7fb8150234c5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7fb815009879] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7fb81500bc62]))) 0-dlp-storage-dht: invalid argument: inode
[2015-01-27 00:24:11.559419] W [fuse-bridge.c:397:fuse_entry_cbk] 0-glusterfs-fuse: Received NULL gfid for /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg. Forcing EIO
[2015-01-27 00:24:11.559475] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 14805551: LOOKUP() /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg => -1 (Input/output error)
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2015-01-27 00:24:11configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.6
/lib/x86_64-linux-gnu/libc.so.6(+0x321e0)[0x7fb8199371e0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/protocol/client.so(client3_3_lookup_cbk+0x88)[0x7fb81526c948]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7fb81a4cb1a4]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xcd)[0x7fb81a4cb52d]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb81a4c7bf3]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/rpc-transport/socket.so(+0x88b6)[0x7fb81651c8b6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/rpc-transport/socket.so(+0xaf9c)[0x7fb81651ef9c]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x63b7a)[0x7fb81a736b7a]
/usr/sbin/glusterfs(main+0x3f5)[0x7fb81ab7efe5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fb819923ead]
/usr/sbin/glusterfs(+0x63b9)[0x7fb81ab7f3b9]
That's what the entry in the log clearly says:
It actually also mentions a possible solution:multiple subvolumes (dlp-storage-client-X and dlp-storage-client-Y) have file XXX
The system is a 64bit Debian 7 (Wheezy), with glusterfs-client package 3.4.6-1 installed from standard repositories.preferably rename the file in the backend, and do a fresh lookup
In a previous glusterfs version (3.4.0), the file listing (ls) showed identical filenames multiple times then. In v3.4.6, it crashes like this.
In my case, I'm dealing with a lot of such duplicate files on bricks (*), scattered across several subfolders.
Therefore, manually renaming them is very time consuming, so I'm looking for a quicker/scriptable solution.
...to be continued.
(*) NOTE: The problem was very likely caused by manual interference from my side when importing data from a previous glusterfs test-setup into a newly set up machine. So it's very likely not glusterfs's fault, but probably human error