GlusterFS: "gfid different on data file"

Linux howto's, compile information, information on whatever we learned on working with linux, MACOs and - of course - Products of the big evil....
Post Reply
User avatar
peter_b
Chatterbox
Posts: 383
Joined: Tue Nov 12, 2013 2:05 am

GlusterFS: "gfid different on data file"

Post by peter_b »

[PROBLEM]
I'm having problems with a glusterfs share crashing when I try to copy files to certain subfolders.
Once the mount crashes, rsync for example gives the following error:
Transport endpoint is not connected (107)
In the logfiles for that volume in /var/log/glusterfs, I get the following messages:
gfid different on data file on dlp-storage-client-X
[...]
multiple subvolumes (dlp-storage-client-X and dlp-storage-client-Y) have file XXX
The detailed log looks like this:

Code: Select all

[2015-01-27 00:24:11.553397] W [dht-common.c:1580:dht_lookup_linkfile_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid different on data file on dlp-storage-client-0
[2015-01-27 00:24:11.554221] W [dht-common.c:1335:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid differs on subvolume dlp-storage-client-0
[2015-01-27 00:24:11.554299] W [dht-common.c:1335:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg: gfid differs on subvolume dlp-storage-client-2
[2015-01-27 00:24:11.554318] W [dht-common.c:1397:dht_lookup_everywhere_cbk] 0-dlp-storage-dht: multiple subvolumes (dlp-storage-client-0 and dlp-storage-client-2) have file /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg (preferably rename the file in the backend, and do a fresh lookup)
[2015-01-27 00:24:11.554558] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 14805548: LOOKUP() /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg => -1 (Stale NFS file handle)
[2015-01-27 00:24:11.559353] E [dht-helper.c:1240:dht_inode_ctx_get] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0xa15) [0x7fb8150234c5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7fb815009879] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34) [0x7fb81500bc44]))) 0-dlp-storage-dht: invalid argument: inode
[2015-01-27 00:24:11.559397] E [dht-helper.c:1259:dht_inode_ctx_set] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_lookup_everywhere_done+0xa15) [0x7fb8150234c5] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_layout_preset+0x59) [0x7fb815009879] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52) [0x7fb81500bc62]))) 0-dlp-storage-dht: invalid argument: inode
[2015-01-27 00:24:11.559419] W [fuse-bridge.c:397:fuse_entry_cbk] 0-glusterfs-fuse: Received NULL gfid for /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg. Forcing EIO
[2015-01-27 00:24:11.559475] W [fuse-bridge.c:462:fuse_entry_cbk] 0-glusterfs-fuse: 14805551: LOOKUP() /part2/video/12/00/12-00122_B03/HIRES/12-00122_b03.mpg => -1 (Input/output error)
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2015-01-27 00:24:11configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.6
/lib/x86_64-linux-gnu/libc.so.6(+0x321e0)[0x7fb8199371e0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/xlator/protocol/client.so(client3_3_lookup_cbk+0x88)[0x7fb81526c948]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7fb81a4cb1a4]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xcd)[0x7fb81a4cb52d]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb81a4c7bf3]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/rpc-transport/socket.so(+0x88b6)[0x7fb81651c8b6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.6/rpc-transport/socket.so(+0xaf9c)[0x7fb81651ef9c]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x63b7a)[0x7fb81a736b7a]
/usr/sbin/glusterfs(main+0x3f5)[0x7fb81ab7efe5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fb819923ead]
/usr/sbin/glusterfs(+0x63b9)[0x7fb81ab7f3b9]
I've checked the location/file mentioned in the logs, and the problem seems to be that in that folder, duplicates of identical files exists on several bricks.
That's what the entry in the log clearly says:
multiple subvolumes (dlp-storage-client-X and dlp-storage-client-Y) have file XXX
It actually also mentions a possible solution:
preferably rename the file in the backend, and do a fresh lookup
The system is a 64bit Debian 7 (Wheezy), with glusterfs-client package 3.4.6-1 installed from standard repositories.
In a previous glusterfs version (3.4.0), the file listing (ls) showed identical filenames multiple times then. In v3.4.6, it crashes like this.

In my case, I'm dealing with a lot of such duplicate files on bricks (*), scattered across several subfolders.
Therefore, manually renaming them is very time consuming, so I'm looking for a quicker/scriptable solution.

...to be continued.

(*) NOTE: The problem was very likely caused by manual interference from my side when importing data from a previous glusterfs test-setup into a newly set up machine. So it's very likely not glusterfs's fault, but probably human error :oops:
Post Reply