Seems to be the case that when mounting NFS 4.1 using Windows Server 2012 R2 as the endpoint people are getting the following error:
2015-08-15T13:55:59.460Z cpu0:34075 opID=e5c595d5)NFS41: NFS41_VSIMountSet:402: Mount server: 172.20.10.14, port: 2049, path: normal, label: nfs, security: 1 user: , options: <none>
2015-08-15T13:55:59.460Z cpu0:34075 opID=e5c595d5)StorageApdHandler: 982: APD Handle Created with lock[StorageApd-0x43059cff25c0]
2015-08-15T13:55:59.461Z cpu1:33347)NFS41: NFS41ProcessClusterProbeResult:3865: Reclaiming state, cluster 0x43059cff3780 [4]
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSCompleteMount:3582: Lease time: 120
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSCompleteMount:3583: Max read xfer size: 0x3fc00
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSCompleteMount:3584: Max write xfer size: 0x3fc00
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSCompleteMount:3585: Max file size: 0xfffffff0000
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSCompleteMount:3586: Max file name: 255
2015-08-15T13:55:59.462Z cpu0:34075 opID=e5c595d5)WARNING: NFS41: NFS41FSCompleteMount:3591: The max file name size (255) of file system is larger than that of FSS (128)
2015-08-15T13:55:59.463Z cpu0:34075 opID=e5c595d5)WARNING: NFS41: NFS41FSCompleteMount:3601: RECLAIM_COMPLETE FS failed: Failure; forcing read-only operation
2015-08-15T13:55:59.463Z cpu0:34075 opID=e5c595d5)NFS41: NFS41FSAPDNotify:5651: Restored connection to the server 172.20.10.14 mount point nfs, mounted as 907c6b39-acb49d79-0000-000000000000 ("normal")
2015-08-15T13:55:59.463Z cpu0:34075 opID=e5c595d5)NFS41: NFS41_VSIMountSet:414: nfs mounted successfully
A snippet from the NFS 4.1 RFC has this to say about RECLAIM_COMPLETE (RFC 5661 - NFS Version 4 Minor Version 1 - Page 567)
Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST send a
RECLAIM_COMPLETE with rca_one_fs set to FALSE, even if there are no
locks to reclaim. If non-reclaim locking operations are done before
the RECLAIM_COMPLETE, an NFS4ERR_GRACE error will be returned.
Similarly, when the client accesses a file system on a new server,
before it sends the first non-reclaim operation that obtains a lock
on this new server, it MUST send a RECLAIM_COMPLETE with rca_one_fs
set to TRUE and current filehandle within that file system, even if
there are no locks to reclaim. If non-reclaim locking operations are
done on that file system before the RECLAIM_COMPLETE, an
NFS4ERR_GRACE error will be returned.
I ran a capture during the mount which shows that the ESXi host sent a RECLAIM_COMPLETE with rca_one_fs set to FALSE, later it sends another RECLAIM_COMPLETE with rca_one_fs set to TRUE and a filehandle specified. This looks like the correct behavior.
Image may be NSFW.
Clik here to view. Image may be NSFW.
Clik here to view.
Last of all, back in the RFC is this little tidbit:
RECLAIM_COMPLETE should only be done once for each server instance or
occasion of the transition of a file system. If it is done a second
time, the error NFS4ERR_COMPLETE_ALREADY will result. Note that
because of the session feature's retry protection, retries of
COMPOUND requests containing RECLAIM_COMPLETE operation will not
result in this error.
The screenshots also appear to show that both RECLAIM_COMPLETE operations are COMPOUND, so it seems to me that ESXi is doing what it is supposed to do, but the NFS server is sending back an error when it shouldn't which is causing the host to fall back to read-only access mode.
Any input from the wider community? Happy to be stood corrected!
Clik here to view.