SunService Tip Sheet: Sun NFS

 

INFODOC ID: 11987

SYNOPSIS: NFS PSD/FAQ
DETAIL DESCRIPTION:

SunService Tip Sheet for Sun NFS
 
Revision: 1.8
Date: February 14, 1996

Mail to: support@network.East.Sun.COM
Table of Contents

1.0: About NFS
2.0: Debugging NFS
  2.1: share and exportfs
  2.2: showmount
  2.3: nfsstat
  2.4: rpcinfo
  2.5: etherfind and snoop
  2.6 Running a snoop of NFS requests: 
  2.7 Lockd debug hints 
3.0: Common How Tos
  3.1: How to Export Filesystems Under SunOS
  3.2: How to Export Filesystems Under Solaris
  3.3: How to Mount Filesystems Under SunOS
  3.4: How to Mount Filesystems Under Solaris
  3.5: How to Set Up Secure NFS
4.0: Some Frequently Asked Questions
  4.1: Miscellaneous NFS Questions
  4.2: Problems Mounting Filesystems on a Client
  4.3: Common NFS Client Errors
  4.4: Problems Umounting Filesystems on a Client
  4.5: Interoperability Problems With Non-Sun Systems
  4.6: Common NFS Server Errors
  4.7: Common nfsd Error Message on NFS Servers
  4.8: Common rpc.mountd Error Messages on NFS Servers
  4.9: Common rpc.lockd & rpc.statd Error Messages
  4.10: NFS Related Shutdown Errors
  4.11: NFS Performance Tuning
5.0: Patches
  5.1: Core NFS Patches for SunOS
  5.2: Patches Related to NFS for SunOS
  5.3: Core NFS Patches for Solaris
  5.4: Patches Related to NFS for Solaris
6.0: Known Bugs & RFEs
7.0: References
  7.1: Important Man Pages
  7.2 Sunsolve Documents
  7.3 Sun Educational Services
  7.4: Solaris Documentation
  7.5: Third Party Documentation
  7.6: RFCs
8.0: Supportability
9.0: Additional Support

1.0: About NFS
 
This Tip Sheet documents a wide variety of information concerning NFS,
as implemented in the SunOS and Solaris operating systems. It is
intended as both an introduction to NFS, and as a guide to the most
common problems. There are many more complete references to NFS, a few
of which are noted in section 7.4 and 7.5.
 
The following terms are important to an understanding of NFS:
 
The NFS SERVER is the machine which makes file systems available to
the network. It does so by either EXPORTING (SunOS term) or SHARING
(Solaris term) them.
 
The NFS CLIENT is the machine which accesses file systems which have
been made available. It does so by MOUNTING them.
 
A number of different daemons are involved with NFS:
 
RPC.MOUNTD only runs on NFS servers. It answers initial requests from
clients for file systems.
 
NFSD runs on NFS servers. They are the daemons which deal
with the majority of the client NFS requests.
 
On SunOS 4.1.X, BIODS (block I/O daemons) help clients with
their NFS requests.  These do not exist on Solaris 2.X.

LOCKD and STATD are a set of daemons which keep track of locks on NFS
files. There will typically be a set of daemons running on a client
and server.
 
NFS partitions can be mounted in one of two ways, hard or soft. 
 
HARD MOUNTS are permanent mounts designed to look just like any
normal, local file system. If a partition that is hard mounted becomes
unavailable, client programs will keep trying to access it forever.
This will cause local processes to lock when a hard mounted disk goes
away. Hard mounts are the default type of mount.
 
SOFT MOUNTS will fail after a few retries if a remote partition
becomes unavailable. This is a problem if you are writing to the
partition, because you can never be sure that a write will actually
get processed  on the other hand, your local processes will not lock
up if that partition does go away. In general, soft mounts should only
be used if you are solely reading from a disk, and even then it should
be understood that the mount is an unreliable one. If you soft mount a
partition that will be written to, you are nearly guaranteeing that
you will have problems.
 
There are a number of files related to NFS:
 
/etc/exports (SunOS) or /etc/dfs/dfstab (Solaris) list which files to
export on a Server. These file are maintained by hand.
 
/etc/xtab (SunOS) or /etc/dfs/sharetab (Solaris) list the filesystems
which actually are currently exported. They are maintained by exportfs
and share, respectively.
 
/etc/rmtab on a server lists filesystems which are remotely mounted by
clients. This file is maintained by rpc.mountd.
 
/etc/fstab (SunOS) or /etc/vfstab (Solaris) list which files to mount
on a client. These files are maintained by hand.
 
/etc/mtab (SunOS) or /etc/mnttab (Solaris) on a client lists
filesystems which are currently mounted onto that client. The mount
and umount commands modify this file.

2.0 Debugging NFS

General NFS Debugging Note
 
When NFS is not working, or working intermittently, it can be very
difficult to track down what exactly is causing the problem. The
following tools are the best ones available to figure out what exactly
NFS is doing.

2.1: share and exportfs
 
share (on Solaris) and exportfs (on SunOS) are good tools to use to
see exactly how a NFS server is exporting its filesystems. Simply log
on to the NFS server, and run the command that is appropriate for the
OS.
 
  SunOS:
  # exportfs
  /usr -root=koeller
  /mnt
  /tmp
 
The above shows that /mnt and /tmp are exported normally. Since we see
neither rw or ro as options, this means that the default is being
used, which is rw to the world. In addition /usr gives root
permissions to the machine koeller.
 
  Solaris:
  # share
  -               /var   rw=engineering   ""  
  -               /usr/sbin   rw=lab-manta.corp.sun.com   ""  
  -               /usr/local   rw   ""  
 
The above shows that /usr/local is exported normally, /var is exported
only to engineering (which happens to be a netgroup) and /usr/sbin is
exported only to lab-manta.corp.sun.com (which is a machine).
 
Note: netgroups are only supported if you are running NIS or NIS+.
Consult documentation on those products for how to set up netgroups on
your machines.

2.2: showmount

showmount, used with the -e option, can also show how a NFS server is
exporting its file systems. Its benefit is that it works over the
network, so you can see exactly what your NFS client is being offered.
However, showmount does not show all of the mount options, and thus
you will sometimes need to use share or exportfs, as described in
section 2.1. When you do a test with showmount, you should do it from
the NFS client that is having problems:
 
  # showmount -e psi
  export list for psi:
  /var       engineering
  /usr/sbin  lab-manta.corp.sun.com
  /usr/local (everyone)
 
  # showmount -e crimson
  export list for crimson:
  /usr (everyone)
  /mnt (everyone)
  /tmp (everyone)
 
Note that showmount only displays: the partition, and who may mount
it. We will not see any other options displayed. In the example above,
there are no restrictions on who may mount crimson's partitions, and
so showmount lists (everyone).

2.3: nfsstat

The nfsstat command gives diagnostics on what type of messages are
being sent via NFS. It can be run with either the -c option, to show
the stats of an NFS client, or the -s option, to show the stats of an
NFS server. When we run 'nfsstat -c' we see the following:
 
  # nfsstat -c
 
  Client rpc:
  calls      badcalls   retrans    badxids    timeouts   waits      newcreds   
  45176      1          45         3          45         0          0          
  badverfs   timers     toobig     nomem      cantsend   bufulocks  
  0          80         0          0          0          0          
 
  Client nfs:
  calls      badcalls   clgets     cltoomany  
  44866      1          44866      0          
 
  Version 2: (44866 calls)
  null       getattr    setattr    root       lookup     readlink   read       
  0 0%       7453 16%   692 1%     0 0%       15225 33%  55 0%      13880 30%  
  wrcache    write      create     remove     rename     link       symlink    
  0 0%       5162 11%   623 1%     914 2%     6 0%       306 0%     0 0%       
  mkdir      rmdir      readdir    statfs     
  15 0%      0 0%       467 1%     68 0%      
 
The rpc stats at the top are probably the most useful. High 'retrans'
and 'timeout' values can indicate performance or network issues. The
client nfs section can show you what types of NFS calls are taking up
the most time. This can be useful if you're trying to figure out what
is hogging your NFS. For the most part, the nfsstat command is most
useful when you are doing network and performance tuning. Section 7.4 
and 7.5 list books which give some information on this  they are useful 
to
make more sense of the nfsstat statistics.

2.4: rpcinfo

You can test that you have a good, solid NFS connection to your NFS
server via the rpcinfo command. As explained in the man page, this
program provides information on various rpc daemons, such as nfsd,
rpc.mountd, rpc.statd and rpc.lockd. Its biggest use is to determine
that one of these daemons is responding on the NFS server. The
following examples all show the indicated daemons correctly
responding. If instead you get complaints about a service 'not
responding' there may be a problem.
 
To see that nfsd is responding:
 
  # rpcinfo -T udp crimson nfs
  program 100003 version 2 ready and waiting
 
[crimson is the name of the remote machine that I am testing]
 
To see that mountd is responding:
 
  # rpcinfo -T udp crimson mountd
  program 100005 version 1 ready and waiting
  program 100005 version 2 ready and waiting
 
To see that lockd is responding:
 
  # rpcinfo -T udp crimson nlockmgr
  program 100021 version 1 ready and waiting
  rpcinfo: RPC: Procedure unavailable
  program 100021 version 2 is not available
  program 100021 version 3 ready and waiting
 
  # rpcinfo -T udp crimson llockmgr
  program 100020 version 2 ready and waiting
 
(the procedure unavailable error for the nlockmgr seems to be normal
for most systems)
 
If you run rpcinfo, and determine that certain rpc services are not
responding, you should check those daemons on the master. 
 
If none of the above works, you can verify that RPC services are
working at all on the server by running:
 
  # rpcinfo remote-machine-name
 
If this gives errors too, there is probably an issue with portmap
(SunOS) or rpcbind (Solaris).
 
[Note: the above rpcinfo commands will vary slightly under Solaris 2.5
or higher, as those OSes will offer NFS version 3, running over TCP,
rather than UDP.]

2.5: etherfind and snoop
 
If all else fails, and NFS still doesn't seem to be working right, the
last resort in debugging is to use a network sniffer program, such as
etherfind (SunOS) or snoop (Solaris). This can give you some
indication of whether remote machines are responding at all. Below is
an example of a totally normal NFS interaction, shown by snoop on
Solaris:
 
  # snoop psi and rainbow-16
  Using device /dev/le (promiscuous mode)
           psi -> rainbow-16   NFS C GETATTR FH=4141
    rainbow-16 -> psi          NFS R GETATTR OK
           psi -> rainbow-16   NFS C READDIR FH=4141 Cookie=2600
    rainbow-16 -> psi          NFS R READDIR OK 1 entries (No more)
 
These were the results when an 'ls' was run on 'psi' in a directory
that was mounted from 'rainbow-16'. The lines labelled 'C' are NFS
requests, while the lines labelled 'R' are NFS replies. Through snoop
you can easily see: NFS not being responding to (you would get lots of
'C' lines without 'R' replies to them), and also certain errors
(timeouts and retransmits particularly). The man page on snoop gives
some indication of how to make more in-depth use of the tool. In
general, it should only be used for very complex issues, where NFS is
behaving very oddly, and even then you need to be very good with NFS
to perceive unexpected behavior.  See the next section for
more tips on snoop.

2.6  Running a snoop of NFS requests:
 
This is best done from a third, uninvolved machine.  If the
machine that you are trying to debug is on "systemA", then trace
the packets going to and from "systemA" as follows:
 
snoop -o /var/snoop.out systemA
 
Alternatively, snoop between systemA and clientB:

snoop -o /var/snoop.out systemA clientB

snoop will run in the window (do not put into background), with a counter
of the packets in the corner of the screen.  It will dump all the
packets into a raw snoop file.  When the "network event" occurs,
wait a couple of seconds and then kill the snoop job.  If the network
event includes error messages in /var/adm/messages, please send
us the /var/adm/messages and the raw snoop file (/var/snoop.out).
 
You can read the snoop file with snoop -i /var/snoop.out more.  There
are a variety of options in the snoop man page to increase verbosity,
and/or to look at specific packets.
 
Please note that if disk space becomes an issue, you will need to
take steps similar to those listed above.  A very large snoop file
can be created in two or three minutes, so snooping is best reserved
for easily reproduced events.

2.7  Lockd debug hints

Please see section 4.9: Common rpc.lockd & rpc.statd Error Messages
for information regarding specific lockd and statd problems.

Generally you can pick out problem clients by snooping and/or
putting lockd into debug mode.  Sections 2.5 and 2.6 cover snoop.


How to put the Solaris 2.3 and 2.4 lockd into debug mode:
Edit the line in the /etc/init.d/nfs.client script that starts
up lockd to start it with -d3 and redirect stdout to a 
filesystem with ALOT of disk space.

   /usr/lib/nfs/lockd -d3 > /var/lockd.debug.out
   
Note that lockd always creates an empty file in the pwd, called
logfile when it is running in debug mode.  Disregard this file.

If disk space becomes an issue from doing the lockd debug mode,
you will have to stop lockd and restart it.

If you turn the above command from a shell, make sure it is
a bourne or korn shell (sh or ksh).

How to put the Solaris 2.5 lockd into debug mode:

You will have to do this from a shell, preferably
a command tool window.  You will need to capture the debug output
that scrolls by into a script file:
 
  script /var/lockd.out
  /usr/lib/nfs/lockd -d3

After you are done debugging, CTRL/C the lockd job since it does
not fork and exec to the background.  Then exit or CTRL/D the
script job.  The debug output will be in /var/lockd.out.

Please note that Solaris 2.5 will also log more detailed debug output into 
the /var/adm/messages file.  We will need that also. 


How to run a truss of lockd (rarely do you ever need to do this):

Just modify the start of lockd to be a truss of the lockd process.
You will need even more disk space to do this!

For Solaris 2.3 and 2.4:
truss  -o /var/truss.out -vall -f /usr/lib/nfs/lockd -d3>var/lockd.debug.out

For Solaris 2.5:
  script /var/lockd.out
  truss -o /var/truss.out -vall -f /usr/lib/nfs/lockd -d3

CTRL/C the script job (and exit from the script shell if on 2.5) after you
have reproduced the problem.

If disk space becomes an issue from doing the truss, use a cron job
to:
1.  stop running truss
2.  move "current" truss file to "old" truss file
3.  get PID of lockd
4.  truss -o /var/truss.out.current -vall -f -p PID (PID from step 3).

3.0 Common How-Tos

3.1: How to Export Filesystems Under SunOS
 
In order to export a fs under SunOS, you must first edit the file
/etc/exports on your NFS server, adding a line for the new filesystem.
For example, the following /etc/exports file is for a server that
makes available the filesystems /usr, /var/spool/mail and /home:
 
  # cat /etc/exports
  /usr
  /var/spool/mail
  /home
 
You may add normal mount options to these lines, such as ro, rw and
root. These options are fully described in the exports man page. The
following example shows our /etc/exports file, but this time with the
filesystems all being exported read only:
 
  # cat /etc/exports
  /usr  -ro
  /var/spool/mail       -ro
  /home -ro
 
If your machine is already exporting filesystems, and you are adding a
new one, you should simply run the exportfs command to make this new
filesystem available:
 
  # exportfs -a
 
If you have never exported filesystems from this machine before, you
should reboot it after editing the /etc/exports file. This will cause
rpc.mountd and nfsd to get started, and will also automatically export
out the filesystems.

3.2: How to Export Filesystems Under Solaris
 
You must edit the file /etc/dfs/dfstab in order to make files
automatically export on a Solaris system. The standard syntax of lines
in that file is:
 
  share -F nfs partition
 
For example, the following /etc/dfs/dfstab file is for a server that
makes available the filesystems /usr, /var/spool/mail and /home:
 
  share -F nfs /usr
  share -F nfs /var/spool/mail
  share -F nfs /home
 
You may add normal mount options to these lines, such as ro, rw and
root. this is done by preceeding the options with a -o flag. These
options are fully described in the share man page. The following
example shows our /etc/dfs/dfstab file, but this time with the
filesystems all being exported read only:
 
  share -F nfs -o ro /usr
  share -F nfs -o ro /var/spool/mail
  share -F nfs -o ro /home
 
If your machine is already exporting filesystems, and you are adding a
new one, you should simply run the shareall command to make this new
filesystem available:
 
  # shareall
 
If you have never exported filesystems from this machine before, you
need to run the nfs.server script:
 
  # /etc/init.d/nfs.server start
 
(The NFS Server will come up fine on the next boot, now that an
/etc/dfs/dfstab file exists)

3.3: How to Mount Filesystems Under SunOS
 
You can always mount file systems with the mount command, with the
following syntax:
 
  mount remotemachine:/remotepartition /localpartition
 
For example:
 
  mount bigserver:/usr/local /usr/local
 
You may also give the mount command any of the general mount options.
For example, to mount /usr/local read only, you would use the command:
 
  mount -o ro bigserver:/usr/local /usr/local
 
If you wish a filesystem to get mounted every time the machine is
booted, you need to edit the /etc/fstab file. The syntax is:
 
  remotemach:/remotepart        /localpart      nfs     [options]       0 0
 
The options field is optional, and may be left out if none are needed.
To make /usr/local mount automatically, you would add the following to
your /etc/fstab:
 
  bigserver:/usr/local  /usr/local      nfs     0 0
 
To make it mount read only, you could use:
 
  bigserver:/usr/local  /usr/local      nfs     ro      0 0

3.4: How to Mount Filesystems Under Solaris
 
Section 3.3, above, shows how to correctly use the mount command, to
interactively mount files. It works exactly the same under Solaris.
 
If you wish a filesystem to get mounted every time the machine is
booted, you need to edit the /etc/vfstab file. The syntax is:
 
  remotemach:/remotepart - localpart nfs - yes [options]
 
For example, to mount the /usr/local partition, you would enter:
 
  bigserver:/usr/local - /usr/local nfs - yes -
 
To mount it readonly, you would enter:
 
  bigserver:/usr/local - /usr/local nfs - yes ro
 
Consult the vfstab man page if you're interested in knowing what the
fields that contain "-"s and "yes" are for. For the most part, they're
only relevent for non-NFS mounts.

3.5: How to Set Up Secure NFS
 
NFS has, built into it, private-key encryption routines which can
provide added security. In order for this functionalilty to be used, a
partition must be both exported with the "secure" option, and mounted
with the "secure" option. In addition, either NIS or NIS+ must be
available. Secure NFS will not work without one of these naming
services.
 
To add the -secure option to the /secret/top partition on a SunOS
machine, the following exports entry would be needed on the server:
 
  /secret/top   -secure
 
In addition, the following fstab entry would be needed on the client:
 
  server:/secret/top    /secret/top     nfs     rw,secure       0 0
 
(Solaris machines would have to have the -secure option similarly
added.)
 
If you are running NIS+, you will not need to do anything further to
access the parition, since NIS+ users and NIS+ hosts will already have
credentials created for them.
 
If you are running NIS, you must create credentials for all users and
hosts which may want to access the secure partition.
 
Root may add credentials for users with the following command:
 
  # newkey -u username
 
Users may create their own credentials with the following command:
 
  $ chkey
 
The passwd supplied to these programs should be the same as the user's
passwd.
 
Root may add credentials for hosts with the following command:
 
  # newkey -h machinename
 
The passwd supplied to newkey in this case should be the same as the
machine's root passwd.
 
It is important to note that rpc.yppasswd must be running on your NIS
server for these commands to work. In addition, you should push out
publickey maps afterwards, to make sure that the most up-to-date
credential information is available.
 
Once this is all done, secure NFS should work on your NIS network,
with two caveats: First, keyserv must be running on your client
machines. If this is not the case, adjust your rc files, so that it
automatically starts up. Second, if a user does not supply a passwd
when logging in (due to a .rhosts or /etc/hosts.equiv for example) or
if his secure key is different than his passwd, then he will need to
execute the command 'keylogin' before he can access the secure NFS
partition.

4.0 Frequently Asked Questions

4.1: Miscellaneous NFS Questions
 
Q: What version of NFS does Sun implement?
 
A: All of the currently supported revisions of SunOS and Solaris
support NFS version 2, over UDP. In addition, Solaris 2.5 will support
NFS version 3, over TCP. Although NFS version 3 is the default
for Solaris 2.5 and up, the NFS will fall back to version 2 if other
machines do not have version 3 capability.

Q: What do these NFS Error Codes mean (e.g. NFS write error 49)?
 
A: On SunOS, you can find a list of error codes in the intro(2) man
page:
 
  # man 2 intro
 
On Solaris, you can consult the /usr/include/sys/errno.h file. SRDB
#10946, available through SunSolve also lists some of the NFS error
codes.
 
Q: Why isn't my netgroup entry working?
 
A1: There are lots of factors related to netgroup. First, you must be
using either NIS or NIS+, to propagate the netgroup. Second, netgroup
will only work as ro or rw arguments, and even then only when the ro
or rw is not being used to override another ro or rw option. Netgroups
may not be used as an argument to the root option.

A2: NFS requires that the "reverse lookup" capability work such that
the hostname returned by looking up the IP address (gethostbyaddr)
matches EXACTLY the text specified in the netgroup entry.
Otherwise the NFS mount will fail with "access denied"

For example, if the NFS server has the following NIS netgroup entry:

goodhosts   (clienta,,) (clientb,,) (blahblah,,)

clienta is at 192.1.1.1 and the Server uses DNS for hostname lookups.

The NFS request to do the mount arrives from IP address 192.1.1.1
The NFS server looks up the IP address of 192.1.1.1 to get the hostname
associated with that IP address.

The gethostbyaddr MUST return "clienta".  If it does not, the NFS
request will fail with "access denied".  telnet from the NFS client
to the NFS server and run "who am i".  The hostname in paranthesis
is the name that should be in the netgroup:

hackley    pts/13       Jan 24 09:21	(mercedes)

The most common cause of this failure is failure of a DNS administrator
to properly manage the "reverse lookup maps" e.g. 192.1.1.IN-ADDR.ARPA.


Q:  What can you tell me about CacheFS?

A:  CacheFS is the "cache file system".  It allows a Solaris 2.X NFS client
to cache a remote file system to improve performance.  For example,
CacheFS allows you to be on "clienta" and cache your home directory, which
is mounted via NFS from an NFS server.

Because most often CacheFS is used in conjunction with the automounter,
we have some basic information on CacheFS in our automounter tips sheet
(Product Support Document).  You can read more about CacheFS in the
"NFS Administration Guide", and in the "mount_cachefs" man page.

Q:  Is there a showfh for Solaris to show the NFS File Handle?

A:  Yes here it is:

#!/bin/sh
#
# fhfind: takes the expanded filehandle string from an
# NFS write error or stale filehandle message and maps
# it to a pathname on the server.
#
# The device id in the filehandle is used to locate the
# filesystem mountpoint.  This is then used as the starting
# point for a find for the file with the inode number
# extracted from the filehandle.
# 
# If the filesystem is big - the find may take a long time.
# Since there's no way to terminate the find upon finding
# the file, you may need to kill fhfind after it prints
# the path.
#

if [ $# -ne 8 ]  then
        echo
        echo "Usage: fhfind <filehandle> e.g."
        echo
        echo "  fhfind 1540002 2 a0000 4df07 48df4455 a0000 2 25d1121d"
        exit 1
fi

# Filesystem ID

FSID1=$1
FSID2=$2

# FID for the file

FFID1=$3
FFID2=`echo $4   tr [a-z] [A-Z]` # uppercase for bc
FFID3=$5

# FID for the export point (not used)

EFID1=$6
EFID2=$7
EFID3=$8

# Use the device id to find the /etc/mnttab
# entry and thus the mountpoint for the filesystem.

E=`grep $FSID1 /etc/mnttab`
if [ "$E" = "" ]   then
        echo
        echo "Cannot find filesystem for devid $FSID1"
        exit 0
fi

set - $E
MNTPNT=$2

INUM=`echo "ibase=16 $FFID2"   bc` # hex to decimal for find

echo
echo "Now searching $MNTPNT for inode number $INUM"
echo

find $MNTPNT -mount -inum $INUM -print 2>dev/null

4.2: Problems Mounting Filesystems on a Client

Q: Why do I get "permission denied" or "access denied" when I try to
mount a remote filesystem?

A1: Your remote NFS server is not exporting/sharing its file systems.
You can verify this by running the showmount command as follows:
 
  # showmount -e servername
 
That will provide you with a list of all the file systems that are
being sent out. If a file system is not being exported, you should
consult section 3.1 or 3.2, as applicable.
 
A2: Your remote NFS server is exporting file systems, but only to a
limited number of client machines, which does not include you. To
verify this, again use the command showmount:
 
  # showmount -e psi
  /var       engineering
  /usr/sbin  lab-manta.corp.sun.com
  /usr/local (everyone)
 
In this example, /usr/local is being exported to everyone, /var is
being exported to the engineering group, and /usr/sbin is only being
exported to the machine lab-manta.corp.sun.com. So, I might get the
denial message if I tried to mount /var from a machine not in the
engineering netgroup, or if I tried to mount /usr/sbin from anything
but lab-manta.corp.sun.com.
 
A3: Your machine is given explicit permission to mount the partition,
but the server does not list your correct machine name. In the example
above, psi is exporting to "lab-manta.corp.sun.com", but the machine
might actually identify itself as "lab-manta" without the suffix. Or,
alternatively, a machine might be exporting to "machine-le0" while the
mount request actually comes from "machine-le1". You can test this by
first running "showmount -e" and then physically logging in to the
server, from the client that cannot mount, and then typing "who". This
will show you if the two names do not match. For example, I am on
lab-manta, trying to mount /usr/sbin from psi:
 
  lab-manta# mount psi:/usr/sbin /test
  mount: access denied for psi:/usr/sbin
 
I use showmount -e to verify that I am being exported to:
 
  lab-manta# showmount -e psi
  export list for psi:
  /usr/sbin  lab-manta.corp.sun.com
 
I then login to psi, from lab-manta, and execute who:
 
  lab-manta% rsh psi
  ...
  psi# who
  root       pts/6        Sep  8 14:02    (lab-manta)
 
As can be seen, the names "lab-manta" and "lab-manta.corp.sun.com" do
not match. The entry shown by who, lab-manta, is what should appear in
my export file. When I change it, and re-export, I can verify it with
showmount, and then see that mounts do work:
 
  lab-manta[23] showmount -e psi
  export list for psi:
  /usr/sbin  lab-manta
  lab-manta[24] mount psi:/usr/sbin /test
  lab-manta[25] 

A4: Your client is a member of a netgroup, but it seems that the netgroup
does not work.  See Section 4.1 for notes on debugging netgroups.


Q: Why do I get the following error when I try and mount a remote file
system:
 
  nfs mount: remote-machine:: RPC: Program not registered
  nfs mount: retrying: /local-partition
 
A: rpc.mountd is not running on the server. You probably just exported
the first filesystem from a machine that has never done NFS serving
before. You should reboot the NFS server, if it is SunOS 4.X,.  The
the NFS server is running Solaris 2.X, run the following:

/etc/init.d/nfs.server start

Note: Consult section 3.1 or 3.2 for information on how to create
the exports file on a SunOS 4.X system, or on how to create the dfstab
file on a Solaris 2.X system.


Q: Why doesn't the mountd respond?  After I try the mount I get NFS SERVER
NOT RESPONDING.  When I try to talk to the mountd, rpcinfo gives an rpc
timed out error.  How can I debug or fix a hung mountd on the NFS server.

A: First, try killing the mountd process on the server and restarting it.
This gets around many hung mountd issues.

Second, make sure the NFS server is "patched up".  There is a mountd patch
for Solaris 2.3, and we've seen cases where the \patch 101973 patch helps
on 2.4.

Further troubleshooting tips to debug the hung mountd on Solaris 2.X:
1.  get the PID of the running mountd
2.  truss -f -vall -p PID
3.  start a snoop at the same time you start the truss
4.  if you have access to it, run "gcore" or "pstack" (unsupported utilities
    made available by SunService) to get the stack trace of the mountd PID.

4.3: Common NFS Client Errors
 
If a file system has been successfully mounted, you may encounter the
following errors when accessing it.

 
Q: Why do I get the following error message:
 
  Stale NFS file handle
 
A1: This means that a file or directory that your client has open has
been removed or replaced on the server. It happens most often when a
dramatic change is made to the file system on the server, for example
if it was moved to a new disk, or totally erased and restored. The
client should be rebooted to clear Stale NFS file handles.
 
A2: If you prefer not to reboot the machine, you may create a new
mount point on the client for the mount point with the Stale NFS file
handle.


Q: Why do I get the following error message:
 
  NFS Server <server> not responding
  NFS Server ok

  Note, this error will occur when using HARD mounts.
  This troubleshooting section applies to HARD or SOFT mounts.

A1: If this problem is happening intermittently, while some NFS
traffic is occuring, though slowly, you have run into the performance
limitations of either your current network setup, or your current NFS
server. This issue is beyond the scope of what SunService can support.
Consult sections 7.4 & 7.5 for some excellent references which can help you
tune NFS performance. Section 9.0 can point you to where you can get
additional support on this issue from Sun.
 
A2: If the problem lasts for an extended period of time, during which
no NFS traffic at all is going through, it is possible that your NFS
server is no longer available. 

You can verify that the server is still responding by running the commands:
 
  # ping server
and
  # ping -s server 8000 10
(this will send 10 8k ICMP Echo request packets to the server)
 
If your machine is not available by ping, you will want to check the
server machine's health, your network connections and your routing. 

If the ping works, check to see that the NFS server's nfsd and 
mountd are responding with the "rpcinfo" command:

   # rpcinfo -u server nfs

program 100003 version 2 ready and waiting

   # rpcinfo -u server mountd

program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting

If there is no response, go to the NFS server and find out why
the nfsd and/or /mountd are not working over the network.  From
the server, run the same commands.  If they work OK from the
server, the network is the culprit.  If they do NOT work, 
check to see if they are running.  If not, restart them and
repeat this process.  If either nfsd or mountd IS running but
does not respond, then kill it and restart it and retest.
 
A3: Some older bugs might have caused this symptom. Make sure that you
have the most up-to-date Core NFS patches on the NFS server.
These are listed in Section 5.0 below. In addition, if you are running
quad ethernet cards on Solaris, you should install the special quad
ethernet patches listed in Section 5.4.

A4:  Try cutting down the NFS read and write size with the NFS mount
options:  rsize=1024,wsize=1024.  This will eliminate problems with
packet fragmentation across WANS, routers, hubs, and switches in a
multivendor environment, until the root cause can be pin-pointed. 
THIS IS THE MOST COMMON RESOLUTION TO THIS PROBLEM.

A5: If the NFS server is  Solaris 2.3 and 2.4, 'nfsreadmap' occasionally
caused the "NFS server not responding" message on Sun and non-Sun
NFS clients.  You can resolve this by adding the following entry to 
your /etc/system file on the NFS server:
 
set nfs:nfsreadmap=0
 
And rebooting the machine.  The nfsreadmap function was removed in 2.5
because it really didn't work.
 
A6: If you are using FDDI on Solaris, you need to enable fragmentation
with the command:
ndd -set /dev/ip ip_path_mtu_discovery 0

Add this to /etc/init.d/inetinit, after the other ndd command on line 18.

A7:  Another possible cause is IF the NFS SERVER is Ultrix, old AIX,
Stratus, older SGI, and you ONLY get this error on Solaris 2.4 and 2.5
clients, but the 2.3 and 4.X clients are OK.  

The NFS Version 2 and 3 protocol allow for the NFS READDIR request to be 
1048 bytes in length.  Some older implementations incorrect thought the
request had a max length of 1024.  To work around this, either mount
those problem servers with rsize=1024,wsize=1024, or add the following
to the NFS client's /etc/system file, and reboot:

set nfs: nfs_shrinkreaddir-1


Q: Why can't I write to a NFS mounted file system as root?
 
A: Due to security concerns, the root user is given "nobody"
permissions when it tries to read from or write to a NFS file system.
This means that root has less access than any user, and will only be
able to read from things with world read permissions, and will only be
able to write to things with world write permissions.
 
If you would like your machine to have normal root permissions to a
filesystem, the filesystem must be exported with the option
"root=clientmachine". 
 
An alternative is to export the filesystem with the "anon=0" option.
This will allow everyone to mount the partition with full root
permissions.
 
Sections 3.1 and 3.2 show how to include options when exporting
filesystems.

 
Q1: Why do 'ls'es of NFS mounted directories sometimes get mangled on
    my SunOS machine?  
Q2: Why do I get errors when looking at a NFS file on my SunOS
    machine?
 
A: By default, SunOS does not have UDP checksums enabled. This can
cause problems if NFS is being done over an extended distance,
especially if it is going across multiple routers. If you are seeing
very strange errors on NFS, or are getting corruption of directories
when you view them, you should try turning UDP checksums on.
 
You can do so my editing the kernel file /usr/sys/netinet/in_proto.c,
changing the following:
 
  int  udp_cksum = 0            /* turn on to check & generate udp checksums */
 
to:
 
  int  udp_cksum = 1            /* turn on to check & generate udp checksums */
 
Afterwards, you will need to build a new kernel, install it and
reboot. UDP checksums must be enabled on both the NFS client and NFS
server for it to have any effect. 
 
This is only an issue on SunOS machines, as Solaris machines have UDP
checksums enabled by default.

 
Q1: Why do I get intermittent errors writing to an NFS partition?
Q2: Why do I get intermittent errors reading from an NFS partition?
Q3: Why do I get the following error on my NFS partition?
 
  "nfs read error on <machine> rpc: timed out"
 
A: These symptoms can all be caused by failures of soft mounts. 
Soft mounts time out instead of logging an "NFS SERVER NOT RESPONDING" 
message.  Because of this, and other reason, it is recommended that
you only mount non-critical read-only servers with soft mounts (e.g.
man pages).

To resolve the problem, you need to solve the underlying problem
(See the section above on "NFS server not responding" for troubleshooting
assistance.  Alternatively, you can mount the NFS server with hard,intr
instead of soft, but this will have the effect of causing applications
to hang instead of timeout when the NFS servers are unavailable or 
unreachable.

4.4: Problems Umounting Filesystems on a Client
 
Q: When I try and umount a partition, why do I get the following
error:
 
  /partition: Device busy
 
A: This means that someone is actively using the partition you are
trying to unmount. They might be running a program from it, or they
might simply be sitting in a subdirectory of the partition.
 
In Solaris, you can run the command fuser to determine what processes
are using a partition:
 
  # fuser /test
  /test:     1997c    1985c
 
The above example shows that pids 1985 and 1997 are accessing the
/test partition. Either kill the processes, or run fuser -k /test to
have fuser do this for you.

 
NOTES:  This functionality is not available under SunOS.  It does
not always identify an automounted process on Solaris.
 
In many cases, it is necessary to reboot a machine in order to clear
out all of the processes which could be making a file system busy.

4.5: Interoperability Problems With Non-Sun Systems
 
The following problems are relevent to Suns which are doing mounts
from non-Sun systems.
 
Q: Why do I get the following error when mounting from my HP or SunOS
3.5 machine or other machine running an older version of NFS:
 
  nfsmount server/filesystem server not responding RPC authetication error why = invalid client credential.
 
A: Older versions of NFS only allowed users to be in eight groups or
less. Reduce root's number of groups to eight or less, and the problem
will go away. Users planning to access this partition should also
reduce their number of groups to eight.
 
Q: When I NFS mount filesystems to my Sun, from my PC, why does the
Sun never see changes I make to those filesystems.
 
A: Most PC NFS servers do not seem to correctly notify their NFS
clients of changes made to their filesystems. It appears that this is
due to the fact that file timestamps on PCs are very coarse. If you
are having this problem, you should speak with the vendor of your PC
NFS product.
 
Q: Why do mounts from my SGI fail with "not a directory" ?
 
A: For some reason, certain versions of the SGI NFS server sometimes
begin using port 860 rather than 2049 for NFS. When this occurs,
mounts will fail. In order to get around this bug, always use the
"port" option, with 2049 as a value, when doing mounts from an SGI,
eg:
 
  mount -o port=2049 sgi:/partition /localpartition
 
If you are mounting from an SGI via autofs, be sure you have the
newest version of the kernel patch (101318-74 or better for 5.3,
101945-32 or better for 5.4), as older versions of the kernel patch
did not support the port option for autofs.
 
Q: Why can't I NFS mount from my old, old machine?
 
A: If you have a very old machine, it is probably running NFS version
1. Such machines often have problems talking to newer versions of NFS.
If you have a very old machine, you should speak with the manufacturer
to see if they've ported NFS version 2 or 3.

4.6: Common NFS Server Errors

Q: Why do I get the following error when I run exportfs/shareall?
 
  exportfs: /var/opt: parent-directory (/var) already exported
  share_nfs: /var/opt: parent-directory (/var) already shared
 
A: NFS specs forbid you from exporting both a parent directory and a
sub-directory. If you try and export a sub-directory when the parent
directory is already exported, you will get the above error. The above
example showed an export of the subdirectory /var/opt being attempted,
after the directory /var was already available.
 
A very similar error will occur in the opposite case:
 
  exportfs: /var: sub-directory (/var/spool/mail) already exported
 
This shows the directory /var being exported after /var/spool/mail was
already available.
 
If you want to have both a parent directory, and its sub-directory
exported, you must export just the parent directory. Among other
things, this means that you can not have different options on parent
and sub-directories, for example -ro on a parent directory, and -rw on
a specific subdirectory.
 
Q: Why is my NFS server getting totally overrun by quota errors?
 
A: Solaris 2.4 experienced an error relating to way that quotas and
NFS interacted.  Obtain 101945-34 or later, if quota message from NFS
partitions are having a serious impact on your machine.
 
If you are running into this problem where your client is Solaris, and
your server is SunOS, you will not have this option, and it is
recommended that you simply upgrade your SunOS system.
 
Q: Why does the /etc/rmtab file get huge?
 
A: The rmtab contains the list of all the file systems currently being
mounted by remote machines. When a filesystem is unmounted by a remote
machine, the line in the rmtab is just commented out, not deleted.
This can make the rmtab file get very large, maybe even filling the
root partition.
 
If this is a problem at your site, you should add the following lines
to your rc, prior to the starting of the rpc.mountd:
 
  if [ -f /etc/rmtab ]   then
    sed -e "/^#/d" /etc/rmtab > /tmp/rmtab 2>dev/null
    mv /tmp/rmtab /etc/rmtab >dev/null 2>1
  fi
 
This will cause the rmtab file to be trimmed every time the system boots.

4.7: Common nfsd Error Message on NFS Servers
 
Q: Why do I get the following error message when nfsd starts?
 
  /usr/lib/nfs/nfsd[247]: netdir_getbyname (transport udp,
       host/serv \1/nfs), Bad file number
 
A: This problem is usually the result of an nfsd line not being in
your services map. Consult your naming service (files, nis, nis+), and
insert the following entry, if it is missing:
 
  nfsd            2049/udp        nfs             # NFS server daemon
 
...and at 2.5, you must also have:
  nfsd		2049/tcp	nfs

Q: Why do I get the following error message when nfsd starts?

/usr/lib/nfs/nfsd[2943]: t_bind to wrong address
/usr/lib/nfs/nfsd[2943]: Cannot establish NFS service over /dev/udp: transport setup problem.
/usr/lib/nfs/nfsd[2943]: t_bind to wrong address
/usr/lib/nfs/nfsd[2943]: Cannot establish NFS service over /dev/tcp: transport setup problem.
/usr/lib/nfs/nfsd[2943]: Could not start NFS service for any protocol. Exiting.

A: This problem is caused by trying to start a second nfsd when one is
already running.

4.8: Common rpc.mountd Error Messages on NFS Servers
 
Q: Why do I constantly get the following error message on my NFS
server:
 
  Aug 15 13:13:56 servername mountd[930]: couldn't register TCP MOUNTPROG
  Aug 15 13:13:58 servername inetd[141]: mountd/rpc/udp server failing
 
A: This problem occurs most often on SunOS machines. It typically
means that you are starting rpc.mountd from the rc.local, but also
have a line in your inetd.conf:
 
  mountd/1       dgram   rpc/udp wait root /usr/etc/rpc.mountd   rpc.mountd
 
You can resolve this problem by commenting out the mountd line in the
/etc/inetd.conf file, and then killing and restarting your inetd.

4.9: Common rpc.lockd & rpc.statd Error Messages
 
Q: What does it mean when I get the following error:
 
  lock manager: rpc error (#): RPC: Program/version mismatch
 
A: Some of your systems are running up-to-date versions of lockd,
while others are outdated. You should install the most up-to-date
lockd patch on all of your systems. See section 5.0 below for a list
of lockd patches.
 
Q: What does it mean when I get the following error:
 
  rpc.statd: cannot talk to statd on [machine]
 
A: Either, [machine] is down, or it is no longer doing NFS services.
It's possible that the machine may still be around, but have changed
name, or something similar. If these changes are going to be
permanent, you should clear out the statmon directories on your
machine. Do this by rebooting the machine into single user mode, and
running the following command:
 
  SunOS:
  rm /etc/sm/* /etc/sm.bak/*
 
  Solaris:       
  rm /var/statmon/sm/* /var/statmon/sm.bak/*
 
Afterwards, execute reboot to bring your machine back up. 

Alternatively, if you cannot put the system into single user mode,
- Kill the statd and lockd process
- clear out the "sm" and "sm.bak" directories"
- Restart statd and lockd in that order

Q: How can I fix these errors?  The SunOS 4.1.X lockd reports:

  lockd[136]: fcntl: error Stale NFS file handle
  lockd[136]: lockd: unable to do cnvt.

the lockd error message is different on Solaris 2.3 and 2.4:

  lockd: unable to do cnvt.
  _nfssys: error Stale NFS file handle

 Generally, this is caused by an error from a client.  The client
 has submitted a request for a lock on a stale file handle.  Sometimes,
 older or unpatched lockd clients will continually resubmit these
 requests.  See the "lockd debug hint" Section for help in
 identifying the client making the request.  See section 5.0 for info
 on the NFS and lockd patches.  If the client is a non-Sun,
 contact the client system vendor for their latest lockd patch.

Q: How can I fix the following errors:
     nlm1_reply: RPC unknown host
     create_client: no name for inet address 0x90EE4A14.
We also see
     nlm1_call: RPC: Program not registered
     create_client: no name for inet address 0x90EE4A14.

A: There are THREE items to check in order.

1.  This first answer applies if the The hexadecimal address 0x90EE4A14 
    corresponds to an IP address in use on your network, and it not
    in your hosts database (/etc/hosts, NIS, NIS+ or DNS as appropriate).

    In this case, to 144.238.74.20. 
    The customer does not have that host ID in his NIS+ hosts table.
    The customer can find out the host name for that IP address by using 
    telnet to connect to the IP address, then getting the hostname.
    The customer then adds the entry to the NIS+ hosts table. 

    Then verify that gethostbyaddr() was working with the new 
    IP/hostname in NIS+ with:
      ping -s 144.238.74.20 
    The responses interpret the IP address into the hostname.

2.  If you do the above, and the messages continue, kill and 
    restart the lockd as it appears lockd caches name service
    information.    

3.  Patch levels:

Solaris 2.4: 
101945-34 or better kernel jumbo patch
101977-04 or better lockd jumbo patch
102216-05 or better klm kernel locking patch (See note below)

Note:
Patch 102216-05 contains a fix for a bug that can cause this error message:
1164679 KLM doesn't initialize rsys & rpid correctly

Solaris 2.3:
101318-75 or better kernel jumbo patch


Q:  Why do I get the following error message on Solaris?

lockd[2269]:netdir_getbyname (transport udp, host/serv 
\1/lockd), Resource temporarily unavailable
lockd[2269]: Connot establish LM service over /dev/udp: 
bind problem. Exiting.

A:  This is caused by missing entries for lockd in /etc/services,
NIS services map, and/or NIS+ services table. 
Verify this with:
getent services lockd

If you don't get the lockd entries, add the following
entry to the appropriate services database if it does not exist:
lockd		4045/udp
lockd		4045/tcp

Check your /etc/nsswitch.conf file's services entry to determine
which services database you are using.

Q:  Why do I get the following error message on Solaris?

lockd[2947]: t_bind to wrong address
lockd[2947]: Cannot establish LM service over /dev/udp: bind problem. Exiting.

A:  This is cuased by trying to start lockd when it is already running.
If you see this message at bootup, you will need to inspect your startup
scripts in /etc/rc2.d and /etc/rc3.d to determine the cause.

4.10: NFS Related Shutdown Errors
 
Q: Why do I get the following error, when running 'shutdown' on my
Solaris machine:
 
  "showmount: machine: RPC:program not registered"
 
A: This is due to a bug in the /usr/sbin/shutdown command. shutdown
executes the showmount command as part of its scheme to warn other
machines that it will not be available. If the machine you executed
shutdown on is not a nfs server, shutdown will complain with the above
message. This will cause no impact to your machine, but if it annoys
you, you can run the older /usr/ucb/ shutdown program:
 
  # /usr/ucb/shutdown
 
Q: Why do I get the following error, when running 'shutdown' on my
Solaris machine:
 
  "nfs mount:machine(vold(PID###):server not responding:RPC not registered"
 
A: This is due to a bug in vold, which causes it to be shutdown too
late. This will cause no impact to your machine, but if it annoys you,
you can stop vold before executing shutdown:
 
  # /etc/init.d/volmgt stop
  # shutdown

4.11  NFS Performance Tuning

Q: How do I determine how many nfsds to run on a SunOS 4.1.X or on a
   Solaris 2.X system?

A:  It is difficult to provide NFS tuning in short a technical note,
but here are some general guidelines.  For more specific guidlelines,
consult the O'Reilly and Associates book "Managing NFS and NIS" and
the SunSoft Press book, "Sun Performance and Tuning".  If you need
NFS performance consulting assistance from SunService, please refer
to Sections 8 and 9 of this document on supportability and support
providers.

In SunOS 4.1.X, the number of nfsd's specifies
the number of nfsd processes that run.  In Solaris 2.X, the
number of nfsd's specifies the number of nfsd threads
that run inside the single nfsd Unix process.

Here are some general guidelines for SunOS 4.1.X:

To determine how many nfsds to run, use any of the
formulas below to pick a starting value.  Then use the
procedures below to adjust the number of nfsds until it
is right for the particular environment.

--------------------------------------------------------
  VARIATION                    FORMULA                  
--------------------------------------------------------
 Variation 1   #(disk spindles) + #(network interfaces) 
--------------------------------------------------------
 Variation 2   4 for a desktop system that is both      
               client and server,                       
               8 for a small dedicated server,          
               16 for a large NFS and compute server,   
               24 for a large NFS-only server           
+-----------+------------------------------------------+
 Variation 3   2 * max#(simultaneous disk operations)   
--------------------------------------------------------


On Solaris 2.X, this number will be different.  The SunSoft
press book recommends taking the highest number obtained by
applying the following three rules:

* Two NFS threads per active client process
* 32 NFS threads on a SPARCclassic server, 64 NFS threads per
  SuperSPARC processor.  
* 16 NFS threads per ethernet, 160 per FDDI

The default for 2.X is 16 threads.

5.0 Patches

General Information on Patches

The following is the list of all of the NFS related patches for 4.1.3,
4.1.3_u1, 4.1.4, 2.3, 2.4, and 2.5.  If you are having NFS problems,
installing the patches is a good place to start, especially if you
recognize the general symptoms noted below.
 
In order for a machine to be stable, all of the recommended patches
should be installed as well. The list of recommended patches for your
operating system is available from sunsolve1.sun.com.

5.1: Core NFS Patches for SunOS 4.1.X
 
100173-13 SunOS 4.1.3: NFS Jumbo Patch  
102177-04 SunOS 4.1.3_U1: NFS Jumbo Patch  
 
  Resolve a large number of NFS problems. Should be installed on any
  machine doing NFS.
 
100988-04 SunOS 4.1.3: UFS File system and NFS locking Jumbo Patch.  
101784-04 SunOS 4.1.3_U1: rpc.lockd/rpc.statd jumbo patch  
 
  Fixes a wide variety of rpc.lockd and rpc.statd problems.
 
102264-02 SunOS 4.1.4: rpc.lockd patch for assertion failed panic  
 
  Fixes an "Assertion failed" panic related to the lockd.

5.2: Patches Related to NFS for SunOS
 
100361-04 SunOS 4.1.1 4.1.2 4.1.3: server not responding due to limits of  
 
  Resolves an error which could cause "NFS server not responding"
  errors on a machine which had more than 500 machines in its arp cache.
  Only a problem at sites with very large local nets.
 
101849-01 SunOS 4.1.3: rpc.quotad is very slow on busy NFS servers  
 
  Speeds up slow rpc.quotads on NFS servers.

5.3: Core NFS Patches for Solaris
  
SOLARIS 2.3:

101318-75 SunOS 5.3: Jumbo patch for kernel (includes libc, lockd)  
 
  Resolves a large number of problems involving both nfs and the
  lockd, as well as the related autofs program. Should be installed 
  on any 5.3 machine, but is an absolute necessity on a machine doing
  NFS.

102654-01 SunOS 5.3: rmtab grows without bounds  

  This patch solves problems where the mountd hangs up, but the nfsd
  continues to process NFS requests from existing NFS mounts.

103059-01 SunOS 5.3: automountd /dev rdev not in mnttab  

  This patch fixes a variety of issues where the automounter loses
  entries from mnttab, often seen with lofs (loopback) mounts.

101930-01 SunOS 5.3: some files may not show up under cachefs  

  This patch is required with the "autoclient" product, which is needed
  to cache the / and /usr file systems with cachefs.

SOLARIS 2.4 and 2.4x86:

101945-36 SunOS 5.4: jumbo patch for kernel  
101946-29 SunOS 5.4_x86: jumbo patch for kernel  

  Resolves a large number of problems involving nfs, as well as the
  related autofs program. Should be installed on any 5.4 machine, but
  is an absolute necessity on a machine doing NFS.

102685-01 SunOS 5.4: lofs - causes problems with 400+ PC-NFS users  

  This patch resolves some mountd hangs seen after sharing a lofs mount point.

101977-04 SunOS 5.4: lockd fixes  
101978-03 SunOS 5.4_x86: lockd fixes  
 
  Resolves various lockd error messages, as well as a lockd memory
  leak.

102216-05 SunOS 5.4: klmmod and rpcmod fixes  

  Resolves problems with NFS file locking.  It is needed whenever
  patching lockd.  

102769-01 SunOS 5.4: statd requires enhancements in support of HADF  

  This patch is generally needed in high availability server application.

102209-01 SunOS 5.4: No way to cache the root and /usr file systems with CacheFS  
102210-01 SunOS 5.4_x86: No way to cache root & /usr file systems with CacheFS  

  This patch is required with the "autoclient" product, which is needed
  to cache the / and /usr file systems with cachefs.

102217-01 SunOS 5.4_x86: NFS client starts using unreserved UDP port numb  

  Resolves a problem specific to the x86 port of 5.4, which caused NFS
  clients to begin using unreserved ports. [look up bug 1179403]
 5.4: Patches Related to NFS for Solaris

We STRONGLY recommend you install these patches, especially if you have
had any problems with "NFS SERVER NOT RESPONDING":

SOLARIS 2.3:

101546-01 SunOS 5.3: nfs: multiple quota -v may not return info or too slow  

101581-02 SunOS 5.3: quotaon/quotaoff/quotacheck fixes  
  Resolves a problem which caused rquotad to hang on some NFS systems, and
  to resolve other quota issues.

101306-11 SunOS 5.3: Jumbo Patch for le & qe drivers  
  This is a "must install" patch for systems with Ethernet.

102272-01 SunOS 5.3: Ethernet and ledmainit fixes  
  Resolve dma problems with le interface, possible causes of NFS server hangs

101734-03 SunOS 5.3: iommu fixes for sun4m  
 Resolve iommu problems mainly on Sparc 5, possible causes of NFS server hangs.


SOLARIS 2.4:

101973-15 SunOS 5.4: jumbo patch for libnsl and ypbind  
  This patch resolves a variety of name service issues that can cause
  a 2.4 NFS server to not respond to certain requests.  This is a "must have" patch.

102001-09 SunOS 5.4: le, qe, be Ethernet driver Jumbo Patch  
  This is a "must install" patch for systems with Ethernet.  

102332-01 SunOS 5.4: ledma fix  
  Resolve dma problems with le interface, possible causes of NFS server hangs

102038-02 SunOS 5.4: iommunex_dma_mctl Sparc 5 only  
  Resolves iommu problems on Sparc 5, possible causes of NFS server hangs.6.0 Bugs and RFEs

6.0 Bugs and RFEs (Request for Enhancement)

This section should be considered under construction, and fairly
dynamic.

Bugs:

1149389 - Under heavy load, a 2.3 NFS server may see the following errors:
Oct 18 08:57:12 cobra unix: xdrmblk_getmblk failed
Oct 18 08:57:12 cobra unix: NOTICE: nfs_server: bad getargs 

There is no fix for this bug in 2.3.  One case of this bug
was fixed in the 2.4 FCS.

Note:  it is possible that this bug is caused by UDP checksum errors from
the clients.  This is most often seen with SunOS and PC clients.  Enable
UDP checkumming as a potential workaround.  Another workaround is to
change the rsize=1024,wsize=1024 on all of the NFS client so that there
is no UDP packet reassembly problems.

In any case, the root cause is corruption of a UDP packet, or incorrect
or non-existant creation of UDP checksums for requests from an NFS client.

See the automount Tips Sheet "PSD" for some further information about
automount bugs.

1222181   2.4's mountd allows automount "lofs" mount point created by loopback 
mount to be shared or exported.  This bug has been known to cause mountd hangs
on 2.4!!

RFEs:

To be investigated and added.

7.0 Documentation

7.1: Important Man Pages
 
  dfmounts              (Solaris only)
  dfshares              (Solaris only)
  exportfs
  exports
  lockd
  mnttab
  mount
  mountd
  nfs
  nfsd
  rmtab
  share                 (Solaris only)
  share_nfs             (Solaris only)
  shareall              (Solaris only)
  sharetab              (Solaris only)
  showmount
  statd
  xtab
  unshare               (Solaris only)7.2 Sunsolve Documents
 
There are a huge number of Sunsolve documents related to NFS. The ones
noted below are primarily those which have expanded information, not
in this document.
 
7.2.1 Sun Infodocs
 
2016      How does NFS work?
 
7.2.2 Sun FAQs
 
1025      nfs mounting from non-Solaris system fails
1128      Using nfsstat to gather network integrity data from NFS
 
7.2.3 Sun SRDBs
 
3874      Getting "Stale NFS Handles" errors
4459      What is the procedure for optimizing the number of nfsds?
4726      error running exportfs: "Too many levels of remote in path"
4727      exportfs doesn't recog. netgroup root access exported dir.
4769      How to use rpcinfo program to troubleshoot RPC daemons?
4840      Secure NFS failing with authentication errors
5594      nfs mount fails with "not owner"
5925      NFS request from unprivileged port.
6682      quota -v return no information on an NFS client
7334      rpc.lockd error- fcntl: error Stale NFS file handle
10609     diskless client boot gave nfs mount error 13
10946     NFS errors
11058     ls of file system mounted from ULTRIX server hangs on Solaris 2.4

7.3 Sun Educational Services
 
Sun Education provides general SunOS and Solaris network administration
classes.  In the USA, contact Sun Education at 1-800-422-8020 for a
current catalog and set of course descriptions. 

7.4: Solaris Documentation
 
_NFS Administration Guide_, Part #801-6634-10
 
  Information on how to set up, maintain and debug NFS and
  autofs.
 
_SMCC NFS Server Performance and Tuning Guide_, Part #801-7289-10
 
  A very good resource for analyzing and improving NFS performance on
  a Solaris server.

7.5: Third Party Documentation
 
_Managing NFS and NIS_, by Hal Stern, published by O'Reilly &
Associates, Inc, ISBN #0-937175-75-7
 
  The definitive source for managing NFS in a SunOS environment. Has a
  section on performance tuning which is quite helpful. Gives some
  information on the automounter as well. The underlying concepts are
  still the same for Solaris, but some of the commands and file names
  have changed.
 
_TCP/IP Network Administration_, by Craig Hunt, published by O'Reilly
& Associates, Inc, ISBN #0-937175-82-X
 
  A good overview of TCP/IP, with a limited introduction to SunOS NFS.

7.6: RFCs
 
RFCs are the internet-written documents that define the specifications
of many common networking programs. RFCs can be retrieved from
nic.ddn.mil, in the /rfc directory.
 
1094  NFS: Network File System Protocol specification
 
  The official spec on the NFS protocol.

8.0 Supportability

8.0: Supportability 
 
SunService is not responsible for the initial configuration of your
NFS environment. In addition, SunService can not diagnose your NFS
performance problems, or suggest NFS tuning guidelines.  Consulting
services are available from Sun to provide these services on
a flat fee or per hour consulting rate.  Contact your local Sun
office for further information on those services.
 
We can help resolve problems where NFS is not behaving correctly, but
in such cases the contact must be a system administrator who has a
good understanding of NFS client-server interaction.  We cannot
guarentee a solution to problems involving non-Sun hosts, nor
across complex routed networks, but can provide limited debugging
support and explaination of debugging tools.

9.0: Additional Support
 
For initial configuration, or NFS performance tuning guidelines,
please contact your local SunService office for possible consulting
offerings. Sun's Customer Relations organization can put you in touch
with your local SunIntegration or Sales office. You can reach Customer
Relations at 800-821-4643.

SOLUTION SUMMARY:

 

PATCH ID: n/a
PRODUCT AREA: Gen. Network
PRODUCT: NFS
SUNOS RELEASE: any
UNBUNDLED RELEASE: n/a
HARDWARE: n/a