IJ40279 |
High Importance
|
Fileset df doesn't report correct limit and usage for a fileset.
(show details)
Symptom |
Unexpected results |
Environment |
Linux |
Trigger |
Enable fileset df and set quota limit on a block but not on an inode. |
Workaround |
Upgrade the cluster version to 5.1.1.0. |
|
5.1.4.1 |
Filesetdf |
IJ40567 |
Medium Importance |
Restart of the pmsensors process in the container environment failed due to a race condition on apid file. pmsensors remained down, not collecting perfmon statistics.
(show details)
Symptom |
mmhealth event pmsensors_down in CNSA. |
Environment |
Linux |
Trigger |
Failover of perfmon singleton node CLUSTER_PERF_SENSOR to another node requires pmsensors restart on old and new node. |
Workaround |
Manually start pmsensors process. |
|
5.1.4.1 |
Performance monitoring, Sysmon |
IJ40568 |
High Importance
|
On s390x "mmvdisk simulate-dead", and "mmvdisk replace --prepare" commands are setting the device power off even though the power control stanza is not specified. An Error 5 can be seen.
(show details)
Symptom |
Error output/message |
Environment |
Linux (s390x) |
Trigger |
"mmvdisk simulate-dead", or "mmvdisk replace --prepare" |
Workaround |
On "Pdisk state is missing" or err 5, power on the device before further processing. |
|
5.1.4.1 |
ESS, GNR |
IJ40569 |
Suggested |
GPFS fails to process the kmipServerUri field in a remote key manager stanza in the RKM.conf file if provided as an IPv6 address, e.g., kmipServerUri = tls://[fd9a:f0d0:1002:11::31]:5696.
(show details)
Symptom |
Failure to read files from encrypted file systems/sets. |
Environment |
All |
Trigger |
None |
Workaround |
Use the hostname instead. |
|
5.1.4.1 |
Security |
IJ40570 |
High Importance
|
Assert or SIGSEGV in writeAllocSumBlock after offline mmfsckx.
(show details)
Symptom |
Abend/Crash |
Environment |
All |
Trigger |
Offline mmfsckx |
Workaround |
None |
|
5.1.4.1 |
FSCK |
IJ37871 |
Suggested |
The mmlsquota reports duplicate lines when issuing the -C option.
(show details)
Symptom |
Duplicate output |
Environment |
All |
Trigger |
Specify the Device argument that also belongs to the remote cluster in the -C argument. |
Workaround |
Specify the Device argument that does not belong to the -C ClusterName. |
|
5.1.4.1 |
Admin Commands |
IJ33574 |
Suggested |
Trace parameters set through the mmtracectl command does not keep the node classes.
(show details)
Symptom |
Unexpected behavior |
Environment |
All |
Trigger |
Set trace parameters with mmtracectl command. |
Workaround |
Explicitly set the trace parameters via mmchconfig command. |
|
5.1.4.1 |
Admin |
IJ40607 |
High Importance
|
When recovery happens, and encounters extra entries within a deleted directory from the cache - it tends to determine the mode of the remote entry and queue Remove/Rmdir accordingly. But sometimes it gets the mode wrong and ends up queuing Rmdir on a file instead of a Remove. This causes the queue to be stuck forever.
(show details)
Symptom |
Unexpected Behavior |
Environment |
Linux (AFM Gateway nodes) |
Trigger |
Recovery on AFM fileset with large number of removes/rmdirs to be captured by the recovery. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40608 |
High Importance
|
A node delete for in an ECE cluster will cause the declustered array to be stuck in critical rebuild, preventing the system from doing any data rebuild function.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
Remove an ECE node with mmvdisk. |
Workaround |
None |
|
5.1.4.1 |
ESS, GNR |
IJ40609 |
High Importance
|
The SUID and SGID bits are not cleared after a successful write/truncate to a file by a non-owner.
(show details)
Symptom |
Unexpected Results/Behavior |
Environment |
Linux |
Trigger |
Create a file with the SUID and SGID bits set. As a non-owner or non-root user, write to the file with the write() system call or truncate the file with the truncate() system call. |
Workaround |
Ensure that only owners can write to an executable binary file that has the SUID/SGID bit set. |
|
5.1.4.1 |
Core GPFS |
IJ40573 |
High Importance
|
Deadlock while accessing the data from AFM cascading relationship filesets because of token conflicts if the home fileset is AFM+COS enabled.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
AFM cascading relationship with AFM+COS fileset. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40834 |
High Importance
|
mmafmcosconfig options -gcs and -vhb does not work together. vhb is used for virtual hosting of bucket.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
Accessing AFM+COS fileset which was created with both -gcs and -vhb options |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40835 |
High Importance
|
AFM Recovery procedure sometimes fails with error 112.
(show details)
Symptom |
Unexpected Behavior |
Environment |
Linux (AFM Gateway nodes) |
Trigger |
Running recovery on a fileset who's .ptrash directory has local bit reset on it. |
Workaround |
Setting the ptrash bit manually on the .ptrash directory (if it is found to be reset). |
|
5.1.4.1 |
AFM |
IJ40841 |
High Importance
|
mmafmcosconfig options -gcs and -vhb does not work together. vhb is used for virtual hosting of bucket.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
Accessing AFM+COS fileset which was created with both -gcs and -vhb options |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40844 |
High Importance
|
readdir/read operation on AFM+COS fileset does not preserve file times causing the file time mismatch after the download operation.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
AFM+COS download operation. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40845 |
High Importance
|
Object readdir messages are not filtered if the multiple readdir operations for the same directory comes to the gateway node. This causes performance overhead and deadlocks.
(show details)
Symptom |
Long Waiters/Deadlock |
Environment |
Linux |
Trigger |
AFM+COS caching mode with multiple readdirs on the same uncached directory. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ40894 |
Suggested |
The mmauth show can be slow on a cluster that authorized a large number of remote accesses to the file systems it owns.
(show details)
Symptom |
Performance |
Environment |
All |
Trigger |
Large number of remote accesses. |
Workaround |
None |
|
5.1.4.1 |
Admin commands |
IJ40947 |
High Importance
|
When an user application uses the Fine Grain Write Sharing hint, GPFS_FINE_GRAIN_WRITE_SHARING, to overwrite an existing file which is also in a snapshot, there is a possibility that the file content in the snapshot won't be preserved but will be changed to be the same as the file content in the active file system.
(show details)
Symptom |
Data loss in the snapshot file when the GPFS_FINE_GRAIN_WRITE_SHARING hint to used to overwrite the file in the active file system. |
Environment |
Linux |
Trigger |
Use the GPFS_FINE_GRAIN_WRITE_SHARING hint to overwrite an existing file which is also in a snapshot. |
Workaround |
None |
|
5.1.4.1 |
Core GPFS |
IJ40959 |
High Importance
|
Objects are not fully prefetched at the Cache on reading 4th block when afmPrefetchThreshold is set to 0, and io pattern is random.
(show details)
Symptom |
Unexpected Behavior |
Environment |
Linux (AFM Gateway nodes) |
Trigger |
- In RO/LU/IW/SW mode of operation, with AFM COS as the backend have an uncached file (evict file in case of SW or IW from cache).
- Read 4 data blocks randomly on the file at cache.. (make sure no 2 blocks are read sequentially).
|
Workaround |
Read 4 blocks sequentially as compared to random. |
|
5.1.4.1 |
AFM |
IJ40987 |
High Importance
|
When rename/remove operations are performed on dependent filesets which are linked inside AFM independent filesets, and these operations get replicated to the remote site - the local removed/renamed inodes are not reclaimed resulting in extra inodes being held inUse than actually necessary.
(show details)
Symptom |
Unexpected Behavior |
Environment |
Linux (AFM Gateway nodes) |
Trigger |
Remove/Rename being performed on the dependent fileset inodes - when this dependent fileset is linked under an AFM independent fileset. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ41004 |
High Importance
|
Files are not re-validated in an AFM cascading relationship because of readdir optimizations. This happens if the home fileset is AFM enabled with COS backend.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
AFM cascading relationship with AFM+COS fileset. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ41031 |
Suggested |
When changing to a new calhome group server node using the`mmcallhome group change` command, this may cause the daily/weekly schedules to no longer properly function.
(show details)
Symptom |
Callhome information may not be uploaded to IBM accordingly to the established daily and weekly schedules. mmhealth may show callhome in a degraded or failed state if uploads to IBM are not occurring. |
Environment |
Linux |
Trigger |
This issue may occur when using the mmcallhome command to change the callhome group server to a different node. |
Workaround |
The node class CALLHOME_SERVERS may be manually changed to address this issue. The command `mmchnodeclass CALLHOME_SERVERS replace -N <new_server_node>` may be used to update the node class to reflect the new callhome group server node. |
|
5.1.4.1 |
Callhome |
IJ41032 |
High Importance
|
Directory is not re-validated under some conditions in AFM caching modes causing the directory attributes to be not fetched from the home.
(show details)
Symptom |
Unexpected Results |
Environment |
Linux |
Trigger |
AFM caching mode with directory updates. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ41040 |
High Importance
|
Mutex contention could lead to slow write performance on AIX when there are multiple threads trying to flush the same file that contain many blocks at same time.
(show details)
Symptom |
Performance Impact/Degradation |
Environment |
AIX/Power, windows (x86_64) |
Trigger |
Multiple threads invoking sync on the same file at the same time. |
Workaround |
None |
|
5.1.4.1 |
Core GPFS |
IJ41042 |
High Importance
|
AFM gateway asserts when replicating the Rmdir operation on a dependent fileset.
(show details)
Symptom |
Assert |
Environment |
Linux |
Trigger |
AFM caching with dependent filesets. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ41053 |
Suggested |
In recovery/resync its not able to find correct path for files which are under a mapped dir and failed with error 2 as mapped dirctory length was skipped.
(show details)
Symptom |
Operation queue gets dropped. |
Environment |
Linux |
Trigger |
Error 2 hits and queue gets dropped and cache state will be Needresync. |
Workaround |
None |
|
5.1.4.1 |
AFM COS |
1IJ41072 |
High Importance
|
"mmsdrrestore --ccr-repair" is not removing CCR tiebreaker disks from the cluster configuration when those CCR tiebreaker disks aren't available when this command is executed. This happens only when the CCR nodes file '/var/mmfs/ccr/ccr.nodes' is not available on the quorum nodes.
(show details)
Symptom |
Unexpected results/behavior |
Environment |
All |
Trigger |
'/var/mmfs/ccr/ccr.nodes' not available on the quorum nodes in conjunction with CCR tiebreaker disks not accessible on those quorum nodes. |
Workaround |
None |
|
5.1.4.1 |
CCR Admin command |
IJ40902 |
High Importance
|
Lookup on hardlinks fails intermittently on AFM cache filesets. This is due to a race between multiple threads performing the lookup of the same hardlink from different directories.
(show details)
Symptom |
Unexpected results |
Environment |
Linux |
Trigger |
AFM caching with hardlinks. |
Workaround |
None |
|
5.1.4.1 |
AFM |
IJ41074 |
Suggested |
mmvdisk pdisk list --rg all --not-ok -L prints extraneous information when all pdisks are ok. Specifically a recovery group separator will be printed with no pdisk information after it This might be confusing to the user, as the user might expect blank output if all disks are ok.
(show details)
Symptom |
Unexpected results |
Environment |
Linux |
Trigger |
Running the mmvdisk pdisk list command on a healthy system. |
Workaround |
None |
|
5.1.4.1 |
ESS, GNR |
IJ39624 |
Suggested |
On latest Cygwin (versions ≥ 3.3), an attempt to uninstall GPFS on Windows might display a dialog box complaining about access denied on uninstall.lnk. The dialog box presents options to Abort, Retry or Ignore the error. Ignoring the error bypasses the issue and results in a successful uninstall.
(show details)
Symptom |
Upgrade/Install failure. |
Environment |
Windows (x86_64) |
Trigger |
Cygwin version ≥ 3.3 |
Workaround |
When presented with the dialog box complaining about uninstall.lnk, click on "Ignore" and that should let the uninstall complete. Then from an elevated Cygwin terminal: cd /usr/lpp/mmfs/support; chmod 777 uninstall.lnk; rm uninstall.lnk |
|
5.1.4.0 |
Install, Upgrade |