This document describes the authorized program analysis reports (APARs) resolved in IBM Spectrum Scale 5.0.5.x releases.
This document was last updated on 28th April, 2022.
Tips:
APAR |
Severity |
Description |
Resolved in |
Feature Tags |
---|---|---|---|---|
|
| |||
IJ38923 | Critical | While updating the symlink target path on an AFM enabled fileset, the inode is not copied to the previous snapshot causing the assert. (show details) | 5.0.5.14 | AFM |
IJ38924 | Suggested | Issuing io_uring IORING_OP_READ_FIXED requests to read data into preallocated buffers fails with an error. (show details) | 5.0.5.14 | Core GPFS |
IJ38925 | Suggested | Today there is no command to bring an AFM Inactive fileset to active. (show details) | 5.0.5.14 | AFM |
IJ38926 | Suggested | When the handler for AFM replication is created on the gateway node, the handler create time, the last replay time, and the last sync time are all initialized to now time. If for some reason the handler couldn't be mounted and replicate to home, this leads to AFM printing the last replay time as the same time as handler create time and gives a misconception that replication has actually happened. (show details) | 5.0.5.14 | AFM |
IJ38927 | High Importance | When running IO through KNFS and file audit logging enabled, an invalid pointer might be accessed. (show details) | 5.0.5.14 | File audit logging |
IJ38928 | Suggested | 32bit GPFS API library not available in default path on Ubuntu. (show details) | 5.0.5.14 | GPFS API |
IJ38949 | High Importance | The SUID and SGID bits are not cleared after a successful write/truncate to a file by a non-owner. (show details) | 5.0.5.14 | Core GPFS |
IJ36560 | Suggested | If a workload involves opening and creating lots of files concurrently under the same directory, some of the opens may suffer high open times. (show details) | 5.0.5.14 | Core GPFS |
IJ39127 | High Importance | An error 22 is hit when trying to get the valid data blocks on a file in resync. (show details) | 5.0.5.14 | AFM |
IJ38848 | Suggested | NFS mount point is not getting killed if home fileset is unresponsive or hung. This is causing multiple NFS mount to be created for the same fileset. (show details) | 5.0.5.14 | AFM DR |
IJ39144 | Critical | DMAPI read event is generated on AFM deferred deletion files causing unnecessary recalls if there exists only AFM recovery snapshot. (show details) | 5.0.5.14 | AFM |
IJ39384 | Critical | AFM fails to upload the object if the name starts with a '-' character. (show details) | 5.0.5.14 | AFM |
IJ39388 | Suggested | If the system pool is also used for data, auto recovery mis-calculates avaiable metadata fg count and might trigger tsrestripefs -r wrongly. (show details) | 5.0.5.14 | FPO |
IJ39411 | Suggested | The IBM Spectrum Scale admin commands and handling of file system encryption keys require the use of more robust settings. (show details) | 5.0.5.14 | Admin commands |
IJ37872 | High Importance | Missing sqlite-3 packages on IBM Spectrum Scale Erasure Code Edition environments can cause admin command hangs. (show details) | 5.0.5.13 | Admin commands |
IJ37873 | High Importance | SGNotQuiesced assertion in dbshLockInode during file system quiesce (show details) | 5.0.5.13 | Snapshots |
IJ37875 | Suggested | An error message "Could not retrieve minReleaseVersion" is logged in the systemhealth monitor log file (mmsysmonitor.log). (show details) | 5.0.5.13 | System health |
IJ36358 | High Importance | mmap reads from lots of threads might cause a deadlock in DeclareResourceUsage. (show details) | 5.0.5.13 | Core GPFS |
IJ37909 | High Importance | When there are multiple threads trying to flush the same file and the file is large with many blocks, there could be mutex contention which can lead to performance degradation. (show details) | 5.0.5.13 | Core GPFS |
IJ37910 | High Importance | SGNotQuiesced assertion in dbshLockInode during file system quiesce. (show details) | 5.0.5.13 | Snapshots |
IJ37978 | High Importance | Inodes are not reclaimed after the hardlinks are corrected during the AFM prefetch. This causes more inodes to be in-use than actual number of files present in the fileset. (show details) | 5.0.5.13 | AFM |
IJ37107 | High Importance | AFM fileset resync failed with EINVAL error (22) (show details) | 5.0.5.13 | AFM |
IJ37979 | Suggested | When SGPanic occurs, the dealloc queue subblocks count could be wrong and cause "(deallocHighSeqNum - deallocFlushedSeqNum) >= deallocQueueSubblocks" assertion failure. (show details) | 5.0.5.13 | Core GPFS |
IJ38041 | Suggested | When a fileset is in chmodAndUpdateAcl permission change mode, creating a file with the open() system call under a parent directory with inherit entries and changing permission's of the newly created file via NFS results in duplicated and incorrect entries in the file's NFSv4 ACL. (show details) | 5.0.5.13 | NFS |
IJ38052 | High Importance | Due to a change in procps output in Cygwin version 3.3, IBM Spectrum Scale fails to start. (show details) | 5.0.5.13 | Admin commands |
IJ38077 | Suggested | mmvdisk recovery group conversion might conflict with settings for nsdRAIDSmallBufferSize from the previous deployment scripts. mmvdisk will apply a value of -1 to this setting, which conflicts with the original value of 256KiB. The result is that the Daemon will print a warning message on start up, warning the user that nsdRAIDSmallBufferSize has been reduced to a value of 4KiB. This might impact performance. (show details) | 5.0.5.13 | ESS, GNR |
IJ38081 | High Importance | AFM Prefetch with --dir-list-file option where the list contains encoded directory names is not being processed and queued. (show details) | 5.0.5.13 | AFM |
IJ38148 | High Importance | Given a parent directory with the SGID bit set, a file created with the SGID bit specified by a user who does not belong to the same group as the directory can still have the SGID bit set. (show details) | 5.0.5.13 | Core GPFS |
IJ36547 | High Importance | A newly mounting node either due to user mount or an expelled node rejoining the cluster can fail assert 'llfP->lockRangeNode != NodeAddr(-1U, 0, NodeAddr::naNormal)' if it happens in the middle of a, mmrestripefs, mmaddisk, mmdeldisk, or mmfsck operation. (show details) | 5.0.5.12 | Core GPFS |
IJ34995 | High Importance | mmlsquota is reporting wrong results with: 1. extra output lines with "no limits" for users or groups that don't have usage on the fileset 2. extra output lines, all showing "no limits" when no limits (quotas) are set for a user or group in the fileset (show details) |
5.0.5.12 | Quotas |
IJ36553 | Suggested | When the last block of a file is not a full GPFS block, replica compare function could report false replica mismatch. (show details) | 5.0.5.12 | Core GPFS |
IJ36556 | Critical | If there are node failures during burst of file create or delete activity, then it is possible for the cached free inode counters on the file system manager to become out of date. (show details) | 5.0.5.12 | Core GPFS |
IJ36855 | High Importance | The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-9, 3.1.1.7 and 3.3.0-0 contain Apache Log4j libraries that are affected by the security vulnerabilities CVE-2019-17571 and CVE-2021-4104. (show details) | 5.0.5.12 | HDFS Connector |
IJ36338 | High Importance | "More than 22 minutes searching for a free buffer in the pagepool" assertion failure. (show details) | 5.0.5.12 | Core GPFS |
IJ36861 | Critical | GPFS daemon could assert while running mmadddisk. This can only happen if a new storage pool is being created as a result of running mmadddisk and a storage pool had been deleted in the past by using mmdeldisk. (show details) | 5.0.5.12 | Core GPFS |
IJ36862 | High Importance | When split reads are spawned out to helper gateway nodes and there is a remote error at the home site causing the fileset to be evasively put to unmounted state - there is a window where putting the fileset to unmounted can conflict with an ongoing split read message. (show details) | 5.0.5.12 | AFM |
IJ34184 | High Importance | Daemon assert going off when generating DMAPI event: addr.isReserved() || addr.getClusterIdx() == clusterIdx in file cfgmgr.h, resulting in a daemon crash. (show details) | 5.0.5.12 | DMAPI |
IJ36967 | Critical | AFM gateway node crashes during the fileset recovery because invalid file handle are used to get inodes in the kernel. (show details) | 5.0.5.12 | AFM |
IJ36969 | Suggested | Certain characters such as newline (\n) or backslash (\), etc. were not escaped correctly resulting in invalid JSON. JSON parsers are not be able to read the event correctly. (show details) | 5.0.5.12 | File audit logging, Watch folder |
IJ36970 | High Importance | While trying to set extended attributes, SetXAttrHandlerThread could deadlock with itself trying to obtain a WW lock on the buffer while holding XW lock. (show details) | 5.0.5.12 | Core GPFS |
IJ36558 | Suggested | When running file audit logging, signal 11 is possible at FileMetadata::set_mtimeUpdate(unsigned int) (show details) | 5.0.5.12 | File audit logging |
IJ36557 | Suggested | If the number of quorum nodes in the cluster is not greater than the minQuorumNodes configure setting, the mmchconfig command fails without a clear message. (show details) | 5.0.5.12 | Admin commands |
IJ36536 | High Importance | Running workloads with many lookups done to GPFS in a highly concurrent way have a performance impact. (show details) | 5.0.5.12 | Core GPFS |
IJ36842 | Critical | GPFS daemon could assert while mounting the file system on a client node with code level prior to V5.1.1.0. This can only happen if a new storage pool is being created by mmadddisk and the storage pool had been deleted in the past by using mmdeldisk. (show details) | 5.0.5.12 | Core GPFS |
IJ37027 | High Importance | NULL pointer dereference in kxGanesha (show details) | 5.0.5.12 | NFS |
IJ34842 | Suggested | If the NFSv4 client holds a file lock for read/write operations, then client may report I/O error after CES-IP failover. (show details) | 5.0.5.12 | NFS |
IJ34927 | High Importance | logAssertFailed: exclLockWord == 0 (show details) | 5.0.5.11 | Core GPFS |
IJ35248 | High Importance | One thread is trying to initialize the AFM relationship with the remote site and in the meanwhile another thread has initiated an unlink on the same fileset. The relationship initialization thread doesn't give up causing unlink thread to wait forever, causing the deadlock. (show details) | 5.0.5.11 | AFM |
IJ35257 | Suggested | Eviction tries to create a pruned list file in the current working directory instead of putting it into internal IBM Spectrum Scale directories on /var. If the PWD is an RO directory, then the eviction command fails with a "Could not open file" error. (show details) | 5.0.5.11 | AFM |
IJ35258 | Suggested | Reading an evicted file in a snapshot at the AFM cache site should return an EIO and fail always. (show details) | 5.0.5.11 | AFM |
IJ35259 | High Importance | mmrepquota command failure with message size too big due to empty quota entries. (show details) | 5.0.5.11 | Quotas |
IJ35285 | Suggested | Running mmfsck command with --status-report will fail with error "Option '-N' is incorrect.", if the cluster has the config parameter defaultHelperNodes set on the cluster. (show details) | 5.0.5.11 | FSCK |
IJ35440 | High Importance | Drives on an ESS 3000 may not show up after a boot or reboot of a canister. You can detect these errors using lspci -s 0x87 | grep DpcSta | grep Trigger+ or lspci -s 0x3c | grep DpcSta | grep Trigger+ (show details) | 5.0.5.11 | ESS, GNR |
IJ35441 | Critical | When offline fsck is run in repair mode (-y) on a file system having inode size less than 4K the following assert is possible: "logAssertFailed: (dacIb[i] == (*DiskAddr::invalidDiskAddrP)) || (dacIb[i] == (*DiskAddr::dittoDiskAddrP)) || (dacIb[i] == (*DiskAddr::cdittoDiskAddrP)) || (dacIb[i] == (*DiskAddr::invalidZDiskAddrP)) || (dacIb[i] == (*DiskAddr::brokenZDiskAddrP)) || (dacIb[i] == (*DiskAddr::brokenDiskAddrP))" (show details) |
5.0.5.11 | FSCK |
IJ33526 | Suggested | A mechanism is needed that, when mmfsd takes on a fatal signal (sigsegv, etc), the signal handler can either complete in finite time, or the daemon needs to be forcibly terminated. Lack of such a mechanism can result in a deadlock. (show details) | 5.0.5.11 | Core GPFS |
IJ35535 | High Importance | Race condition during daemon shutdown could lead to kernel crash if DIO workload is running. (show details) | 5.0.5.11 | Core GPFS |
IJ35687 | High Importance | If a sufficiently large number of inodes are deleted on a node, then it is possible for the background deletion process to miss processing some of the inodes. (show details) | 5.0.5.11 | Core GPFS |
IJ34384 | Critical | Online fsck can report lost blocks that are false positives. Repairing this can allow the block to be used by multiple files at the same time causing corruption. (show details) | 5.0.5.11 | Core GPFS |
IJ35688 | Suggested | Copying an inode block which contains a bad deleted inode could trigger SIGFPE signal, then crash the mmfsd daemon. (show details) | 5.0.5.11 | Snapshots |
IJ35689 | Suggested | ACL changed when running AFM failover in SW. (show details) | 5.0.5.11 | AFM |
IJ35919 | Critical | GPFS API calls from 32bit application fail on SLES15 SP3. (show details) | 5.0.5.11 | GPFS API |
IJ35921 | Suggested | IBM Spectrum Scale ships several ilm samples. One of them is a mmfind tool and to use the tool, findUtil_processOutputFile.c needs to be compiled. But the compilation of findUtil_processOutputFile.c fails on some Linux distros. (show details) | 5.0.5.11 | Admin commands |
IJ34331 | Suggested | When the mmchmgr command is used to assign a new file system manager, it could fail with "No log available" message after the current file system panics with "No log available" error. This can happen if file system is not externally mounted on any node. (show details) | 5.0.5.10 | Core GPFS |
IJ34346 | High Importance | If FIPS is enabled, call home uploads fail; manual call home uploads crash with an error, mentioning FIPS. (show details) | 5.0.5.10 | Call home |
IJ34351 | High Importance | On a cluster with two quorum nodes and tiebreaker disks, an unexpected quorum loss can be seen on the challenger node when the current cluster manager sees a mmshutdown (or node reboot). (show details) | 5.0.5.10 | Cluster Manager |
IJ34354 | High Importance | When an application reads with an IO size that is a multiple of the GPFS block, prefetching doesn't start until the application issue a second read request unless the read starts at the beginning of the file or prefetchAggressiveness is set to prefetchOnFirstAccess. This can cause slow read performance when read IO size is very large. (show details) | 5.0.5.10 | Core GPFS |
IJ34355 | High Importance | mmkeyserv client register, deregister or rkm change command will fail if the new RKM.conf contains expired certificates. (show details) | 5.0.5.10 | Admin commands, Encryption |
IJ34356 | High Importance | When a file system has a high number of block allocation regions, the processing of the allocation manager RPC could be slower than expected. (show details) | 5.0.5.10 | Core GPFS |
IJ34357 | High Importance | With thousands of client nodes mounted in the file system, adding some more disks serviced by ESS 3000 nodes can cause long waiters trying to get NSD disk information on each client node. (show details) | 5.0.5.10 | Admin commands |
IJ34381 | Suggested | The timestamps displayed by "mmdiag --iohist" on Windows nodes may show incorrect values, especially for the decimal part of the seconds. This may also cause misreporting of the duration of the affected I/O operations. (show details) | 5.0.5.10 | Admin commands |
IJ34391 | Critical | Running online fsck in repair mode (-o -y) can cause it to detect and repair false positive lost blocks (i.e. blocks that are assigned to files) and mark it as free, and doing this can lead to duplicate block corruptions. (show details) | 5.0.5.10 | Online FSCK |
IJ34783 | Suggested | Kernel assert: Signal 11 at SharedHashTab::htInit (show details) | 5.0.5.10 | Core GPFS |
IJ34784 | Suggested | File data not synced after a recovery (show details) | 5.0.5.10 | AFM |
IJ34785 | Suggested | File data not synced after a recovery (show details) | 5.0.5.10 | AFM |
IJ34805 | High Importance | Assert: SGNotQuiesced sgmrpc.C (show details) | 5.0.5.10 | Core GPFS |
IJ34609 | High Importance | Deadlock while queueing PIO read if there is one active gateway node and home becomes unresponsive. (show details) | 5.0.5.10 | AFM |
IJ34822 | Suggested | When multiple nodes are creating files in the same directory, creates can slow down during recovery. (show details) | 5.0.5.10 | Core GPFS |
IJ34813 | Critical | Hard lockup between 2 pemsmod kernel threads can panic the kernel. A kernel panic will mean system down time and maybe quorum loss for the customer. Stack trace at vmcore-dmesg.txt will have something like this: [88432.803601] CPU: 27 PID: 14563 Comm: pemsRollUpQueue Kdump: loaded Tainted: G (show details) |
5.0.5.10 | ESS, GNR |
IJ34943 | High Importance | AFM gateway node crashes if the home is not responding while mounting the fileset target path. (show details) | 5.0.5.10 | AFM |
IJ33370 | High Importance | If the disks for a file system are not ready to be used yet and the command "mmfsadm dump deferreddeletions" is run at the same time, the command will fail with the side effect of causing a long waiter 'waiting for SG cleanup' when the file system is deleted and recreated. (show details) | 5.0.5.9 | Core GPFS |
IJ33371 | High Importance | GPFS allows the length of NSD names to be up to 255 characters and there are no rules that say it must contain an alpha. If there are NSD names with all digits and long enough, this can be a problem. With long digit names, two NSDs can incorrectly be identified as the same NSD. (show details) | 5.0.5.9 | Admin commands |
IJ33372 | Critical | During reconnect in the middle of a write operation, the below error may be reported: 2021-03-30_12:59:35.050-0400: [W] Encountered first checksum error on network I/O from NSD Client 10.10.10.10 (show details) |
5.0.5.9 | Core GPFS |
IJ33386 | Suggested | Command: err 46: tsunlinkfileset |
5.0.5.9 | Filesets |
IJ33392 | High Importance | With the introduction of the 5-level page tables, supported by Intel's Ice Lake processor generation, user space memory gets expanded by a factor of 512. This resulted in the change of kernel base address and due to this GPFS asserts with message "logAssertFailed: (UIntPtr)(vmallocStart)" while validating kernel addresses. (show details) | 5.0.5.9 | Core GPFS |
IJ33393 | High Importance | When users run the mmlsfileset command, it doesn't show the junction paths of some fileset randomly. (show details) | 5.0.5.9 | Filesets |
IJ33394 | Suggested | Assert "(verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)". (show details) | 5.0.5.9 | Core GPFS |
IJ33410 | Suggested | --safe-limit option is ignored when eviction is invoked manually using the "mmafmctl device evict" command. (show details) | 5.0.5.9 | AFM |
IJ33627 | Suggested | When a thread performing shutdown and a thread initiating startup run concurrently, it is possible that it could result in a kernel crash. (show details) | 5.0.5.9 | Core GPFS |
IJ33680 | Suggested | If a Linux node is overloaded and the thread cannot be scheduled quickly could result in a kernel panic: RIP list_del_entry_valid.cold (show details) | 5.0.5.9 | Core GPFS |
IJ33702 | High Importance | cNFS does not work on RHEL8.x. This is due to a change in pidof commnand in RHEL8. (show details) | 5.0.5.9 | cNFS |
IJ33704 | Suggested | Fileset path is taking the chars count to read the actual mount path from a given directory path. If the directory mount has the same chars till the count then prefetch starts processing successfully. (show details) | 5.0.5.9 | AFM |
IJ33715 | Suggested | GPFS has fileset level permissions which can deny setting the mode or EAs on the fileset entities depending on which mode this targets. AFM doesn't consider this flag on the fileset and we end up getting E_PERM from the home which causes the queue to get stalled. Normal queue goes fine, but its mostly the recovery or resync queue that hits this. (show details) | 5.0.5.9 | AFM |
IJ33740 | High Importance | AFM Prefetch is not generating the prefetch end callback event registered through the afmPrepopEnd event. (show details) | 5.0.5.9 | AFM |
IJ33741 | High Importance | The automatic restart of NFS (remedy action) is blocked by an open unmounted_fs_check event which is not relevant for NFS/SMB exports. (show details) | 5.0.5.9 | System health |
IJ33759 | High Importance | The Mellanox firmware manager was called frequently (around every minute) by the system health monitor. That caused a high CPU load. (show details) | 5.0.5.9 | System health |
IJ33778 | Suggested | RAS event dir_sharedroot_perm_problem was received by mmhealth sometimes without a need and sometimes with a need, but the description of the event does not say what is wrong with the permissions and which permissions should be provided. (show details) | 5.0.5.9 | System health |
IJ33860 | High Importance | cNFS does not work on RHEL8.x. This is due to a change in pidof commnand in RHEL8. (show details) | 5.0.5.9 | cNFS |
IJ33861 | Suggested | If a file system is set to maintenance mode then it is listed as 'SUSPENDED', but only a 'unmounted_fs_check' event is shown as the reason. It should say 'maintenance state' instead. (show details) | 5.0.5.9 | System health |
IJ33862 | Suggested | Ganesha fails to open files when over 1 million files are open. (show details) | 5.0.5.9 | CES NFS |
IJ32972 | Suggested | Manual procedure to decommission a DataNode is not supported. (show details) | 5.0.5.9 | HDFS Transparency |
IJ31047 | High Importance | Assertion (!OWNED_BY_CALLER(lockWordCopy, lockWordCopy) Failure at line 1275 in file dSynch.C when accessing snapshot files (show details) | 5.0.5.8 | Snapshots |
IJ32501 | High Importance | When getting the stats of a file, users could run into the assert: "Assert exp((verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || !ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)" if there are writes to the same file from other nodes. (show details) |
5.0.5.8 | gpfs_statlite API |
IJ32365 | Critical | AFM prefetch fails with "too many open files" error. (show details) | 5.0.5.8 | AFM |
IJ32345 | Suggested | The systemhealth monitor did not detect all paths for RDMA support (libibverbs.so library) on Ubuntu Machines. Therefore, it reports a "ib_rdma_libs_wrong_path" issue. (show details) | 5.0.5.8 | System health |
IJ32361 | Suggested | After convert legacy recovery group to mmvdisk managed recovery group, poor write performance observed from an application and the gpfs daemon did not come up because of OOM issue on the some nodes. (show details) | 5.0.5.8 | ESS, GNR |
IJ32375 | Critical | Application performance degradation while running on AFM filesets. (show details) | 5.0.5.8 | AFM, AFM DR |
IJ32503 | Critical | GPFS daemon could assert with: Assert exp(start + offsetToRef(elen) <= dhP->hashTabRef) when operate on a corrupted directory block. The assert also prevent repairs using mmfsck. (show details) | 5.0.5.8 | Core GPFS |
IJ32504 | HIPER | AFM recovery may incorrectly delete the files at home or secondary if there is any network issues while doing the home readdir. (show details) | 5.0.5.8 | AFM, AFM DR |
IJ32581 | High Importance | When doing preallocation and writes (e.g., Spectrum Protect Plus copy restore), the block usage of the file system is more than the total data size of these files. (show details) | 5.0.5.8 | Disk space preallocation of files |
IJ32601 | High Importance | mmhealth reports degraded network with reason "ib_rdma_nic_unrecognized" even though all RDMA ports are operational. (show details) | 5.0.5.8 | RDMA |
IJ32653 | High Importance | AFM prefetch fails with error 238 if the prefetch list file contains symlinks and if their target paths do not exist as part of the same fileset. (show details) | 5.0.5.8 | AFM |
IJ32796 | High Importance | Operations requiring allocation of full metadata blocks. Examples: Expand number of allocated inode Create new independent fileset. (show details) | 5.0.5.8 | Core GPFS |
IJ32797 | Suggested | The Linux fallocate(2) API doesn't work correctly on Spectrum Scale file systems when punching a hole beyond the end of file. (show details) | 5.0.5.8 | fallocate(2) |
IJ32813 | High Importance | Issuing a "mmchnode --daemon-interface" attempts to change the cluster configuration repository (CCR). When this mmchnode is issued from a Windows node, CCR gets committed with invalid IPv4 information, rendering the cluster in a non-working state. (show details) | 5.0.5.8 | CCR |
IJ32814 | Suggested | Offline fsck will not be able to repair all corruptions when using the option of applying patch file ((i.e mmfsck FSchk -v --patch-file path-towrite-patchfile --patch) to repair the corruptions When repairing corruption by applying patch file the fsck output would show the below messages indicating the issue: ---------------- Invalid BlockType Inode. Skipping patch. --------------- (show details) |
5.0.5.8 | FSCK |
IJ32859 | Suggested | When the mmdf command is run from a directory where the current working directory has become stale (directory was deleted after going to it), the command states it was run from an invalid directory. (show details) | 5.0.5.8 | Core GPFS |
IJ33000 | High Importance | In the current implementation of Eviction on a file, the eviction program acquires a DMAPI lock first on the file first and punches a hole on it. The program can be terminated at any point without the DMAPI lock to be released - causing a lock leak and hence later DMAPI lock acquire on the file can deadlock and the only way to come out of this is to bounce the mmfsd. (show details) | 5.0.5.8 | AFM |
IJ32892 | High Importance | On an AIX node, in some occasions, including the /var file system becoming full, mmfsd is unable to run child processes, and that results in different failures, depending on the process which mmfsd attempts to run. Among the operations which have been seen to fail: - mmadddisk - mmauth Once the problem is triggered, it will remain until the mmfsd daemon is restarted. If the problem is initiated by the /var file system getting full, freeing up space on that file system is not
enough to solve the problem. An indication that problem is taking place is in the output of command /usr/lpp/mmfs/bin/tslsfs nonexistent_FS (that is, passing the name of a nonexistent file system as parameter to the command above). In a system where the problem is occurring, the output will be mmcommon getEFOptions nonexistent_FS failed. Return code 1. While on a system without the problem, the output will be mmcommon: File system nonexistent_FS is not known to the GPFS cluster. (show details) |
5.0.5.8 | Core GPFS |
IJ32893 | High Importance | When a dependent fileset is created and linked under AFM independent fileset, ACLs form the home dependent fileset are not fetched and set at the cache dependent fileset. This happens only for the dependent fileset root path. (show details) | 5.0.5.8 | AFM |
IJ32906 | High Importance | DEADLOCK PROBECLUSTERTHREAD WAITING FOR SG CLEANUP (show details) | 5.0.5.8 | File audit logging, Watch folder |
IJ32929 | Suggested | The mmfs.log may be filled with netstat: not found messages on systems running SLES 15. This is the result of running mmdiag --network command explicitly or through mmhealth monitoring service which uses the netstat command. (show details) | 5.0.5.8 | GUI, System health |
IJ30160 | High Importance | When mmbackup or tsapolicy is called to scan files, it could report "no such file or directory" for existing files. (show details) | 5.0.5.7 | mmbackup, tsapolicy, GPFS API, DMAPI, AFM |
IJ30308 | Suggested | mmcrfs fails to create file systems when the cluster is configured with minQuorumNodes greater than one and tiebreakerDisks are in use. (show details) | 5.0.5.7 | Admin commands |
IJ31134 | High Importance | mmbackup does not honor --max-backup-size in a snapshot backup. (show details) | 5.0.5.7 | mmbackup |
IJ30673 | High Importance | When aioSyncDelay config is enabled, the buffer steal and the aio writes that need to be done as buffered I/O may race with each other and causes log assert isSGPanicked in clearBuffer. (show details) | 5.0.5.7 | Core GPFS |
IJ30700 | Suggested | GPFS command reports incorrect default for nsdRAIDMaxRecoveryRetries. (show details) | 5.0.5.7 | Admin commands |
IJ31060 | High Importance | State of Physical disk shown as "unknown" in mmhealth and GUI for ECE. (show details) | 5.0.5.7 | System health, GUI |
IJ31785 | Suggested | mmhealth has issues with high inode consumption. (show details) | 5.0.5.7 | System health |
IJ31208 | High Importance | Daemon assert: (ofP == NULL) || (getPseudoIbdP() == ibdP) || (ibdP->assignedIndDA.isALLOC()) in file Metadata.h, resulting in mmfsd daemon crash. (show details) | 5.0.5.7 | Core GPFS |
IJ30797 | High Importance | GPFS daemon could fail with logAssertFailed: getDeEntType() == detUnlucky when reading a directory block that contain unexpected data due to corruption. (show details) | 5.0.5.7 | Core GPFS |
IJ32008 | High Importance | The free space stats report through the "df" command or statfs API is very out of date on client nodes. (show details) | 5.0.5.7 | Core GPFS |
IJ29239 | High Importance | logAssertFailed: ofP->mnodeStatusIs(0x4) (show details) | 5.0.5.7 | Core GPFS |
IJ31851 | Suggested | While running offline fsck the node asserts with signal 11 when checking log files. (show details) | 5.0.5.7 | FSCK |
IJ31852 | High Importance | Assert exp(isValidSocket(sock)) in line 2722 of file thread.C (show details) | 5.0.5.7 | Core GPFS |
IJ32009 | High Importance | GNR rebalance unable to complete after many days. (show details) | 5.0.5.7 | ESS, ECE, GNR |
IJ31853 | Suggested | The provided improvements result in a more robust functionality of the MM command interface. (show details) | 5.0.5.7 | Core GPFS |
IJ31880 | High Importance | GPFS daemon assert: retryCount <= 300 (show details) | 5.0.5.7 | Core GPFS |
IJ29447 | High Importance | When performing IO on a very large file, contention for InodeCacheObjMutex could occur as a number of buffers for the file increases. This is more likely to happen on file systems with smaller block sizes. (show details) | 5.0.5.7 | Core GPFS |
IJ31902 | High Importance | There is a small possibility for both replica to be placed in the same failure group when there is a disk configuration change and one failure group is low on free space. (show details) | 5.0.5.7 | Core GPFS |
IJ31927 | Critical | GPFS daemon could fail unexpectedly with assert: commitRanges[i].valid || commitRanges[i].numBytes <= lfP->sgP->getWriteCacheThreshold(). This could happen after mmchfs command was issued to reduce write cache threshold while applications are actively writing to the file system. (show details) | 5.0.5.7 | HAWC |
IJ32035 | Critical | Assert in mmfsd "Signal 6 at verbs::parseConfigVerbsPorts, at verbsInit.C:4407", resulting in a Spectrum Scale crash at start up. (show details) | 5.0.5.7 | RDMA |
IJ32038 | High Importance | There appears to be an issues at the systemd layer that causes startup service to fail with connection time out during reboot. If auto load is set to yes, GPFS may not be able to start up or it may get stuck waiting for the environment to be initialized. (show details) | 5.0.5.7 | GPFS startup, CCR, systemd |
IJ30432 | High Importance | When mmdelnode is issued against a node whose mmfsd daemon is still up, several of the nodes in the cluster can fail with messages such as the following: [E] Deleted node 169.28.113.36 (nodeX) is still up. [E] Node 169.28.113.36 (nodeX) has been deleted from the cluster (show details) |
5.0.5.6 | Cluster Membership |
IJ30346 | Suggested | Some processes may not be woken up as they should during a cluster manager change. That might lead to potential deadlocks. (show details) | 5.0.5.6 | Core GPFS |
IJ30393 | High Importance | The GPFS daemon could fail with logAssertFailed: fromNode != regP->owner. This could occur when a file system's disk configuration is changed just as a new file system manager is taking over. (show details) | 5.0.5.6 | Core GPFS |
IJ30402 | High Importance | "Disk in use" error when using not partitioned DASD devices. DASD '/dev/dasdk' is in use. Unmount it first! mmcrnsd: Unexpected error from fdasd -a /dev/dasd. Return code: 1 mmcrnsd: [E] Unable to partition DASD device /dev/disk/by-path/ccw-0.0.0500 mmcrnsd: Failed while processing disk stanza on node node01.abc.de %nsd: device=/dev/disk/by-path/ccw-0.0.0500 nsd=scale_data01 servers=node01.abc.de usage=dataAndMetadata (show details) |
5.0.5.6 | Installation toolkit |
IJ30408 | Suggested | AIO operations on encrypted files are handled as buffered IO, further decreasing the performance of the AIO operation in addition to the crytographic overhead introduced by the encryption of files in the file system. (show details) | 5.0.5.6 | Encryption |
IJ30451 | Suggested | When a user starts the mmrestripefile command against a big file with the -b option, it could take a long time (For example, more than 20 minutes) to return but no data movement is seen between disks. This is because the big file is already balanced. (show details) | 5.0.5.6 | mmrestripefile command |
IJ30409 | Suggested | Kernel v4.7 changes the inode ACLs cache mechanism, and GPFS (5.0.5.2+, 4.2.3.23+) does not adapt to the new kernel behaviors. The following two typical issues are observed: 1. normal user can access one file, and root removes the file access privilege from the user by chmod command => the user can still access the file 2. normal user cannot access one file, and root grants the file access privilege for the user by chmod command => the user cannot access the file either. (show details) |
5.0.5.6 | Core GPFS |
IJ30458 | Suggested | skipRecall config does not work. (show details) | 5.0.5.6 | DMAPI |
IJ30427 | Suggested | The mmfs.log shows several "sdrServ: Communication error" messages. (show details) | 5.0.5.6 | System health |
IJ30461 | High Importance | mmbackup could backup files unnecessary after failure. (show details) | 5.0.5.6 | mmbackup |
IJ30462 | High Importance | Memory leak on file system manager node during quota revoke storm (show details) | 5.0.5.6 | Quotas |
IJ30463 | Medium Importance | While migrating a file to the cloud, the gpfs daemon might hit a signal in StripeGroup::decNumAccessRights() (show details) | 5.0.5.6 | TCT |
IJ30429 | High Importance | mmfsd crashed due to signal 11 when verifying the file system descriptor. (show details) | 5.0.5.6 | Core GPFS |
IJ30466 | Suggested | mmsmb exportacl list doesn't show "@" of the SMB share name. (show details) | 5.0.5.6 | SMB |
IJ30397 | High Importance | mmvdisk throws an exception for a list operation when the daemon node name is not identical to the admin node name. (show details) | 5.0.5.6 | GNR, ESS |
IJ30465 | High Importance | Assert failure luEnclosureSlotP == __null (show details) | 5.0.5.6 | GNR, ESS |
IJ30352 | Critical | logAssertFailed (*respPP != __null) cacheops.C (show details) | 5.0.5.6 | AFM |
IJ30493 | Suggested | The administrator is unable to change the page pool setting on the GNR recovery group server. The problem is seen only on recovery groups not managed by mmvdisk. The mmchconfig command will fail, and the following error message is displayed: The --force-rg-server flag must be used to change the pagepool (show details) |
5.0.5.6 | GNR, ESS |
IJ30143 | High Importance | After dm_punch_hole call, a dm_get_allocinfo could return improper results for the information of the data blocks allocation. (show details) | 5.0.5.6 | DMAPI |
IJ30621 | Suggested | Running "mmces events list" stdout prints many trailing white characters (empty spaces), unnecessarily. (show details) | 5.0.5.6 | CES |
IJ30634 | High Importance | An RPC message could be handled twice when TCP reconnect happens This could cause log assertion, FS struct error or be silently handled depending on the type of RPC. (show details) | 5.0.5.6 | Core GPFS |
IJ30785 | Suggested | If the mmvdisk command takes more than 60 seconds to complete, mmhealth reports all pdisks as vanished. On larger systems with many I/O nodes and pdisks, 60 second timeouts are not enough. (show details) | 5.0.5.6 | System health, GUI |
IJ30864 | High Importance | In a mixed AIX/Linux cluster, the mmbackup command could fail with gskit/ssl errors after upgrading IBM Spectrum Protect code to 8.1.11, which introduced new rpm for gskit 8.0-55.17 that is not compatible with gpfs.gskit version. (show details) | 5.0.5.6 | mmbackup |
IJ29444 | Suggested | After Node-B successfully reestablishes a broken connection to Node-A, Node-A still shows the reconnect_start state (DEGRADED). (show details) | 5.0.5.6 | System health |
IJ30878 | HIPER | AFM gateway node crashes if the home is not responding and multiple threads are trying to read the same file. (show details) | 5.0.5.6 | AFM |
IJ30973 | HIPER | AFM gateway node asserts if the home is not responding and multiple threads are trying to read the same file. (show details) | 5.0.5.6 | AFM |
IJ30675 | Critical | Daemon (AFM) assert goes off: getReqP->r_length <= ksP->r_bufSize (show details) | 5.0.5.6 | AFM |
IJ30976 | HIPER | Revalidation on a AFM fileset fails on a RHEL 8.3 gateway node and home changes may not be detected causing the data or metadata mismatch between cache and home. (show details) | 5.0.5.6 | AFM, AFM DR |
IJ30778 | Critical | With async refresh enabled, file system quiesce is blocked during the remote operation and it might result in a deadlock if the remote is not responding. (show details) | 5.0.5.6 | AFM |
IJ29433 | Medium Importance | The systemhealth monitor reported a gpfs_down event and triggered a failover even though the system was fine. (show details) | 5.0.5.5 | System health |
IJ29517 | Critical | When an uncached file is renamed in the local-updates mode, the file is not copied to the previous snapshot causing the setInodeDirtyAndVerify assert. (show details) | 5.0.5.5 | AFM |
IJ29434 | High Importance | While the GPFS daemon is shutting down, there is chance that a specific trace will be logged and it may crash the kernel. (show details) | 5.0.5.5 | Core GPFS |
IJ29530 | Suggested | IBM Spectrum Scale has core dump triggered in dAssignSharedBufferSpace() due to segmentation fault hit by the mmfsd or the lxtrace daemon. (show details) | 5.0.5.5 | Trace |
IJ29435 | High Importance | On zLinux, while running an mmap workload with traceIoData configuration enabled, the trace code may trigger a page fault and cause the kernel to crash. (show details) | 5.0.5.5 | mmap |
IJ25754 | High Importance | Quota clients request quota shares based on the workload and most of the time the quota shares given to an active client is much larger than the previously pre-defined amount (e.g. 20 file system blocks). The unused or excess quota shares are returned to the quota manager periodically. At the quota manager side, when the quota usage exceeds the established soft quota limits, the grace period is triggered. At this event, the quota shares are reclaimed and the quota share distribution falls back to a more conservative fashion (based on predetermined amount). In certain workloads, when the partial quota shares are returned to the manager along with the usage updates and as a result it triggers the soft quota limit exceeded event, some amount of quota shares are lost due to mismanagement of quota shares between the client and the manager, leading to a permanent loss of quota shares correctable by using the mmcheckquota command. (show details) | 5.0.5.5 | Quotas |
IJ29453 | High Importance | In cases which have small pagepool size and large file system block size, GPFS may wait for page reservation unnecessarily because GPFS tends to reserve more pages than necessary. (show details) | 5.0.5.5 | mmap |
IJ29535 | Critical | After mmimgrestore, the mmfsd could assert when handling the mmlsfileset command for a dependent fileset: logAssertFailed: fsOfP->getDirLayoutP() != __null (show details) | 5.0.5.5 | DMAPI |
IJ29490 | High Importance | Under heavy workload (especially with file creation/deletion involved) with quota function enabled, some race issues are exposed such that the filesetId is not handled correctly, causing a GPFS daemon assert. (show details) | 5.0.5.5 | Quotas |
IJ29495 | High Importance | mmchnode fails when more than the current number of quorum nodes becomes quorum nodes again. (show details) | 5.0.5.5 | mmchnode --quorum, CCR |
IJ29502 | Suggested | If the cluster is configured with a separate daemon and admin interfaces, the -Y output of mmgetstate only shows the admin node name. (show details) | 5.0.5.5 | Admin commands |
IJ29678 | High Importance | logAssertFailed: !"Cleanup hit contended Fileset lock." (show details) | 5.0.5.5 | Filesets |
IJ29679 | Suggested | The mmkeyserv command displays the latest expiration date from the KMIPT certificate chain. It should display the expiration date of the end-entity certificate. (show details) | 5.0.5.5 | Admin commands, Encryption |
IJ29682 | Critical | When truncating a migrated immutable file with DMAPI interfaces, the data of the file becomes zero, although the file is immutable. (show details) | 5.0.5.5 | Immutable and append-only files |
IJ29514 | Critical | File system manager could assert with exp(isStoragePoolIdValid(poolId)) during log recovery if a node fails shortly after running mmdeldisk. (show details) | 5.0.5.5 | Core GPFS |
IJ29515 | Suggested | Incorrect quota check result due to OpenFile reuse/updateShadowTab (show details) | 5.0.5.5 | Quotas |
IJ29686 | High Importance | On clusters having minReleaseLevel at 5.0.1, with mixed version nodes available from 5.0.1.X through till 5.0.5.X nodes and where the gateway node is at level 5.0.5.X, the newer level of gateway nodes finds it hard to co-exist with the older level nodes causing a recovery failure repeatedly. (show details) | 5.0.5.5 | AFM |
IJ29719 | Suggested | While reading the file, the file can be evicted and its captured checksum shows inconsistency for this opened file. (show details) | 5.0.5.5 | AFM |
IJ29533 | Suggested | AFM Prefetch doesn't print how many files/inodes it has completed processing for queueing. It would help understand the progress because the actual queueing happens when enough files/inodes have been accumulated and it takes longer to update any progress to the user. (show details) | 5.0.5.5 | AFM |
IJ29683 | Medium Importance | A NFS client might be blocked for a while after a failover before it continues I/O. (show details) | 5.0.5.5 | System health |
IJ29685 | Suggested | I/O hangs with mirrored disk while recovery group resigns repetitively due to vdisk fault tolerance exceeded. (show details) | 5.0.5.5 | GNR |
IJ29690 | Suggested | The systemhealth monitor reports file systems used with NFS/SMB exports as unmounted even when they are mounted and functional. (show details) | 5.0.5.5 | System health |
IJ29542 | Suggested | Several RAS events had inconsistent values in their SEVERITY and STATE. For instance, the event "network_bond_degraded", which STATE=DEGRADED, has SEVERITY=INFO. As a result, related failures were not propagated properly. (show details) | 5.0.5.5 | System health, GUI |
IJ29910 | Suggested | In IBM Spectrum Scale Erasure Code Edition, it is possible for all of the server's pdisks (physical disks) to become missing, either due to network failure, node failure, or through a planned "node suspend" maintenance procedure. When this
happens, the system will continue to function if there is sufficient remaining fault tolerance. However, smaller configurations with less ECE nodes are exposed to a race condition where pdisk state changes can interrupt a system-wide descriptor update which causes the recovery group to resign. It is also possible to experience this problem with higher probability when using small ESS configurations, such as the GS1 or GS2 enclosures. For both ESS and ECE, a possible symptom may appear in the mmfs.log in this form when a pdisk state change is quickly followed by a resign message claiming VCD write failures before the system fault tolerance is exceeded: 2020-12-01_19:01:36.696-0400: [D] Pdisk n004p005 of RG rg1 state changed from ok/00000.180 to missing/ suspended/00050.180. 2020-12-01_19:01:36.697-0400: [E] Beginning to resign recovery group rg1 due to "VCD write failure", caller err 217 when "updating VCD: RGD" Note that a "VCD write failure" with err 217 is a generic message issued when fault tolerance is exceeded during critical system updates, but in this case the race condition resigns the system when only a handful of missing disks are found. (show details) |
5.0.5.5 | GNR, ESS |
IJ29916 | Suggested | When the file system is in panic on a quota client node, the outstanding quota share is not relinquished. Quota share Indoubt value is reported and the shares can only be reclaimed by mmcheckquota. (show details) | 5.0.5.5 | Quotas |
IJ29919 | Suggested | Incorrect quota check results on small files with fragments (show details) | 5.0.5.5 | Quotas |
IJ28891 | Suggested | In a kernel >4.10 and file-sizes being a multiple of page sizes, a false error is returned once the read offset reaches file size. (show details) | 5.0.5.5 | Core GPFS |
IJ29312 | High Importance | Node crash if tremendous parallel access to file with NFS (show details) | 5.0.5.5 | kNFS |
IJ29826 | Critical | AFM gateway nodes runs out of memory during resync glibc is known to use as many arenas as 8 times the number of CPU threads a systems has. This makes a multi-threaded program like AFM which allocates memory for queues to use a lot more memory than actually needed. (show details) | 5.0.5.5 | AFM |
IJ29939 | Critical | Assertion 'exp(lfVersion != other.lfVersion || lfVersion == 0 || tailLsn == other.tailLsn)' in line 99 of file openlog.C (show details) | 5.0.5.5 | File IO Cluster manager election |
IJ29763 | Suggested | The tsfindinode utility incorrectly reports file path as not found for valid inodes. (show details) | 5.0.5.5 | tsfindinode |
IJ29209 | High Importance | logAssertFailed: ofP->isInodeValid() at mnUpdateInode when doing stat() or gpfs_statlite() (show details) | 5.0.5.5 | Core GPFS |
IJ30125 | Suggested | When there are many threads doing sync writes through the same file descriptor, a contention on inodeCacheOjbMutex between them could impact the performance of writes. (show details) | 5.0.5.5 | Sync writes |
IJ30076 | Suggested | Inodes not getting freed after user deleted them (show details) | 5.0.5.5 | Core GPFS |
IJ29780 | Suggested | mmvdisk --replace command results in message: Location XXX contains multiple disk devices. (show details) | 5.0.5.5 | GNR, ESS |
IJ29106 | High Importance | The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-6 contains 2 NullPointerExceptions in the HDFS NameNode service. The application accessing the data is not impacted, but these exceptions are seen in the NameNode log file. (show details) | 5.0.5.5 | HDFS Connector |
IJ29133 | Suggested | The IBM Spectrum Scale HDFS Transparency connector version 3.1.0-6 modified the label for the open operation when the configuration is set to "Scale" for the ranger.enabled parameter. When retrieving the JMX stats, the open is reported as GetBlockLocations. (show details) | 5.0.5.5 | HDFS Connector |
IJ28470 | Suggested | The CLiC cryptographic engine by IBM Spectrum Scale has been sunset. (show details) | 5.0.5.4 | Encryption |
IJ26212 | Suggested | mmdumpkthreads is stuck in zombie state. (show details) | 5.0.5.4 | System health |
IJ27098 | High Importance | mmfsd daemon asserting with Assert exp(cfg) in FSFlashDevice.C. (show details) | 5.0.5.4 | LROC |
IJ27087 | High Importance | When QoS throttling is in use, and an application uses ionice with certain IO priorities, it is possible for not only that application to experience degraded performance due to the throttline, but also for other file system operations to be delayed or to fail. (show details) | 5.0.5.4 | QoS |
IJ27923 | Suggested | When a user turns off the file system maintenance mode, the file system cannot be mounted. (show details) | 5.0.5.4 | Core GPFS |
IJ28498 | High Importance | Long waiters (show details) | 5.0.5.4 | Core GPFS |
IJ28505 | Suggested | Opening or closing parenthesis not being accepted in bind_dn password. (show details) | 5.0.5.4 | Authentication |
IJ28604 | Suggested | Umount file system operation is not successful because of open files. (show details) | 5.0.5.4 | AFM, AFM DR |
IJ28605 | Suggested | mmlsmount with the --report and -Y options may not take into account nodes which do not have the file system mounted. (show details) | 5.0.5.4 | Core GPFS |
IJ28606 | Suggested | mmhealth cluster show: faster heartbeat_missing (show details) | 5.0.5.4 | System health |
IJ28434 | Suggested | mmgetstate -Y format is showing a negative value (show details) | 5.0.5.4 | AFM, AFM DR |
IJ28611 | HIPER | AFM uses special control files to replicate ACLs, EAs and to check fileset mode at the home/secondary site. This special control file is not being recognized correctly thus affecting the EA and ACLs replication and fileset mode recognition. (show details) | 5.0.5.4 | AFM, AFM DR |
IJ28609 | Suggested | In mmcheckquota, when a quota entry is processed from a deleted fileset, the quota entry is correctly skipped, but this makes the mmcheckquota process exit with an error. (show details) | 5.0.5.4 | Core GPFS |
IJ28584 | HIPER | File and memory leaks in the kernel (show details) | 5.0.5.4 | AFM |
IJ27905 | Critical | Failback performance issues (show details) | 5.0.5.4 | CES |
IJ28184 | High Importance | mmfsd daemon assert going off: Assert exp(rmsgP != __null) in file llcomm.C, resulting in a daemon crash. (show details) | 5.0.5.4 | Core GPFS |
IJ27087 | High Importance | Application runs with I/O priority mapping into a not supported QoS class, which does have IOPS limitation with 1 IOPS, thus leading to I/Os being queued to wait for enough tokens to service the I/O operation. This causes long waiters. (show details) | 5.0.5.4 | QoS |
IJ28608 | High Importance | If call home data collection process was interrupted because of a the power loss, the following data collection of the same schedule will fail due to the directory already existing. (show details) | 5.0.5.4 | Call home |
IJ28626 | Suggested | Read-only offline fsck reports references to down disk as corrupt which is not correct behavior. (show details) | 5.0.5.4 | FSCK |
IJ27414 | High Importance | mmfsck --estimate-only panics at the end of fsck and a new stripe group manager disallows new mounts. (show details) | 5.0.5.4 | FSCK |
IJ28631 | High Importance | Given an IBM Spectrum Scale cluster with 'verbsRdmaCm' set to 'enable' and configured to use RDMA via RoCE, individual nodes will fail to establish a RDMA connection to other nodes when the IP addresses configured on the RDMA adapters include a non-link local IPv6 address. (show details) | 5.0.5.4 | RDMA |
IJ28610 | Suggested | While a node is tryiing to join a cluster, mmfsd start could encounter a null pointer reference and crash with a signal 11 with a backstack that looks like this: [D] #0: 0x0000559601506BCE RGMaster::getNode FullDomainName(NodeAddr, char**) + 0xAE at ??:0 [D] #1: 0x000055960150CAA2 RGMaster::rgListServers(int, unsigned int) + 0x212 at ??:0 [D] #2: 0x000055960145F21C runTSLsRecoveryGroupV2 (int, StripeGroup*, int, char**) + 0xA8C at ??:0 [D] #3: 0x0000559601460371 runTSLsRecoveryGroup (int, StripeGroup*, int, char**) + 0xB1 at ??:0 (show details) |
5.0.5.4 | GNR |
IJ28848 | Suggested | Can not create SMB share using utf8 chars through CLI. (show details) | 5.0.5.4 | SMB |
IJ28801 | High Importance | A bad extended encryption attribute is in a snapshot file, then attempt to delete that snapshot. (show details) | 5.0.5.4 | Snapshots, Encryption |
IJ28849 | Critical | On file system with HAWC enabled, data written to disk could be lost after a node failure when using the system call fdatasync() or Ganesha. (show details) | 5.0.5.4 | HAWC |
IJ28877 | High Importance | When file audit logging or watch folder is enabled on a file system, unmounting the file system might result in a waiter that will not clear. This may cause other commands to hang. (show details) | 5.0.5.4 | File audit logging, Watch folder |
IJ28889 | Critical | mmhealth does not work on AIX. (show details) | 5.0.5.4 | System health |
IJ29002 | Critical | If the default replication (-m or -r) setting for a file system is set to 1 and mmvdisk is used to add an additional vdisk set to the file system, an exception will be hit if the --failure-groups option is not used. (show details) | 5.0.5.4 | ESS, GNR |
IJ29004 | Suggested | The systemhealth monitor reports data and name nodes as down for the HadoopConService. In fact, both were running. (show details) | 5.0.5.4 | System health |
IJ28890 | Critical | AFM metadata prefetch does not handle hardlinks. (show details) | 5.0.5.4 | AFM |
IJ28897 | Suggested | When using --skip-inode-check option of offline fsck, it reports false positive extendedAcl corruption. (show details) | 5.0.5.4 | FSCK |
IJ29182 | Critical | The --metadata-only option hit the assert Assert exp(!"Assert on Structure Error") in prefetch. (show details) | 5.0.5.4 | AFM, AFM DR |
IJ29210 | High Importance | mmvdisk recovery group fails when creating log vdisks when creating a new recovery group in a cluster with preexisting recovery groups. An error message "Disk XXX is already registered for use by GPFS" will appear on the command console, and the recovery group creation will fail. Once the problem condition is hit, IBM support must be contacted to correct the conflicting cluster information. (show details) | 5.0.5.4 | Admin commands, ESS, GNR |
IJ29313 | Suggested | Running prefetch stats is failing with err 22. (show details) | 5.0.5.4 | AFM |
IJ29337 | Critical | GPFS maintains EA (Extended Attribute) registry to verify the EA priority. Due to incorrect EA registry addition without SG format version check, policy and inode scans might fail in the mixed node cluster environment. This problem could occur while running policy or inode scans in a mixed node environment running with 5.0.5.2, 5.0.5.3, and 5.1.0.0 and other old version nodes as the file system manager. (show details) | 5.0.5.4 | AFM, Core GPFS |
IJ28321 | Critical | logAssertFailed:isValidSocket(sock) line 2661 of file thread.C (show details) | 5.0.5.3 | Core GPFS |
IJ28305 | HIPER | When the fileset is in the stopped state (mmafmctl device stop -j fileset), modifying the file's metadata such as owner permissions and file times may not be replicated to the home/secondary site after the fileset is restarted (mmafmctl device start -j fileset). (show details) | 5.0.5.3 | AFM, AFM DR |
IJ28314 | Suggested | In a distributed IBM Spectrum Scale environment in the presence of repetitive node failures can result in the declustered array becoming stuck in-transition with long waiters. Long waiters may occur, and file system operations may become stalled. (show details) | 5.0.5.3 | GNR |
IJ28316 | Critical | AFM list file prefetch hangs. (show details) | 5.0.5.3 | AFM |
IJ28162 | HIPER | AFM replicating extended attributes and ACLs can cause resync/recovery performance issues. (show details) | 5.0.5.3 | AFM, AFM DR |
IJ28161 | Suggested | Changing the file system name via mmchfs -W option does not work if the file system manager is on another node that is not a Linux node. This is due to the new device name not being created in /dev at the time the daemon open the file system with the new device name. (show details) | 5.0.5.3 | Core GPFS |
IJ27709 | High Importance | When a node becomes a quorum node (via mmchnode command) and this node was a quorum node sometime in the past and a mmfsd/mmsdrserv restart has not occurred on this node since; In that case, the mmchnode command will fail with output like "initialize (1, 'node-11', ('192.168.1.1', 1191)) failed (err 73) mmchnode: 6027-1639 Command failed. Examine previous error messages to determine cause." Also an assertion of the following type occurs on the node which should become a quorum node (GPFS log): "ccrmmfsd.C:806: assertion 'nodeId == ccrNodeId'" (show details) | 5.0.5.3 | mmchnode, CCR |
IJ27801 | Suggested | The GPFS kernel module exports an ioctl interface used by the mmfsd daemon and some of the mm* commands. The provided improvements result in a more robust functionality of the kernel module. (show details) | 5.0.5.3 | Core GPFS |
IJ27929 | High Importance | When recycling the recovery group nodes, access to the recovery groups can be lost, even if the nodes are recycled one at a time. (show details) | 5.0.5.3 | GNR |
IJ27922 | Medium Importance | A race condition within GNR fast write logging mechanism could lead to a thread incorrectly considering itself as the candidate to start zone flushing operations, while there are still other threads that have not yet completed the fast (short) writing operation yet in this zone. The assert stopped the flushing operation. (show details) | 5.0.5.3 | GNR, ECE, ESS 3000 |
IJ27711 | High Importance | When a metanode receives many RPC requests from the clients nodes, it is possible for mutex contention to occur which in turn can lead to high CPU usage. This could happen when many nodes share access to the same file/directory such as the root directory of the file system. (show details) | 5.0.5.3 | Core GPFS |
IJ25651 | High Importance | GPFS shuts down with the following message "logAssertFailed: holdCount > 0" found in the GPFS log file (show details) | 5.0.5.3 | Core GPFS |
IJ26952 | High Importance | logAssertFailed: isValidSocket(sock): This assertion goes off when the sock file descriptor is too large (show details) | 5.0.5.3 | Core GPFS |
IJ26702 | Suggested | mmauth grant sets the local file system (which is remote for cache) access to Read-only access. So when you access the fileset over the control file, it returns a E_ROFS error from remote file system and it logs irrelevant information. (show details) | 5.0.5.3 | AFM |
IJ27144 | High Importance | When reconfiguring Object protocol authentication using the mmuserauth command, the command may occasionally hang while waiting for systemctl to shutdown a service. Looking at the process table may show systemctl waiting for the child process "pkttyagent" to complete. (show details) | 5.0.5.3 | Object |
IJ26697 | High Importance | When multiple applications on a single node perform readdir and lookups on the same directory in a loop, it could lead to token starvation on other nodes trying to perform rename/create operation on the same directory. This will show up as slow application performance on affected nodes. (show details) | 5.0.5.3 | Core GPFS |
IJ26830 | Critical | If a file is unlinked after opening it, fallocate(2) on that fd will fail with ENOENT. (show details) | 5.0.5.3 | Core GPFS |
IJ27150 | Suggested | Disable enableIpv6 by using mmchconfig enableIpv6=no does not work. The command treats the no value like yes value. (show details) | 5.0.5.3 | Admin commands |
IJ27241 | High Importance | On a Linux node defined to have the GPFS node role of snmp_collector, the GPFS control scripts might not properly determine if the GPFS Net-SNMP subagent is running. Consequently, multiple GPFS Net-SNMP subagent processes might be running at the same time. This might affect the ability of the snmp_collector node to respond to SNMP queries or send SNMP traps. Recurring error messages might be found in GPFS log files showing a GPFS Net-SNMP subagent is unable to register with the Net-SNMP agent (snmpd). (show details) | 5.0.5.3 | SNMP, Admin commands |
IJ27249 | Suggested | The ECE disk inventory utility to list out all the disk slots in the system could hit an exception when one of the LSI adapters doesn't have a disk. This may result in the missing slot locations for the pdisks. The slot location is important to identify the disk drive in disk replacement, so it may cause the disk replacement to fail. If multiple disks fail without being replaced, it may risk the data reliability of the ECE storage system. (show details) | 5.0.5.3 | GNR, ECE |
IJ26972 | Suggested | Grafana bridge returns "- ERROR - Metric mynode|GPFSNSDFS|exfld|gpfs_nsdfs_bytes_read cannot be found. Please check if the corresponding sensor is configured - " (show details) | 5.0.5.3 | Core GPFS |
IJ27264 | High Importance | The check for mismatched NSDs in a storage pool may fail if the regular NSD was created with a pool designation different than the one used in the mmadddisk stanza file. (show details) | 5.0.5.3 | GNR, ESS |
IJ27236 | High Importance | In certain conditions, the monitoring code will not work properly resulting in an erroneous state being shown. (show details) | 5.0.5.3 | System health |
IJ27285 | Suggested | Remote mount is not responsive and the control file failed(-1) to set. Due to this, it is returning E_STALE and the lookup gets requeued with E_RESTART every time. (show details) | 5.0.5.3 | AFM |
IJ27339 | Suggested | Notification messages for TLS socket disconnect as a result of a peer's idle disconnect were printed in the mmfs log file and the sys log, Creating confusion whether there is a real problem or not. (show details) | 5.0.5.3 | Authentication |
IJ27366 | Critical | With parallel IO enabled, it is possible that all the afmMaxWorkerThreads are used and there are no threads to handle the parallel IO responses. (show details) | 5.0.5.3 | AFM, AFM DR |
IJ27375 | High Importance | Windows offline bit is set on the directory after AFM replication. (show details) | 5.0.5.3 | AFM |
IJ27003 | Critical | AFM sets incorrect creation time on symlinks during the migration. (show details) | 5.0.5.3 | AFM |
IJ27038 | Critical | AFM incorrectly sets secondary mode fileset permissions to primary mode fileset permissions during a resync operation. (show details) | 5.0.5.3 | AFM |
IJ27646 | Suggested | mmadquery fails with error - size limit exceeded (show details) | 5.0.5.3 | Authentication |
IJ27683 | Critical | With parallel IO enabled, the PIO response handler thread on MDS might deadlock due to an EIO error from the helper gateway node. (show details) | 5.0.5.3 | AFM, AFM DR |
IJ16663 | High Importance | When multiple applicatons on a single node perform readdir and lookups on the same directory in a loop, it could lead token starvation on other nodes trying to operate on the same directory. This will result in as slow application performance on affected nodes. (show details) | 5.0.5.2 | Core GPFS |
IJ22152 | Suggested | There is no automatic method to generate the GPL installable package for customized RHEL release. (show details) | 5.0.5.2 | Core GPFS |
IJ26349 | Critical | For a DMAPI enabled file system, migrating files into an external storage pool may cause problems with snapshot files. In some cases, it might assert "okToIncreaseIndLevel". In another case when DMAPI is not enabled, adding extended attributes to a sparse file may trigger the same assert. (show details) | 5.0.5.2 | Snapshots, DMAPI |
IJ24387 | Suggested | For mmces related log files, set mode bits to 622. (show details) | 5.0.5.2 | CES |
IJ25803 | Critical | open(2) with O_NOATIME flag may not work as expected. (show details) | 5.0.5.2 | Core GPFS |
IJ26355 | Suggested | Inode operations ->set_acl and ->get_acl are not supported in GPFS, and starting with kernel v3.14 some commands such as nfs4_setfacl may fail. (show details) | 5.0.5.2 | Core GPFS |
IJ26510 | Critical | GPFS daemon crashes and file system gets unmounted. The GPFS daemon crashes because of daemon code hit an assert that indicates "the allocated disk address is not expected". (show details) | 5.0.5.2 | Core GPFS |
IJ26341 | High Importance | GPFS Access Control Lists (ACL) can only store limited types of Access Control Entries (ACE), specifically plain Access-Allowed-ACE, Access-Denied-ACE, System-Audit-ACE, and System-Alarm-ACE. GPFS does not support storing of any of the Object-specific-ACEs corresponding to ACL_REVISION_DS. An attempt to set an ACL (containing the unsupported ACE types, such as the Object-specific-ACEs), can result in a kernel bugcheck. (show details) | 5.0.5.2 | Windows, ACLs |
IJ26342 | Critical | If incorrect arguments are passed to tschcarrier, a cleanup routine trys to remove some in-memory objects which are not created and this leads to a segmentation fault. (show details) | 5.0.5.2 | GNR |
IJ26356 | Suggested | When a data management application receives a DMAPI postrename event, it fails to get the file handle for the renamed file with a "no such file" error. This is because Spectrum Scale is delivering a DMAPI postrename event before the Linux kernel updates its directory lookup cache for the file being renamed. (show details) | 5.0.5.2 | DMAPI |
IJ26022 | High Importance | After upgrading the Spectrum Scale version to 5.0.4.4 or later, hit this assert (logAssertFailed: status == ASumWriting) on the upgraded node when it takes over the file system manager role. (show details) | 5.0.5.2 | Core GPFS |
IJ26358 | Suggested | Eviction is failing to execute if there is a space character in the path name. (show details) | 5.0.5.2 | AFM |
IJ26423 | High Importance | When adding a vdisk set which contains multiple node classes to a file system, some node classes may be omitted. (show details) | 5.0.5.2 | GNR |
IJ26438 | High Importance | For pre-4.1 file systems, after the mmquotaoff command deactivates user/group/fileset quota, the old quota file will be deinstalled and converted to a normal file. If the system pool cannot contain the data, the old quota file will need to be moved from the system pool to a data pool. If the file system has DMAPI enabled (-z yes), the deinstallation process will encounter assert exp(context != unknownOp) in moveDataBetweenPools. (show details) | 5.0.5.2 | Quotas |
IJ26436 | Suggested | The Spectrum Scale mmbackup command translates include/exclude options into policy rules for backup candidate file lists. If the path name specified in the include/exclude option contains any white space, mmbackup translates it incorrectly because space is used as the default delimiter in "dsmc query inclexcl" output. (show details) | 5.0.5.2 | mmbackup |
IJ26348 | High Importance | A race condition between the RG master resign/recovery and the mdi operation on the worker side could lead to a bug in the RG master recovery and cause the the working index (WI) entry to be stuck in the Assigned state. This would further cause the integrity manager thread to block and cause long waiter "wait for working index entry to be committed" on the RG master node. Under this state, it could lead to data integrity issues when the next RG master resign/recovery event occurs. Note that this could only happen in ECE/ESS 3000 environment, and not in legacy ESS environment. (show details) | 5.0.5.2 | GNR, ECE, ESS 3000 |
IJ26789 | Critical | This is a deadlock issue between the file deletion thread and another thread that is finding the inode from the Linux kernel inode hash list while holding the scale file lock. The other thread is mostly doing DMAPI calls. (show details) | 5.0.5.2 | DMAPI |
IJ26835 | High Importance | Calling dm_path_to_handle API from DMAPI RENAME event handler could run into a deadlock on the i_mutex lock that is obtained when the file's parent is renamed on V3.0.x Linux kernel versions. (show details) | 5.0.5.2 | DMAPI |
IJ26679 | Suggested | Rare deadlock causing mmrestripefs to hang under high load (show details) | 5.0.5.2 | Policy, ILM restripe, Rebalance |
IJ25547 | Critical | GPFS daemon crashes, file system gets unmounted on this node. The GPFS daemon crashes because of daemon code hit an assert which indicates the allocated disk address is not expected. (show details) | 5.0.5.2 | Core GPFS |
IJ25843 | High Importance | If a multi-threaded program reads or writes to a file in regular or mmap mixed mode, it may assert with "logAssertFailed: TokenPermits(get_token_mode(), lm_want)" (show details) | 5.0.5.2 | Core GPFS |
IJ26434 | High Importance | The EA overflow block is a metadata block that should be read using a continuous buffer, but due to a code error, it is considered to be a data block, so a scatter buffer is used which causes a log assert failure. (show details) | 5.0.5.2 | Core GPFS |
IJ26921 | Critical | When deleting an old snapshot or accessing snapshot files from an old snapshot, the operations could run into a out-of-stack space error if a snapshot file contains an overflow block. This causes the mmfsd process to crash with a memory fault error. (show details) | 5.0.5.2 | Snapshots |
IJ25905 | High Importance | When a new fileset is created, the in-doubt quota shares for that fileset should start out at 0. In some situations, a fileset can be created and start out with a non-zero in-doubt value. This can occur when a fileset had been deleted previously, and then the new fileset re-uses the same ID that the deleted fileset had used. (show details) | 5.0.5.2 | Quotas |
IJ26301 | High Importance | mmadddisk hits assert exp(isStoragePoolIdValid(poolId)) when trying to open a disk due to stripe group descriptor update. (show details) | 5.0.5.2 | Core GPFS |
IJ25906 | Critical | After an upgrade from 5.0.4.3 to 5.0.5.0, none of the CES IP addresses can be assigned. (show details) | 5.0.5.2 | CES |
IJ34824 | High Importance | The Ganesha process crashes or the Ganesha work pool threads hang. The crash occurs when nfs4_acl_release_entry() calls hashtable_getlatch(). See the Symptom section of this APAR to view the stack traces for both scenarios. (show details) | 5.0.5.2 | NFS |
IJ25321 | High Importance | Signal 11 occurs while mmdiag --threads is running. For example: [E] Signal 11 at location 0x55F6FF53A47D in process 7840 (show details) | 5.0.5.1 | Core GPFS |
IJ25146 | High Importance | A race condition between disks having errors and recovery groups or log groups resigning could lead to a bug in the GNR log: 'vtrack recovery failed to scrub and repair the stale data on the disk'. It can further lead to data corruption if all good copies of the mirrored data are lost. (show details) | 5.0.5.1 | GNR |
IJ25463 | High Importance | GPFS daemon assert: exp(this->mutexMagic == MUTEX_MAGIC_VALID) dSynch.C. This could occur during file system unmount. (show details) | 5.0.5.1 | Core GPFS |
IJ25468 | High Importance | GPFS daemon assert: ofP->mnodeStatusIs(0x4) || ofP->mnodeStatusIs(0x2) && indAccessLock.isLockedExclusive() in sync.C (show details) | 5.0.5.1 | Core GPFS |
IJ25714 | High Importance | Potential deadlock when a file is accessed concurrently via mmap and regular file access methods. (show details) | 5.0.5.1 | Core GPFS |
IJ25469 | Suggested | GPFS daemon failed to start with "Cannot allocate memory" error when prefetchThreads is set to less than 3. (show details) | 5.0.5.1 | Core GPFS |
IJ25478 | High Importance | GPFS daemon assert: !"Log file migrate check failed: need" in sgmdata.C. This could happen during mmrestripefs/mmdeldisk/mmrpldisk command. (show details) | 5.0.5.1 | Core GPFS |
IJ25511 | High Importance | Under rare circumstances, all quorum nodes could be expelled in case the current cluster manager is expelled due to an error on network level on the current cluster manager node, which results in a cluster wide quorum loss. This applies only in the case RDMA has been activated and all GPFS RPCs are going over RDMA (verbsPorts,verbsRdma and verbsRdmaSend must be set). The current cluster manager will be expelled due to the network error (as expected) and the new elected cluster manager cannot make progress during its following group protocol, because it waits for a 10 seconds linger timeout down in the CCR, when a cached socket connection to the former cluster manager gets closed. This way all quorum nodes will be expelled. (show details) | 5.0.5.1 | RDMA, Cluster Membership, Cluster Manager |
IJ22372 | High Importance | If the gpfsready script fails during GPFS startup, GPFS goes down automatically. This causes GPFS shutdown to hang for 5 minutes. (show details) | 5.0.5.1 | mmfsd startup |
IJ25555 | Suggested | Input file with carriage return causes mmfileid to fail with arithmetic syntax error. (show details) | 5.0.5.1 | Core GPFS |
IJ24043 | High Importance | The mmunlinkfileset command hangs and long waiter "waiting to quiesce" appears. A thread is hung and waiting inside the gpfs_s_delete_inode kernel extension routine. (show details) | 5.0.5.1 | Filesets, Snapshots |
IJ25557 | High Importance | fallocate(2) will set the wrong file size if the file has fragments and the end position of the fallocate range fits in the last block. (show details) | 5.0.5.1 | Core GPFS |
IJ25587 | Suggested | File system could panic with an error code 2 during the unmount process. This could happen if mmdelsnapshot command is running at the time of the unmount. (show details) | 5.0.5.1 | Snapshots |
IJ24352 | Suggested | LogAssert in file vnodeops.C (oldCount >= 0) (show details) | 5.0.5.1 | Snapshots |
IJ25591 | Suggested | The time of the certificateExpiration field of mmkeyserv -Y output is not correct. (show details) | 5.0.5.1 | Admin commands |
IJ24499 | High Importance | When attempting to move or rename a file from a source to a destination, and the destination file already exists and has permissions set such that its deletion is not allowed, the move/rename operation wrongly ends up overwriting the destination file. (show details) | 5.0.5.1 | ACLs |
IJ24742 | Suggested | If root squash is enabled on a remote file system, the root user is remapped to a user specified UID and AFM fails to access the remote file system. (show details) | 5.0.5.1 | AFM, AFM DR |
IJ25388 | Suggested | A configuration variable "sharedTmpDir" is to instruct mmapplypolicy where to create temporary files shared among nodes during a policy scan. The internal configuration variable table missed containing this variable, which caused an error during GPFS daemon initialization but it does not result in any functional problem. (show details) | 5.0.5.1 | Policy, ILM |
IJ25592 | High Importance | The timeout option provided by the mmkeyserv command allows a system admin to fine tune the communication with the key server by providing a timeout value. This timeout value does not apply to the communication with the key server occurring during the execution of the mmkeyserv command, creating the potential of the command returning timeout errors. (show details) | 5.0.5.1 | Admin commands |
IJ25600 | High Importance | The GPFS kernel module's cryptographic functionality used block chaining ciphers that were previously deprecated. Newer versions of the Linux distro removed the deprecated ciphers and, consequently, the block chaining ciphers are not available any longer. On those distros, the GPFS kernel module is updated to use symmetric key ciphers. (show details) | 5.0.5.1 | Encryption |
IJ25601 | Medium Importance | If a file is migrated to a cloud using TCT, accessing the file in snapshot will not show any contents. (show details) | 5.0.5.1 | TCT |
IJ25613 | Suggested | In the unlikely scenario that the GPFS configuration file (mmfs.cfg) becomes corrupted, the mmfsd daemon may be affected. (show details) | 5.0.5.1 | Core GPFS |
IJ25617 | Suggested | Even though the tenant contains no keys, it cannot be deleted when there are other clients registered to it. On the same cluster, the client should have already deregistered. So, it is likely that the registered client is from another cluster. (show details) | 5.0.5.1 | Admin commands, Encryption |
IJ24558 | Suggested | When a fileset is in the deleteRequired state, a blank character is missing between the "latest:" string and the snapshot name if it is too long, thus leading to parsing issues on the snapshot name from the output of mmlsfileset. (show details) | 5.0.5.1 | mmlsfileset command |
IJ25620 | High Importance | AFM doesn't allow tuning the AFM tunables per node in the cluster. All of them seem to be only for the whole cluster level. A few of them such as afmHardMemThreshold and afmMaxParallelRecoveries need to be tuned at each gateway node. (show details) | 5.0.5.1 | AFM |
IJ25624 | Suggested | When a GPFS command is being blocked by another command, an informational message will be displayed to remind the user of the blocked command that the command will be resumed after the running conflicted command completes. This is not true for some long running commands, like mmlssnapshot and mmdelsnapshot. (show details) | 5.0.5.1 | Conflicting GPFS commands |
IJ25369 | High Importance | Prefetch enhancements in 5.0.2 introduced a minor internal checksuch that if the list file is present in an NFS mount commonacross the application and gateway nodes, AFM skips copying this list file from app to the gateway node and uses the list fileas is on the gateway node. But sometimes the same path and file name can exist andhave 2 entirely unrelated files. (show details) | 5.0.5.1 | AFM |
IJ25652 | High Importance | Cannot commit some log records successfully. But it might have updated the VtrackMap. Due to the resign, update to the memory metadata may not have happened. In such a scenario, if another thread is trying to do a VtrackMap flush operation, it successfully writes the metadata block, then the metadata version updated by the VtrackMap entry and metadata block will be the same, although the log will contain the latest version of the record. (show details) | 5.0.5.1 | GNR |
IJ25656 | High Importance | AFM builds up temporary files in the /var/ directory for recovery procedures of the AFM fileset. These files are not deleted until the next recovery on the same fileset. (show details) | 5.0.5.1 | AFM |
IJ25578 | Suggested | mmces event list command does not accept all options described in the man page. (show details) | 5.0.5.1 | CES |
IJ25657 | High Importance | AFM is sending lookups on non existent or newly created files to the old system even though readdir was performed on the directory. This is causing too many lookups being sent to old system leading to performance degradtion. (show details) | 5.0.5.1 | AFM |
IJ25547 | Critical | GPFS daemon crashes and the file system gets unmounted. The GPFS daemon crashes because the daemon code hit an assert which indicates the disk address is not expected. (show details) | 5.0.5.1 | Core GPFS |
IJ25660 | Suggested | Applications that have lots of GPFS system calls may fail with a SIGSEGV. (show details) | 5.0.5.1 | GPFS system call library |
IJ25661 | High Importance | RPC message sending thread hung, such as the following: Waiting 34581.7807 sec since 10:04:23, monitored, thread 13813 EEWatchDogThread: on ThCond 0x3FFCDC00E418 (MsgRecordCondvar), reason 'RPC wait' (show details) |
5.0.5.1 | Core GPFS |
IJ25663 | Critical | A CES-IP was declared in /etc/hosts with just the IPand without a host name. This causes hangs in process. (show details) | 5.0.5.1 | CES |
IJ25664 | High Importance | In a rare case, mmap(2)/munmap(2) system call may block file system quiesce and cause quiesce timeout. (show details) | 5.0.5.1 | Filesets, Snapshots |
IJ25472 | Medium Importance | tsbuhelper checkprotection might not work correctly if the filename contains two or more spaces. (show details) | 5.0.5.1 | tsbuhelper checkprotection subcommand |
IJ25685 | High Importance | On systems that are booted in FIPS, the ssh client produces extra messages on stdout. The message "FIPS mode initialized" causes GPFS command to fail. GPFS requires that the shell command produces no extraneous messags. (show details) | 5.0.5.1 | Admin commands |
IJ25686 | Suggested | If files are already cached and not any file is queued for prefetch then it returns an error. (show details) | 5.0.5.1 | AFM |
IJ25579 | High Importance | Cygwin version 3.1.5 released on June 1, 2020, has changed its implementation of symlinks. Cygwin symlinks are now Windows reparse points instead of the older-style system file with header. Due to this change, GPFS on Windows fails to interpret the new Cygwin symlinks. This results in errors during the GPFS daemon startup, specifically in its attempt to load the authorized public key. (show details) | 5.0.5.1 | Windows |
IJ25687 | High Importance | When a snapshot gets deleted, some of the blocks of the snapshot are copied to the previous snapshot to maintain data and metadata consistency. If due to some reason when the deletion of the snapshot is interrupted, there can be a scenario where the blocks are copied to the previous snapshot but the block disk addresses are not yet removed from the snapshot being deleted, the status of such snapshots change to DeleteRequired. This condition is expected for DeletedRequired snapshots and will be handled properly when the deletion of the snapshot is retried. But if before deleting the DeleteRequired snapshot, if an offline fsck is run on the file system then fsck will falsely report such blocks as duplicate address between the DeleteRequired snapshot and its previous snapshots. (show details) | 5.0.5.1 | FSCK |
IJ25689 | Suggested | Fsck reports a bad DA in a to-be-deleted inode even though such inodes will be cleaned up during normal GPFS operations. (show details) | 5.0.5.1 | FSCK |
IJ25692 | Suggested | Running an administrative command within a few seconds after running a read-only offline fsck can lead to an assert. (show details) | 5.0.5.1 | FSCK |
IJ25698 | Suggested | Offline fsck reports false positive replica mismatch if the NSD goes down midway. (show details) | 5.0.5.1 | FSCK |
IJ25791 | High Importance | Due to a delayed file close in the VFS layer and a context mismatch, closing the file after the replication does not wait for the file system quiesce causing the remote log assert. (show details) | 5.0.5.1 | AFM, AFM DR |
IJ25792 | High Importance | AFM does not replicate the file the correct amount of times when the time is set using the gpfs_set_times_path and gpfs_set_times API. (show details) | 5.0.5.1 | AFM, AFM DR |
IJ25856 | Suggested | Offline fsck requires a certain amount of pagepool memory for it to run with a single inode scan pass. If the needed amount of pagepool memory is not available, it will display a warning message before starting the fsck scan indicating the number of inode scan passes it will take with the current available pagepool memory. It also shows the amount of pagepool memory it would need to run a complete single inode scan pass. For example, following is the message displayed by fsck if there is insufficient pagepool memory available for fsck to run with a single inode scan pass:
---------------- Available pagepool memory will require 3 inode scan passes by mmfsck.
To scan inodes in a single pass, total pagepool memory of 11767119872 bytes is needed.
The currently available total memory f or use by mmfsck is 8604614656 bytes. Continue fsck with multiple inode scan passes? n
---------------- Now the problem is that in some cases it will display an incorrect value of pagepool memory needed. Another side effect of this problem is that in some cases fsck might not show the above message and instead shows the below incorrect message:
--------------- There is not enough free memory available for use by mmfsck in |
5.0.5.1 | FSCK |
IJ25665 | HIPER | mmap may expand the file size to a page boundary if the file is a sparse file and the file size is not a multiple of page size. (show details) | 5.0.5.1 | Core GPFS |