Filesystem troubleshooting

Introduction

This guide will help diagnose filesystem problems one may come across on a GNU/Linux system. New sections are still being added to this howto.

Basic filesystem checks and repairs

The most common method of checking filesystem's health is by running what's commonly known as the fsck utility. This tool should only be run against an unmounted filesystem to check for possible issues. Nearly all well established filesystem types have their fsck tool. e.g.: ext2/3/4 filesystems have the e2fsck tool. Most notable exception until very recently was btrfs. There are also filesystems that do not need a filesystem check tool i.e.: read-only filesystems like iso9660 and udf.

e2fsprogs - ext2, ext3, ext4 filesystems

Ext2/3/4 have the previously mentioned e2fsck tool for checking and repairing filesystem. This is a part of e2fsprogs package - the package needs to be installed to have the fsck tool available. Unless one removes it in aptitude during installation, it should already be installed.

There are 4 ways the fsck tool usually gets run (listed in order of frequency of occurrence):

  1. it runs automatically during computer bootup every X days or Y mounts (whichever comes first). This is determined during the creation of the filesystem and can later be adjusted using tune2fs.
  2. it runs automatically if a filesystem has not been cleanly unmounted (e.g.: powercut)
  3. user runs it against an unmounted filesystem
  4. user makes it run at next bootup

case 1

When filesystem check is run automatically X days after the last check or after Y mounts, Ubuntu gives user the option to interrupt the check and continue bootup normally. It is recommended that user lets it finish the check.

case 2

If a filesystem has not been cleanly unmounted, the system detects a dirty bit on the filesystem during the next bootup and starts a check. It is strongly recommended that one lets it finish. It is almost certain there are errors on the filesystem that fsck will detect and attempt to fix. Nevertheless, one can still interrupt the check and let the system boot up on a possibly corrupted filesystem.

2 things can go wrong

  1. fsck dies - If fsck dies for whatever reason, you have the option to press ^D (Ctrl + D) to continue with an unchecked filesystem or run fsck manually. See e2fsck cheatsheet for details how.

  2. fsck fails to fix all errors with default settings - If fsck fails to fix all errors with default settings, it will ask to be run manually by the user. See e2fsck cheatsheet for details how.

case 3

User may run fsck against any filesystem that can be unmounted on a running system. e.g. if you can issue umount /dev/sda3 without an error, you can run fsck against /dev/sda3.

case 4

You can make your system run fsck by creating an empty 'forcefsck' file in the root of your root filesystem. i.e.: touch /forcefsck Filesystems that have 0 or nothing specified in the sixth column of your /etc/fstab, will not be checked

Till Ubuntu 6.06 you can also issue shutdown -rF now to reboot your filesystem and check all partitions with non-zero value in sixth column of your /etc/fstab. Later versions of Ubuntu use Upstart version of shutdown which does not support the -F option any more.

Refer to man fstab for what values are allowed.

e2fsck cheatsheet

e2fsck has softlinks in /sbin that one can use to keep the names of fsck tools more uniform. i.e. fsck.ext2, fsck.ext3 and fsck.ext4 (similarly, other filesystem types have e.g.: fsck.ntfs) This cheatsheet will make use of these softlinks and will use ext4 and /dev/sda1 as an example.

  • fsck.ext4 -p /dev/sda1 - will check filesystem on /dev/sda1 partition. It will also automatically fix all problems that can be fixed without human intervention. It will do nothing, if the partition is deemed clean (no dirty bit set).

  • fsck.ext4 -p -f /dev/sda1 - same as before, but fsck will ignore the fact that the filesystem is clean and check+fix it nevertheless.

  • fsck.ext4 -p -f -C0 /dev/sda1 - same as before, but with a progress bar.

  • fsck.ext4 -f -y /dev/sda1 - whereas previously fsck would ask for user input before fixing any nontrivial problems, -y means that it will simply assume you want to answer "YES" to all its suggestions, thus making the check completely non-interactive. This is potentially dangerous but sometimes unavoidable; especially when one has to go through thousands of errors. It is recommended that (if you can) you back up your partition before you have to run this kind of check. (see dd command for backing up filesystems/partitions/volumes)

  • fsck.ext4 -f -c -C0 /dev/sda1 - will attempt to find bad blocks on the device and make those blocks unusable by new files and directories.

  • fsck.ext4 -f -cc -C0 /dev/sda1 - a more thorough version of the bad blocks check.

  • fsck.ext4 -n -f -C0 /dev/sda1 - the -n option allows you to run fsck against a mounted filesystem in a read-only mode. This is almost completely pointless and will often result in false alarms. Do not use.

dosfstools - FAT12, FAT16 and FAT32 (vfat) filesystem

In order to create and check/repair these Microsoft(TM)'s filesystems, dosfstools package needs to be installed. Similarly to ext filesystems' tools, dosfsck has softlinks too - fsck.msdos and fsck.vfat. Options, however, vary slightly.

dosfsck cheatsheet

These examples will use FAT32 and /dev/sdc1

  • fsck.vfat -n /dev/sdc1 - a simple non-interactive read-only check

  • fsck.vfat -a /dev/sdc1 - checks the file system and fixes non-interactively. Least destructive approach is always used.

  • fsck.vfat -r /dev/sdc1 - interactive repair. User is always prompted when there is more than a single approach to fixing a problem.

  • fsck.vfat -l -v -a -t /dev/sdc1 - a very verbose way of checking and repairing the filesystem non-interactively. The -t parameter will mark unreadable clusters as bad, thus making them unavailable to newly created files and directories.

Recovered data will be dumped in the root of the filesystem as fsck0000.rec, fsck0001.rec, etc. This is similar to CHK files created by scandisk and chkdisk on MS Windows.

ntfs-3g (previously also ntfsprogs) - NTFS filesystem

Due to the closed sourced nature of this filesystem and its complexity, there is no fsck.ntfs available on GNU/Linux (ntfsck isn't being developed anymore). There is a simple tool called ntfsfix included in ntfs-3g package. Its focus isn't on fixing NTFS volumes that have been seriously corrupted; its sole purpose seems to be making an NTFS volume mountable under GNU/Linux.

Normally, NTFS volumes are non-mountable if their dirty bit is set. ntfsfix can help with that by clearing trying to fix the most basic NTFS problems:

  • ntfsfix /dev/sda1 - will attempt to fix basic NTFS problems. e.g.: detects and fixes a Windows XP bug, leading to a corrupt MFT; clears bad cluster marks; fixes boot sector problems

  • ntfsfix -d /dev/sda1 - will clear the dirty bit on an NTFS volume.

  • ntfsfix -b /dev/sda1 - clears the list of bad sectors. This is useful after cloning an old disk with bad sectors to a new disk.

    Windows 8 and GNU/Linux cohabitation problems This segment is taken from http://www.tuxera.com/community/ntfs-3g-advanced/ When Windows 8 is restarted using its fast restarting feature, part of the metadata of all mounted partitions are restored to the state they were at the previous closing down. As a consequence, changes made on Linux may be lost. This can happen on any partition of an internal disk when leaving Windows 8 by selecting “Shut down” or “Hibernate”. Leaving Windows 8 by selecting “Restart” is apparently safe.

    To avoid any loss of data, be sure the fast restarting of Windows 8 is disabled. This can be achieved by issuing as an administrator the command : powercfg /h off

reiserfstools - reiserfs

Install reiserfstools package to have reiserfsck and a softlink fsck.reiserfs available. Reiserfsck is a very talkative tool that will let you know what to do should it find errors.

  • fsck.reiserfs /dev/sda1 - a readonly check of the filesystem, no changes made (same as running with --check). This is what you should run before you include any other options.

  • fsck.reiserfs --fix-fixable /dev/sda1 - does basic fixes but will not rebuild filesystem tree

  • fsck.reiserfs --scan-whole-partition --rebuild-tree /dev/sda1 - if basic check recommends running with --rebuild-tree, run it with --scan-whole-partition and do NOT interrupt it! This will take a long time. On a non-empty 1TB partition, expect something in the range of 10-24 hours.

xfsprogs - xfs

If a check is necessary, it is performed automatically at mount time. Because of this, fsck.xfs is just a dummy shell script that does absolutely nothing. If you want to check the filesystem consistency and/or repair it, you can do so using the xfs_repair tool.

  • xfs_repair -n /dev/sda - will only scan the volume and report what fixes are needed. This is the no modify mode and you should run this first.

    • xfs_repair will exit with exit status 0 if it found no errors and with exit status 1 if it found some. (You can check exit status with echo $?)

  • xfs_repair /dev/sda - will scan the volume and perform all fixes necessary. Large volumes take long to process.

XFS filesystem has a feature called allocation groups (AG) that enable it to use more parallelism when allocating blocks and inodes. AGs are more or less self contained parts of the filesystem (separate free space and inode management). mkfs.xfs creates only a single AG by default.

xfs_repair checks and fixes your filesystems by going through 7 phases. Phase 3 (inode discovery and checks) and Phase 4 (extent discovery and checking) work sequentially through filesystem's allocation groups (AG). With multiple AGs, this can be heavily parallelised. xfs_repair is clever enough to not process multiple AGs on same disks.

Do NOT bother with this if any of these is true for your system:

  • you created your XFS filesystem with only a single AG.
  • your xfs_repair is older than version 2.9.4 or you will make the checks even slower on GNU/Linux. You can check your version with xfs_repair -V

  • your filesystem does not span across multiple disks

otherwise:

  • xfs_repair -o ag_stride=8 -t 5 -v /dev/sda - same as previous example but reduces the check/fix time by utilising multiple threads, reports back on its progress every 5 minutes (default is 15) and its output is more verbose.

    • if your filesystem had 32 AGs, the -o ag_stride=8 would start 4 threads, one to process AGs 0-7, another for 8-15, etc... If ag_stride is not specified, it defaults to the number of AGs in the filesystem.

  • xfs_repair -o ag_stride=8 -t 5 -v -m 2048 /dev/sda - same as above but limits xfs_repair's memory usage to a maximum of 2048 megabytes. By default, it would use up to 75% of available ram. Please note, -o bhash=xxx has been superseded by the -m option

== jfsutils - jfs == == btrfs ==

Missing superblock

Bad blocks

Sources and further reading

FilesystemTroubleshooting (last edited 2014-10-14 09:14:04 by pabouk)