Start Here When Things Go Wrong on Your Linux System

If you’ve run any operating system for any length of time, you will probably have encountered strange phenomena. When it comes to computers, strange is usually unwelcome. The longer you run any given OS installation without a reinstall, the more likely you are to see at least a few quirks. This can be anything from programs freezing, to your cooling fan suddenly revving up, to all manner of oddities.

For the commercial desktop OSes with massive install bases, it’s easy to find support in the form of official manufacturer (OEM) or OS developer troubleshooting and documentation pages. However, for Linux such resources aren’t always available. Even when they are, they don’t always issue consistent guidance from distribution to distribution and aren’t guaranteed to account for the user’s specific hardware.

In this piece I will offer up a few routes you can traverse to track down suspicious behavior on your Linux system. This sequence of diagnostics is neither definitive nor rigid. I don’t claim to know everything you should do to figure out what’s gone awry on your Linux system, and even if I did it would make for an epic poem of an article.

It’s quite possible that not every procedure is applicable to the problem at hand. My aim, though, is to put forward a good number of tests that should at least give you a place to start. Conveniently, these will (with one exception), serve you well on desktop or server Linux, as they utilize command line tools.

What follows will proceed in order of high to low layer of abstraction, namely from the application level down to the OS level. Without further ado, let’s get digging.

Browser Off Task? Open Its Task Manager

Browsers have become so robust and central to the desktop computing experience that they now have their own OS-style process manager. These tools allow users to see what open web connections are utilizing system resources, and how much so.

If your web browser is the main program running on your computer when resource spikes or slowdowns crop up, the process manager is an invaluable resource. It provides a clearer picture than your OS process manager because the browser process manager is aware of which of its constituent processes are driven by which web pages.

Every browser has its own way of getting to its task manager. In Firefox and Chrome, you can access the task manager from their respective upper-right menus. Chromium and close derivatives (like Chrome) also offer the option of hitting the Shift + Escape keys to access the tool. Once you have the task manager open, you can sort processes by CPU or memory usage to determine what’s hogging either one. Finally, you can kill off a browser process that tries to cling onto your computer’s hardware.

Take It From the ‘top’

If your browser isn’t the star of the show, you will probably want to see all the processes your system is juggling. The best way to do that is to open your terminal and use the top command. Essentially, it’s a task manager for Unix-like systems (like Linux). With it, you can view the CPU usage, memory usage, and much more for every active process. As you’d expect, you can sort by these statistics, too. Any out-of-control processes can be killed right from top.

But if you think top is your average task manager, think again. You can sort by any available metric, including running time and “niceness” (basically process priority). Oh yeah, there’s process priority. You can also choose to display processes as a tree, indicating which processes begot others. Best of all, you can search for any text sequence, a feature sorely lacking in many competing OSes’ task managers.

Overview of Open Files

If you suspect the problem isn’t CPU or memory consumption but unusual disk I/O, pull out lsof. It’s a tool I both love and don’t use anywhere near enough. This CLI command lists all the files that are currently open. In other words, it lets users review all files being read from or written to.

The lsof command has powerful options too numerous to cover in detail, for limiting the types of files to be outputted. One of my favorites is the “-u” flag for filtering to or excluding files by the user accessing them. If you have a lineup of shady processes (perhaps from top), you can use the “-p” flag to look up only those processes (by PID) to see the files it’s working on.

My favorite way of making short work of lsof’s output is to pipe it into grep and see what I can find. This way, I can search for any pattern present, whether that’s user, path, or anything else I can think of.

Don’t Mind if I Decode

Looking for the birds-eye view of all the hardware on your system? Look no further than dmidecode.

Executing dmidecode in the shell with superuser privileges will print a summary of your system hardware, listing the make, model, and modes of the equipment that your OS sits on top of. This is especially helpful if you’re using a more DIY flavor of Linux, or trying to get uncommon hardware to be functional.

For instance, if you need to install a nonstandard kernel module, running dmidecode will inform you what device the system detects, and thus what module you’ll need to add.

Linux Isn’t a Destination, It’s a Process

If things are starting to get really hairy, you can start digging into your system’s lower-level operation.

First on our deep dive is the /proc directory. Unlike typical directories that persist with static contents between boots, /proc gets dynamically populated with information read from the kernel and hardware on boot, continuously updated during operation, and whisked out of existence on shutdown. As everything here is treated as a file, all users need to do is read the files to see what was written to them.

I could definitely stand to get better acquainted with what’s here, but some poking around yielded interesting finds. For instance, you can see the mount options for all your physical disks. You can also get counts for failed kernel operations like hangs and panics. You can even peruse all the hardware drivers loaded at boot.

To give a more concrete example, I could see myself dumping out /proc/scsi/device_info to check why an inserted SCSI interface wasn’t being detected. You might have to get a bit creative with /proc, but it won’t disappoint if you do.

Get ‘dmesg’

Speaking of the kernel, you can discover exactly what it’s been up to by running dmesg with superuser authority. This outputs the kernel log to your console in chronological order from boot. If the kernel ever tried to work with some hardware and came up short, it will journal its rebuffed advance here.

While you likely won’t need to resort to dmesg often, it’s a command every Linux user needs to know purely because of how quickly it lets you get to the bottom of hardware problems. It’s the command that forum denizens expect you to run so they can get what they need to know which direction to point you.

Linux is loaded with all sorts of great system diagnostic tooling, but when something on your system goes wrong, you probably won’t with the ones above.

Jonathan Terrasi has been an ECT News Network columnist since 2017. In addition to his work as a freelance writer, he is a full-time computer science educator and IT decision-maker. His main interests are information security, with a focus on Linux desktops, and the influence of technology trends on current events. His background also includes providing technical commentary and analysis for the Chicago Committee to Defend the Bill of Rights.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Related Stories
More by Jonathan Terrasi
More in Software

TechNewsWorld Channels