Chapter 7 Basic Shell Commonds

7.1 Introduction

A shell is a computer program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interface (GUI) with a mouse/keyboard combination.

There are many reasons to learn about the shell:

  • Many bioinformatics tools can only be used through a command line interface, or have extra capabilities in the command line version that are not available in GUI.

  • The shell makes your work less error-prone and more reproducible.

  • Many bioinformatics tasks require large amounts of computing power and cannot realistically be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.

7.2 Usage

7.2.1 Manipulating files and directories

pwd: print working directory

ls: list files and sub-directories in this directory

cd: change directory. The parent of a directory is the directory above it. You can use cd .. to move to the directory above. ~ (the tilde character), which means “your home directory,” therefore cd ~ will always take you home.

cp: copy files. For instance, the following script copies the two files from seasonal directory to backup directory.

cp seasonal/autumn.csv seasonal/winter.csv backup

mv: move and/or rename files. One warning: just like cp, mv will overwrite existing files with the same names.

rm: delete files

mkdir: make directories

rmdir: delete empty directories

7.2.2 Manipulating data

cat: view a file’s content

less: to page the output, one page is displayed at a time; you can press space bar to page down or type q to quit. If you give less the names of several files, you can type :n to move to the next file, :p to go back to the previous one, or :q to quit.

head: print top 10 line of a file. You won’t always want to look at the first 10 lines of a file, so the shell lets you change head’s behavior by giving it a command-line flag, for instance, head -5 hi.csv print top 5 lines of hi.csv file. A flag’s name usually indicates its purpose (-n is meant to signal “number of lines”). Note: it’s considered good style to put all flags before any file names.

cut: elect columns from a file. For instance,

cut -f 2-5,8 -d , values.csv

which means “select columns 2 through 5 and columns 8, using comma as the separator.” cut uses -f (meaning “fields”) to specify columns and -d (meaning “delimiter”) to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns. Note: cut is a simple-minded command. In particular, it doesn’t understand quoted strings.

grep: select lines containing specific values. For example, grep bicuspid seasonal/winter.csv prints lines from winter.csv that contain “bicuspid.”

Some common flags of grep:

  • -c: print a count of matching lines rather than the lines themselves
  • -h: do not print the names of files when searching multiple files
  • -i: ignore case (e.g., treat “Regression” and “regression” as matches)
  • -l: print the names of files that contain matches, not the matches
  • -n: print line numbers for matching lines
  • -v: invert the match, i.e., only show lines that don’t match