# 3.2 Searching and Extracting Data from Files

### **3.2 Searching and Extracting Data from Files**

**Weight:** 3

**Description:** Search and extract data from files in the home directory.

**Key Knowledge Areas:**

* Command line pipes
* I/O redirection
* Basic Regular Expressions using ., \[ ], \*, and ?

**The following is a partial list of the used files, terms and utilities:**

* grep
* less
* cat, head, tail
* sort
* cut
* wc

## cat <a href="#streams" id="streams"></a>

The `cat` command in Linux is one of the most frequently used commands in Unix-like operating systems. It stands for “concatenate” and is primarily used to read, display, and concatenate text files.<br>

* Primarily used to read and display the contents of files on the terminal.
* Can concatenate multiple files and display them as a single continuous output.
* Allows users to create new files or append data to existing ones.
* Useful for quick file inspection and merging without opening a text editor.

**View the Content of a Single File** using cat

The most basic use of 'cat' is to display the contents of a file on the terminal. This can be achieved by simply providing the filename as an argument:

**Syntax:**&#x20;

```
cat file_name
```

**Example:** If our file\_name = output.txt

```
cat output.txt
```

## streams <a href="#streams" id="streams"></a>

A stream is nothing more than a sequence of bytes that is passed from one file, device, or program to another.

In Linux, a stream is a fundamental concept for handling input, output, and communication between processes. At its core, a stream represents a sequence of bytes that can be read from or written to. Streams provide a uniform interface for data transfer and processing across various input/output operations.

These streams are:

* **standard input stream (stdin)**, which provides input to commands.
* **standard output stream (stdout)**, which displays output from commands.
* **standard error stream (stderr)**, which displays error output from commands.

The streams are also numbered: **stdin (0)** ,**stdout (1)**, **stderr (2)**.

<figure><img src="/files/NULgDQXIHq7GJBC3d3AZ" alt=""><figcaption></figcaption></figure>

## I/O Redirection

Linux includes redirection commands for each stream. These can be used to write standard output or standard error to a file. If you write to a file that does not exist, a new file with that name will be created prior to writing.

Commands with a single bracket *overwrite* the destination’s existing contents.

**Overwrite**

* **>** - standard output
* **<** - standard input
* **2>** - standard error

Commands with a double bracket *do not* overwrite the destination’s existing contents.

**Append**

* **>>** - standard output
* **<<** - standard input
* **2>>** - standard error

Examples:

```
[payam@earth Working]$ echo "hello"
hello
[payam@earth Working]$ echo "hello" > output.txt
[payam@earth Working]$ cat output.txt 
hello
[payam@earth Working]$ echo "how are you?" > output.txt 
[payam@earth Working]$ cat output.txt 
how are you?
[payam@earth Working]$ echo "I'm fine, thank you" >> output.txt 
[payam@earth Working]$ cat output.txt 
how are you?
I'm fine, thank you
[payam@earth Working]$ 
```

```
[payam@earth Working]$ cat Blahblah.txt
cat: Blahblah.txt: No such file or directory
[payam@earth Working]$ cat Blahblah.txt > result.txt
cat: Blahblah.txt: No such file or directory
[payam@earth Working]$ cat result.txt 
[payam@earth Working]$ cat Blahblah.txt > result.txt 2>error.txt
[payam@earth Working]$ cat error.txt 
cat: Blahblah.txt: No such file or directory
```

#### piping with | <a href="#piping-with-or" id="piping-with-or"></a>

A pipe is a form of redirection (transfer of standard output to some other destination) that is used in Linux and other Unix-like operating systems to send the output of one command/program/process to another command/program/process for further processing. The Unix/Linux systems allow the stdout of a command to be connected to the stdin of another command. You can make it do so by using the pipe character **'|'**. (found above the backslash `\` key on most keyboards). \
\
The pipe is used to combine two or more commands, and in this, the output of one command acts as input to another command, and this command's output may act as input to the next command, and so on. It can also be visualized as a temporary connection between two or more commands/ programs/ processes. The command line programs that do the further processing are referred to as filters. \
\
This direct connection between commands/ programs/ processes allows them to operate simultaneously and permits data to be transferred between them continuously rather than having to pass it through temporary text files or through the display screen. \
Pipes are unidirectional i.e., **data flows from left to right through the pipeline.**&#x20;

```
command1 | command2
```

Either command can have options or arguments. We can also use | to redirect the output of the second command in the pipeline to a third command, and so on.

```
command 1 | command 2 | command3 | command 4 | ...
```

{% hint style="success" %}

#### View Kernel Messages in Linux&#x20;

**dmesg** command also called “driver message” or “display message” is used to examine the kernel ring buffer and print the message buffer of the kernel. The output of this command contains the messages produced by the device drivers.

```
[payam@earth Working]$ dmesg 
[    0.000000] Linux version 5.14.0-611.9.1.el9_7.x86_64 (mockbuild@iad1-prod-build001.bld.equ.rockylinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), GNU ld version 2.35.2-67.el9) #1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025
[    0.000000] The list of certified hardware and cloud instances for Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/vmlinuz-5.14.0-611.9.1.el9_7.x86_64 root=/dev/mapper/vg--os-lv--root ro resume=/dev/mapper/vg--os-lv--swap rd.lvm.lv=vg-os/lv-root rd.lvm.lv=vg-os/lv-swap rhgb quiet crashkernel=1G-2G:192M,2G-64G:256M,64G-:512M
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000039f98fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000039f99000-0x000000003a898fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000003a899000-0x00000000434aefff] usable
[    0.000000] BIOS-e820: [mem 0x00000000434af000-0x00000000452fefff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000452ff000-0x0000000045b2efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000045b2f000-0x0000000045bfefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000045bff000-0x0000000045bfffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000045c00000-0x0000000049ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000004a200000-0x000000004a3fffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000004b000000-0x00000000503fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fe010000-0x00000000fe010fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed20000-0x00000000fed7ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000004afbfffff] usable

```

{% endhint %}

now lets redirct `dmesg` out put to  `less` command input :&#x20;

```
dmesg | less
```

```
[payam@earth Working]$ dmesg 
[    0.000000] Linux version 5.14.0-611.9.1.el9_7.x86_64 (mockbuild@iad1-prod-build001.bld.equ.rockylinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), GNU ld version 2.35.2-67.el9) #1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025
[    0.000000] The list of certified hardware and cloud instances for Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/vmlinuz-5.14.0-611.9.1.el9_7.x86_64 root=/dev/mapper/vg--os-lv--root ro resume=/dev/mapper/vg--os-lv--swap rd.lvm.lv=vg-os/lv-root rd.lvm.lv=vg-os/lv-swap rhgb quiet crashkernel=1G-2G:192M,2G-64G:256M,64G-:512M
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000039f98fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000039f99000-0x000000003a898fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000003a899000-0x00000000434aefff] usable
[    0.000000] BIOS-e820: [mem 0x00000000434af000-0x00000000452fefff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000452ff000-0x0000000045b2efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000045b2f000-0x0000000045bfefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000045bff000-0x0000000045bfffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000045c00000-0x0000000049ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000004a200000-0x000000004a3fffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000004b000000-0x00000000503fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fe010000-0x00000000fe010fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed20000-0x00000000fed7ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x00000004afbfffff] usable

```

this way we have more control over reading logs using `less` command options.

## Filters

*Filters* are are a class of programs that are commonly used with output piped from another program. Many of them are also useful on their own, but they illustrate piping behavior especially well.

* **grep** - returns text that matches the string pattern passed to grep.
* **head** - is used to display the first few lines of one or more text files directly in the terminal
* **tail**- is used to display the last part of a file, showing recent content such as logs or updates.
* **sort**- used to sort a file, arranging the records in a particular order.
* **wc** - counts characters, lines, and words.

### grep

Grep, short for “global regular expression print”,  is one of the most useful tools in Linux and Unix systems. It is used to search for specific words, phrases, or patterns inside text files, and shows the matching lines on your screen.

grep Command is useful when you need to quickly find certain keywords or phrases in logs or documents. Let’s consider an example:

#### Search for a word in a file <a href="#example-1-search-for-a-word-in-a-file" id="example-1-search-for-a-word-in-a-file"></a>

If you have a file called `notes.txt` and you want to find all lines containing the word Python, you can use:

```
grep "python" notes.txt
```

**Syntax** :

The basic syntax of the \`**`` grep` ``** command is as follows:

<pre><code><strong>grep [options] pattern [files]
</strong></code></pre>

* `[`**`options`**`]`: These are command-line flags that modify the behavior of `grep`.&#x20;
* `[`**`pattern`**`]`: This is the regular expression you want to search for.
* `[`**`file`**`]`: This is the name of the file(s) you want to search within. You can specify multiple files for simultaneous searching.

#### **Commonly Used `grep` Options** <a href="#commonly-used-grep-options" id="commonly-used-grep-options"></a>

| **Option** | **What It Does**                                                                                                                                                                      | **Example Command**                                                                                    |
| ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| **`-i`**   | **Case insensitive search**                                                                                                                                                           | `grep -i myfile.txt`                                                                                   |
| **`-c`**   | **Displaying the Count Matches**                                                                                                                                                      | `grep -c "unix" myfile.txt`                                                                            |
| **`-l`**   | **Display the Matching Filenames**                                                                                                                                                    | <p><code>grep -l "unix" \*</code>    </p><p><code>grep -l "unix" f1.txt f2.txt f3.xt f4.txt</code></p> |
| **`-w`**   | **Checking Whole Words :** By default, grep matches the given string/pattern even if it is found as a substring in a file. The -w option to grep makes it match only the whole words. | `grep -w "unix" myfile.txt`                                                                            |
| **`-o`**   | **Display Matched Pattern:** By default, grep displays the entire line which has the matched string. We can make the grep to display only the matched string by using the -o option   | `grep -o "unix" myfile.txt`                                                                            |
| **`-n`**   | **Show Line Numbers**                                                                                                                                                                 | `grep -n "unix" myfile.txt`                                                                            |
| **`-v`**   | I**nverting the Pattern Match:**  You can display the lines that are not matched with the specified search string pattern using the -v option.                                        | `grep -v "unix" myfile.txt`                                                                            |

### Regular Expressions

Regexps are acronyms for regular expressions(Regex). Regular expressions are special characters or sets of characters that help us to search for data and match the complex pattern. Regexps are most commonly used with the Linux commands:  `grep`, `sed, tr`, `vi.`

The following are some basic regular expressions:

| Symbol      | Description                                                                                        |
| ----------- | -------------------------------------------------------------------------------------------------- |
| .           | It is called a wild card character, It matches any one character other than the new line.          |
| ^           | It matches the start of the string.                                                                |
| $           | It matches the end of the string.                                                                  |
| \*          | It matches up to zero or more occurrences i.e. any number of times of the character of the string. |
| \\          | It is used for escape following character.                                                         |
| ()          | It is used to match or search for a set of regular expressions.                                    |
| ?           | It matches exactly one character in the string or stream.                                          |
| \[     ]    | Matches any one of a set characters                                                                |
| \[   -    ] | Matches any one of a range characters                                                              |

{% hint style="info" %}
**Globbing and Regex: So Similar, So Different**

Beginners sometimes tend to confuse **wildcards**(globbing) with **regular expressions** when using grep but they are not the same. **Wildcards** are a feature provided by the shell to expand file names whereas **regular expressions** are a text filtering mechanism intended for use with utilities like grep, sed and awk.
{% endhint %}

`grep` supports regex for advanced searching:

| Command                    | Description                             |
| -------------------------- | --------------------------------------- |
| `grep  "^unix" myfile.txt` | **Match Lines Starting with a  string** |
| `grep "os$" myfile.txt`    | **Match Lines Ending with a String**    |

> **double quotes " " :** Also we need to put our extended regex between double quotes, other wise it might be interpreted by shell and gives us different results.

In order to avoid any mistake while using extended regular expressions, use `grep` with `-E` option, `-E` treats pattern as an extended regular expression(ERE).

| regex                                           | match                  |
| ----------------------------------------------- | ---------------------- |
| echo "aa ab ba aaa bbb AB BA" \| grep -E "a\*b" | aa ab ba aaa bbb AB BA |
| echo "aa ab ba aaa bbb AB BA" \| grep -E "a.b"  | aa ab ba aaa bbb AB BA |
| echo "aa ab ba aaa bbb AB BA" \| grep -E "a?b"  | aa ab ba aaa bbb AB BA |

{% hint style="success" %}

### egrep <a href="#egrep" id="egrep"></a>

**egrep** is a pattern searching command which belongs to the family of grep functions. It works the same way as **`grep -E`** does. It treats the pattern as an extended regular expression and prints out the lines that match the pattern. If there are several files with the matching pattern, it also displays the file names for each line.

Copy

```
egrep [ options ] 'PATTERN' files 
```

**Options:** Most of the options for this command are same as grep.

So instead of using grep -E command in above we can use egrep easily.
{% endhint %}

### Head and Tail Commands

<figure><img src="/files/HgdwVxrBduvy5e7DmEJH" alt=""><figcaption></figcaption></figure>

### head

The head command in Linux is used to display the first few lines of one or more text files directly in the terminal.

* The head command reads a file and prints the top portion (default is the first 10 lines) to standard output.
* It’s commonly used when you want to quickly preview the beginning of a file without opening it in an editor.
* It supports options to specify the number of lines or bytes to display.
* You can use it with multiple files at once to view the first lines of each.

the basic head command to display the first 10 lines of the `sample.txt` file:

```
head sample.txt
```

**example:**

```
[payam@earth Working]$ dmesg | head
[    0.000000] Linux version 5.14.0-611.9.1.el9_7.x86_64 (mockbuild@iad1-prod-build001.bld.equ.rockylinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), GNU ld version 2.35.2-67.el9) #1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025
[    0.000000] The list of certified hardware and cloud instances for Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/vmlinuz-5.14.0-611.9.1.el9_7.x86_64 root=/dev/mapper/vg--os-lv--root ro resume=/dev/mapper/vg--os-lv--swap rd.lvm.lv=vg-os/lv-root rd.lvm.lv=vg-os/lv-swap rhgb quiet crashkernel=1G-2G:192M,2G-64G:256M,64G-:512M
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000039f98fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000039f99000-0x000000003a898fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000003a899000-0x00000000434aefff] usable

```

**Syntax:**

```
head [options] [file(s)]
```

> If no file name is specified, `head` reads from standard input (stdin)

Head command common options:

| Option   | Long-Form       | Description                                                       |
| -------- | --------------- | ----------------------------------------------------------------- |
| **`-n`** | **`--lines`**   | show the specified number of lines                                |
| **`-c`** | **`--bytes`**   | show the specified number of bytes                                |
| **`-v`** | **`--verbose`** | show the file name tag                                            |
| **`-q`** | **`--quiet`**   | don't separate the content of multiple files with a file name tag |

example:

```
[payam@earth Working]$ dmesg | head -n 5
[    0.000000] Linux version 5.14.0-611.9.1.el9_7.x86_64 (mockbuild@iad1-prod-build001.bld.equ.rockylinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), GNU ld version 2.35.2-67.el9) #1 SMP PREEMPT_DYNAMIC Tue Nov 25 17:53:21 UTC 2025
[    0.000000] The list of certified hardware and cloud instances for Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/vmlinuz-5.14.0-611.9.1.el9_7.x86_64 root=/dev/mapper/vg--os-lv--root ro resume=/dev/mapper/vg--os-lv--swap rd.lvm.lv=vg-os/lv-root rd.lvm.lv=vg-os/lv-swap rhgb quiet crashkernel=1G-2G:192M,2G-64G:256M,64G-:512M
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
[    0.000000] BIOS-provided physical RAM map:

```

### tail

Tail Command in Linux is used to display the last part of a file, showing recent content such as logs or updates.

* By default, it shows the last 10 lines of a file.
* Commonly used for monitoring log files and debugging.
* You can customize the number of lines displayed using the -n option.
* Useful for viewing the most recent entries without opening the entire file.

Without any option it display only the last 10 lines of the file specified:

```
tail myfile.txt
```

another example:

```
[payam@earth Working]$ dmesg | tail 
[  106.485984] Bluetooth: RFCOMM socket layer initialized
[  106.485996] Bluetooth: RFCOMM ver 1.11
[  122.145191] rfkill: input handler enabled
[  129.412381] rfkill: input handler disabled
[  130.176701] exFAT-fs (sdb): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 1370.338863] input: INK'D+ WIRELESS (AVRCP) as /devices/virtual/input/input20
[ 4323.759520] input: INK'D+ WIRELESS (AVRCP) as /devices/virtual/input/input21
[ 6955.285011] input: INK'D+ WIRELESS (AVRCP) as /devices/virtual/input/input22
[ 7067.130053] input: INK'D+ WIRELESS (AVRCP) as /devices/virtual/input/input23
[13692.287103] input: INK'D+ WIRELESS (AVRCP) as /devices/virtual/input/input24

```

**Syntax:**

```
tail [OPTION]... [FILE]...
```

tail command common options:

| Short Form | Long Form                           | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ---------- | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`-c`**   | **`--bytes=[+]NUM`**                | Shows the last **`NUM`** bytes of a file. Using **`+`** shows the bytes following from the specified **`NUM`** byte of each file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **`-f`**   | **`--follow[={name\|descriptor}]`** | <p>Monitors file for changes and outputs new data as the file grows. When no value is specified after <strong><code>--follow=</code></strong>, <strong><code>descriptor</code></strong> is used as the default value. This means that the update mode continues to run even when the file is renamed or moved.<br>Specify the <strong><code>--max-unchanged-stats=N</code></strong> argument to reopen a <strong><code>\[file]</code></strong> that has not changed size after <strong><code>N</code></strong> (default 5) iterations to check if it has been unlinked or renamed.<br>Specify the <strong><code>--pid=PID</code></strong> argument to exit <strong><code>tail</code></strong> after the process with the <strong><code>PID</code></strong> process ID terminates.</p> |
| **`-F`**   | **`--follow= name --retry`**        | Instructs **`tail`** to keep updating the output even if the original file is removed during the log rotation and replaced by a new one with the same name.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| **`-n`**   | **`--lines=[+]NUM`**                | Shows the last **`NUM`** lines instead of the default 10. Using **`-n +NUM`** causes the output to start with the line **`NUM`**.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **`-q`**   | **`--quiet, --silent`**             | Omits the file names from the output, displaying only the contents.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **`-s`**   | **`--sleep-interval=N`**            | Used in combination with **`-f`**. Instructs **`tail`** to wait for **`N`** seconds (default 1) between iterations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **`-v`**   | **`--verbose`**                     | Makes **`tail`** always print the file name before displaying the contents.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| **`-z`**   | **`--zero-terminated`**             | Uses **`NUL`** as the line delimiter instead of the newline character.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|            | **`--help`**                        | Displays the help file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |

### sort

The ‘sort’ command is a Linux program used for printing lines of input text files and concatenation of all files in sorted order. Sort command takes blank space as field separator and the entire input file as the sort key. It is important to notice that the sort command doesn’t actually sort the files but only prints the sorted output until you redirect the output.

**Syntax**

```
sort [OPTION]... [FILE]...
```

**example:**

```
[payam@earth Working]$ cat 1.txt 
D 1
d 1
c 2
C 2
A 3
B 4
f 14

[payam@earth Working]$ sort 1.txt 
A 3
B 4
c 2
C 2
d 1
D 1
f 14

```

If a file has words/lines beginning with both upper case and lower case characters, then sort displays those with upper case at top. However, we can change this behavior using the `-f` command line option:

```
[payam@earth Working]$ sort -f 1.txt 
A 3
B 4
C 2
c 2
D 1
d 1
f 14
```

\
The `-n` option sort the contents numerically. Also we can sort a file base on "`n"`**th** column with `-k`n option:

```
[payam@earth Working]$ sort -n -k2 1.txt 
d 1
D 1
c 2
C 2
A 3
B 4
f 14

```

user `-r` to reverse the result of comparisons. Other options of sort command:

sort command common options:<br>

| **Short option form** | **Long option form**                  | **Description**                                                                                                                                                         |
| --------------------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`-b`**              | **`--ignore-leading-blanks`**         | Causes **`sort`** to ignore leading blanks.                                                                                                                             |
| **`-d`**              | **`--dictionary-order`**              | Causes **`sort`** to consider only blanks and alphanumeric characters.                                                                                                  |
| **`-f`**              | **`--ignore-case`**                   | Ignores the default case sorting rule and changes all lowercase letters to uppercase before comparison.                                                                 |
| **`-M`**              | **`--month-sort`**                    | Sorts lines according to months (Jan-Dec).                                                                                                                              |
| **`-h`**              | **`--human-numeric-sort`**            | Compares human-readable numbers (e.g., 2K 1G).                                                                                                                          |
| **`-n`**              | **`--numeric-sort`**                  | Compares data according to string numerical values.                                                                                                                     |
| **`-R`**              | **`--random-sort`**                   | Sorts data by a random hash of keys but groups identical keys together.                                                                                                 |
| **`-r`**              | **`--reverse`**                       | Reverses the comparison results.                                                                                                                                        |
|                       | **`--sort=WORD`**                     | Sort data according to the specified **`WORD`**: general-numeric **`-g`**, human-numeric **`-h`**, month **`-M`**, numeric **`-n`**, random **`-R`**, version **`-V`**. |
| **`-c`**              | **`--check, --check=diagnose-first`** | Checks if the input is already sorted but doesn't sort it.                                                                                                              |
|                       | **`--debug`**                         | Annotates the part of the line used for sorting.                                                                                                                        |
| **`-k`**              | **`--key=KEYDEF`**                    | Sort data using the specified **`KEYDEF`**, which gives the key location and type.                                                                                      |
| **`-m`**              | **`--merge`**                         | Causes **`sort`** to merge already sorted files.                                                                                                                        |
| **`-o`**              | **`--output=FILE`**                   | Redirects the output to **`FILE`** instead of printing it in standard output.                                                                                           |
| **`-t`**              | **`--field-separator=SEP`**           | Uses the specified **`SEP`** separator instead of non-blank to blank transition.                                                                                        |
| **`-z`**              | **`--zero-terminated`**               | Causes sort to use **`NUL`** as the line delimiter instead of the newline character.                                                                                    |
|                       | **`--help`**                          | Displays the help file with full options list and exits.                                                                                                                |

### cut

The cut command in UNIX is a command line utility for cutting sections from each line of files and writing the result to standard output. It can be used to cut parts of a line by byte **position**, **character** and **delimiter**.

**syntax:**

```
cut OPTION... [FILE]...
```

> **Note**: If `FILE` is not specified, \`**`` cut` ``** reads from standard input (stdin).

#### **cut by byte position:**

```
[payam@earth Working]$ echo "linux" | cut -b 1
l
[payam@earth Working]$ echo "linux" | cut -b 1,5
lx
[payam@earth Working]$ echo "linux" | cut -b 1-4
linu
```

#### **cut by character:**

```
[payam@earth Working]$  echo '♣foobar' | cut -c 1,7
♣r
[payam@earth Working]$  echo '♣foobar' | cut -c 5-7
bar
```

#### **cut based on a delimiter:**

<figure><img src="/files/EMOgIsmpZOzIqdjJM9hK" alt=""><figcaption></figcaption></figure>

To cut using a delimiter use the **`-d`** option. This is normally used in conjunction with the -f option to specify the field that should be cut. examples:

```
[payam@earth Working]$ cut 1.txt -d: -f1
1
2
3
4
[payam@earth Working]$cut 1.txt -d: -f2
a,w
b,x
c,y
d,z
[payam@earth Working]$ cut 1.txt -d, -f1
1:a
2:b
3:c
4:d
[payam@earth Working]$ cut 1.txt -d, -f2
w
x
y
z
```

#### Options Available in cut Command <a href="#syntax-of-cut-command" id="syntax-of-cut-command"></a>

Here is a list of the most commonly used options with the Linux cut command:

| Option                | Description                                                                                         |
| --------------------- | --------------------------------------------------------------------------------------------------- |
| -b, --bytes=LIST      | Selects only the bytes specified in `LIST` (e.g., `-b 1-3,7`).                                      |
| -c, --characters=LIST | Selects only the characters specified in `LIST` (e.g., `-c 1-3,7`).                                 |
| -d, --delimiter=DELIM | Uses `DELIM` as the field delimiter character instead of the tab character.                         |
| -f, --fields=LIS      | Selects only the fields specified in `LIST`, separated by the delimiter character (default is tab). |
| -n                    | Do not split multi-byte characters (no effect unless `-b` or `-c` is specified).                    |
| --complement          | Invert the selection of fields/characters. Print the fields/characters not selected.                |
| --output-delimiter    | Changes the output delimiter for fields in the cut command bash.                                    |

### wc

**wc** (short for **word count**) is a command line tool in Unix/Linux operating systems, which is used to find out the number of newline count, word count, byte and character count in the files specified by the ***File*** arguments to the standard output and hold a total count for all named files.

When you define the ***File*** parameter, the **wc** command prints the file names as well as the requested counts. If you do not define a file name for the ***File*** parameter, it prints only the total count to the standard output. example:

```
[payam@earth Working]$ wc /etc/inittab 
 16  76 490 /etc/inittab
```

Three numbers shown below are **16**(number of **lines**), **76** (number of **words***\[by default space delimited]*) and **490**(number of **bytes**) of the file.

**Syntax :**

<pre><code><strong>wc [OPTION]... [FILE]...
</strong></code></pre>

> If no file is specified, it will read from **standard input**, meaning you can type text manually or pipe it from another command.

The followings are the options and usage provided by the **wc** command.

* `wc -l` – Prints the number of lines in a file.
* `wc -w` – prints the number of words in a file.
* `wc -c` – Displays the count of bytes in a file.
* `wc -m` – prints the count of characters from a file.
* `wc -L` – prints only the length of the longest line in a file.

\
That's all.

.

.

.

***

sources:

<https://serveracademy.com/blog/the-linux-cat-command/>\
<https://www.geeksforgeeks.org/linux-unix/cat-command-in-linux-with-examples/>\
<https://www.geeksforgeeks.org/linux-unix/input-output-redirection-in-linux/>\
<https://www.geeksforgeeks.org/linux-unix/redirect-output-to-a-file-and-stdout/>\
<https://www.geeksforgeeks.org/linux-unix/dmesg-command-linux-driver-messages/>\
<https://www.geeksforgeeks.org/linux-unix/grep-command-in-unixlinux/>\
<https://www.geeksforgeeks.org/linux-unix/regular-expression-grep/>\
<https://phoenixnap.com/kb/linux-head>\
<https://www.geeksforgeeks.org/linux-unix/tail-command-linux-examples/>\
<https://phoenixnap.com/kb/linux-tail>\
<https://www.geeksforgeeks.org/linux-unix/sort-command-linuxunix-examples/>\
<https://phoenixnap.com/kb/linux-sort>\
<https://www.geeksforgeeks.org/linux-unix/cut-command-linux-examples/>

example fruit file to play with it:

```
NAME,COLOR,SIZE
orange,orange,medium
grape,green,small
grape,red,small
apple,red,medium
banana,yellow,medium
watermelon,green,large
avocado,green,medium
lemon,yellow,medium
honeydew,green,large

```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://borosan.gitbook.io/lpi-linux-essentials/3.2-searching-and-extracting-data-from-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
