Mastering sed: part 1
In this tutorial, we will go over the basics of using sed, an ancient and powerful text manipulator. This tutorial assumes basic experience with bash.
We will go over real sed examples that you can copy paste into your own terminal which explore various features and fundamental ideas of sed. At the end, we will have a summary of the fundamentals.
Print the 2nd line
Example
echo 'first
second
third
fourth
fifth' | sed -n 2p
Output:
second
Explanation
There are 3 important ideas in this example:
2
Sed applies commands to an address within the text. Here our address is 2, which means the 2nd line (sed line numbers start from 1, not 0).
p
Sed commands are single letters. Here our command is p which means print.
-n
Normally sed prints every line after the command has been applied. However, when our command itself is print, we don’t want to see the other lines, so we use -n to tell sed not to print everything again.
Altogether
Altogether this forms a command, only print the 2nd line.
Print the 2nd through 4th lines
Example
echo 'first
second
third
fourth
fifth' | sed -n 2,4p
Output:
second
third
fourth
Explanation
The only thing that’s changed from the previous example is the address.
It changed from "2" to "2,4". 2,4 means lines 2 through 4, meaning lines 2, 3, and 4.
p is the command which means print.
-n means override sed’s default behavior by not printing every line.
Print the 2nd line, plus 2 more
Example
echo 'first
second
third
fourth
fifth' | sed -n 2,+2p
Output:
second
third
fourth
Explanation
An address can also accept an offset in the 2nd position where a,+b is equivalent to a,a+b.
The address 2,+2, is equivalent to 2,2+2 = 2,4 which is why it prints lines 2,3, and 4.
p is the command which means print.
-n means override sed’s default behavior by not printing every line.
Print the 3rd through last lines
Example
echo 'first
second
third
fourth
fifth' | sed -n 3,$p
Output:
third fourth fifth
Explanation
$ is a special address meaning the last line. So, the address 3,$ means the 3rd through last line.
p is the command which means print.
-n means override sed’s default behavior by not printing every line.
Print all lines with an “o” and the next one
Example
echo 'first
second
third
fourth
fifth' | sed -n /o/,+1p
Output:
second
third
fourth
fifth
Explanation
An address can also be a regular expression when wrapped in slashes. Here however it means it matches all lines which match the regular expression, not just the first. So both “second” and “fourth” match /o/, so they and the line after them are in the range. The command is “p”, so this prints lines 2, 3, 4, 5.
p is the command which means print.
-n means override sed’s default behavior by not printing every line.
Wait, isn’t this grep: a historical tangent
Wait, isn’t printing a line because it matches a regular expression grep? Why would we do this when we have grep? You’re right, but what does grep stand for? It stands for
g - the global address meaning all lines,
re - meaning a /regularexpression/
p meaning print, like here
Huh, this looks a lot like a sed command. It is. grep and sed (and vim) share a common ancestor ed, the original unix file editor. g/re/p was an ed command which was later made into its own program grep. Later on, sed was made as a way to perform ed commands on streams in a non-interactive way.
Print every line
Example
echo 'first
second
third
fourth
fifth' | sed -n p
Output:
first
second
third
fourth
fifth
Explanation
If we don’t specify an address, sed uses the “zero address”, which is every line.
p is the command which means print.
-n means override sed’s default behavior by not printing every line.
As such, this command prints every line. It’s a poor way to use sed, but it illustrates the “zero address” well.
Delete the 3rd and 4th lines
Example
echo 'first
second
third
fourth
fifth' | sed 3,+1d
Output:
first
second
fifth
Explanation
3,+1 is the range which means the 3rd and fourth lines.
d is the delete command which deletes the specified address. Since we’re not doing an explicit print, we drop the -n flag to print all remaining lines. d, like any sed command, can accept any address range.
Since -n is not specified, sed prints every line, excluding the deleted lines.
Altogether this means delete lines 3 and 4, and then print all remaining lines.
Insert text above the first line
Example
echo 'first
second
third
fourth
fifth' | sed "1i zeroth"
Output:
zeroth
first
second
third
fourth
fifth
Explanation
1 is the address, which is just the first line. i, like any sed command, can accept any address range.
i is the insertion command, it lets you insert text on the line before every line in the specified address.
"zeroth" is the command option, which for insert means the text to be inserted.
Since -n is not specified, sed prints every line.
Altogether this means insert “zeroth” before the first line, and then print all remaining lines.
Append text after the last line
Example
echo 'first
second
third
fourth
fifth' | sed "$a last"
Output:
first
second
third
fourth
fifth
last
Explanation
$ is the address, which is just the last line.
a is the append command, it lets you append text on the line after every line in the specified address.
"last" is the command option, which for append means the text to be appended.
Find and Replace only on the 2nd line
echo 'find
find
find
' | sed "2s/find/replace/"
Output:
find
replace
find
Explanation
2 is the address, which is just the second line. That’s why fourth is unchanged.
s is the find and replace command, it lets you append text on the line after every line in the specified address.
"/find/replace/" is the option.
/ is the delimiter which needs to go in 3 places.
find is the string to find.
replace, the second string is the string to replace the found string with. Find can be a regular expression, but we’ll cover this more in next week’s post.
Since -n is not specified, this will then print all lines, including the lines which had their strings changed.
Fundamentals Overview
Now that we’ve seen various usages of sed, we will go over the fundamentals.
All sed commands are in the following format
sed "ACO"
A
A is the address.
Zero-address
The zero address is an empty string which matches every line.
One-address
There are 2 types of one-addresses:
Number
A single number matches that line number, for example "1" matches the first line.
$
$ is a special line number which always means the last line.
/regularexpression/ address
A /regularexpression/ matches all lines that contain a string matching the regular expression. For example /o/ matches all lines containing an o
2-address
A 2-address matches all lines between the first address and the second address.
"2,4" matches all lines between 2 and 4 lines 2, 3, and 4.
/o/,/n/ matches all lines between a line with an o and a line with an n
2nd position offset address
The second position of a 2-address can also be an offset. If so, it will match all lines from the first part of the address through the next offset lines.
"2,+2" matches lines 2, 3 (2+1), and 4 (2+2).
/o/,+2 matches every line with an o, plus the next 2 lines.
C
C is the command, it will always be one letter. So far we covered
p(rint) - prints the line, commonly used with the -n flag
d(elete) - deletes the line
i(nsert) - inserts text on the line before
a(ppend) - appends text on the line after
s(tring replace) - find and replace on the line
O
O are the command options.
Some commands like p take no options.
Some commands like s take a whole slew of options.
Some commands like a take in a string literal.
-n
By default, sed prints all of its lines, however, the -n flag disables this default behavior. This is useful when using the p command to print selected portions of stdin.
Sneak peek
Next week, we’ll dig more into sed’s versatile s command. If you don’t want to miss out, just hit the subscribe now button below