In 1977, a programming language-level tool for processing text, named' awk', was born at Bell Labs.
The name comes from the first letters of the last names of three famous people:
Alfred Aho
Peter Weinberger
Brian Kernighan
Similar to shell (bash, csh, zsh, and ksh), awk has derivatives with the development of history:
awk: Born in 1977 Bell Labs.
nawk (new awk): It was born in 1985 and is an updated and enhanced version of awk. It was widely used with Unix System V Release 3.1 (1987). The old version of awk is called oawk (old awk).
gawk (GNU awk): It was written by Paul Rubin in 1986. The GNU Project was born in 1984.
mawk: was written in 1996 by Mike Brennan, the interpreter of the awk programming language.
jawk: Implementation of awk in JAVA
In the GNU/Linux operating system, the usual awk refers to gawk. However, some distributions, such as Ubuntu or Debian, use mawk as their default awk.
Although awk is a tool for processing text, it has some programming language features:
variable
process control (loop)
data type
logical operation
function
array
...
The working principle of awk: Similar to relational databases, it supports processing fields (columns) and records (rows). By default, awk treats each line of a file as a record and places these records in memory for line-by-line processing, with a portion of each line treated as a field in the record. By default, delimiters to separate different fields use spaces and tabs, while numbers represent different fields in the row record. To reference multiple fields, separate them with commas or tabs.
Before formally learning awk, beginners need to understand the command printf.
printf:format and print data. Its usage is -printf FORMAT [ARGUMENT]...
FORMAT:Used to control the content of the output. The following common interpretation sequences are supported:
\a - alert (BEL)
\b - backspace
\f - form feed
\n - new line
\r - carriage return
\t - horizontal tab
\v - vertical tab
%Ns - The output string. The N represents the number of strings, for example: %s %s %s
%Ni - Output integers. The N represents the number of integers of the output, for example: %i %i
%m.nf - Output Floating Point Number. The m represents the total number of digits output, and the n represents the number of digits after the decimal point. For example: %8.5f
ARGUMENT: If it is a file, you need to do some preprocessing to output correctly.
Shell>cat/tmp/printf.txt
IDNameAgeClass
1Frank2032Jack2553Django1664Tom197# Example of incorrect syntax:
Shell>printf'%s %s $s\n'/tmp/printf.txt
/tmp/printf.txt
# Change the format of the text
Shell>printf'%s'$(cat/tmp/printf.txt)
IDNameAgeClass1Frank2032Jack2553Django1664Tom197
# Change the format of the text
Shell>printf'%s\t%s\t%s\n'$(cat/tmp/printf.txt)
IDNameAge
Class1Frank
2032
Jack2553Django1664Tom
197
Shell>printf"%s\t%s\t%s\t%s\n"abcd1234
abcd
1234
No print command exists in RockyLinux OS. You can only use print in awk, and its difference from printf is that it automatically adds a newline at the end of each line. For example:
Shell>cat/etc/services|awk'/[^0-9a-zA-Z]9[1-9]{2}\/tcp/ || /91{2}\/tcp/ {print $0}'
telnets992/tcp
imaps993/tcp# IMAP over SSL
pop3s995/tcp# POP-3 over SSL
mtp1911/tcp#
rndc953/tcp# rndc control sockets (BIND 9)
xact-backup911/tcp# xact-backup
apex-mesh912/tcp# APEX relay-relay service
apex-edge913/tcp# APEX endpoint-relay service
ftps-data989/tcp# ftp protocol, data, over TLS/SSL
nas991/tcp# Netnews Administration System
vsinet996/tcp# vsinet
maitrd997/tcp#
busboy998/tcp#
garcon999/tcp##puprouter 999/tcp #
blockade2911/tcp# Blockade
prnstatus3911/tcp# Printer Status Port
cpdlc5911/tcp# Controller Pilot Data Link Communication
manyone-xml8911/tcp# manyone-xml
sype-transport9911/tcp# SYPECom Transport Protocol
Shell>cat/etc/services|awk'BEGIN{RS="\n";ORS="\n"} NR<=10 {print NR,$0}'1# /etc/services:2# $Id: services,v 1.49 2017/08/18 12:43:23 ovasik Exp $3#4# Network services, Internet style5# IANA services version: last updated 2016-07-086#7# Note that it is presently the policy of IANA to assign a single well-known8# port number for both TCP and UDP; hence, most entries here have two entries9# even if the protocol doesn't support UDP operations.10# Updated from RFC 1700, ``Assigned Numbers'' (October 1994). Not all ports
Why?Read the first line: Because "i" is not assigned a value, so "i=!i" indicates TRUE.
Read the second line: At this point, "i=!i" indicates FALSE.
And so on, the final printed line is an odd number.
Print even rows:
Shell>seq110|awk'!(i=!i)'# or
Shell>seq110|awk'!(i=!i) {print $0}'246810
Note
As you can see, sometimes you can ignore the syntax for the "action" part, which by default is equivalent to "{print $0}".
Reversal
Shell>cat/etc/services|awk'!/(tcp)|(udp)|(^#)|(^$)/ {print $0}'
http80/sctp# HyperText Transfer Protocol
bgp179/sctp
https443/sctp# http protocol over TLS/SSL
h323hostcall1720/sctp# H.323 Call Control
nfs2049/sctpnfsdshilp# Network File System
rtmp1/ddp# Routing Table Maintenance Protocol
nbp2/ddp# Name Binding Protocolecho4/ddp# AppleTalk Echo Protocol
zip6/ddp# Zone Information Protocol
discard9/sctp# Discard
discard9/dccp# Discard SC:DISC
...
Please pay attention! You must use double quotes to include the command.
Regular expression
Here, we cover basic examples of regular expressions. You can use regular expressions on row records.
Shell>cat/etc/services|awk'/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ {print $0}'# Be equivalent to:
Shell>cat/etc/services|awk'$0~/[^0-9a-zA-Z]1[1-9]{2}\/tcp/ {print $0}'
If the file has a large amount of text, regular expressions can also be used for fields, which will help improve processing efficiency. The usage example is as follows:
Shell>cat/etc/services|awk'$0~/^(ssh)/ && $2~/tcp/ {print $0}'
ssh22/tcp# The Secure Shell (SSH) Protocol
sshell614/tcp# SSLshell
ssh-mgmt17235/tcp# SSH Tectia Manager
Shell>cat/etc/services|grep-v-E"(^#)|(^$)"|awk'$2!~/(tcp)|(udp)/ {print $0}'
http80/sctp# HyperText Transfer Protocol
bgp179/sctp
https443/sctp# http protocol over TLS/SSL
h323hostcall1720/sctp# H.323 Call Control
nfs2049/sctpnfsdshilp# Network File System
rtmp1/ddp# Routing Table Maintenance Protocol
nbp2/ddp# Name Binding Protocol
...
array: A collection of data with the same data type arranged in a certain order. Each data in an array is called an element.
Like most programming languages, awk also supports arrays, which are divided into indexed arrays (with numbers as subscripts) and associative arrays (with strings as subscripts).
awk has a lot of functions, and the functions related to arrays are:
Shell>cat/etc/passwd|awk-F":"'{username[NR]=$1}END{print username[2]}'
bin
Shell>cat/etc/passwd|awk-F":"'{username[NR]=$1}END{print username[1]}'
root
Info
The numeric subscript of an awk array can be a positive integer, a negative integer, a string, or 0, so the numeric subscript of an awk array has no concept of an initial value. This is not the same as arrays in bash.
Shell>tail-n5/etc/group|awk-F":"'\{ a[x++]=$1}END{ for(i in a) print a[i],i}'
slocate0
unbound1
docker2
cgred3
redis4
Use a field as the subscript of an array
Shell>tail-n5/etc/group|awk-F":"'\{ a[$1]=$3}END{ for(i in a) print a[i],i}'991docker
21slocate
989redis
992unbound
990cgred
Count the number of occurrences of the same field
Count the number of occurrences of the same IPv4 address. Basic idea:
First use the grep command to filter out all IPv4 addresses
Then hand it over to the awk program for processing
Shell>cat/var/log/secure|egrep-o"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"|awk' \{ a[$1]++} END{ for(v in a) print a[v],v}'40.0.0.0
4192.168.100.2
Info
a[$1]++ is equivalent to a[$1]+=1
Count the number of occurrences of words regardless of case. Basic idea:
Split all fields into multiple rows of records
Then hand it over to the awk program for processing
Shell>cat/etc/services|awk-F" "'{for(i=1;i<=NF;i++) print $i}'
Shell>cat/etc/services|awk-F" "'{for(i=1;i<=NF;i++) print $i}'|awk'\BEGIN{IGNORECASE=1;OFS="\t"} /^netbios$/ || /^ftp$/ {a[$1]++} END{for(v in a) print a[v],v}'3NETBIOS
18FTP
7ftp
Shell>cat/etc/services|awk-F" "'{ for(i=1;i<=NF;i++) print $i }'|awk'\BEGIN{IGNORECASE=1;OFS="\t"} /^netbios$/ || /^ftp$/ {a[$1]++} END{for(v in a) \if(a[v]>=5) print a[v],v}'18FTP
7ftp
You can first filter specific row records and then perform statistics, such as:
Shell>ss-tulnp|awk-F" "'/tcp/ {a[$2]++} END{for(i in a) print a[i],i}'2LISTEN
Print lines based on the number of occurrences of a specific field
Shell>tail/etc/services
aigairserver21221/tcp# Services for Air Server
ka-kdp31016/udp# Kollective Agent Kollective Delivery
ka-sddp31016/tcp# Kollective Agent Secure Distributed Delivery
edi_service34567/udp# dhanalakshmi.org EDI Service
axio-disc35100/tcp# Axiomatic discovery protocol
axio-disc35100/udp# Axiomatic discovery protocol
pmwebapi44323/tcp# Performance Co-Pilot client HTTP API
cloudcheck-ping45514/udp# ASSIA CloudCheck WiFi Management keepalive
cloudcheck45514/tcp# ASSIA CloudCheck WiFi Management System
spremotetablet46998/tcp# Capture handwritten signatures
Shell>tail/etc/services|awk'a[$1]++ {print $0}'
axio-disc35100/udp# Axiomatic discovery protocol
Reverse:
Shell>tail/etc/services|awk'!a[$1]++ {print $0}'
aigairserver21221/tcp# Services for Air Server
ka-kdp31016/udp# Kollective Agent Kollective Delivery
ka-sddp31016/tcp# Kollective Agent Secure Distributed Delivery
edi_service34567/udp# dhanalakshmi.org EDI Service
axio-disc35100/tcp# Axiomatic discovery protocol
pmwebapi44323/tcp# Performance Co-Pilot client HTTP API
cloudcheck-ping45514/udp# ASSIA CloudCheck WiFi Management keepalive
cloudcheck45514/tcp# ASSIA CloudCheck WiFi Management System
spremotetablet46998/tcp# Capture handwritten signatures
Multidimensional array
The awk program does not support multi-dimensional arrays, but support for multi-dimensional arrays is achievable through simulation. By default, "\034" is the delimiter for the subscript of a multidimensional array.
Please note the following differences when using multidimensional arrays:
Shell>awk'BEGIN{ a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i }'20020300301001,0
Redefine the delimiter:
Shell>awk'BEGIN{ SUBSEP="----" ; a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i }'3003----0
2002----0
1001,0
Reorder:
Shell>awk'BEGIN{ SUBSEP="----" ; a["1,0"]=100 ; a[2,0]=200 ; a["3","0"]=300 ; for(i in a) print a[i],i | "sort" }'1001,0
2002----0
3003----0
Count the number of times the field appears:
Shell>catc.txt
A192.168.1.1HTTP
B192.168.1.2HTTP
B192.168.1.2MYSQL
C192.168.1.1MYSQL
C192.168.1.1MQ
D192.168.1.4NGINX
Shell>catc.txt|awk'BEGIN{SUBSEP="----"} {a[$1,$2]++} END{for(i in a) print a[i],i}'1A----192.168.1.1
2B----192.168.1.2
2C----192.168.1.1
1D----192.168.1.4
Returns a random number N with a range of (0,1). The result is not that every run is a random number, but that it remains the same.
srand([expr])
Use "expr" to generate random numbers. If "expr" is not specified, the current time is used as the seed by default, and if there is a seed, the generated random number is used.
asort(a,b)
The elements of the array "a" are reordered (lexicographically) and stored in the new array "b", with the subscript in the array "b" starting at 1. This function returns the number of elements in the array.
asorti(a,b)
Reorder the subscript of the array "a" and store the sorted subscript in the new array "b" as an element, with the subscript of the array "b" starting at 1.
sub(r,s[,t])
Use the "r" regular expression to match the input records, and replace the matching result with "s". "t" is optional, indicating a replacement for a certain field. The function returns the number of replacements - 0 or 1. Similar to sed s//
gsub(r,s[,t])
Global replacement. "t" is optional, indicating the replacement of a certain field. If "t" is ignored, it indicates global replacement. Similar to sed s///g
gensub(r,s,h[,t])
The "r" regular expression matches the input records and replaces the matching result with "s". "t" is optional, indicating a replacement for a certain field. "h" represents replacing the specified index position
index(s,t)
Returns the index position of the string "t" in the string "s" (the string index starts from 1). If the function returns 0, it means it does not exist
length([s])
Returns the length of "s"
match(s,r[,a])
Test whether the string "s" contains the string "r". If included, return the index position of "r" within it (string index starting from 1). If not, return 0
split(s,a[,r[,seps]])
Split string "s" into an array "a" based on the delimiter "seps". The subscript of the array starts with 1.
substr(s,i[,n])
Intercept the string. "s" represents the string to be processed; "i" indicates the index position of the string; "n" is the length. If you do not specify "n", it means to intercept all remaining parts
tolower(str)
Converts all strings to lowercase
toupper(str)
Converts all strings to uppercase
systime()
Current timestamp
strftime([format[,timestamp[,utc-flag]]])
Format the output time. Converts the timestamp to a string
As you can see, the int function only works for numbers, and when encountering a string, converts it to 0. When encountering a string starting with a number, truncates it.
sqrt function
Shell>awk'BEGIN{print sqrt(9)}'3
rand function and srand function
The example of using the rand function is as follows:
Shell>vim/tmp/tmp-file1.txt
A192.168.1.1HTTP
B192.168.1.2HTTP
B192.168.1.2MYSQL
C192.168.1.1MYSQL
C192.168.1.1MQ
D192.168.1.4NGINX
# Add a line of text before the second line
Shell>cat/tmp/tmp-file1.txt|awk'NR==2 {gsub(/.*/,"add a line\n&")} {print $0}'
A192.168.1.1HTTP
addaline
B192.168.1.2HTTP
B192.168.1.2MYSQL
C192.168.1.1MYSQL
C192.168.1.1MQ
D192.168.1.4NGINX
# Add a string after the IP address in the second line
Shell>cat/tmp/tmp-file1.txt|awk'NR==2 {gsub(/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/,"&\tSTRING")} {print $0}'
A192.168.1.1HTTP
B192.168.1.2STRINGHTTP
B192.168.1.2MYSQL
C192.168.1.1MYSQL
C192.168.1.1MQ
D192.168.1.4NGINX
# The length of the output field
Shell>tail-n5/etc/services|awk'{print length($1)}'98151014# The length of the output array
Shell>cat/etc/passwd|awk-F":"'a[NR]=$1 END{print length(a)}'22
Shell>echo"365%tmp%dir%number"|awk'{split($1,a1,"%") ; for(i in a1) print i,a1[i]}'13652tmp
3dir
4number
substr function
Shell>head-n5/etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
# I need this part of the content - "emon:/sbin:/sbin/nologin"
Shell>head-n5/etc/passwd|awk'/daemon/ {print substr($0,16)}'
emon:/sbin:/sbin/nologin
Shell>tail-n5/etc/services
axio-disc35100/udp# Axiomatic discovery protocol
pmwebapi44323/tcp# Performance Co-Pilot client HTTP API
cloudcheck-ping45514/udp# ASSIA CloudCheck WiFi Management keepalive
cloudcheck45514/tcp# ASSIA CloudCheck WiFi Management System
spremotetablet46998/tcp# Capture handwritten signatures# I need this part of the content - "tablet"
Shell>tail-n5/etc/services|awk'/^sp/ {print substr($1,9)}'
tablet
What is a UNIX timestamp?
According to the development history of GNU/Linux, UNIX V1 was born in 1971, and the book "UNIX Programmer's Manual" was published on November 3 of the same year, which defines 1970-01-01 as the reference date of the start of UNIX.
The conversion between a timestamp and a natural date time in days:
Read the next matching row record and assign it to "$0". The return value is 1: Indicates that relevant row records have been read. The return value is 0: Indicates that the last line has been read The return value is negative: Indicates encountering an error
getline var
Read the next matching row record and assign it to the variable "var"
command | getline [var]
Assign the result to "$0" or the variable "var"
next
Stop the current input record and perform the following actions
Earlier, we introduced the break statement and the continue statement, the former used to terminate the loop, and the latter used to jump out of the current loop. See here. For next, when the conditions are met, it will stop the input recording that meets the conditions and continue with subsequent actions.
Shell>seq15|awk'{if(NR==3) {next} print $0}'1245# equivalent to
Shell>seq15|awk'{if($1!=3) print $0}'
">" indicates writing to the file as an overlay. If you want to write to the file as an append, please use ">>". Reminder again, you should use double quotation marks to include the file path.
If you have specialized programming language skills, awk is relatively easy to learn. However, for most sysadmins with weak programming language skills (including the author), awk can be very complicated to learn. For information not covered, please refer to here.