2016-06-14 06:53:59 +00:00
# BBO-Bioinformatics-Bash-Oneliner
Hi bioinformaticans and bash learner, welcome to BBO, Bioinformatics Bash Oneliner learning station. I started studying bioinformatics data three years ago, and my cs friend install Ubuntu on my lab computer. I was amazed by those single-word bash commands which are much faster than my dull scripts, so i started and insist using bash. Not all the code here is oneliner (if the ';' counts..), but i put effort on making them brief and fast.
This blog will focus on bash commands for parsing biological data, most of which are tsv files (tab-separated values); some of them are for Ubuntu system maintaining. I have been recording the bash commands on my notebook, but putting them on web will help others and myself to 'Ctrl +F '. I apologize that there won't be any citation for the codes, coz i haven't make any record of it, but they are probably from dear Google and Stackoverflow.
English and bash are not my first language, so... correct me anytime, sorry
In case you would like to check up and like my stupid questions on Stackoverflow, here's my page:
http://stackoverflow.com/users/4290753/once
2016-06-14 06:57:38 +00:00
2016-06-14 07:11:00 +00:00
2016-06-14 07:13:30 +00:00
##Handy Bash 'oneliner' commands for tsv file editing
2016-06-14 07:11:00 +00:00
2016-06-14 07:22:55 +00:00
- [Grep ](#grep )
- [Sed ](#sed )
- [Awk ](#awk )
- [Xargs ](#xargs )
- [Find ](#find )
- [Others ](#others )
2016-06-14 07:20:16 +00:00
2016-06-14 07:13:30 +00:00
##Grep
extract text bewteen words (e.g. w1,w2)
2016-06-14 07:20:16 +00:00
2016-06-14 07:13:30 +00:00
grep -o -P '(?< =w1).*(?=w2)'
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
2016-06-14 06:57:38 +00:00
>grep lines without word (e.g. bbo)
2016-06-14 07:20:16 +00:00
grep -v bbo
2016-06-14 06:57:38 +00:00
>grep and count (e.g. bbo)
2016-06-14 07:20:16 +00:00
grep -c bbo filename
2016-06-14 06:57:38 +00:00
>insensitive grep (e.g. bbo/BBO/Bbo)
2016-06-14 07:20:16 +00:00
grep -i "bbo" filename
2016-06-14 06:57:38 +00:00
>count occurrence (e.g. three times a line count three times)
2016-06-14 07:20:16 +00:00
grep -o bbo filename
2016-06-14 06:57:38 +00:00
>COLOR the match (e.g. bbo)!
2016-06-14 07:20:16 +00:00
grep --color bbo filename
2016-06-14 06:57:38 +00:00
>grep search all files in a directory(e.g. bbo)
2016-06-14 07:20:16 +00:00
grep -R bbo /path/to/directory or
grep -r bbo /path/to/directory
2016-06-14 06:57:38 +00:00
>search all files in directory, only output file names with matches(e.g. bbo)
2016-06-14 07:20:16 +00:00
grep -Rh bbo /path/to/directory or
grep -rh bbo /path/to/directory
2016-06-14 06:57:38 +00:00
>grep OR (e.g. A or B or C or D)
2016-06-14 07:20:16 +00:00
grep 'A\|B\|C\|D'
2016-06-14 06:57:38 +00:00
>grep AND (e.g. A and B)
2016-06-14 07:20:16 +00:00
grep 'A.*B'
2016-06-14 06:57:38 +00:00
>grep all content of a fileA from fileB
2016-06-14 07:20:16 +00:00
grep -f fileA fileB
2016-06-14 06:57:38 +00:00
>grep a tab
2016-06-14 07:20:16 +00:00
grep '\t'
2016-06-14 06:57:38 +00:00
2016-06-14 07:25:46 +00:00
##Sed
2016-06-14 07:30:42 +00:00
[[back to top ](#handy-bash--oneliner--commands-for-tsv-file-editing )]
2016-06-14 07:25:46 +00:00
2016-06-14 06:57:38 +00:00
>remove lines with word (e.g. bbo)
2016-06-14 07:20:16 +00:00
sed "/bbo/d" filename
2016-06-14 06:57:38 +00:00
>edit infile (edit and save)
2016-06-14 07:20:16 +00:00
sed -i "/bbo/d" filename
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
>when using variable (e.g. i), use double quotes " "
e.g. add > i to the first line (to make a FASTA file)
sed "1i > i"
2016-06-14 06:57:38 +00:00
//notice the double quotes! in other examples, you can use a single quote, but here, no way!
//'1i' means insert to first line
>delete empty lines
2016-06-14 07:20:16 +00:00
sed '/^\s* /d' or
sed 's/^ /d'
2016-06-14 06:57:38 +00:00
>delete last line
2016-06-14 07:20:16 +00:00
sed ' d'
2016-06-14 06:57:38 +00:00
>add \n every nth character (e.g. every 4th character)
2016-06-14 07:20:16 +00:00
sed 's/.\{4\}/& \n/g'
2016-06-14 06:57:38 +00:00
>substitution (e.g. replace A by B)
2016-06-14 07:20:16 +00:00
sed 's/A/B/g' filename
2016-06-14 06:57:38 +00:00
>select lines start with string (e.g. bbo)
2016-06-14 07:20:16 +00:00
sed -n '/^@S/p'
2016-06-14 06:57:38 +00:00
>delete lines with string (e.g. bbo)
2016-06-14 07:20:16 +00:00
sed '/bbo/d' filename
2016-06-14 06:57:38 +00:00
>print every nth lines
2016-06-14 07:20:16 +00:00
sed -n '0~3p' filename
2016-06-14 06:57:38 +00:00
//catch 0: start; 3: step
>print every odd # lines
2016-06-14 07:20:16 +00:00
sed -n '1~2p'
2016-06-14 06:57:38 +00:00
>print every third line including the first line
2016-06-14 07:20:16 +00:00
sed -n '1p;0~3p'
2016-06-14 06:57:38 +00:00
>remove leading whitespace and tabs
2016-06-14 07:20:16 +00:00
sed -e 's/^[ \t]*//'
2016-06-14 06:57:38 +00:00
//notice a whitespace before '\t'!!
>remove only leading whitespace
2016-06-14 07:20:16 +00:00
sed 's/ *//'
2016-06-14 06:57:38 +00:00
//notice a whitespace before '*'!!
>remove ending commas
2016-06-14 07:20:16 +00:00
sed 's/, //g'
2016-06-14 06:57:38 +00:00
>add a column to the end
2016-06-14 07:20:16 +00:00
sed "s/ /\t i/"
// i is the valuable you want to add
2016-06-14 06:57:38 +00:00
e.g. add the filename to every last column of the file
2016-06-14 07:20:16 +00:00
for i in (ls);do sed -i "s/ /\t i/" i;done
2016-06-14 06:57:38 +00:00
>remove newline\ nextline
2016-06-14 07:20:16 +00:00
sed ':a;N; !ba;s/\n//g'
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
#Awk
2016-06-14 06:57:38 +00:00
>set tab as field separator
2016-06-14 07:20:16 +00:00
awk -F '\t'
2016-06-14 06:57:38 +00:00
>output as tab separated (also as field separator)
2016-06-14 07:20:16 +00:00
awk -v OFS='\t'
2016-06-14 06:57:38 +00:00
>pass variable
a=bbo;b=obb;
2016-06-14 07:20:16 +00:00
awk -v a=" a" -v b=" b" " 1==a & & 10=b' filename
2016-06-14 06:57:38 +00:00
>print number of characters on each line
2016-06-14 07:20:16 +00:00
awk '{print length ( 0);}' filename
2016-06-14 06:57:38 +00:00
>find number of columns
2016-06-14 07:20:16 +00:00
awk '{print NF}'
2016-06-14 06:57:38 +00:00
>reverse column order
2016-06-14 07:20:16 +00:00
awk '{print 2, 1}'
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
>check if there is a comma in a column (e.g. column 1)
awk ' 1~/,/ {print}'
2016-06-14 06:57:38 +00:00
>split and do for loop
2016-06-14 07:20:16 +00:00
awk '{split( 2, a,",");for (i in a) print 1"\t"a[i]} filename
2016-06-14 06:57:38 +00:00
>print all lines before nth occurence of a string (e.g stop print lines when bbo appears 7 times)
2016-06-14 07:20:16 +00:00
awk -v N=7 '{print}/bbo/& & --N< =0 {exit}'
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
>add string to the beginning of a column (e.g add "chr" to column 3)
awk 'BEGIN{OFS="\t"} 3="chr" 3'
2016-06-14 06:57:38 +00:00
>remove lines with string (e.g. bbo)
2016-06-14 07:20:16 +00:00
awk '!/bbo/' file
2016-06-14 06:57:38 +00:00
>column subtraction
2016-06-14 07:20:16 +00:00
cat file| awk -F '\t' 'BEGIN {SUM=0}{SUM+= 3- 2}END{print SUM}'
2016-06-14 06:57:38 +00:00
>usage and meaning of NR and FNR
e.g.
fileA:
a
b
c
fileB:
d
e
2016-06-14 07:20:16 +00:00
awk 'print FILENAME, NR,FNR, 0}' fileA fileB fileA 1 1 a
2016-06-14 06:57:38 +00:00
fileA 2 2 b
fileA 3 3 c
fileB 4 1 d
fileB 5 2 e
>and gate
e.g.
fileA:
1 0
2 1
3 1
4 0
fileB:
1 0
2 1
3 0
4 1
2016-06-14 07:20:16 +00:00
awk -v OFS='\t' 'NR=FNR{a[ 1]= 2;next} NF {print 1,((a[ 1]= 2)? 2:"0")}' fileA fileB 1 0
2016-06-14 06:57:38 +00:00
2 1
3 0
4 0
>round all numbers of file (e.g. 2 significant figure)
2016-06-14 07:20:16 +00:00
awk '{while (match( 0, /[0-9]+\[0-9]+/)){
\printf "%s%.2f", substr( 0,0,RSTART-1),substr( 0,RSTART,RLENGTH)
\ 0=substr( 0, RSTART+RLENGTH)
2016-06-14 06:57:38 +00:00
\}
\print
\}'
>give number/index to every row
2016-06-14 07:20:16 +00:00
awk '{printf("%s\t%s\n",NR, 0)}'
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
#Xargs
2016-06-14 06:57:38 +00:00
>set tab as delimiter (default:space)
2016-06-14 07:20:16 +00:00
xargs -d\t
2016-06-14 06:57:38 +00:00
>display 3 items per line
2016-06-14 07:20:16 +00:00
echo 1 2 3 4 5 6| xargs -n 3
2016-06-14 06:57:38 +00:00
//1 2 3
4 5 6
>prompt before execution
2016-06-14 07:20:16 +00:00
echo a b c |xargs -p -n 3
2016-06-14 06:57:38 +00:00
>print command along with output
2016-06-14 07:20:16 +00:00
xargs -t abcd
2016-06-14 06:57:38 +00:00
///bin/echo abcd
//abcd
>with find and rm
2016-06-14 07:20:16 +00:00
find . -name "*.html"|xargs rm -rf
2016-06-14 06:57:38 +00:00
>delete fiels with whitespace in filename (e.g. "hello 2001")
2016-06-14 07:20:16 +00:00
find . -name "*.c" -print0|xargs -0 rm -rf
2016-06-14 06:57:38 +00:00
>show limits
2016-06-14 07:20:16 +00:00
xargs --show-limits
2016-06-14 06:57:38 +00:00
>move files to folder
2016-06-14 07:20:16 +00:00
find . -name "*.bak" -print 0|xargs -0 -I {} mv {} ~/old
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
find . -name "*.bak" -print 0|xargs -0 -I file mv file ~/old
2016-06-14 06:57:38 +00:00
>move first 100th files to a directory (e.g. d1)
2016-06-14 07:20:16 +00:00
ls |head -100|xargs -I {} mv {} d1
2016-06-14 06:57:38 +00:00
>parallel
2016-06-14 07:20:16 +00:00
time echo {1..5} |xargs -n 1 -P 5 sleepa lot faster than
time echo {1..5} |xargs -n1 sleep
2016-06-14 06:57:38 +00:00
>copy all files from A to B
2016-06-14 07:20:16 +00:00
find /dir/to/A -type f -name "*.py" -print 0| xargs -0 -r -I file cp -v -p file --target-directory=/path/to/B
2016-06-14 06:57:38 +00:00
//v: verbose|
//p: keep detail (e.g. owner)
>with sed
2016-06-14 07:20:16 +00:00
ls |xargs -n1 -I file sed -i '/^Pos/d' filename
2016-06-14 06:57:38 +00:00
>add the file name to the first line of file
2016-06-14 07:20:16 +00:00
ls |sed 's/.txt//g'|xargs -n1 -I file sed -i -e '1 i\>file\' file.txt
2016-06-14 06:57:38 +00:00
>count all files
2016-06-14 07:20:16 +00:00
ls |xargs -n1 wc -l
2016-06-14 06:57:38 +00:00
>to filter txt to a single line
2016-06-14 07:20:16 +00:00
ls -l| xargs
2016-06-14 06:57:38 +00:00
>count files within directories
2016-06-14 07:20:16 +00:00
echo mso{1..8}|xargs -n1 bash -c 'echo -n " 1:"; ls -la " 1"| grep -w 74 |wc -l' --
2016-06-14 06:57:38 +00:00
// "--" signals the end of options and display further option processing
>download dependencies files and install (e.g. requirements.txt)
2016-06-14 07:20:16 +00:00
cat requirements.txt| xargs -n1 sudo pip install
2016-06-14 06:57:38 +00:00
>count lines in all file, also count total lines
2016-06-14 07:20:16 +00:00
ls|xargs wc -l
2016-06-14 06:57:38 +00:00
2016-06-14 07:20:16 +00:00
#Find
2016-06-14 06:57:38 +00:00
>list all sub directory/file in the current directory
2016-06-14 07:20:16 +00:00
find .
2016-06-14 06:57:38 +00:00
>list all files under the current directory
2016-06-14 07:20:16 +00:00
find . -type f
2016-06-14 06:57:38 +00:00
>list all directories under the current directory
2016-06-14 07:20:16 +00:00
find . -type d
2016-06-14 06:57:38 +00:00
>edit all files under current directory (e.g. replace 'www' with 'ww')
2016-06-14 07:20:16 +00:00
find . name '*.php' -exec sed -i 's/www/w/g' {} \;
2016-06-14 06:57:38 +00:00
>if no subdirectory
2016-06-14 07:20:16 +00:00
replace "www" "w" -- *
2016-06-14 06:57:38 +00:00
//a space before *
>find and output only filename (e.g. "mso")
2016-06-14 07:20:16 +00:00
find mso*/ -name M* -printf "%f\n"
2016-06-14 06:57:38 +00:00
>find and delete file with size less than (e.g. 74 byte)
2016-06-14 07:20:16 +00:00
find . -name "*.mso" -size -74c -delete
2016-06-14 06:57:38 +00:00
//M for MB, etc
2016-06-14 07:20:16 +00:00
#Others
2016-06-14 06:57:38 +00:00
>remove newline / nextline
2016-06-14 07:20:16 +00:00
tr --delete '\n' < input.txt > output.txt
2016-06-14 06:57:38 +00:00
>replace newline
2016-06-14 07:20:16 +00:00
tr '\n' ' ' < filename
2016-06-14 06:57:38 +00:00
>compare files (e.g. fileA, fileB)
2016-06-14 07:20:16 +00:00
diff fileA fileB
2016-06-14 06:57:38 +00:00
//a: added; d:delete; c:changed
or
2016-06-14 07:20:16 +00:00
sdiff fileA fileB
2016-06-14 06:57:38 +00:00
//side-to-side merge of file differences
>number a file (e.g. fileA)
2016-06-14 07:20:16 +00:00
nl fileA
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
nl -nrz fileA
2016-06-14 06:57:38 +00:00
//add leading zeros
>combine/ paste two files (e.g. fileA, fileB)
2016-06-14 07:20:16 +00:00
paste fileA fileB
2016-06-14 06:57:38 +00:00
//default tab seperated
>reverse string
2016-06-14 07:20:16 +00:00
echo 12345| rev
2016-06-14 06:57:38 +00:00
>read .gz file without extracting
2016-06-14 07:20:16 +00:00
zmore filename
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
zless filename
2016-06-14 06:57:38 +00:00
>run in background, output error file
2016-06-14 07:20:16 +00:00
(command here) 2>log &
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
(command here) 2>& 1| tee logfile
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
(command here) 2>& 1 >>outfile
2016-06-14 06:57:38 +00:00
//0: standard input; 1: standard output; 2: standard error
>send mail
2016-06-14 07:20:16 +00:00
echo 'heres the content'| mail -A 'file.txt' -s 'mail.subject' me@gmail.com
2016-06-14 06:57:38 +00:00
//use -a flag to set send from (-a "From: some@mail.tld")
>.xls to csv
2016-06-14 07:20:16 +00:00
xls2csv filename
2016-06-14 06:57:38 +00:00
>append to file (e.g. hihi)
2016-06-14 07:20:16 +00:00
echo 'hihi' >>filename
2016-06-14 06:57:38 +00:00
>make BEEP found
2016-06-14 07:20:16 +00:00
speaker-test -t sine -f 1000 -l1
2016-06-14 06:57:38 +00:00
>set beep duration
2016-06-14 07:20:16 +00:00
(speaker-test -t sine -f 1000) & pid= !;sleep 0.1s;kill -9 pid
2016-06-14 06:57:38 +00:00
>history edit/ delete
2016-06-14 07:20:16 +00:00
~/.bash_history
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
history -d [line_number]
2016-06-14 06:57:38 +00:00
>get last history/record filename
2016-06-14 07:20:16 +00:00
head !
2016-06-14 06:57:38 +00:00
>clean screen
2016-06-14 07:20:16 +00:00
clear
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
Ctrl+l
2016-06-14 06:57:38 +00:00
>send data to last edited file
2016-06-14 07:20:16 +00:00
cat /directory/to/file
echo 100>!
2016-06-14 06:57:38 +00:00
>run history number (e.g. 53)
2016-06-14 07:20:16 +00:00
!53
2016-06-14 06:57:38 +00:00
>run last command
2016-06-14 07:20:16 +00:00
!!
2016-06-14 06:57:38 +00:00
>run last command that began with (e.g. cat filename)
2016-06-14 07:20:16 +00:00
!cat
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
!c
2016-06-14 06:57:38 +00:00
//run cat filename again
>extract .xf
2016-06-14 07:20:16 +00:00
1. unxz filename.tar.xz
2. tar -xf filename.tar
2016-06-14 06:57:38 +00:00
>install python package
2016-06-14 07:20:16 +00:00
pip install packagename
2016-06-14 06:57:38 +00:00
>random order (lucky draw)
2016-06-14 07:20:16 +00:00
for i in a b c d e; do echo i; done| shuf
2016-06-14 06:57:38 +00:00
>echo a random number
2016-06-14 07:20:16 +00:00
echo RANDOM
2016-06-14 06:57:38 +00:00
>Download file if necessary
2016-06-14 07:20:16 +00:00
data=file.txt
url=http://www.example.com/ data
if [! -s data];then
2016-06-14 06:57:38 +00:00
echo "downloading test data..."
2016-06-14 07:20:16 +00:00
wget url
2016-06-14 06:57:38 +00:00
fi
>wget to a filename (when a long name)
2016-06-14 07:20:16 +00:00
wget -O filename "http://example.com"
2016-06-14 06:57:38 +00:00
>wget files to a folder
2016-06-14 07:20:16 +00:00
wget -P /path/to/directory "http://example.com"
2016-06-14 06:57:38 +00:00
>delete current bash command
2016-06-14 07:20:16 +00:00
Ctrl+U
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
Ctrl+C
2016-06-14 06:57:38 +00:00
or
2016-06-14 07:20:16 +00:00
Alt+Shift+#
2016-06-14 06:57:38 +00:00
//to make it to history
>add things to history (e.g. "addmetohistory")
2016-06-14 07:20:16 +00:00
#addmetodistory
2016-06-14 06:57:38 +00:00
//just add a "#" before~~
>sleep awhile or wait for a moment or schedule a job
2016-06-14 07:20:16 +00:00
sleep 5;echo hi
2016-06-14 06:57:38 +00:00
>count the time for executing a command
2016-06-14 07:20:16 +00:00
time echo hi
2016-06-14 06:57:38 +00:00
>backup with rsync
2016-06-14 07:20:16 +00:00
rsync -av filename filename.bak
rsync -av directory directory.bak
rsync -av --ignore_existing directory/ directory.bak
rsync -av --update directory directory.bak
2016-06-14 06:57:38 +00:00
//skip files that are newer on receiver (i prefer this one!)
>make all directories at one time!
2016-06-14 07:20:16 +00:00
mkdir -p project/{lib/ext,bin,src,doc/{html,info,pdf},demo/stat}
2016-06-14 06:57:38 +00:00
//-p: make parent directory
//this will create project/doc/html/; project/doc/info; project/lib/ext ,etc
>run command only if another command returns zero exit status (well done)
2016-06-14 07:20:16 +00:00
cd tmp/ & & tar xvf ~/a.tar
2016-06-14 06:57:38 +00:00
>run command only if another command returns non-zero exit status (not finish)
2016-06-14 07:20:16 +00:00
cd tmp/a/b/c ||mkdir -p tmp/a/b/c
2016-06-14 06:57:38 +00:00
>extract to a path
2016-06-14 07:20:16 +00:00
tar xvf -C /path/to/directory filename.gz
2016-06-14 06:57:38 +00:00
>use backslash "\" to break long command
2016-06-14 07:20:16 +00:00
cd tmp/a/b/c \
2016-06-14 06:57:38 +00:00
> || \
>mkdir -p tmp/a/b/c
>get pwd
2016-06-14 07:20:16 +00:00
VAR= PWD; cd ~; tar xvf -C VAR file.tar
2016-06-14 06:57:38 +00:00
//PWD need to be capital letter
>list file type of file (e.g. /tmp/)
2016-06-14 07:20:16 +00:00
file /tmp/
2016-06-14 06:57:38 +00:00
//tmp/: directory
>bash script
#!/bin/bash
2016-06-14 07:20:16 +00:00
file= {1#*.}
2016-06-14 06:57:38 +00:00
//remove string before a "."
2016-06-14 07:20:16 +00:00
file= {1%.*}
2016-06-14 06:57:38 +00:00
//remove string after a "."
=-=-=-=-=-A lot more coming!! =-=-=-=-=-=-=-=-=-=waitwait-=-=-=-=-=-=-=-=-=-