Toggling string case in linux bash

2012-10-31
#sed #awk #python #bash #tr #dd #perl #console #linux

It’s quite an academic task, but anyway useful sometimes. I’ve collected different ways to do it in terminal in linux. Some of them work with UTF-8 characters (some it will toggle case for “й”, “ё” and so on. It will not in general handle special ligatures, such as “ß”" and “fi”.)

Ways are: sed, perl, python, awk, tr, bash, dd.

§ Using sed

Works with UTF-8 characters.

It is quite straightforward and allows to add custom rules easily. For example, I’ve added special handling for ligatures “ß”" and “fi”. I should note that conversion SS -> ß is not correct in general. So you may want to remove it.

$ echo ''Wie hйЮёßen Siefi тест'' | sed ''s/.*/\U&/;s/ß/SS/g;s/fi/FI/g''
WIE HЙЮЁSSEN SIEFI ТЕСТ

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | sed ''s/ß/SS/g;s/fi/FI/g;s/.*/\L&/''
wie hйюёssen siefi тест

§ Using perl

Doesn’t work with UTF-8 characters.

I’m not a perl-ninja, may be there is a more efficient way. But it works.

$ echo ''Wie hйЮёßen Siefi тест'' | perl -ne ''print uc($_)''
WIE HйЮёßEN SIEfi тест

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | perl -ne ''print lc($_)''
wie hЙЮЁssen siefi ТЕСТ

§ Using python

Doesn’t work with UTF-8 characters.

Python nowadays sometimes said to be replacement for perl. It can not convert Cyrillic letters (UTF-8) too.

$ echo ''Wie hйЮёßen Siefi тест'' | python -c "import sys; [sys.stdout.write(arg.upper()) for arg in raw_input()]; print "\n""
WIE HйЮёßEN SIEfi тест

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | python -c "import sys; [sys.stdout.write(arg.lower()) for arg in raw_input()]; print "\n""
wie hЙЮЁssen siefi ТЕСТ

§ Using awk

Doesn’t work with UTF-8 characters in mawk, works with UTF-8 characters in gawk.

Default awk in Ubuntu 12.04 is mawk. To get UTF-8 support you have to install gawk and use it.

$ echo ''Wie hйЮёßen Siefi тест'' | gawk ''{for (i=1; i<=NF; i++) printf toupper($i)" "} END {print ""}''
WIE HЙЮЁßEN SIEfi ТЕСТ

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | gawk ''{for (i=1; i<=NF; i++) printf tolower($i)" "} END {print ""}''
wie hйюёssen siefi тест

§ Using tr

Doesn’t work with UTF-8 characters.

It works with current locale. But I work in us locale and my native language is Russian.

It is the easiest way I believe. It also fits the purpose of tr — to translate and delete characters. It is possible to add custom rules such as “tr ‘ё’ ‘Ё’", but it caused new strange symbols to appear in the output.

$ echo ''Wie hйЮёßen Siefi тест'' | tr ''[:lower:]'' ''[:upper:]''
WIE HйЮёßEN SIEfi тест

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | tr ''[:upper:]'' ''[:lower:]''
wie hЙЮЁssen siefi ТЕСТ

§ Using bash

Doesn’t work with UTF-8 characters.

Warning! It’s weird way for converting strings, but a good way to convert variables in bash scripts.

$ export a=''Wie hйЮёßen Siefi тест'' ; echo ${a^^}
WIE HйЮёßEN SIEfi тест

$ export a=''WIE HЙЮЁSSEN SIEFI ТЕСТ'' ; echo ${a,,}
wie hЙЮЁssen siefi ТЕСТ

§ Using dd

Doesn’t work with UTF-8 characters.

The bad news it outputs more information than only toggled string. Just for collection.

$ echo ''Wie hйЮёßen Siefi тест'' | dd conv=ucase
WIE HйЮёßEN SIEfi тест
0+1 records in
0+1 records out
32 bytes (32 B) copied, 0.000124458 s, 257 kB/s

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | dd conv=lcase
wie hЙЮЁssen siefi ТЕСТ
0+1 records in
0+1 records out
31 bytes (31 B) copied, 7.913e-05 s, 392 kB/s

§ Using php

Doesn’t work with UTF-8 characters.

PHP can be used as scripting language for general purposes with php-cli.

$ echo ''Wie hйЮёßen Siefi тест'' | php -r "print strtoupper(fgets(STDIN));"
WIE HйЮёßEN SIEfi тест

$ echo ''WIE HЙЮЁSSEN SIEFI ТЕСТ'' | php -r "print strtolower(fgets(STDIN));"
wie hЙЮЁssen siefi ТЕСТ