InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

Menu
  • Home
  • Sitemap

Python Programming Language Best Tutorials and Code Examples

Learn Python Right Now!
Home
Bash
Processing Fixed-Length Records in bash
Bash

Processing Fixed-Length Records in bash

InfinityCoder February 21, 2017

You need to read and process data that is in a fixed-length (also called fixed-width) form.

Use Perl or gawk 2.13 or greater. Given a file like:

1
2
3
4
5
$ cat fixed-length_file
Header1-----------Header2-------------------------Header3---------
Rec1 Field1       Rec1 Field2                     Rec1 Field3
Rec2 Field1       Rec2 Field2                     Rec2 Field3
Rec3 Field1       Rec3 Field2                     Rec3 Field3

You can process it using GNU’s gawk, by setting FIELDWIDTHS to the correct field lengths, setting OFS as desired, and making an assignment so gawk rebuilds the record.

However, gawk does not remove the spaces used in padding the original record, so we use two gsubs
to do that, one for all the internal fields and the other for the last field in each record.
Finally, we just print. Note the ➝ denotes a literal tab character in the output.

The output is a little hard to read, so there is a hex dump as well.

Recall that ASCII tab is 09 while ASCII space is 20.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\
t"); gsub(/ +$/, ""); print }' fixed-length_file
Header1----------- ➝ Header2------------------------- ➝ Header3---------
Rec1 Field1 ➝ Rec1 Field2 ➝ Rec1 Field3
Rec2 Field1 ➝ Rec2 Field2 ➝ Rec2 Field3
Rec3 Field1 ➝ Rec3 Field2 ➝ Rec3 Field3
 
$ gawk ' BEGIN { FIELDWIDTHS = "18 32 16"; OFS = "\t" } { $1 = $1; gsub(/ +\t/, "\
t"); gsub(/ +$/, ""); print }' fixed-length_file | hexdump -C
00000000 48 65 61 64 65 72 31 2d 2d 2d 2d 2d 2d 2d 2d 2d |Header1---------|
00000010 2d 2d 09 48 65 61 64 65 72 32 2d 2d 2d 2d 2d 2d |--.Header2------|
00000020 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
00000030 2d 2d 2d 09 48 65 61 64 65 72 33 2d 2d 2d 2d 2d |---.Header3-----|
00000040 2d 2d 2d 2d 0a 52 65 63 31 20 46 69 65 6c 64 31 |----.Rec1 Field1|
00000050 09 52 65 63 31 20 46 69 65 6c 64 32 09 52 65 63 |.Rec1 Field2.Rec|
00000060 31 20 46 69 65 6c 64 33 0a 52 65 63 32 20 46 69 |1 Field3.Rec2 Fi|
00000070 65 6c 64 31 09 52 65 63 32 20 46 69 65 6c 64 32 |eld1.Rec2 Field2|
00000080 09 52 65 63 32 20 46 69 65 6c 64 33 0a 52 65 63 |.Rec2 Field3.Rec|
00000090 33 20 46 69 65 6c 64 31 09 52 65 63 33 20 46 69 |3 Field1.Rec3 Fi|
000000a0 65 6c 64 32 09 52 65 63 33 20 46 69 65 6c 64 33 |eld2.Rec3 Field3|
000000b0 0a                                              |.|
000000b1

If you don’t have gawk, you can use Perl, which is more straightforward anyway.

We use a non-printing while input loop (-n), unpack each record ($_) as it’s read, and turn the resulting list back into a scalar by joining the elements with a tab.

We then print each record, adding a newline at the end:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ perl -ne 'print join("\t", unpack("A18 A32 A16", $_) ) . "\n";' fixed-length_file
Header1----------- ➝ Header2------------------------- ➝ Header3---------
Rec1 Field1 ➝ Rec1 Field2 ➝ Rec1 Field3
Rec2 Field1 ➝ Rec2 Field2 ➝ Rec2 Field3
Rec3 Field1 ➝ Rec3 Field2 ➝ Rec3 Field3
 
$ perl -ne 'print join("\t", unpack("A18 A32 A16", $_) ) . "\n";' fixed-length_file |
hexdump -C
00000000 48 65 61 64 65 72 31 2d 2d 2d 2d 2d 2d 2d 2d 2d |Header1---------|
00000010 2d 2d 09 48 65 61 64 65 72 32 2d 2d 2d 2d 2d 2d |--.Header2------|
00000020 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d |----------------|
00000030 2d 2d 2d 09 48 65 61 64 65 72 33 2d 2d 2d 2d 2d |---.Header3-----|
00000040 2d 2d 2d 2d 0a 52 65 63 31 20 46 69 65 6c 64 31 |----.Rec1 Field1|
00000050 09 52 65 63 31 20 46 69 65 6c 64 32 09 52 65 63 |.Rec1 Field2.Rec|
00000060 31 20 46 69 65 6c 64 33 0a 52 65 63 32 20 46 69 |1 Field3.Rec2 Fi|
00000070 65 6c 64 31 09 52 65 63 32 20 46 69 65 6c 64 32 |eld1.Rec2 Field2|
00000080 09 52 65 63 32 20 46 69 65 6c 64 33 0a 52 65 63 |.Rec2 Field3.Rec|
00000090 33 20 46 69 65 6c 64 31 09 52 65 63 33 20 46 69 |3 Field1.Rec3 Fi|
000000a0 65 6c 64 32 09 52 65 63 33 20 46 69 65 6c 64 33 |eld2.Rec3 Field3|
000000b0 0a |.|
000000b1

See the Perl documentation for the pack and unpack template formats.

Anyone with any Unix background will automatically use some kind of delimiter in output, since the textutils toolchain is never far from mind, so fixed-length (also called fixed-width) records are rare in the Unix world.

They are very common in the mainframe world however, so they will occasionally crop up in large applications that originated on big iron, such as some applications from SAP.

As we’ve just seen, it’s no problem to handle.
One caveat to this recipe is that it requires each record to end in a newline.

Share
Tweet
Email
Prev Article
Next Article

Related Articles

Handling Filenames Containing Odd Characters in bash
First, understand that to Unix folks, odd means “anything not …

Handling Filenames Containing Odd Characters in bash

Processing Files with No Line Breaks in bash
You have a large file with no line breaks, and …

Processing Files with No Line Breaks in bash

About The Author

InfinityCoder
InfinityCoder

Leave a Reply

Cancel reply

Recent Tutorials InfinityQuest

  • Adding New Features to bash Using Loadable Built-ins in bash
    Adding New Features to bash Using Loadable …
    June 27, 2017 0
  • Getting to the Bottom of Things in bash
    Getting to the Bottom of Things in …
    June 27, 2017 0

Recent Comments

  • fer on Turning a Dictionary into XML in Python
  • mahesh on Turning a Dictionary into XML in Python

Categories

  • Bash
  • PHP
  • Python
  • Uncategorized

InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more

About Us

Start learning your desired programming language with InfinityQuest.com.

On our website you can access any tutorial that you want with video and code examples.

We are very happy and honored that InfinityQuest.com has been listed as a recommended learning website for students.

Popular Tags

binary data python CIDR convert string into datetime python create xml from dict python dictionary into xml python how to create xml with dict in Python how to write binary data in Python IP Address read binary data python tutorial string as date object python string to datetime python

Archives

  • June 2017
  • April 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
Copyright © 2021 InfinityQuest - Programming Code Tutorials and Examples with Python, C++, Java, PHP, C#, JavaScript, Swift and more
Programming Tutorials | Sitemap