Table of Contents

When you create a model for your domain, C++ offers you flexibility and increates type-safety with so-called Strong Types. Rather than working with simple built-in types, you can create a set of well-defined classes that better suits your needs. In a new blog post, you can see one concrete example of such a design practice.

This is a guest post by prof. Bogusław Cyganek:

Prof. Cyganek is a researcher and lecturer at the Department of Electronics, AGH University of Science and Technology in Cracow, Poland. He has worked as a software engineer for a number of companies such as Nisus Writer USA, Compression Techniques USA, Manta Corp. USA, Visual Atoms UK, Wroclaw University in Poland, and Diagnostyka Inc. Poland. His research interests include computer vision and pattern recognition, as well as the development of embedded systems. See his recent book at Amazon and his home page.

Often when you work on projects that process people’s identity, you might need a class representing a Person. As a start, let’s consider the following implementation:

struct Person
{
    std::string firstName;
    std::string lastName;

    int pesel_id {};
    
    // ...
};

Person myself { "Adam", "Kowal", 94120612345 };

std::cout << "I'm " << myself.firstName << " ";
std::cout << myself.lastName << " and my ID is: " << myself.pesel_id << std::endl;

but the computer output was not exactly as expected …

I'm Adam Kowal and my ID is: -368668167

The magic pesel_id field is a PESEL ID number used in Polish jurisdiction, similar to the Social Security Number in USA, or 15 digit Carte Vitale in France, for instance. Quite a useful field in a database since it is unique to every citizen. Even more, it has some useful information encoded, which is a date of birth and gender, so we gain both information in one member. However, the thing is that PESEL requires 11 digits, which in our example simply did not fit into the int type. Although I think it would be even worse if it did because we’d leave the class without a proper lesson and with a potential bomb in the code…

The first lesson is always to check if the range of our data fits into the range of a chosen type, such as int. To do this, we need to answer how many bits are necessary to store an 11 digits long number?

Since for 8 bits without a sign this value is 2^8^-1 = 255, then our question boils down to answer what is the lowest number of bits, denoted as N, which fulfills 2^N^-1≥99999999999.

A simple calculation provides *N*≥37. Now we see what happened - since int in our system is stored on 4 bytes, as can be easily verified invoking the sizeof( int ) operator, only parts of bits could be stored, while the leftmost turned the value to be negative. This error seems obvious now but how many times we set some values, for example, read from a spread sheet, not checking their range. Also, frequently we overlook messages issued by a compiler, which warned us also in this case.

The second lesson comes when we try to fix the above problem. Well, on 64-bit systems, the simplest is to choose a type with more bits, such as unsigned long long. Useful but still does not free us from checking if 8 * sizeof(unsigned long long) > 36 on all systems we wish to have our code working. Maybe an alternative is to use an array, each cell for a separate digit. Surely std::vector will work, but std::array<unsigned char, 11> may be more efficient since the number of digits is always the same. However, still, it is at least 11 bytes of storage which may be further aligned depending on the system properties. No good for a database.

Can we do better? Yes, we can ;)

Since to store a digit 0-9, only 4 bits are sufficient, and then each byte can be used to hold two digits. With help comes the bit field feature of C++, as in the following structure

struct NibblePair
{
    unsigned char fFirst  : 4;        // define two bit fields
    unsigned char fSecond : 4;        // of a total size of 1 byte

    NibblePair() : fFirst( 0 ), fSecond( 0 ) {}
};  

The above allows for storage in the binary coded decimal format (BCD), today a little forgotten but still in use on embedded platforms and for precise computations.

Now we can split the size of our previous array by half, that is we may ended up with something like this

std::array<NibblePair, 6> thePESEL;

This is even lower bytes than for unsigned long long.

However, although almost done, we easily notice that reading and writing chunks of 4-bits is not that convenient in practice, so some helper functions would be useful here. We are almost about writing them when we notice that such long numbers can happen in the future – books’ ISBN numbers, journals ISSN, passport serial number, or cars chassis IDs are just few examples. So, instead of writing something very specific exclusively for the PESEL numbers we come with an idea of a more general class for this and similar situations – the TLongNumberFor class.

The class to store long numbers

The above code snippets can be joined together into one class able to efficiently store numbers of any but fixed length in the BCD format. Its version is presented here

Listing 1. Definition of the TLongNumberFor class.

// This class efficiently stores a series of numbers, such as 12345678901234567890
// of a given length. Each number is stored in a nibble (i.e. 4 bits).
//
// The auto keyword in a template parameter -
// the type is deduced at the point of instantiation. 
//
template < auto MAX_NUMBERS >
class TLongNumberFor
{
public:
    static const auto kMaxNumbers { MAX_NUMBERS };

private:
    // --------------------------------
    struct NibblePair
    {
        unsigned char fFirst  : 4;    // define two bit fields
        unsigned char fSecond : 4;    // of a total size of 1 byte

        NibblePair() : fFirst( 0 ), fSecond( 0 ) {}
    };
    // --------------------------------

    static const auto kNumOfBytes = (kMaxNumbers >> 1) + (kMaxNumbers & 0x01);

    using NibbleArray = std::array< NibblePair, kNumOfBytes >;
    NibbleArray    fData {}; // Here we efficiently store the nibbles 

    // Helper functions 
    // Returns true if first nibble 
    bool IsFirstNibble( int index ) const { return ( index & 0x01 ) == 0; }
        
    // Returns address of a number in the fData structure
    auto ReComputeIndex( int index ) const { return index >> 1; }

The most interesting parts are the following setter and getter functions.

public:
    int GetNumberAt( int position ) const
    {
        assert( position < kMaxNumbers );
        if( position >= kMaxNumbers )
            throw std::out_of_range( "position out of range" );

        return IsFirstNibble( position ) ? 
                fData[ ReComputeIndex( position ) ].fFirst : 
                fData[ ReComputeIndex( position ) ].fSecond;
    }

    void SetNumberAt( int position, int val )
    {
        assert( val >= 0 && val <= 9 ); // check that we don't abuse it
        assert( position < kMaxNumbers );
        if( position >= kMaxNumbers )
            throw std::out_of_range( "position out of range" );

        IsFirstNibble( position ) ? 
                ( fData[ ReComputeIndex( position ) ].fFirst = val ) : 
                ( fData[ ReComputeIndex( position ) ].fSecond = val );
    }
};

Good, but … Why don’t define the subscript operator? Let’s try

// Overloaded subscript operator but ONLY to READ.
// To write, we will need a proxy pattern (see below).
const int operator [] ( int position ) const
{
    assert( position < kMaxNumbers );
    return GetNumberAt( position );
}

It seems that the above operator [] works fine but only in the read operations. When trying to read-and-write we encounter a problem since we cannot simply return a reference to a nibble, i.e. the first or the second 4-bits field in which we store our digits. Can we fix this? Yes, with an interesting proxy pattern, but that is a slightly longer story for another post maybe. This, as well as full definition of the TLongNumberFor class, also containing conversions to and from std::string, can be read in my recent book. Don’t worry - the code is ready instantaneously from the GitHub.

The PESEL class

Now the class to represent a specific series of digits can be defined as a wrapper around the TLongNumberFor<D> object fData, where D denotes the number of digits. This way PESEL can be defined as follows.

Listing 2. Definition of the PESEL class.

class PESEL
{
    // Some constants specific to the Polish PESEL number
    enum { kBirthYear_Dec = 10, kBirthYear_Sngl = 9, kSex = 1 };

public:
    enum class ESex { kMan, kWoman };

private:
    using LongNumberObject = TLongNumberFor< 11 >;
    LongNumberObject    fData;

public:
    PESEL( void ) {}
    PESEL( const string & s ) : fData( s ) {}

public:
    auto GetYearOfBirth( void ) 
    { 
        return fData.GetNumberAt( kBirthYear_Dec ) * 10 
            + fData.GetNumberAt( kBirthYear_Sngl );
    }

    ESex GetSex( void )
    {
        return ( fData.GetNumberAt( kSex ) & 0x01 ) == 0x01 ? 
                ESex::kMan : ESex::kWoman;
    }
};

A useful thing to have is the converting constructor defined on line 18, which allows for initialization with a PESEL number in the std::string format. This, in turn, requires such a converting constructor in the TLongNumberFor class. For simplicity, they are omitted here. However, you can look them up in the code on GitHub.

Now, we can amend our Person class, as follows.


struct Person
{
    std::string firstName;
    std::string lastName;

    PESEL person_id;
    
    // ...
};

Person myself { "Adam", "Kowal", "94120612345" };
std::cout << "I'm " << myself.firstName << " ";
std::cout << myself.lastName << " and my ID is: " << myself.PESEL << std::endl;

And now the output is as expected:

I'm Adam Kowal and my ID is: 94120612345

Warning: the code in the article assumes a perfect PESEL number that conforms to all the rules in the PESEL simplified specification (see at Wiki). However, some ids could be manually assigned by clerks, sometimes with errors. The production-ready system should also include those special cases.

What next?

The TLongNumberFor and its following PESEL classes are examples of strong types - i.e. rather than using the built-in types, such as int or long long, we defined dedicated classes to represent specific entities.

With these two we have also encountered two nice and very useful design patterns - the wrapper, as well as the mentioned proxy.

Conclusions

It is a long way we traversed from a simple int up to TLongNumberFor and PESEL classes. But the former makes us prepared for any fixed length numbers. On the other hand, PESEL helps to safely and efficiently store unique ID numbers - a very useful feature in any relational database. Moreover, we gained a bonus of a date of birth and a gender flag encoded in each PESEL id, so we can save on storing these as well. Here are some hints:

  • Always check types and the range of values to store - based on these choose the proper data type to represent in C++
  • Use setters to control the range of valid entries
  • Prefer strong types over the built-in ones
  • Pay attention to the compiler warnings

This and other examples can be found in my latest book Introduction to Programming with C++ for Engineers.

Have fun!

References

  1. Cyganek B.: Introduction to Programming with C++ for Engineers. Wiley, 2021. @Amazon
  2. Source Code For The book @GitHub
  3. International Standard Book Number - Wikipedia
  4. PESEL - Wikipedia
  5. XBCD_Math - Extended Precision