How To Parallelise CSV Reader - New Chapter

Learning new language and library features in isolation might not be the best choice. That’s why in my book I also added bigger examples where many C++ elements coexist together.

This time I’d like to describe another book update where I create and walk through a CSV reader application. The application uses a lot of language and library components. And the main task is to parallelise it. Can it work faster than the sequential version?

The New Chapter - How To Parallelise CSV Reader

To have a more extensive example where I could present multiple C++17 elements I decided to go for a CSV reader application. The app exposes a few problems, and it requires to combine not only filesystem but also different algorithms and utilities from STL.

Imagine you work with some sales data, and one task is to calculate a sum of orders for some products. Your shopping system is elementary, and instead of some database, you have CSV files with the data about orders. There’s one file per product.

For example, here are the book sales:

date	coupon code	price	discount	quantity
5-12-2018		10.0	0	2
5-12-2018		10.0	0	1
6-12-2018	Santa	10.0	0.25	1
7-12-2018		10.0	0	1

Each line shows a book sale on a specific date. For example 5th Dec there were three sales, 10$ each, and one person bought two books. On 6th Dec we had one transaction with a coupon code.

The data is encoded as a CSV file: sales/book.csv:

5-12-2018;;10.0;0;2;
5-12-2018;;10.0;0;1;
6-12-2018;Santa;10.0;0.25;1;
7-12-2018;;10.0;0;1;

The application should read the data and then calculate the sum, in the above case we have

sum = 10*2+10*1+       // 5th Dec
      10*(1-0.25)*1 +  // 6th Dec with 25% coupon
      10*1;            // 7th Dec

For the above sales data, the final sum is 47.5.

Here are requirements of the application we want to build:

The app loads all CSV files in a given folder - read from the first argument in the command line
The files might contain thousands of records but will fit into memory. There’s no need to provide extra support for huge files
Optionally, the app reads the start and the end date from the second and the third command line argument $
Each CSV line has the following structure: date;coupon code;unit price;quantity;discount;
The application sums all orders between given dates and prints the sum to the standard output

I’ll guide you through the design of the app and explain which places can be parallelised. In the end, we’ll discuss what worked, and what were the issues and possible improvements.

For example here’s a flow diagram of the parallel version:

The code uses not only parallel algorithms but also new language and library features. For example, to parses data, it uses std::string_view, new conversion routines (std::from_chars), and std::optional. To process files, the application leverages several methods from std::filesystem.

The code contains more than 300 lines… x2, as there’s sequential and a parallel version.

Here’s the link to the book: C++17 In Detail@Leanpub Acknowledgements

Special thanks to JFT, Jacek Galowicz, Michał Czaja, Łukasz Rachwalski, Billy O’Neil and other reviewers who contributed to the chapter!

Book Mentions

So far the book was mentioned in several places.

The book is listed in one of the articles from the Visual C++ Team: Books on C++17 | Visual C++ Team Blog
There’s a review at CppDepend blog:C++ 17 In Detail Book Review – CppDepend Blog (including a little discount)
And there’s also a GoodReads page: C++17 in Detail @GoodReads

The Plans

The book is still not 100% ready, but getting close to the end. Here’s the current plan:

rewrite the filesystem chapter (in progress)
describe missing features: polymorphic allocators, aggregate initialisation, scoped_lock, update structured bindings intro,
polishing across the whole book

The filesystem chapter should be ready in Mid January.

Until the book is not 100% done, you have a chance to buy it much cheaper and get free updates later.

Your Feedback

I appreciate your initial feedback and support! The book has now almost 800 readers (and only six refunds)! That’s not too bad I think :)

Let me know what’s your experience with the book. What would you like to change? What would you like to see more?

You can use this comment site:
https://leanpub.com/cpp17indetail/feedback

Or forum:
https://community.leanpub.com/c/cpp17indetail

To celebrate the update, I offer a nice 10% discount, available till the end of the year.

Just use this link to buy the book:

leanpub.com/cpp17indetail/EndOfYearPromo