Last Update:
Boost File Scanning Speed: Query File Attributes on Windows 50x Faster
Table of Contents
Imagine you’re developing a tool that needs to scan for file changes across thousands of project files. Retrieving file attributes efficiently becomes critical for such scenarios. In this article, I’ll demonstrate a technique to get file attributes that can achieve a surprising speedup of over 50+ times compared to standard Windows methods.
Let’s dive in and explore how we can achieve this.
Inspiration & Disclaimer
The inspiration for this article came from a recent update for Visual Assist - a tool that heavily improves Visual Studio experience and productivity for C# and C++ developers.
In one of their blog post, they shared:
The initial parse is 10..15x faster!
What’s New in Visual Assist 2024—Featuring lightning fast parser performance [Webinar] - Tomato Soup
After watching the webinar, I noticed some details about efficiently getting file attributes and I decided to give it a try on my machine. In other words I tried to recreate their results.
Disclaimer: This post was written with the support and sponsorship of Idera, the company behind Visual Assist.
Understanding File Attribute Retrieval Methods on Windows
On Windows, there are at least a few options to check for a file change:
FindFirstFile[EX]
GetFileAttributesEx
std::filesystem
Below, you can see some primary usage of each approach:
FindFirstFileEx
FindFirstFileEx
is a Windows API function that allows for efficient searching of directories. It retrieves information about files that match a specified file name pattern. The function can be used with different information levels, such as FindExInfoBasic
and FindExInfoStandard
, to control the amount of file information fetched.
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(), FindExInfoBasic, &findFileData, FindExSearchNameMatch, NULL, 0);
if (hFind != INVALID_HANDLE_VALUE) {
do {
// Process file information
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
GetFileAttributesEx
GetFileAttributesEx
is another Windows API function that retrieves file attributes for a specified file or directory. Unlike FindFirstFileEx
, which is used for searching and listing files, GetFileAttributesEx
is typically used for retrieving attributes of a single file or directory.
WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
if (GetFileAttributesEx((directory + "\\" + fileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
// Process file attributes
}
std::filesystem
Introduced in C++17, the std::filesystem
library provides a modern and portable way to interact with the file system. It includes functions for file attribute retrieval, directory iteration, and other common file system operations.
for (const auto& entry : fs::directory_iterator(directory)) {
if (entry.is_regular_file()) {
// Process file attributes
auto ftime = fs:last_write_time(entry);
...
}
}
The Benchmark
To evaluate the performance of different file attribute retrieval methods, I developed a small benchmark. This application measures the time taken by each method to retrieve file attributes for N number of files in a specified directory.
Here’s a rough overview of the code:
The FileInfo
struct stores the file name and last write time.
struct FileInfo {
std::string fileName;
FILETIME lastWriteTime;
};
Each retrieval technique will have to go over a directory and build a vector of FileInfo
objects.
BenchmarkFindFirstFileEx
void BenchmarkFindFirstFileEx(const std::string& directory,
std::vector<FileInfo>& files,
FINDEX_INFO_LEVELS infoLevel)
{
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFileEx((directory + "\\*").c_str(),
infoLevel,
&findFileData,
FindExSearchNameMatch, NULL, 0);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "FindFirstFileEx failed ("
<< GetLastError() << ")\n";
return;
}
do {
if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
FileInfo fileInfo;
fileInfo.fileName = findFileData.cFileName;
fileInfo.lastWriteTime = findFileData.ftLastWriteTime;
files.push_back(fileInfo);
}
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
BenchmarkGetFileAttributesEx
void BenchmarkGetFileAttributesEx(const std::string& directory,
std::vector<FileInfo>& files)
{
WIN32_FIND_DATA findFileData;
HANDLE hFind = FindFirstFile((directory + "\\*").c_str(),
&findFileData);
if (hFind == INVALID_HANDLE_VALUE) {
std::cerr << "FindFirstFile failed ("
<< GetLastError() << ")\n";
return;
}
do {
if (!(findFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
WIN32_FILE_ATTRIBUTE_DATA fileAttributeData;
if (GetFileAttributesEx((directory + "\\" + findFileData.cFileName).c_str(), GetFileExInfoStandard, &fileAttributeData)) {
FileInfo fileInfo;
fileInfo.fileName = findFileData.cFileName;
fileInfo.lastWriteTime = fileAttributeData.ftLastWriteTime;
files.push_back(fileInfo);
}
}
} while (FindNextFile(hFind, &findFileData) != 0);
FindClose(hFind);
}
BenchmarkStdFilesystem
And the last one, the most portable technique:
void BenchmarkStdFilesystem(const std::string& directory,
std::vector<FileInfo>& files)
{
for (const auto& entry : std::filesystem::directory_iterator(directory)) {
if (entry.is_regular_file()) {
FileInfo fileInfo;
fileInfo.fileName = entry.path().filename().string();
auto ftime = std::filesystem::last_write_time(entry);
memcpy(&fileInfo.lastWriteTime, &ftime, sizeof(FILETIME));
files.push_back(fileInfo);
}
}
}
In the code, we use the assumption that file_time_type values maps to FILETIME
on Windows. Read more in this explanation std::filesystem::file_time_type does not allow easy conversion to time_t - Developer Community
The Main Function
The main
function sets up the benchmarking environment, runs the benchmarks, and prints the results.
// Benchmark FindFirstFileEx (Basic)
auto start = std::chrono::high_resolution_clock::now();
BenchmarkFindFirstFileEx(directory,
filesFindFirstFileExBasic,
FindExInfoBasic);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsedFindFirstFileExBasic = end - start;
// Benchmark FindFirstFileEx (Standard)
start = std::chrono::high_resolution_clock::now();
BenchmarkFindFirstFileEx(directory,
filesFindFirstFileExStandard,
FindExInfoStandard);
end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsedFindFirstFileExStandard = end - start;
// ...
This benchmark code measures the performance of FindFirstFileEx
with both FindExInfoBasic
and FindExInfoStandard
, GetFileAttributesEx
, and std::filesystem
. The results are then formatted and displayed in a table.
Performance Results
To measure the performance of each file attribute retrieval method, I executed benchmarks on a directory containing 1000, 2000 or 5000 random text files. The tests were performed on a laptop equipped with an Intel i7 4720HQ CPU and an SSD. I measured the time taken by each method and compared the results to determine the fastest approach.
Each test run consisted of two executions: the first with uncached file attributes and the second likely benefiting from system-level caching.
The speedup factor is the factor of the current result compared to the slowest technique in a given run.
1000 files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0131572000 17.876
FindFirstFileEx (Standard) 0.0018139000 129.665
GetFileAttributesEx 0.2351992000 1.000
std::filesystem 0.0607928000 3.869
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0009740000 61.956
FindFirstFileEx (Standard) 0.0009998000 60.358
GetFileAttributesEx 0.0602633000 1.001
std::filesystem 0.0603455000 1.000
Directory with 2000 files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0023182000 54.402
FindFirstFileEx (Standard) 0.0044334000 28.446
GetFileAttributesEx 0.1261137000 1.000
std::filesystem 0.1259038000 1.002
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0022301000 55.417
FindFirstFileEx (Standard) 0.0040665000 30.391
GetFileAttributesEx 0.1235858000 1.000
std::filesystem 0.1220140000 1.013
Directory with 5000 random, small text files:
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0059723000 113.144
FindFirstFileEx (Standard) 0.0125500000 53.843
GetFileAttributesEx 0.6757297000 1.000
std::filesystem 0.3098593000 2.181
Method Time (seconds) Speedup Factor
FindFirstFileEx (Basic) 0.0060349000 52.300
FindFirstFileEx (Standard) 0.0136566000 23.112
GetFileAttributesEx 0.3156277000 1.000
std::filesystem 0.3075732000 1.026
The results consistently showed that FindFirstFileEx
with the Standard
flag was the fastest method in uncached scenarios, offering speedups up to 129x compared to GetFileAttributesEx
. However, in cached scenarios, FindFirstFileEx
(Basic and Standard) achieved over 50x speedup improvements.
For the directory with 2000 files, FindFirstFileEx
(Basic) demonstrated a speedup factor of over 54x in the first run and maintained similar performance in the second run. In the directory with 5000 files, the Basic
version achieved an impressive 113x speedup initially and 52x in the subsequent run, reflecting the impact of caching. Notably, std::filesystem
performed on par with GetFileAttributesEx
.
Further Techniques
Getting file attributes is only part of the story, and while important, they may contribute to only a small portion of the overall performance for the whole project. The Visual Assist team, who contributed to this article, improved their initial parse time performance by avoiding GetFileAttributes[Ex]
using the same techniques as this article. But Visual Assist also improved performance through further techniques. My simple benchmark showed 50x speedups, but we cannot directly compare it with the final Visual Assist, as the tool does many more things with files.
The main item being optimised was the initial parse, where VA builds a symbol database when a project is opened for the first time. This involves parsing all code and all headers. They decided that it’s a reasonable assumption that headers won’t change while a project is being loaded, and so the file access is cached during the initial parse, avoiding the filesystem entirely. (Changes after a project has been parsed the first time are, of course, still caught.) The combination of switching to a much faster method for checking filetimes and then avoiding file IO completely contributed to the up-to-15-times-faster performance improvement they saw in version 2024.1 at the beginning of this year.
Read further details on their blog Visual Assist 2024.1 release post - January 2024 and Catching up with VA: Our most recent performance updates - Tomato Soup.
Summary
In the text, we went through a benchmark that compares several techniques for fetching file attributes. In short, it’s best to gather attributes at the same time as you iterate through the directory - using FindFirstFileEx
. So if you want to do this operation hundreds of times, it’s best to measure time and choose the best technique.
The benchmark also showed one feature: while C++17 and its filesystem
library offer a robust and standardized way to work with files and directories, it can be limited in terms of performance. In many cases, if you need super optimal performance, you need to open the hood and work with the specific operating system API.
The code can be found in my Github Respository: FileAttribsTest.cpp
Back to you
- Do you use std::filesystem for tasks involving hundreds of files?
- Do you know other techniques that offer greater performance when working with files?
Share your comments below.
I've prepared a valuable bonus if you're interested in Modern C++!
Learn all major features of recent C++ Standards!
Check it out here: