Data structures are of vital importance in computer programs. When programmers are going to optimise the programs, they generally either improve their algorithms, or perform optimisations on data structures, from the implementation of data structures to the access methods over the data structures. Recently, there have been a few research on optimising an existing computer program by switching out the data structures based on different criteria. For example, the size of data often impact the proper choice of a data structure - there are vectors that are tailored for a small amount data, and that they are not suitable to store large amount of data such as all the data of the customers in Amazon. By swapping out the concrete data structures when faced with different data size, the programs can have a more suitable container to store the data and thus provide better performance associated with the operations over the data structure.
For compiled programming language, it is not hard to alter the data structures during compile. However, it is quite difficult to make the change at runtime as the program has already been compiled into binaries. Let’s look at C++ - if we have already declared a list in the source code, how can we make it a vector after it has already been compiled and linked? The answer is, though it is not straghtforward, but yes we can!
Actually, this can be done through a design pattern specified for C++ - pointer to implementation(pimpl). Pimpl helps manage dependencies, reduce compilation times, and improve encapsulation. The idea of pimp is to move the implementation details of a class into a separate class, which is accessed through a pointer. This helps in achieving better separation between the interface and the implementation of a class.
There are several benefits that pimpl provides to the program. Encapsulation: By hiding the implementation details, the interface of a class is kept clean and stable. Changes to the implementation do not affect the interface, reducing the need to recompile code that depends on the class. Reduced Compilation Time: Since implementation details are hidden, changes in the implementation do not trigger recompilation of the code that includes the header of the class. This can significantly reduce compilation times in large projects. Binary Compatibility: By hiding implementation details, it is easier to maintain binary compatibility between different versions of a library. Changes to the private implementation do not affect the public interface, so clients can use new versions of the library without recompiling. Improved Modularity: Separating the interface and implementation promotes better modularity and allows for more flexible code organization.
To implement a pimpl pattern, it is suggested to firstly declare a class in the header file including declaring a pointer to the implementation class, then define the implementation class in the source file. Class Declaration (Header File): The public interface of the class is declared in the header file. A pointer to the implementation class is included in the declaration. Class Definition (Source File): The implementation class is defined in the source file. The public methods of the class delegate to the corresponding methods of the implementation class.
Now, let us delve into some code to go through how to swap out a data structure during runtime in C++. Assume we would like to decide the data structure based on the input data size which can only be available until runtime. And let’s say when the data size is larger than 1000, we would like a std::vector, otherwise we would like a std::deque. Firstly, we design the header file including the declarations of the class and the pointer to the concrete implementation.
1class Collection{
2 public:
3 ~Collection();
4 Collection(size_t size);
5
6 Collection(const Collection& collection);
7 Collection& operator=(Collection rhs);
8
9 size_t getSize();
10 void setSize(size_t);
11 std::variant<std::vector<int>, std::deque<int>> func();
12
13 private:
14 class Impl;
15 std::unique_ptr<Impl> pimpl;
16};
Firstly, we declare a class Collection
and explicitly define its destructor and constructor. We also want a copy constructor so we define one thereafter. Then we overload the =
symbol and declare a bunch of member functions regarding the size of the collection. We declare a class Impl
as a private member of the Collection and the pointer pimpl to an class(object) of type Impl
. Note that we use a smart pointer instead of raw pointer to prevent potential memory leaks.
After that, we work on the concrete definitions on a seperate source file.
1struct Collection:Impl{
2 Impl(size_t size) : size(std::move(size)) {};
3 ~Impl() = default;
4
5 std::variant<std::vector<int>, std::deque<int>> func(){
6 if(size > 1000){
7 return std::vector<int> (size);
8 }
9 else
10 return std::deque<int> (size);
11 }
12 size_t size;
13};
Here we implement the class Impl
and its constructor and destructor. We develop the decision mechanism in class Impl, when the size is larger than 1000, a std::vector will be returned; otherwise, a std::deque will be returned.
1Collection::Collection(size_t size) : pimpl(new Impl(std::move(size))){
2 pimpl->func();
3}
4
5Collection::~Collection() = default;
6
7Collection::Collection(const Collection& other) : pimpl(new Impl(*other.pimpl)) {}
8
9Collection& Collection::operator=(Collection rhs){
10 std::swap(pimpl, rhs.pimpl);
11 return *this;
12}
13
14size_t Collection::getSize(){
15 return pimpl->size;
16}
17
18void Collection::setSize(size_t size){
19 pimpl->size = size;
20}
21
22std::variant<std::vector<int>, std::deque<int>> Collection::func(){
23 return pimpl->func();
24}
Here, we implement the Collection class declared earlier in the header file. After that, we can write a simple test program to check the runtime data structure switch,
1size_t size = 0;
2std::cout<<"Type the size of the data collection"<<std::endl;
3std::cin>>size;
4
5Collection c(size);
6
7auto t = c.func();
8
9//...print the type of t
That’s it! We have successfully implemented a dynamic data structure switching prototype. The advantage of this implementation is that we do not need to modify any code from the compiler side - the program compiles on any standard C++ compiler and the data structure switching is done.
Moreover, there can be more strategies to implement a dynamic data structure switching without modifying the compiler, and I will introduce in my future posts!
Comments