Table of Contents

Particle systems are awesome! Not only can you create amazing effects, but you can also optimize code and push even more and more pixels to the screen. This post series will cover how to design a flexible particle system and apply a bunch of optimizations to run it faster. Flexible means that it can be used in real applications and for a variety of graphics effects.


For some time I have been playing with my own little particle system. One previous post shows some effects that I was able to make using the system. Since that moment I did not create any more effects, however I’ve spent this time on optimizations and improvements.

I would like to show you more, or say that I optimized the code by 100000%… but it is not that easy :) Still I think it is valuable to share my current experience.

This post will cover the basics of the particle system and my assumptions.

Let’s start!

The Series  

Big Picture  

What is needed to create a particle system:

  • array of particles - we need some container to keep particles. Particles are dynamic things so we also need efficient way of making a particle alive or dead. It seems that even std::vector is not enough for this purpose. Another thing is what data should one particle contain? Should we use Array of Struct (AoS) or maybe Struct of Arrays (SoA)?
  • generators/emitters - they create (make alive) particles, sets their initial parameters
  • updaters - when a particle is alive there has to be a system that updates it and manages its movements.
  • a renderer - finally we need a way to push all the data to the screen and render the whole system. Rendering particle system is an interesting topic on its own because there are lots of possible solutions and techniques.

And probably that is all for a good start.

Stateless vs State preserving particle systems  

When implementing a particle system it is important to notice that we can update particles in two ways:

Stateless way

It means that we compute current position/data/state from initial values and we do not store this calculated state. Take a look at this simple movement equation used in a simple particle system:

pos = pos_start + vel_start*time + 0.5*acc*time*time;

This computed pos is used usually only for rendering. In the next frame, the time will change and thus we will get different value for pos.

Lots of graphics tutorials have such particle systems. It is especially visible as an example for vertex shaders. You can pass start data of particles to vertex shader and then update only time value. Looks nice but it is hard to create advanced effects using such technique.


  • simple to use, no additional data is needed, just start values
  • very fast: just create initial data, need to update particle buffer only when a particle is killed or born.


  • only for simple movement equations

state preserving

As name suggests we will store current state of particles. We will use previous state(s) to compute the current one. One of the most popular way to do this is called Euler method:

vel = vel + delta_time * acc;
pos = pos + delta_time * vel;


  • can be used to create advanced effects


  • need a storage for internal/current state
  • more computations and updates needed than in stateless system

I will leave this topic, but it will come back when I show actual implementation of the system.


What would I like to achieve with the system:

Usability - the whole system will not be just little experiment with some simple update loop, can be used to create several different effects.

Easy to extend - different modules or option to create own parts.

Performance - should be fast enough. This is quite vague spec, but whole optimization part will be a great playground for testing new ideas.

  • I aim for at least 100k particles running smoothly (60fps) on my system. Would be nice to have 1M, but this will not be that easy on CPU version

CPU only - I know that currently GPU implementations are better, but for the experiment I choose CPU only. Maybe in the second version I will rewrite it to OpenCL or OpenGL Compute Shaders.

  • CPU version also gives a chance to experiment with the CPU to GPU buffer transfers.
  • I often used a great book: Video Game Optimization - with lots of valuable information about CPU/cache/GPU

So far simple OpenGL 3.3+ renderer

What’s Next  

In the next article I will write about particle data and its container used in the system.

Read next: Particle Container 1 - problems

Here is a bunch of links and resources that helped me (or will help) in the implementation: