Adding multithreading in a raytracer

I’ve been working on to add multithreading in my raytracer.

My simple whitted raytracer (no indirect lighting yet as of now) only had 1 main thread to do all the rendering workload, since the renders are taking awhile, I decided to try implementing a multi-threading raytracer. My implementation is very simple and revolves around splitting the rendered image to several grids that will be handed over to the threads.

So basically what I did was:

  1. Determine the number of thread to use (if it is not power of 2, will be ceiled to the nearest power of 2)
  2. Split the screen into multiple sections (each corresponding to a thread)
  3. Create the threads and hand them over the start and end coord (x,y) & Render.
  4. Save the rendered image to a framebuffer

1. Determining the number of threads to use.

For now, I am still using manual input to determine the number of threads used (I found around 16 to 49 is a good number).

2. Split the screen into multiple grid sections.

Currently, I’m doing a very simple implementation to divide the grid section. We need the threads to be a power of 2 because it will be used in a square manner. Let’s say we will be using 9 threads

1     2     3

4     5     6

7     8     9

If we were to use only 8 threads then we will be missing the bottom right portion of the image (unless you want to shove the workload that was supposed to be the 9th thread to another thread, this will complicate the calculations though)

Let’s say we have an image that has the width (x axis) of 815 pixels and that we are using 16 threads (pow(4)).

We need to find the offset (size of 1 grid) = 815 / 4 = 203

We have leftover pixels 815 – (203 * 4) = 3 leftover pixel. In which will be put back on the 1st, 2nd, and 3rd thread to balance them out, so the end result will look like [(0, 204), (204, 408), (408, 612), (612, 815)] for the x axis. You can repeat the same step to get the pixel for y axis. Here is the full implementation https://github.com/Xyten/rust-tracer/blob/master/src/threading/eventhread.rs, should be up as long as I don’t refactor the code later.

3. Create the threads and hand them over the start and end render coord & Render

If you are using rust, you probably need to use crossbeam for a threaded scope (otherwise your data might outlive the thread since the main thread can terminate without waiting for all the threads to be done), I am also using the channels (https://rustbyexample.com/std_misc/channels.html) to pass the rendered radiance value to the frame buffer

crossbeam::scope(|scope|{
    //separate into multiple threads
    for k in 0..self.num_of_threads {
        let tx = tx.clone();
        let thread_render_info = thread_render_infos.get(k as usize).unwrap();

        scope.spawn(move || {
            /* Make Sampler, determines sample number */
            let mut sampler = RandomSampler::new(32);

            for i in thread_render_info.start_y..thread_render_info.end_y{
                for j in thread_render_info.start_x..thread_render_info.end_x{

                    sampler.start_pixel(Poi2i::new(i,j));
                    let mut l = Spectrum::zero();

                    //Anti alias
                    while sampler.is_sample_exists() {

                        let offset = sampler.get_2d();
                        let mut ray = self.camera.project_to_world(j, i, offset.x, offset.y);

                        //Calculate radiance on ray (render)
                        l += self.li(&mut ray, &scene, 0, integrator);
                    }

                    //Send result tuple of (index, li) to framebuffer
                    tx.send((
                        (i * self.sw) + j,
                        l / (sampler.get_max_sample() as f32)
                    )).unwrap();
                }
            }
        });
    }
});

4. Save the rendered image to a framebuffer

Save the radiance to a framebuffer like usual

for k in 0..self.num_of_threads {
    let thread_render_info = thread_render_infos.get(k as usize).unwrap();
    for i in thread_render_info.start_y..thread_render_info.end_y {
        for j in thread_render_info.start_x..thread_render_info.end_x {
            let (index, li) = rx.recv().unwrap();
            framebuffer[index as usize] = li;
        }
    }
}

 

foo_aa_100sample (1)In the end, rendering the above scene with 1 thread results in around

Num of threads 1 is ceiled to 1 (pow of 2)
Start Render...
Writing to file...
Render Done 24.04498775799948ms

While rendering with 16 thread results in around

Num of threads 16 is ceiled to 16 (pow of 2)
Start Render...
Writing to file...
Render Done 5.082824091000475ms

Which shows a significant improvement overall. The downside to this method is that it isn’t capable of doing parallel raycast for anti-aliasing sample, since we split the threads based on screen grids.

Advertisements

About xyten

A noob trying to learn graphics
This entry was posted in Raytracing. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s