My simple whitted raytracer (no indirect lighting yet as of now) only had 1 main thread to do all the rendering workload, since the renders are taking awhile, I decided to try implementing a multi-threading raytracer. My implementation is very simple and revolves around splitting the rendered image to several grids that will be handed over to the threads.

So basically what I did was:

1. Determine the number of thread to use (if it is not power of 2, will be ceiled to the nearest power of 2)
2. Split the screen into multiple sections (each corresponding to a thread)
3. Create the threads and hand them over the start and end coord (x,y) & Render.
4. Save the rendered image to a framebuffer

#### 1. Determining the number of threads to use.

For now, I am still using manual input to determine the number of threads used (I found around 16 to 49 is a good number).

#### 2. Split the screen into multiple grid sections.

Currently, I’m doing a very simple implementation to divide the grid section. We need the threads to be a power of 2 because it will be used in a square manner. Let’s say we will be using 9 threads

1     2     3

4     5     6

7     8     9

If we were to use only 8 threads then we will be missing the bottom right portion of the image (unless you want to shove the workload that was supposed to be the 9th thread to another thread, this will complicate the calculations though)

Let’s say we have an image that has the width (x axis) of 815 pixels and that we are using 16 threads (pow(4)).

We need to find the offset (size of 1 grid) = 815 / 4 = 203

We have leftover pixels 815 – (203 * 4) = 3 leftover pixel. In which will be put back on the 1st, 2nd, and 3rd thread to balance them out, so the end result will look like [(0, 204), (204, 408), (408, 612), (612, 815)] for the x axis. You can repeat the same step to get the pixel for y axis. Here is the full implementation https://github.com/Xyten/rust-tracer/blob/master/src/threading/eventhread.rs, should be up as long as I don’t refactor the code later.

#### 3. Create the threads and hand them over the start and end render coord & Render

If you are using rust, you probably need to use crossbeam for a threaded scope (otherwise your data might outlive the thread since the main thread can terminate without waiting for all the threads to be done), I am also using the channels (https://rustbyexample.com/std_misc/channels.html) to pass the rendered radiance value to the frame buffer

```crossbeam::scope(|scope|{
let tx = tx.clone();

scope.spawn(move || {
/* Make Sampler, determines sample number */
let mut sampler = RandomSampler::new(32);

sampler.start_pixel(Poi2i::new(i,j));
let mut l = Spectrum::zero();

//Anti alias
while sampler.is_sample_exists() {

let offset = sampler.get_2d();
let mut ray = self.camera.project_to_world(j, i, offset.x, offset.y);

l += self.li(&mut ray, &scene, 0, integrator);
}

//Send result tuple of (index, li) to framebuffer
tx.send((
(i * self.sw) + j,
l / (sampler.get_max_sample() as f32)
)).unwrap();
}
}
});
}
});```

#### 4. Save the rendered image to a framebuffer

Save the radiance to a framebuffer like usual

```for k in 0..self.num_of_threads {
let (index, li) = rx.recv().unwrap();
framebuffer[index as usize] = li;
}
}
}
```

In the end, rendering the above scene with 1 thread results in around

```Num of threads 1 is ceiled to 1 (pow of 2)
Start Render...
Writing to file...
Render Done 24.04498775799948ms
```

While rendering with 16 thread results in around

```Num of threads 16 is ceiled to 16 (pow of 2)
Start Render...
Writing to file...
Render Done 5.082824091000475ms
```

Which shows a significant improvement overall. The downside to this method is that it isn’t capable of doing parallel raycast for anti-aliasing sample, since we split the threads based on screen grids.