Correctly implement iterative mean for camera attribution #72

Open
opened 2024-05-13 13:17:31 +02:00 by Benjamin_Loison · 5 comments

Implementation doubt

I doubt that current iterative mean leverages the best its current knowledge of the training set.

More precisely we add to the estimated PRNU mean the image substracted the average of already considered training images noises but each training image noise depends on the knowledge it was aware of when computed it. So maybe it would be better to recompute totally the estimated PRNU by recomputing its components leveraging the up-to-date mean.
Maybe it is equivalent.

Formalization

Let us formalize it step-by-step for the few first steps:

Let us denote respectively image_{camera}^{training_j} and image_{camera}^{test_j} for the $j$-th training (respectively testing) image for the camera camera, with camera = raise for RAISE flat-field and camera = rafael for Rafael.
Let us denote the denoised image image as \text{denoiser}(image, k, camera) with \text{denoiser} trained up to (included) the k camera camera training image and the estimated PRNU for the camera camera at training step l as prnu_{camera}^l. That is \text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})

As a reminder the mean based denoiser principle is based on prnu_{camera}^{\text{len}(image_{camera}^{training} - 1)} = \text{mean}(training\_image - \text{denoiser}(\text{len}(image_{camera}^{training} - 1), camera) \text{ for } training\_image \text{ in } image_{camera}^{training}).

The following in fact consider one arbitrary camera:

Let us consider the first image image_{camera}^{training_0}, then prnu_{camera}^0 = image_{camera}^{training_0} − denoiser(0, camera). However, by definition as denoiser(0, camera) = image_{camera}^{training_0}, prnu_{camera}^0 = image_0 with image_0 denoting the null image.

Let us consider the first 2 images image_{camera}^{training_[0, 1]}, then there is a choice between:

  • prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1])
  • prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1])

So more generally at learning step l have to choose between either:

  • prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1, ..., l])
  • prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])

the second seems to leverage more the current knowledge of the training set but are both choices equivalent?

I have the feeling that both choices are different let us show a counter example for the first 2 images case:

  • prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_0, image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1})) = image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1}) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [1])

  • prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [0, 1])

Conclusion

So the first choice is not equivalent as the second choice contains a very probably not null additional component.

Related to #57.

# Implementation doubt I doubt that current iterative mean leverages the best its current knowledge of the training set. More precisely we add to the estimated PRNU mean the image substracted the average of already considered training images noises but each training image noise depends on the knowledge it was aware of when computed it. So maybe it would be better to recompute *totally* the estimated PRNU by recomputing its components leveraging the up-to-date mean. Maybe it is equivalent. # Formalization Let us formalize it step-by-step for the few first steps: Let us denote respectively $image_{camera}^{training_j}$ and $image_{camera}^{test_j}$ for the $j$-th training (respectively testing) image for the camera $camera$, with $camera = raise$ for RAISE flat-field and $camera = rafael$ for Rafael. Let us denote the denoised image $image$ as $\text{denoiser}(image, k, camera)$ with $\text{denoiser}$ trained up to (included) the $k$ camera $camera$ training image and the estimated PRNU for the camera $camera$ at training step $l$ as $prnu_{camera}^l$. That is $\text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})$ As a reminder the mean based denoiser principle is based on $prnu_{camera}^{\text{len}(image_{camera}^{training} - 1)} = \text{mean}(training\_image - \text{denoiser}(\text{len}(image_{camera}^{training} - 1), camera) \text{ for } training\_image \text{ in } image_{camera}^{training})$. The following in fact consider one arbitrary camera: Let us consider the first image $image_{camera}^{training_0}$, then $prnu_{camera}^0 = image_{camera}^{training_0} − denoiser(0, camera)$. However, by definition as $denoiser(0, camera) = image_{camera}^{training_0}$, $prnu_{camera}^0 = image_0$ with $image_0$ denoting the null image. Let us consider the first 2 images $image_{camera}^{training_[0, 1]}$, then there is a choice between: - $prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1])$ - $prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1])$ So more generally at learning step $l$ have to choose between either: - $prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1, ..., l])$ - $prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])$ the second seems to leverage more the current knowledge of the training set but are both choices equivalent? I have the feeling that both choices are different let us show a counter example for the first 2 images case: - $prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_0, image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1})) = image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1}) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [1])$ - $prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [0, 1])$ # Conclusion So the first choice is not equivalent as the second choice contains a very probably not null additional component. Related to #57.
Benjamin_Loison added the
enhancement
high priority
medium
labels 2024-05-13 13:17:31 +02:00
Author
Owner

Implementation

So now let us correct the implementation to implement:

  • prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])

denoiser(l, camera) can be implemented efficiently thanks to iterativeMean but have to be computed before starting any PRNU estimation step.
In this case prnu_{camera}^l cannot be implemented with iterativeMean as we are not just adding components, at least it does not seem clear what equivalent adding component would be able to leverage iterativeMean.

Have to pay attention at implementation about memory quantity usage.
Unclear order of magnitude of memory necessary to load raw images into memory, the ls -lS show file sizes ranging from 18,697,276 to 20,720,477 bytes, so these files are probably compressed otherwise they would no have such significant size difference. file returns flat-field/nef/flat_001.NEF: TIFF image data, big-endian, direntries=27, height=0, bps=0, compression=none, PhotometricInterpretation=RGB, manufacturer=NIKON CORPORATION, model=NIKON D7000, orientation=upper-left, width=0 it seems to indicate no compression.

Anyway can first implement the first approach which was quite the case and then the second better one.

If we expand:

\text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})

in:

prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l]) = \text{mean}(image_{camera}^{training_j} − \text{mean}(image_{camera}^{training_{[0..l]}}) \text{ for } j \text{ in } [0, 1, ..., l])

Python pseudo code:

mean_image_training_0_l_camera = iterativeMean()

for l in range(len(image_training_camera)):
    mean_image_training_0_l_camera.add(image_training_l_camera)
    prnu_j_camera = mean([image_training_j_camera - mean_image_training_0_l_camera.mean for j in range(l)])

This is about training but there is also testing.

In theory we should achieve the same accuracy as when training directly on the whole dataset, that is from experiment 100 % of accuracy.

cameraColorMeans is the actual variable for mean_image_training_0_l_camera.
cameraColorMeans[camera][color].add(singleColorChannelImages[color] is equivalent to mean_image_training_0_l_camera.add(image_training_l_camera).

# Implementation So now let us correct the implementation to implement: - $prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])$ $denoiser(l, camera)$ can be implemented efficiently thanks to `iterativeMean` but have to be computed before starting any PRNU estimation step. In this case $prnu_{camera}^l$ cannot be implemented with `iterativeMean` as we are not just adding components, at least it does not seem clear what equivalent adding component would be able to leverage `iterativeMean`. Have to pay attention at implementation about memory quantity usage. Unclear order of magnitude of memory necessary to load raw images into memory, the `ls -lS` show file sizes ranging from 18,697,276 to 20,720,477 bytes, so these files are probably compressed otherwise they would no have such significant size difference. `file` returns `flat-field/nef/flat_001.NEF: TIFF image data, big-endian, direntries=27, height=0, bps=0, compression=none, PhotometricInterpretation=RGB, manufacturer=NIKON CORPORATION, model=NIKON D7000, orientation=upper-left, width=0` it seems to indicate no compression. Anyway can first implement the first approach which was quite the case and then the second better one. If we expand: $\text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})$ in: $prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l]) = \text{mean}(image_{camera}^{training_j} − \text{mean}(image_{camera}^{training_{[0..l]}}) \text{ for } j \text{ in } [0, 1, ..., l])$ Python pseudo code: ```python mean_image_training_0_l_camera = iterativeMean() for l in range(len(image_training_camera)): mean_image_training_0_l_camera.add(image_training_l_camera) prnu_j_camera = mean([image_training_j_camera - mean_image_training_0_l_camera.mean for j in range(l)]) ``` This is about training but there is also testing. In theory we should achieve the same accuracy as when training directly on the whole dataset, that is from experiment 100 % of accuracy. `cameraColorMeans` is the actual variable for `mean_image_training_0_l_camera`. `cameraColorMeans[camera][color].add(singleColorChannelImages[color]` is equivalent to `mean_image_training_0_l_camera.add(image_training_l_camera)`.
Author
Owner

50 images for both cameras take about 22 GB of memory.

50 images for both cameras take about 22 GB of memory.
Author
Owner
class iterativeMean:
    mean = None
    numberOfElementsInMean = 0

    def add(self, element):
        if self.mean is None:
            self.mean = element
        else:
            self.mean = ((self.mean * self.numberOfElementsInMean) + element) / (self.numberOfElementsInMean + 1)
        self.numberOfElementsInMean += 1

a = iterativeMean()
a.add(42)
b = iterativeMean()
b.add(43)
print(list(map(iterativeMean.mean, [a, b])))

does not work as expected.

Related to #62.

```python class iterativeMean: mean = None numberOfElementsInMean = 0 def add(self, element): if self.mean is None: self.mean = element else: self.mean = ((self.mean * self.numberOfElementsInMean) + element) / (self.numberOfElementsInMean + 1) self.numberOfElementsInMean += 1 a = iterativeMean() a.add(42) b = iterativeMean() b.add(43) print(list(map(iterativeMean.mean, [a, b]))) ``` does not work as expected. Related to #62.
Author
Owner

Based on PRNU_extraction/issues/8 it seems quite clear that we should not have no good prediction for one class. However, note that the mentioned example considers not all images, only a crop and use a Wavelet denoiser.

Based on [PRNU_extraction/issues/8](https://codeberg.org/Benjamin_Loison/PRNU_extraction/issues/8) it seems quite clear that we should not have no good prediction for one class. However, note that the mentioned example considers not all images, only a crop and use a Wavelet denoiser.
Author
Owner
Related to [src/commit/be83fcf154ba144045e296f2f0e6d0d8deb58ca4/datasets/raise/fft/verify_dots.py#L22](https://gitea.lemnoslife.com/Benjamin_Loison/Robust_image_source_identification_on_modern_smartphones/src/commit/be83fcf154ba144045e296f2f0e6d0d8deb58ca4/datasets/raise/fft/verify_dots.py#L22).
Sign in to join this conversation.
No description provided.