Correctly implement iterative mean for camera attribution #72
Labels
No Label
bug
Context-Adaptive Interpolator
duplicate
enhancement
epic
help wanted
high priority
invalid
left for future work
low priority
medium
medium priority
meta
question
quick
wontfix
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Benjamin_Loison/Robust_image_source_identification_on_modern_smartphones#72
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implementation doubt
I doubt that current iterative mean leverages the best its current knowledge of the training set.
More precisely we add to the estimated PRNU mean the image substracted the average of already considered training images noises but each training image noise depends on the knowledge it was aware of when computed it. So maybe it would be better to recompute totally the estimated PRNU by recomputing its components leveraging the up-to-date mean.
Maybe it is equivalent.
Formalization
Let us formalize it step-by-step for the few first steps:
Let us denote respectively
image_{camera}^{training_j}
andimage_{camera}^{test_j}
for the $j$-th training (respectively testing) image for the cameracamera
, withcamera = raise
for RAISE flat-field andcamera = rafael
for Rafael.Let us denote the denoised image
image
as\text{denoiser}(image, k, camera)
with\text{denoiser}
trained up to (included) thek
cameracamera
training image and the estimated PRNU for the cameracamera
at training stepl
asprnu_{camera}^l
. That is\text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})
As a reminder the mean based denoiser principle is based on
prnu_{camera}^{\text{len}(image_{camera}^{training} - 1)} = \text{mean}(training\_image - \text{denoiser}(\text{len}(image_{camera}^{training} - 1), camera) \text{ for } training\_image \text{ in } image_{camera}^{training})
.The following in fact consider one arbitrary camera:
Let us consider the first image
image_{camera}^{training_0}
, thenprnu_{camera}^0 = image_{camera}^{training_0} − denoiser(0, camera)
. However, by definition asdenoiser(0, camera) = image_{camera}^{training_0}
,prnu_{camera}^0 = image_0
withimage_0
denoting the null image.Let us consider the first 2 images
image_{camera}^{training_[0, 1]}
, then there is a choice between:prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1])
prnu_{camera}^1 = mean(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1])
So more generally at learning step
l
have to choose between either:prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1, ..., l])
prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])
the second seems to leverage more the current knowledge of the training set but are both choices equivalent?
I have the feeling that both choices are different let us show a counter example for the first 2 images case:
prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(j, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_0, image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1})) = image_{camera}^{training_1} - \text{mean}(image_{camera}^{training_0}, image_{camera}^{training_1}) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [1])
prnu_{camera}^1 = \text{mean}(image_{camera}^{training_j} − denoiser(1, camera) \text{ for } j \text{ in } [0, 1]) = \text{mean}(image_{camera}^{training_j} - mean(image_{camera}^{training_{[0, 1]}}) \text{ for } j \text{ in } [0, 1])
Conclusion
So the first choice is not equivalent as the second choice contains a very probably not null additional component.
Related to #57.
Implementation
So now let us correct the implementation to implement:
prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l])
denoiser(l, camera)
can be implemented efficiently thanks toiterativeMean
but have to be computed before starting any PRNU estimation step.In this case
prnu_{camera}^l
cannot be implemented withiterativeMean
as we are not just adding components, at least it does not seem clear what equivalent adding component would be able to leverageiterativeMean
.Have to pay attention at implementation about memory quantity usage.
Unclear order of magnitude of memory necessary to load raw images into memory, the
ls -lS
show file sizes ranging from 18,697,276 to 20,720,477 bytes, so these files are probably compressed otherwise they would no have such significant size difference.file
returnsflat-field/nef/flat_001.NEF: TIFF image data, big-endian, direntries=27, height=0, bps=0, compression=none, PhotometricInterpretation=RGB, manufacturer=NIKON CORPORATION, model=NIKON D7000, orientation=upper-left, width=0
it seems to indicate no compression.Anyway can first implement the first approach which was quite the case and then the second better one.
If we expand:
\text{denoiser}(k, camera) = \text{mean}(image_{camera}^{training_{[0..k]}})
in:
prnu_{camera}^l = \text{mean}(image_{camera}^{training_j} − denoiser(l, camera) \text{ for } j \text{ in } [0, 1, ..., l]) = \text{mean}(image_{camera}^{training_j} − \text{mean}(image_{camera}^{training_{[0..l]}}) \text{ for } j \text{ in } [0, 1, ..., l])
Python pseudo code:
This is about training but there is also testing.
In theory we should achieve the same accuracy as when training directly on the whole dataset, that is from experiment 100 % of accuracy.
cameraColorMeans
is the actual variable formean_image_training_0_l_camera
.cameraColorMeans[camera][color].add(singleColorChannelImages[color]
is equivalent tomean_image_training_0_l_camera.add(image_training_l_camera)
.50 images for both cameras take about 22 GB of memory.
does not work as expected.
Related to #62.
Based on PRNU_extraction/issues/8 it seems quite clear that we should not have no good prediction for one class. However, note that the mentioned example considers not all images, only a crop and use a Wavelet denoiser.
Related to src/commit/be83fcf154ba144045e296f2f0e6d0d8deb58ca4/datasets/raise/fft/verify_dots.py#L22.