Estimate PRNU of devices on actual dataset and evaluate our method #30

Open
opened 2024-04-02 10:48:01 +02:00 by Benjamin_Loison · 6 comments

Considering first a RAW dataset of uniform objects (sky for instance) with one instance of each model seems to be a good starting point.

All flat-field images for NikonD7000 (source) look like attached.

They seem to have a brightness source near the middle and quite isotropic, there is just a shadow at top-right:

image

http://loki.disi.unitn.it/RAISE/getFile.php?p=all Keyword are just the few categories of http://loki.disi.unitn.it/RAISE/download.html.

Only first two Keyword columns contain landscape entries.

ra0cc3d11t,http://193.205.194.113/RAISE/NEF/ra0cc3d11t.NEF,http://193.205.194.113/RAISE/TIFF/ra0cc3d11t.TIF,9/20/2014 15:20,8/27/2011 19:28,11.0 MB,L (4288 x 2848),28:51.0,"UTC+7, DST:OFF",Compressed RAW (12-bit),,,,Nikon D90,VR 18-105mm f/3.5-5.6G,25mm,AF-A,Auto,ON,,f/9,1/2000s,Aperture Priority,0EV,,Matrix,ISO 800,,,,"Color Temp. (6670K), B4, M2",sRGB,ON (Normal),OFF,Extra High,,,,LANDSCAPE-02,[LS] Landscape,,6,Active D-Lighting,Active D-Lighting,3,3,,,,,,,,,,,,,,,,,landscape; outdoor
landscape; outdoor

seems to be a single cell and not 2 cells with the second starting with space.

=ISNUMBER(FIND("landscape",BK2))
=VALUE(SUBSTITUTE(F2, " MB", ""))
=SUM(G2:G8149)

46,246.04 MB

FINISHED --2024-04-02 14:39:34--
Total wall clock time: 30m 24s
Downloaded: 2522 files, 51G in 28m 20s (30.7 MB/s)
Considering first a RAW dataset of uniform objects (sky for instance) with one instance of each model seems to be a good starting point. All [flat-field images for NikonD7000](http://loki.disi.unitn.it/RAISE/Flat-field/Flat-field.zip) ([source](http://loki.disi.unitn.it/RAISE/download.html)) look like attached. They seem to have a brightness source near the middle and quite isotropic, there is just a shadow at top-right: ![image](/attachments/770104ef-688e-446b-9884-b89ed0fbee67) http://loki.disi.unitn.it/RAISE/getFile.php?p=all `Keyword` are just the few categories of http://loki.disi.unitn.it/RAISE/download.html. Only first two `Keyword` columns contain `landscape` entries. ``` ra0cc3d11t,http://193.205.194.113/RAISE/NEF/ra0cc3d11t.NEF,http://193.205.194.113/RAISE/TIFF/ra0cc3d11t.TIF,9/20/2014 15:20,8/27/2011 19:28,11.0 MB,L (4288 x 2848),28:51.0,"UTC+7, DST:OFF",Compressed RAW (12-bit),,,,Nikon D90,VR 18-105mm f/3.5-5.6G,25mm,AF-A,Auto,ON,,f/9,1/2000s,Aperture Priority,0EV,,Matrix,ISO 800,,,,"Color Temp. (6670K), B4, M2",sRGB,ON (Normal),OFF,Extra High,,,,LANDSCAPE-02,[LS] Landscape,,6,Active D-Lighting,Active D-Lighting,3,3,,,,,,,,,,,,,,,,,landscape; outdoor ``` ``` landscape; outdoor ``` seems to be a single cell and not 2 cells with the second starting with space. ```vb =ISNUMBER(FIND("landscape",BK2)) ``` ```vb =VALUE(SUBSTITUTE(F2, " MB", "")) ``` ```vb =SUM(G2:G8149) ``` 46,246.04 MB ``` FINISHED --2024-04-02 14:39:34-- Total wall clock time: 30m 24s Downloaded: 2522 files, 51G in 28m 20s (30.7 MB/s) ```
Benjamin_Loison added the
enhancement
high priority
epic
labels 2024-04-02 10:48:01 +02:00
Benjamin_Loison pinned this 2024-04-02 10:48:49 +02:00
Benjamin_Loison unpinned this 2024-04-02 14:13:36 +02:00
Author
Owner

The most clearest figure would be a 2D table having a colormap and actual accuracy values written in each cell to show the accurracy of our method for all values for both number of images to learn the PRNU and to evalute it. Paying attention to the complexity to make this doable, if even possible initially.

The most clearest figure would be a 2D table having a colormap and actual accuracy values written in each cell to show the accurracy of our method for all values for both number of images to *learn* the PRNU and to evalute it. Paying attention to the complexity to make this doable, if even possible initially.
Author
Owner

How many images per device is there?

grep 'landscape' RAISE_all.csv | grep 'DEVICE' | wc -l
Device Number of landscape images
Nikon D90 482
Nikon D7000 2023
Nikon D40 17

What is the resolution per device?

Device Resolution
Nikon D90 4288 x 2848
Nikon D7000 4928 x 3264
Nikon D40 3008 x 2000

Have to pay attention to compare identical and meaningful same resolution images. Cropping to smallest 3008 x 2000 seems to make sense.
Could split the images to have more of them especially for Nikon D40 to have 17 * 2 * 2 = 68 images seems to be a good start.

Note that sometimes the resolution is reversed as it is not an horizontal but a vertical image it seems. Image Size does not change. Unclear differences among Picture Control and Base values.

How many images per device is there? ```bash grep 'landscape' RAISE_all.csv | grep 'DEVICE' | wc -l ``` | Device | Number of landscape images | | --- | --- | | Nikon D90 | 482 | | Nikon D7000 | 2023 | | Nikon D40 | 17 | What is the resolution per device? | Device | Resolution | | --- | --- | | Nikon D90 | 4288 x 2848 | | Nikon D7000 | 4928 x 3264 | | Nikon D40 | 3008 x 2000 | Have to pay attention to compare identical and meaningful same resolution images. Cropping to smallest 3008 x 2000 seems to make sense. Could split the images to have more of them especially for Nikon D40 to have 17 * 2 * 2 = 68 images seems to be a good start. Note that sometimes the resolution is reversed as it is not an horizontal but a vertical image it seems. `Image Size` does not change. Unclear differences among `Picture Control` and `Base` values.
Author
Owner

Maybe pay attention to camera settings potentially affecting the PRNU computation.

Maybe pay attention to camera settings potentially affecting the PRNU computation.
Author
Owner
cut -d ',' -f 66 RAISE_all.csv | tr '; ' '\n' | sort | uniq

Manually processed:

buildings
Indoor
landscape
nature
objects
outdoor
people

Should get rid of columns with identical values, especially the empty ones.

Should compute the number of images across devices per category.

The idea is to consider the category with maximum images per device I would say.

import csv
import json

columns = {}

with open('RAISE_all.csv') as csvFile:
    reader = csv.DictReader(csvFile)
    fieldNames = reader.fieldnames
    for row in reader:
        for fieldName in fieldNames:
            if not fieldName in columns:
                columns[fieldName] = set()
            columns[fieldName].add(row[fieldName])

for fieldName in fieldNames:
    column = columns[fieldName]
    columnLen = len(column)
    print(fieldName, columnLen)
    if columnLen < 264:
        print(json.dumps(list(column), indent = 4))
```bash cut -d ',' -f 66 RAISE_all.csv | tr '; ' '\n' | sort | uniq ``` Manually processed: ``` buildings Indoor landscape nature objects outdoor people ``` Should get rid of columns with identical values, especially the empty ones. Should compute the number of images across devices per category. The idea is to consider the category with maximum images per device I would say. ```py import csv import json columns = {} with open('RAISE_all.csv') as csvFile: reader = csv.DictReader(csvFile) fieldNames = reader.fieldnames for row in reader: for fieldName in fieldNames: if not fieldName in columns: columns[fieldName] = set() columns[fieldName].add(row[fieldName]) for fieldName in fieldNames: column = columns[fieldName] columnLen = len(column) print(fieldName, columnLen) if columnLen < 264: print(json.dumps(list(column), indent = 4)) ```
Author
Owner
Device Number of images
Nikon D40 76
Nikon D7000 5804
Nikon D90 2276

According to:

grep 'Nikon D40' RAISE_all.csv | wc -l
| Device | Number of images | | --- | --- | | Nikon D40 | 76 | | Nikon D7000 | 5804 | | Nikon D90 | 2276 | According to: ```bash grep 'Nikon D40' RAISE_all.csv | wc -l ```
Author
Owner

download_images.py:

#!/usr/bin/env python

import csv
import urllib.request
from tqdm import tqdm

images = []

RAISE_URL = 'http://193.205.194.113/RAISE'
IS_RAISE_ALL_CSV_DOWNLOADED = False
IMAGE_TYPE = 'tif'

IMAGE_TYPE_UPPERCASE = IMAGE_TYPE.upper()

def treatCsv(csvFile):
    global images
    reader = csv.DictReader(csvFile)
    for row in reader:
        if row['Device'] == 'Nikon D7000':
            assert row['Image Size'] == 'L (4928 x 3264)'
            images += [row['File']]

if IS_RAISE_ALL_CSV_DOWNLOADED:
    with open('RAISE_all.csv') as csvFile:
        treatCsv(csvFile)
else:
    from zipfile import ZipFile
    from io import BytesIO
    from io import TextIOWrapper
    from urllib.request import urlopen

    resp = urlopen(f'{RAISE_URL}/getFile.php?p=all')
    with ZipFile(BytesIO(resp.read())) as zf:
        with zf.open('RAISE_all.csv') as infile:
            treatCsv(TextIOWrapper(infile))

for image in tqdm(images[:100]):
    urllib.request.urlretrieve(f'{RAISE_URL}/{IMAGE_TYPE_UPPERCASE}/{image}.{IMAGE_TYPE_UPPERCASE}', f'{image}.{IMAGE_TYPE}')
`download_images.py`: ```python #!/usr/bin/env python import csv import urllib.request from tqdm import tqdm images = [] RAISE_URL = 'http://193.205.194.113/RAISE' IS_RAISE_ALL_CSV_DOWNLOADED = False IMAGE_TYPE = 'tif' IMAGE_TYPE_UPPERCASE = IMAGE_TYPE.upper() def treatCsv(csvFile): global images reader = csv.DictReader(csvFile) for row in reader: if row['Device'] == 'Nikon D7000': assert row['Image Size'] == 'L (4928 x 3264)' images += [row['File']] if IS_RAISE_ALL_CSV_DOWNLOADED: with open('RAISE_all.csv') as csvFile: treatCsv(csvFile) else: from zipfile import ZipFile from io import BytesIO from io import TextIOWrapper from urllib.request import urlopen resp = urlopen(f'{RAISE_URL}/getFile.php?p=all') with ZipFile(BytesIO(resp.read())) as zf: with zf.open('RAISE_all.csv') as infile: treatCsv(TextIOWrapper(infile)) for image in tqdm(images[:100]): urllib.request.urlretrieve(f'{RAISE_URL}/{IMAGE_TYPE_UPPERCASE}/{image}.{IMAGE_TYPE_UPPERCASE}', f'{image}.{IMAGE_TYPE}') ```
Sign in to join this conversation.
No description provided.