Estimate PRNU of devices on actual dataset and evaluate our method #30

New Issue

Benjamin_Loison · 2024-04-02T10:48:01+02:00

Benjamin_Loison commented

2024-04-02 10:48:01 +02:00

Considering first a RAW dataset of uniform objects (sky for instance) with one instance of each model seems to be a good starting point.

All flat-field images for NikonD7000 (source) look like attached.

They seem to have a brightness source near the middle and quite isotropic, there is just a shadow at top-right:

http://loki.disi.unitn.it/RAISE/getFile.php?p=all Keyword are just the few categories of http://loki.disi.unitn.it/RAISE/download.html.

Only first two Keyword columns contain landscape entries.

ra0cc3d11t,http://193.205.194.113/RAISE/NEF/ra0cc3d11t.NEF,http://193.205.194.113/RAISE/TIFF/ra0cc3d11t.TIF,9/20/2014 15:20,8/27/2011 19:28,11.0 MB,L (4288 x 2848),28:51.0,"UTC+7, DST:OFF",Compressed RAW (12-bit),,,,Nikon D90,VR 18-105mm f/3.5-5.6G,25mm,AF-A,Auto,ON,,f/9,1/2000s,Aperture Priority,0EV,,Matrix,ISO 800,,,,"Color Temp. (6670K), B4, M2",sRGB,ON (Normal),OFF,Extra High,,,,LANDSCAPE-02,[LS] Landscape,,6,Active D-Lighting,Active D-Lighting,3,3,,,,,,,,,,,,,,,,,landscape; outdoor

landscape; outdoor

seems to be a single cell and not 2 cells with the second starting with space.

=ISNUMBER(FIND("landscape",BK2))

=VALUE(SUBSTITUTE(F2, " MB", ""))

=SUM(G2:G8149)

46,246.04 MB

FINISHED --2024-04-02 14:39:34--
Total wall clock time: 30m 24s
Downloaded: 2522 files, 51G in 28m 20s (30.7 MB/s)

Considering first a RAW dataset of uniform objects (sky for instance) with one instance of each model seems to be a good starting point. All [flat-field images for NikonD7000](http://loki.disi.unitn.it/RAISE/Flat-field/Flat-field.zip) ([source](http://loki.disi.unitn.it/RAISE/download.html)) look like attached. They seem to have a brightness source near the middle and quite isotropic, there is just a shadow at top-right: ![image](/attachments/770104ef-688e-446b-9884-b89ed0fbee67) http://loki.disi.unitn.it/RAISE/getFile.php?p=all `Keyword` are just the few categories of http://loki.disi.unitn.it/RAISE/download.html. Only first two `Keyword` columns contain `landscape` entries. ``` ra0cc3d11t,http://193.205.194.113/RAISE/NEF/ra0cc3d11t.NEF,http://193.205.194.113/RAISE/TIFF/ra0cc3d11t.TIF,9/20/2014 15:20,8/27/2011 19:28,11.0 MB,L (4288 x 2848),28:51.0,"UTC+7, DST:OFF",Compressed RAW (12-bit),,,,Nikon D90,VR 18-105mm f/3.5-5.6G,25mm,AF-A,Auto,ON,,f/9,1/2000s,Aperture Priority,0EV,,Matrix,ISO 800,,,,"Color Temp. (6670K), B4, M2",sRGB,ON (Normal),OFF,Extra High,,,,LANDSCAPE-02,[LS] Landscape,,6,Active D-Lighting,Active D-Lighting,3,3,,,,,,,,,,,,,,,,,landscape; outdoor ``` ``` landscape; outdoor ``` seems to be a single cell and not 2 cells with the second starting with space. ```vb =ISNUMBER(FIND("landscape",BK2)) ``` ```vb =VALUE(SUBSTITUTE(F2, " MB", "")) ``` ```vb =SUM(G2:G8149) ``` 46,246.04 MB ``` FINISHED --2024-04-02 14:39:34-- Total wall clock time: 30m 24s Downloaded: 2522 files, 51G in 28m 20s (30.7 MB/s) ```

flat_001.NEF

19 MiB

landscape_tif.txt

121 KiB

image.png

2.5 MiB

Benjamin_Loison added the

enhancement

high priority

epic

labels 2024-04-02 10:48:01 +02:00

Benjamin_Loison referenced this issue

2024-04-02 10:48:10 +02:00

Estimate PRNU on easy real images #29

Benjamin_Loison pinned this 2024-04-02 10:48:49 +02:00

Benjamin_Loison unpinned this 2024-04-02 14:13:36 +02:00

Benjamin_Loison commented

2024-04-03 00:18:10 +02:00

The most clearest figure would be a 2D table having a colormap and actual accuracy values written in each cell to show the accurracy of our method for all values for both number of images to learn the PRNU and to evalute it. Paying attention to the complexity to make this doable, if even possible initially.

The most clearest figure would be a 2D table having a colormap and actual accuracy values written in each cell to show the accurracy of our method for all values for both number of images to *learn* the PRNU and to evalute it. Paying attention to the complexity to make this doable, if even possible initially.

Benjamin_Loison commented

2024-04-03 01:02:34 +02:00

How many images per device is there?

grep 'landscape' RAISE_all.csv | grep 'DEVICE' | wc -l

Device	Number of landscape images
Nikon D90	482
Nikon D7000	2023
Nikon D40	17

What is the resolution per device?

Device	Resolution
Nikon D90	4288 x 2848
Nikon D7000	4928 x 3264
Nikon D40	3008 x 2000

Have to pay attention to compare identical and meaningful same resolution images. Cropping to smallest 3008 x 2000 seems to make sense.
Could split the images to have more of them especially for Nikon D40 to have 17 * 2 * 2 = 68 images seems to be a good start.

Note that sometimes the resolution is reversed as it is not an horizontal but a vertical image it seems. Image Size does not change. Unclear differences among Picture Control and Base values.

How many images per device is there? ```bash grep 'landscape' RAISE_all.csv | grep 'DEVICE' | wc -l ``` | Device | Number of landscape images | | --- | --- | | Nikon D90 | 482 | | Nikon D7000 | 2023 | | Nikon D40 | 17 | What is the resolution per device? | Device | Resolution | | --- | --- | | Nikon D90 | 4288 x 2848 | | Nikon D7000 | 4928 x 3264 | | Nikon D40 | 3008 x 2000 | Have to pay attention to compare identical and meaningful same resolution images. Cropping to smallest 3008 x 2000 seems to make sense. Could split the images to have more of them especially for Nikon D40 to have 17 * 2 * 2 = 68 images seems to be a good start. Note that sometimes the resolution is reversed as it is not an horizontal but a vertical image it seems. `Image Size` does not change. Unclear differences among `Picture Control` and `Base` values.

Benjamin_Loison commented

2024-04-03 14:53:55 +02:00

Maybe pay attention to camera settings potentially affecting the PRNU computation.

Benjamin_Loison commented

2024-04-04 12:21:46 +02:00

cut -d ',' -f 66 RAISE_all.csv | tr '; ' '\n' | sort | uniq

Manually processed:

buildings
Indoor
landscape
nature
objects
outdoor
people

Should get rid of columns with identical values, especially the empty ones.

Should compute the number of images across devices per category.

The idea is to consider the category with maximum images per device I would say.

import csv
import json

columns = {}

with open('RAISE_all.csv') as csvFile:
    reader = csv.DictReader(csvFile)
    fieldNames = reader.fieldnames
    for row in reader:
        for fieldName in fieldNames:
            if not fieldName in columns:
                columns[fieldName] = set()
            columns[fieldName].add(row[fieldName])

for fieldName in fieldNames:
    column = columns[fieldName]
    columnLen = len(column)
    print(fieldName, columnLen)
    if columnLen < 264:
        print(json.dumps(list(column), indent = 4))

```bash cut -d ',' -f 66 RAISE_all.csv | tr '; ' '\n' | sort | uniq ``` Manually processed: ``` buildings Indoor landscape nature objects outdoor people ``` Should get rid of columns with identical values, especially the empty ones. Should compute the number of images across devices per category. The idea is to consider the category with maximum images per device I would say. ```py import csv import json columns = {} with open('RAISE_all.csv') as csvFile: reader = csv.DictReader(csvFile) fieldNames = reader.fieldnames for row in reader: for fieldName in fieldNames: if not fieldName in columns: columns[fieldName] = set() columns[fieldName].add(row[fieldName]) for fieldName in fieldNames: column = columns[fieldName] columnLen = len(column) print(fieldName, columnLen) if columnLen < 264: print(json.dumps(list(column), indent = 4)) ```

Benjamin_Loison commented

2024-04-15 01:38:48 +02:00

Device	Number of images
Nikon D40	76
Nikon D7000	5804
Nikon D90	2276

According to:

grep 'Nikon D40' RAISE_all.csv | wc -l

| Device | Number of images | | --- | --- | | Nikon D40 | 76 | | Nikon D7000 | 5804 | | Nikon D90 | 2276 | According to: ```bash grep 'Nikon D40' RAISE_all.csv | wc -l ```

Benjamin_Loison commented

2024-06-04 11:38:06 +02:00

download_images.py:

#!/usr/bin/env python

import csv
import urllib.request
from tqdm import tqdm

images = []

RAISE_URL = 'http://193.205.194.113/RAISE'
IS_RAISE_ALL_CSV_DOWNLOADED = False
IMAGE_TYPE = 'tif'

IMAGE_TYPE_UPPERCASE = IMAGE_TYPE.upper()

def treatCsv(csvFile):
    global images
    reader = csv.DictReader(csvFile)
    for row in reader:
        if row['Device'] == 'Nikon D7000':
            assert row['Image Size'] == 'L (4928 x 3264)'
            images += [row['File']]

if IS_RAISE_ALL_CSV_DOWNLOADED:
    with open('RAISE_all.csv') as csvFile:
        treatCsv(csvFile)
else:
    from zipfile import ZipFile
    from io import BytesIO
    from io import TextIOWrapper
    from urllib.request import urlopen

    resp = urlopen(f'{RAISE_URL}/getFile.php?p=all')
    with ZipFile(BytesIO(resp.read())) as zf:
        with zf.open('RAISE_all.csv') as infile:
            treatCsv(TextIOWrapper(infile))

for image in tqdm(images[:100]):
    urllib.request.urlretrieve(f'{RAISE_URL}/{IMAGE_TYPE_UPPERCASE}/{image}.{IMAGE_TYPE_UPPERCASE}', f'{image}.{IMAGE_TYPE}')

`download_images.py`: ```python #!/usr/bin/env python import csv import urllib.request from tqdm import tqdm images = [] RAISE_URL = 'http://193.205.194.113/RAISE' IS_RAISE_ALL_CSV_DOWNLOADED = False IMAGE_TYPE = 'tif' IMAGE_TYPE_UPPERCASE = IMAGE_TYPE.upper() def treatCsv(csvFile): global images reader = csv.DictReader(csvFile) for row in reader: if row['Device'] == 'Nikon D7000': assert row['Image Size'] == 'L (4928 x 3264)' images += [row['File']] if IS_RAISE_ALL_CSV_DOWNLOADED: with open('RAISE_all.csv') as csvFile: treatCsv(csvFile) else: from zipfile import ZipFile from io import BytesIO from io import TextIOWrapper from urllib.request import urlopen resp = urlopen(f'{RAISE_URL}/getFile.php?p=all') with ZipFile(BytesIO(resp.read())) as zf: with zf.open('RAISE_all.csv') as infile: treatCsv(TextIOWrapper(infile)) for image in tqdm(images[:100]): urllib.request.urlretrieve(f'{RAISE_URL}/{IMAGE_TYPE_UPPERCASE}/{image}.{IMAGE_TYPE_UPPERCASE}', f'{image}.{IMAGE_TYPE}') ```

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Benjamin_Loison/Robust_image_source_identification_on_modern_smartphones#30