dvscripts/segment.qmd at main · distant-viewing/dvscripts · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# 2.6 Image Segmentation {.unnumbered}

This is a minimal example script showing how to do image
segmentation using a set of images that are stored on
the same machine where we are running the models. To start,
we will load in a few modules that will be needed for the
task.

```{python}
from os import listdir
from os.path import splitext, join
from transformers import pipeline
from PIL import Image

import torch
import polars as pl
import numpy as np
```

Next, we load the model that we are interested in using.
There are a large number of image segmentation algorithms
on HuggingFace; most can be used exactly the same way by simply
changing the name of the model in the function calls below.


```{python}
pipeline = pipeline(
  "image-segmentation",
  "nvidia/segformer-b0-finetuned-ade-512-512"
)
```

With the models loaded, the next step is to build a set of
paths to the images that we are interested in using. Here,
we will create a list of all of the image files in the
directory containing the FSA-OWI images. Then, to save time,
we take just the first five images to work with. We could
run over all of the images or use a different collection by
changing the variables below.

```{python}
collection = 'fsaowi'
num_images = 5

paths = sorted(listdir(join('img', collection)))
paths = [x for x in paths if splitext(x)[1] == '.jpg']
paths = paths[:num_images]
```

Now, we will load each of the images and run the model over
it. For each image, output given by the pipeline is an array
of dictionary objects, where each dictionary has a single score,
a label, and an image mask. The mask is a black and white image
where white represents those pixels associated with the category.
We will save both the full output and a summary statistic for
each of the images. The full output will be used for visualization
below. There are many different kinds of summarization that can
be done, including combining with other masks such as the depth
estimation. Play around with different options for your task!

```{python}
output_image = {}
output = {'path': [], 'score': [], 'label': [], 'percent': []}
for path in paths:
    image = Image.open(join('img', collection, path))
    image = image.convert('RGB')
    outputs = pipeline(image)
    output_image[path] = []
    for oput in outputs:
        output_image[path] += [oput]
        output['path'] += [path]
        output['score'] += [oput['score']]
        output['label'] += [oput['label']]
        mask = np.asarray(oput['mask'])
        output['percent'] += [np.mean(mask == 255)]
```

We can convert the structured outputs that we have produced
above into a data frame to save and explore further. Note
that this particular model does not have score values, but
we include it in the code because other segmentation models do.
Unlike many other models, the ordering of the categories are fixed
rather than sorted by proportion or probability. So, below we
sort by the proportion of the image that is covered by a category.

```{python}
dt = pl.from_dict(output)
dt = dt.sort(by = ['path', 'percent'], descending = [False, True])
dt
```

To visualize the output, we can display any of the masks as an
image. For example, below we display each of the masks of the images
for the label 'person'. You can replace this with any of the categories
that you are interested in looking at. Note that we are only showing
those images with at least some pixels equal to the label 'person'.

```{python}
for path in paths:
    masks = output_image[path]
    mask_person = [x['mask'] for x in masks if x['label'] == 'person']
    if mask_person:
        display(mask_person[0])
```