[Openexr-user] Per-pixel Object Identifiers?

Discussion:

Florian Kainz

2011-04-14 22:13:45 UTC

Hi,

At ILM we want to implement a workflow where a computer graphics artist
can bring up an OpenEXR image of, say, a scene from Rango on his or her
screen, point to a pixel, and be find out that the object seen at that
pixel is called "Beans/dress/button3."

This will require storing per-pixel object identifiers in an OpenEXR file.
In order to avoid re-inventing the wheel, I would like to find out if any
OpenEXR user have done something like this already. If you have done it,
would you be willing to share how you did it?

Also, is there any interest in per-pixel object identifiers outside ILM?

Florian

Unless somebody has a better idea, we'll probably do something like this:

Add a channel called objectID, of type UINT, to the image.
If the image has multiple views, then add an objectID channel
to every view.

Add an attribute called objectID to the header. The type of the
attribute is a map from unsigned integers to lists of strings.
If the image has multiple views, then add multiple attributes
with names of the form <view name>.objectID.

For each unsigned integer value that occurs in one or more pixels
in the objectID channel, a corresponding map entry in the objectID
attribute contains a list of all objects that are visible in those
pixels. More than one object may be visible in a given pixel because
of transparency, motion blur, reflections, or anti-aliasing.

To find out which object or objects cover a given pixel, application
software first looks up the value stored in the objectID channel for
that pixel, then it looks up the corresponding list of object names
in the objectID attribute.

In images with lots of transparency or motion blur any given object
name may occur multiple times in the map from unsigned integers to
string lists. In order to save disk space the objectID attribute
could be compressed by using a two-stage lookup, where the attribute
contains two maps, one from pixel values lists of integer object
identifiers, and one from object identifiers to object names.

With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.

Scott Squires

2011-04-14 23:26:10 UTC

Permalink

Yes, this would be useful.

I'm assuming it might also be useful to have a simple version of this where a ObjectID is simply saved as a UINT for each pixel without being
forced to go through the lookup/remap process.
That's doable now but if you're standardizing on ObjectID formats for EXR it would it should probably be covered as well.

Scott

Post by Florian Kainz
Hi,
At ILM we want to implement a workflow where a computer graphics artist
can bring up an OpenEXR image of, say, a scene from Rango on his or her
screen, point to a pixel, and be find out that the object seen at that
pixel is called "Beans/dress/button3."
This will require storing per-pixel object identifiers in an OpenEXR file.
In order to avoid re-inventing the wheel, I would like to find out if any
OpenEXR user have done something like this already. If you have done it,
would you be willing to share how you did it?
Also, is there any interest in per-pixel object identifiers outside ILM?
Florian
Add a channel called objectID, of type UINT, to the image.
If the image has multiple views, then add an objectID channel
to every view.
Add an attribute called objectID to the header. The type of the
attribute is a map from unsigned integers to lists of strings.
If the image has multiple views, then add multiple attributes
with names of the form <view name>.objectID.
For each unsigned integer value that occurs in one or more pixels
in the objectID channel, a corresponding map entry in the objectID
attribute contains a list of all objects that are visible in those
pixels. More than one object may be visible in a given pixel because
of transparency, motion blur, reflections, or anti-aliasing.
To find out which object or objects cover a given pixel, application
software first looks up the value stored in the objectID channel for
that pixel, then it looks up the corresponding list of object names
in the objectID attribute.
In images with lots of transparency or motion blur any given object
name may occur multiple times in the map from unsigned integers to
string lists. In order to save disk space the objectID attribute
could be compressed by using a two-stage lookup, where the attribute
contains two maps, one from pixel values lists of integer object
identifiers, and one from object identifiers to object names.
With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.
_______________________________________________
Openexr-user mailing list
http://lists.nongnu.org/mailman/listinfo/openexr-user

Florian Kainz

2011-04-15 00:04:10 UTC

Permalink

Hi Scott,

Our use case for the object IDs would be kind of like this: "the skin on that
CG creature doesn't move right. What is the creature's name, and which version
was used when this picture was rendered?"

Storing object IDs directly in the pixels, without a lookup, seems to imply
that your production process identifies assets using 32-bit integers instead
of names. 10-digit numbers aren't very user-friendly. Do artists really
refer to "asset number 3780456321" instead of "Davy Jones, version 3.4?"

Florian

Post by Scott Squires
Yes, this would be useful.
I'm assuming it might also be useful to have a simple version of this where a ObjectID is simply saved as a UINT for each pixel without being
forced to go through the lookup/remap process.
That's doable now but if you're standardizing on ObjectID formats for EXR it would it should probably be covered as well.
Scott

Scott Squires

2011-04-15 01:08:55 UTC

Permalink

I'm thinking of cases where the numeric values are processed and manipulated where necessary.
This makes it useful for other things than user ID of 3D objects such as segments or or other aux data.

Scott

Post by Florian Kainz
Hi Scott,
Our use case for the object IDs would be kind of like this: "the skin on that
CG creature doesn't move right. What is the creature's name, and which version
was used when this picture was rendered?"
Storing object IDs directly in the pixels, without a lookup, seems to imply
that your production process identifies assets using 32-bit integers instead
of names. 10-digit numbers aren't very user-friendly. Do artists really
refer to "asset number 3780456321" instead of "Davy Jones, version 3.4?"
Florian

Florian Kainz

2011-04-15 01:18:54 UTC

Permalink

Storing plain integer IDs without any auxiliary data is already possible;
no additional work is required for that. What I am proposing would be a
way to associate those integer IDs with additional information; I don't
think it would interfere with storing IDs that aren't mapped to lists of
strings. The two types of data would simply be stored in separate channels.

Florian

Post by Scott Squires
I'm thinking of cases where the numeric values are processed and manipulated where necessary.
This makes it useful for other things than user ID of 3D objects such as segments or or other aux data.
Scott

Chris Cox

2011-04-14 23:33:00 UTC

Permalink

Per pixel metadata is probably overkill, but some people have specified
object/surface identification with masks -- using one mask for each surface
or object. When querying a pixel for information, you look in the matching
list of masks for non-zero values and add to the list that should be
associated with that pixel.

Chris

Post by Florian Kainz
Hi,
At ILM we want to implement a workflow where a computer graphics artist
can bring up an OpenEXR image of, say, a scene from Rango on his or her
screen, point to a pixel, and be find out that the object seen at that
pixel is called "Beans/dress/button3."
This will require storing per-pixel object identifiers in an OpenEXR file.
In order to avoid re-inventing the wheel, I would like to find out if any
OpenEXR user have done something like this already. If you have done it,
would you be willing to share how you did it?
Also, is there any interest in per-pixel object identifiers outside ILM?
Florian
Add a channel called objectID, of type UINT, to the image.
If the image has multiple views, then add an objectID channel
to every view.
Add an attribute called objectID to the header. The type of the
attribute is a map from unsigned integers to lists of strings.
If the image has multiple views, then add multiple attributes
with names of the form <view name>.objectID.
For each unsigned integer value that occurs in one or more pixels
in the objectID channel, a corresponding map entry in the objectID
attribute contains a list of all objects that are visible in those
pixels. More than one object may be visible in a given pixel because
of transparency, motion blur, reflections, or anti-aliasing.
To find out which object or objects cover a given pixel, application
software first looks up the value stored in the objectID channel for
that pixel, then it looks up the corresponding list of object names
in the objectID attribute.
In images with lots of transparency or motion blur any given object
name may occur multiple times in the map from unsigned integers to
string lists. In order to save disk space the objectID attribute
could be compressed by using a two-stage lookup, where the attribute
contains two maps, one from pixel values lists of integer object
identifiers, and one from object identifiers to object names.
With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.
_______________________________________________
Openexr-devel mailing list
http://lists.nongnu.org/mailman/listinfo/openexr-devel

Florian Kainz

2011-04-15 00:48:25 UTC

Permalink

Hi Chris,

Unless OpenEXR was modified, object identification with masks would be
prohibitively expensive for images that contain a large number of objects,
such as an aerial view of a city or a battle scene with thousands of
soldiers and their equipment.

Per-object masks could be implemented if the image allowed a variable number
of data samples per pixel. An image might contain two channels for object
identification. One, called objectMask stores a vector of HALFs per pixel;
the other called objectID stores a vector of UINTs per pixel. objectID[i]
would identify an object, and objectMask[i] would specify how much that
object contributes to the final pixel color.

The scheme I proposed would be equivalent to binary per-object masks,
and it could be implemented in the current version of OpenEXR, which does
not support variable-length per-pixel data. The scheme would be sufficient
for ILM's use case.

Florian

Post by Chris Cox
Per pixel metadata is probably overkill, but some people have specified
object/surface identification with masks -- using one mask for each surface
or object. When querying a pixel for information, you look in the matching
list of masks for non-zero values and add to the list that should be
associated with that pixel.
Chris

Post by Florian Kainz
Hi,
At ILM we want to implement a workflow where a computer graphics artist
can bring up an OpenEXR image of, say, a scene from Rango on his or her
screen, point to a pixel, and be find out that the object seen at that
pixel is called "Beans/dress/button3."
This will require storing per-pixel object identifiers in an OpenEXR file.
In order to avoid re-inventing the wheel, I would like to find out if any
OpenEXR user have done something like this already. If you have done it,
would you be willing to share how you did it?
Also, is there any interest in per-pixel object identifiers outside ILM?
Florian
Add a channel called objectID, of type UINT, to the image.
If the image has multiple views, then add an objectID channel
to every view.
Add an attribute called objectID to the header. The type of the
attribute is a map from unsigned integers to lists of strings.
If the image has multiple views, then add multiple attributes
with names of the form <view name>.objectID.
For each unsigned integer value that occurs in one or more pixels
in the objectID channel, a corresponding map entry in the objectID
attribute contains a list of all objects that are visible in those
pixels. More than one object may be visible in a given pixel because
of transparency, motion blur, reflections, or anti-aliasing.
To find out which object or objects cover a given pixel, application
software first looks up the value stored in the objectID channel for
that pixel, then it looks up the corresponding list of object names
in the objectID attribute.
In images with lots of transparency or motion blur any given object
name may occur multiple times in the map from unsigned integers to
string lists. In order to save disk space the objectID attribute
could be compressed by using a two-stage lookup, where the attribute
contains two maps, one from pixel values lists of integer object
identifiers, and one from object identifiers to object names.
With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.
_______________________________________________
Openexr-devel mailing list
http://lists.nongnu.org/mailman/listinfo/openexr-devel

Ger Hobbelt

2011-04-15 03:09:49 UTC

Permalink

Did do something like that, but hadn't thought of using custom attribute(s)
to store the related data at the time, so I ended up storing the (integer)
references in the EXR and using an external store for the mapping, using the
integers (one per pixel) as record ids. Each (externally stored) record
could list one to many items.

If I read correctly what you intend to do, then I suggest you provide an
option to page-align the attribute data: that way you can memorymap the
metadata and smartly employ any page-oriented tree storage algos. When you
think of the pixel to objectID collection mapping (N:M relation) as a
database relation/table, the memorymap/align suggestion becomes pretty
obvious: when you've got that data page-aligned you can just point your b-
or r-tree at the attribute data space and have a field day. :-) (fast
retrieval of arbitrary large sets of objects per pixel. When OpenEXR can
provide the handle and a pointer/reference at the (aligned) start of the
attribute space, anyone can do what they want (think of it as a attribute
acting as 'blob', the rest of this stuff can be done application side. No
need to load OpenEXR with the peculiars.)

Also, is there any interest in per-pixel object identifiers outside ILM?
If you plan to do something like the above (which would be only an
augmentation of the attributes as far as OpenEXR itself is concerned, I
suppose -- I assume the knowledge how to relate the pixel int32 to attribute
X is kept at the application side), then yes, I'm interested!

Post by Florian Kainz
With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.

Nitpicking: wouldn't that rather imply a limit of 2^32 max 'references' and
no limit on the number of pixels themselves? (in your case: 2^32 sets of
objects, each unique set having it own reference int32, so multiple pixels
can have the same reference int32 value when they are covered by the same
set of objects in your scene. Only when you don't want to spend the effort
of combining identical sets for multiple pixels does this translate to a
2^32 pixel limit.)
Or am I completely off track here?

(2^32 is okay with me, unless you want to provide for hash-based indexes and
then I'd _hugely_ favor a 64-bit int per pixel, instead. But that's just
picking the right type of pixel channel, anyway. Could even map float to
attribute. I had just never considered storing such a load of data in an
attribute and that's the idea here if I read it correctly. Cool!! Now I
hope AE/PS are already capable of ignoring such large attributes; then I can
keep on using them for visualization while storing some other interesting
tidbits in the EXR, as well.)
--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: ***@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Florian Kainz

2011-04-15 03:27:41 UTC

Permalink

Hi Ger,

In the file, the objectID attribute will store the mapping from pixel values
to lists of strings in a compact form, without any indexing structure. The
data will be expanded into a form that can be queried quickly during reading,
and packed tightly during writing. The in-memory form of the attribute will
probably consist of two STL maps (which are usually implemented as RB trees
or AVL trees).

Regarding your "nitpicking": you are absolutely correct, the scheme is limited
by the number of unique per-pixel object sets, not by the number of pixels
in the image. However, there cannot be more unique object sets than pixels,
so the scheme is guaranteed to work for any image with fewer than 2^32 pixels.

Florian

Post by Florian Kainz
This will require storing per-pixel object identifiers in an OpenEXR file.
In order to avoid re-inventing the wheel, I would like to find out if any
OpenEXR user have done something like this already. If you have done it,
would you be willing to share how you did it?
Did do something like that, but hadn't thought of using custom
attribute(s) to store the related data at the time, so I ended up
storing the (integer) references in the EXR and using an external store
for the mapping, using the integers (one per pixel) as record ids. Each
(externally stored) record could list one to many items.
If I read correctly what you intend to do, then I suggest you provide an
option to page-align the attribute data: that way you can memorymap the
metadata and smartly employ any page-oriented tree storage algos. When
you think of the pixel to objectID collection mapping (N:M relation) as
a database relation/table, the memorymap/align suggestion becomes pretty
obvious: when you've got that data page-aligned you can just point your
b- or r-tree at the attribute data space and have a field day. :-) (fast
retrieval of arbitrary large sets of objects per pixel. When OpenEXR can
provide the handle and a pointer/reference at the (aligned) start of the
attribute space, anyone can do what they want (think of it as a
attribute acting as 'blob', the rest of this stuff can be done
application side. No need to load OpenEXR with the peculiars.)
Also, is there any interest in per-pixel object identifiers outside ILM?
If you plan to do something like the above (which would be only an
augmentation of the attributes as far as OpenEXR itself is concerned, I
suppose -- I assume the knowledge how to relate the pixel int32 to
attribute X is kept at the application side), then yes, I'm interested!
With 32-bit UINT pixel values this scheme could run out of object
identifiers for images with more than four Gigapixels, but in VFX
production images that large are rare.
Nitpicking: wouldn't that rather imply a limit of 2^32 max 'references'
and no limit on the number of pixels themselves? (in your case: 2^32
sets of objects, each unique set having it own reference int32, so
multiple pixels can have the same reference int32 value when they are
covered by the same set of objects in your scene. Only when you don't
want to spend the effort of combining identical sets for multiple pixels
does this translate to a 2^32 pixel limit.)
Or am I completely off track here?
(2^32 is okay with me, unless you want to provide for hash-based indexes
and then I'd _hugely_ favor a 64-bit int per pixel, instead. But that's
just picking the right type of pixel channel, anyway. Could even map
float to attribute. I had just never considered storing such a load of
data in an attribute and that's the idea here if I read it correctly.
Cool!! Now I hope AE/PS are already capable of ignoring such large
attributes; then I can keep on using them for visualization while
storing some other interesting tidbits in the EXR, as well.)
--
Met vriendelijke groeten / Best regards,
Ger Hobbelt
--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mobile: +31-6-11 120 978
--------------------------------------------------

Ger Hobbelt

2011-04-15 15:44:28 UTC

Permalink

Post by Florian Kainz
In the file, the objectID attribute will store the mapping from pixel values
to lists of strings in a compact form, without any indexing structure. The
data will be expanded into a form that can be queried quickly during reading,
and packed tightly during writing. The in-memory form of the attribute will
probably consist of two STL maps (which are usually implemented as RB trees
or AVL trees).

I can see why you'ld want to do it that way (and it's far more in line with
how openEXR 'does' EXR files right now than my 'direct access' thought) but
that approach is much less useful for me as it's no big improvement over
what I have currently (one channel at int per pixel plus external
direct-access (mapped) store). Alas.

If you're going to store the strings in compacted form, I guess applying
zlib to them is the easiest way to get a nice size reduction. (I've played
with suffix trees and such for 'optimal' storage of strings before (kinda
delta compression on string data), but it's a lot of hassle and little gain
over feeding zlib an ordered stream of string+meta data (index number or
other) to compress, particularly when you're doing a transform (to STL maps)
on write and read anyway. And since we already have a zlib dependency...
Thanks for giving me the idea to store more than just a few bytes in
attributes in an EXR though, i.e. treating an attribute as a BLOB store,
where the blob might even be larger than the image data itself. That hadn't
crossed my mind before. Sometimes the simplest things... :-)
--
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web: http://www.hobbelt.com/
http://www.hebbut.net/
mail: ***@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Kevin Wheatley

2011-04-15 09:28:05 UTC

Permalink

Post by Florian Kainz
At ILM we want to implement a workflow where a computer graphics artist
can bring up an OpenEXR image of, say, a scene from Rango on his or her
screen, point to a pixel, and be find out that the object seen at that
pixel is called "Beans/dress/button3."

How would you deal with composited pixels which are built from several
different objects?

we've attempted to use OID passes before and run into this exact
problem, motion blur, transparency, etc all got in the way of being
able to use simple per pixel data like this, so we're looking into a
more 'deep' pixel implementation. We've embedded only IDs and use
other 'databases' to store human meaningful data.

Kevin