Lossless Codec FFV1 for preservation @arkthis180

ArkThis AV-RD Sounds like an election campaign ;)

There's even the hashtag "#FFWeWon". Funny.
There's a whole worth-a-documentary-about-it story behind this open-source lossless video compression format.

The presentation gives insights in "what is ffv1", and how and why is it useful for long-term preservation and usage
of digital video.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

Contents:

* A short history about almost everything
* FFV1 what does it do?
* What is "lossless"?
* Funded improvements (FOSS-license)
* Error detection and concealment
* Performance comparison (J2K, H264, Dirac)
* Size estimations (mostly 8 bpc)
* Hardware and Speed
* Which container(s) to use?
* Introduction MKV (Matroska) container format
* Showing off BFI stats for moving to FFV1/MKV
* Format normalization / Reference to Whitelist approach
* FFV1 costs
* RAWcooked (for DPX/TIFF+WAV film needs)

updated 4 months ago

Lossless Codec FFV1 for preservation

ArkThis AV-RD 2024-06-06 | Sounds like an election campaign ;)

There's even the hashtag "#FFWeWon". Funny.
There's a whole worth-a-documentary-about-it story behind this open-source lossless video compression format.

The presentation gives insights in "what is ffv1", and how and why is it useful for long-term preservation and usage
of digital video.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

Contents:

* A short history about almost everything
* FFV1 what does it do?
* What is "lossless"?
* Funded improvements (FOSS-license)
* Error detection and concealment
* Performance comparison (J2K, H264, Dirac)
* Size estimations (mostly 8 bpc)
* Hardware and Speed
* Which container(s) to use?
* Introduction MKV (Matroska) container format
* Showing off BFI stats for moving to FFV1/MKV
* Format normalization / Reference to Whitelist approach
* FFV1 costs
* RAWcooked (for DPX/TIFF+WAV film needs)

AHAObjectWorld: Grundidee (2023)

ArkThis AV-RD 2024-09-04 | This was my first try to capture the concept and idea of resolving the separation of "metadata and payload" (aka "meta + data") on the filesystem level.

It describes in an overview how this would instantly (positively) affect file/data handling - and resolve many existing "issues/challenges" that come with the current "files-in-folders and metadata stored somewhere else" paradigms.

Therefore it's in German, because it was intended as "braindump note" to exchange these ideas and concepts with a good friend of mine.

I am assuming that "Object Storage" is able to save any kind of key/value data as-is generically.

At the time of this recording, I have never used or setup an Object Storage before. However, more than 1 year later, the concept still stands very well.

The following remarks may be added:

a) "S3-compatible" systems are (IMO) currently unsuitable for awesome meta+data handling as described here, due to arbitrary limitations of metadata/tagging length and encoding.

b) Object Storage systems designed and implemented before Amazon S3 are more likely to support this properly.
Something like [OpenStack Swift](ubuntu.com/openstack), etc.
ANSI T10 Object Base Storage Standard may also be a good reference:
ieeexplore.ieee.org/document/5388616

b) No need to go "object storage" for providing the core features that come with meta+data on the filesystem level: Extended File Attributes (or similar filesystem features) are very likely sufficient.
See: en.wikipedia.org/wiki/Extended_file_attributes

AHAObjectWorld S01E10: Performance Concerns 1 - Metadata Size

ArkThis AV-RD 2024-09-04 | This video addresses concerns regarding data-size when storing metadata on the filesystem level, as compared to what is currently common: in textfiles, spreadsheets, embedded or databases etc.

Here's an overview:

* metadata is usually way smaller than what we call "data" (payload) right now.
* Most descriptive, annotation metadata is text-based - so we're talking 1-2 bytes per text character. This is in the range of kilobytes or Megabytes currently.
* I therefore say the data-size of adding up metadata - even plaintext, uncompressed - is negligible compared to the gain of handling.

AHAObjectWorld S01E07: Data Relationships

ArkThis AV-RD 2024-09-04 | aka "Related Objects".

If a filesystem is able to store data objects with "meta+data", it is trivial to support link/reference information between any of these objects.

This allows to depict and handle any related-object scenario.

from "this image is the cover art of this musical album, which consists of the following audio tracks" - to having meta-only data objects which serve as stand-in replacement for database entries.

AHAObjectWorld S01E06: Filesystem as Database Engine

ArkThis AV-RD 2024-09-04 | ...

AHAObjectWorld S01E05: Resolving Media Container Formats

ArkThis AV-RD 2024-09-04 | Any "container format" for files exists, because mixed-type information is supposed to feel like one thing.

Imagine a videofile:
* video track(s)
* audio track(s)
* subtitle(s)
* descriptive metadata
* etc.

If you consider each of these components "a Data Object" - and already are able to thing and store data like that: What if you would resolve any container format into a set of "linked, related data objects"?

Examples:
If you want to use, change or add any audio track: Now you need to (de)multiplex container formats to do so. You need special applications and time and knowhow to do so.

FFmpeg is awesome, but imagine this:
Adding a new audio track is simply drag-n-drop of an audio "object", and creating a relationship.

And so on...

AHAObjectWorld S01E04: Resolving Embedded Metadata

ArkThis AV-RD 2024-09-04 | If we can store key/value (meta)data information on the filesystem level, and even depict related objects natively on that same level:

What if one simply copy/pastes any embedded metadata out of a file-format, onto the filesystem layer. As easily accessible/editable and transparent as the filename?

AHAObjectWorld S01E03: Files and Foldernames

ArkThis AV-RD 2024-09-04 | The concept of digital data being "a file in a folder" is just an idea.
It's an imaginary paradigm that was invented in the 60s - when the main use case for computing was representing office data. Which literally were files in folders.

Things have changed since then.

What if a filename was merely "yet another (text) metadata field"?
And the foldername too was "just another metadata".
And not even mandatory information to save and access any data?

What if tagging and (auto-)generated metadata would suffice to work with data?
This is nothing new for "cloud service" users - but these features are technically neither specific to "the cloud" - nor to "online or offline".

Archives Unmasked (S01E02): Embedded Metadata - An Introduction (Show-n-Tell)

ArkThis AV-RD 2024-09-04 | This is a short introduction on "what is embedded metadata"?
In difference and comparison to filename, database and sidecar (textfile).

It shows:
* Embedded metadata for an audio file (mp3)
* How the same file looks differently in different applications
* How embedded metadata looks natively (raw): in a hex-editor.
* That editing embedded metadata changes the whole file (and timestamp)
* etc.

At the end it also shows how "Extended File Attributes" (xattr) of the file-system can be used to store key/value metadata literally "where the filename is".

en.wikipedia.org/wiki/Extended_file_attributes

Archives Unmasked (S01E03): Embedded Metadata - Different Formats, Different Libraries

ArkThis AV-RD 2024-09-04 | This highlights a few facts good to know about "embedded metadata":

* Which metadata fields are available?
* Same metadata field, but different format: Completely different code required.
* Read/write embedded metadata requires extra effort for each new format: in any application/device handling the files.

ArkThis ShowNTell (S01E01): Optical Carrier Ingest (CD/DVD) using `dvdisaster`

ArkThis AV-RD 2024-06-11 | After watching this clip, you'll be able to rip data CDs, DVDs and (possibly) even BluRay in no time.

I'm introducing the tool "dvdisaster" by Carsten Gnörlich (DE) - from 2004:
en.wikipedia.org/wiki/Dvdisaster

It's a great example for how valuable a good tool is, and how easy it is to read data from optical disks into an ISO image file.

Archives Unmasked (S01E01): Optical Carrier (CD/DVD) ingest & station setup (Show-n-Tell)

ArkThis AV-RD 2024-06-09 | After watching this clip, you'll be able to rip data CDs, DVDs and (possibly) even BluRay in no time.

I'm introducing the tool "dvdisaster" by Carsten Gnörlich (DE) - from 2004:
It's a great example for how valuable a good tool is. And how FOSS allows even software to age well.

This video is dedicated to the team of OSCAL (Open Source Conference ALbania)
oscal.openlabs.cc/about

# Hardware Setup

Additionally, I'm doing this live:
Booting an LTS (Long-Term-Support) live-distro Xubuntu (2018).

All you need is the following:

* on an empty notebook from 2010
* ~1 GHz, 2 GB RAM, 300 EUR new
* 1 (or more) external USB DVD drives
* 1 ventoy USB boot stick (32 GB)
* 1 target storage USB stick (64 GB, exfat)

# Contents

* Disable auto-mount/auto-open of inserted removable media.
It's simply distracting (me).

* Install support for exfat filesystem and dvdisaster (in one command):
`$ sudo apt install exfat-fuse dvdisaster`

* Read original "backup data" CD-R (from 1998)

* Interrupt ISO file extaction and continue reading on another DVD drive.
Just to show off.
And to show that drives can be swapped, as option to maybe recover bad-sector-read errors.

* How the extracted .iso image (ISO 9660 format) can simply be opened/accessed in the default file manager by double-clicking it, like a zip file.
It simply automagically mounts the iso file in the background.
You could also do it manually, if you like:

`$ mount -o loop image.iso /mnt/cdrom`

# Commands / Instructions

Open a Shell (`Ctrl+Alt+T` / `Super+T`), then run the following code:
`$ sudo apt install exfat-fuse dvdisaster`

This installs exfat filesystem support (in FUSE) and installs dvdisaster.
After that, you can simply open dvdisaster from the applications menu or the commandline.

# Links

[Wikipedia: Dvdisaster](en.wikipedia.org/wiki/Dvdisaster)
[Ventoy multi-ISO boot](ventoy.net/en/index.html)
[Xubuntu GNU/Linux Distribution](xubuntu.org)

# Why the voice?

I like switching personas when I talk or present, as this is my way of honoring the amazing differences in style, sound - and especially emotions. This is how I am. This is how I perceive and understand things. And I like learning by having fun. And if theater is fun, why not apply that to howto clips or presentations?

Why not comedy-sketch-enjoy our professional work?
Why not make workshops and videos fun to make and watch?

# Why this flavor of culture?

I have close friends in eastern, especially Balkan countries - and I looove their way of life, and admire their practical approach to everyday life "challenges". Kudos, greetings and thanks to Slivki, Marija, Anton, Livia and others.

So if anyone should feel offended in any way:
I am truly sorry.

If there's anything I could adjust to my "improv" on that character, please let me know.

This is for fun, and edutainment purposes.
However, this /is/ the way I think and present. I value cultural impressions, styles and ways of thinking - improvising them is literally how I am. If you consider this offensive, you consider me "not being allowed to be me".

I seriously simply am a big fan of this eastern cool smart style.
And I honored that by creating this tutorial as a play.
And I find the one-eyed glasses a fun accessory.
Things don't always have to make sense.

It's okay every once in a while to just laugh about it.
And know how to extract data from CDs/DVDs - and maybe safe some precious personal memory-files - after "enjoying the show".

And all that using tools you may have been (wrongly) told to be unprofessional - on (old) hardware possibly laying around somewhere.

# Why the glasses?

Because I think pirates look cool. 😎️

And I liked the idea of producing highly professionally valuable content, with one-eye-pirate-patch sunglasses - and small plastic sparkling crystals on the edge.
(And honestly, I wasn't ready yet to let my old broken sunglasses go...)

I love it.
Have fun!

Format Normalization: A Whitelist Approach

ArkThis AV-RD 2024-06-06 | Format Normalization = Change the data encoding formats to reduce the number of code/format variations.

This is very useful to (a) make your life easier, and (b) serve you well if you try to long-term store and use (AV) data.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Reasons/intro to "why change a format?"
* Pros/cons of format normalization
* What "whitelist" - and "why"
* What to whitelist? How does a format "qualify"?
* With focus on long-term preservation considerations
* Avoiding generation loss:
* Uncompressed
* Lossless compression (FFV1, J2K)
* Why Audio to PCM - always?
* Practical format examples
* Examples/discussions on other popular formats

Big thanks again to FFmpeg: Nothing possible without all of you.

FFmpeg: Show-n-Tell (Digital Long Term Preservation workshop)

ArkThis AV-RD 2024-06-06 | *This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Convert image sequence + audiofiles into video format (mp4/m4v, mkv, mov, etc)
* Change container format, without re-encoding
* Change audio/video encoding/properties separately, and remux
* Show tech-metadata in VLC
* Change DAR (Display Aspect Ratio)
* Cut out segments/clips (seek, trim, etc)

Update:

I've recently tried *generating* DPX and TIFF images from video files. Works perfectly, and up to more than 8 bits pe
r color component.

Checking AV Properties - Basics

ArkThis AV-RD 2024-06-06 | This presentation gives some insights into technical properties of a digital audiovisual file - and how to "check" or validate them - or simply "know what you're at" with audiovisual files :)

So called "video-", "movie-" or audio-files:
mkv, mov, avi, flv, 3gp, mp4, mp3, mp2, wav - etc.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Digital Video Trinity:
Container, Video-encoding, Audio-encoding
* Multimedia Container Structure
* Using VLC for tech-inspection: show encoding, resolution, framerate, etc.
* Using MediaInfo to deep-inspect AV files (and some images)
* Machine-processing tech-metadata output by code
* Quick comparison of XML, JSON, CSV
* 09:18 Show-n-tell with actual files.
(That have been synthetically generated by bash+ffmpeg scripts, to show typical and interesting variations)

FFmpeg: Basics on the swiss army knife for AV cross conversion

ArkThis AV-RD 2024-06-06 | This presentation introduces FFmpeg:
en.wikipedia.org/wiki/FFmpeg

An invaluable tool that drives the whole 21st century audio-video demands of the whole world - including "The Industry". This is one of the many reasons why it is great to know some commandline interface (CLI) "magic spells and recipes".

Here I'll show some use of FFmpeg for DLTP (Digital Long Term Preservation) use cases.

*Contents:*

* 04:49 Digital Video Trinity
* 06:00 Basic Syntax
* 11:16 Rewrap/change multimedia container format
* 11:55 Reading ffmpeg's text output
* 13:45 handling A/V separately
* 15:00 Setting encoding format
* Transcode "anything" to FFV1/PCM (24/16 bits)
* Change resolution, aspect ratio, framerate, pixel-format, subsampling, bit-depth, anything!
* Create short cut-out clips without re-encoding
* Encoding. x.264/h.264
* ProRes
* FFV1
* Convert Video to image+wav (like film)
* And vice versa: From `*.png/dpx/tiff + wav` to `output.mkv`

It does NOT include how to install ffmpeg though.
On Linux this is done, like this:

`$ sudo apt install ffmpeg`

For Windows and Mac it's hard(er). Wouldn't have to be.

File Format Normalization: FFV1/PCM & RAWcooked

ArkThis AV-RD 2024-06-05 | This is a show-and-tell regarding changing audiovisual file-formats for (possibly) better long-term preservation properties.

I'll show how to use `ffmpeg` and some simple commandline-magic to easily convert any video encoding to FFV1 - a lossless-only (!) video codec, and audio to PCM uncompressed - and change the container format too: all in one step.

At 16m00s, I'll introduce "[RAWcooked](mediaarea.net/RAWcooked)":

A tool designed to (mainly) convert digital film (DPX/TIFF + WAV) to FFV1 in MKV (Matroska).

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Using ffmpeg in a loop (bash)
* RAWcooked to convert digital film
* Show bit-proof reversibility of RAWcooked MKVs (using MD5)
* Additional information about RAWcooked

Digital Audiovisual Properties - Advanced

ArkThis AV-RD 2024-06-04 | A more in-depth look at more advanced technical properties of A/V.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Image Aspect Ratio (DAR, SAR, PAR)
* Anamorphic Video
* Interlacing/Progressive, field-order (TFF, BFF)
* GOP: Group Of Pictures
Keyframes/Intra-Frames, Inter-Frames (I,P,B)
* Color models: RGB vs YUV
* Shades of Gray = bit-depth (bits-per-component/pixel/sample)
* Chroma subsampling
+ Relation to filesize

Digital Audiovisual Properties - Basics

ArkThis AV-RD 2024-06-04 | An easy introduction into the basic technical properties of digital A/V.

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Image resolution
* Common standard/norm resolutions
* PAL, NTSC, FILM
* Framerate (fps)
* Constant Framerate (CFR) vs Variable Framerate (VFR)
* Audio: Samplerate, bit-depth, tracks and channels
* Linear PCM (uncompressed)
* Minimum samplerate (Nyquist-Shannon sampling theorem)

Digital Audiovisual File Formats: Quality and Size

ArkThis AV-RD 2024-06-04 | More in-depth information about the relationships of digital audiovisual file-formats, and encoding decisions/parameters when it comes to "quality and size".

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Spanish subtitles! 😄️

*Contents:*

* bit-rate: constant (CBR), variable (VBR)
* lossy, lossless, uncompressed
* digital generation loss
* performance (and trade-offs)
* (typical) format examples for: preservation, mezzanine and access use-cases
* tips for keeping the "best" quality (or at least not make it worse ;))

Quick show-n-tell: Using Unix file command to identify file-formats

ArkThis AV-RD 2024-05-21 | This is my very first quick-n-dirty show and tell evening recording.

I'm showing how to identify the filetype (and some of their basic technical properties, like resolution, encoding, etc)

Using the good old unix "file" command, which is installed by default on any GNU/Linux distribution, I've encountered so far.

Documentation is easily available on any command-line (CLI) terminal interface, by typing:

`f: file --help` or `man file`

x AFAIK it also exists by default on MacOS (see: ss64.com/mac/file.html)

This is a great tool to use in DLTP (Digital Long Term Preservation) use-cases. It also shows, that default Linux GUI file-browsers (Thunar/XFCE4 in my setup here) are able to simply identify - and show, including a preview image or proper associated-application icon.

AHAObjectWorld S01E02 - Meta WITH Data

ArkThis AV-RD 2024-05-16 | What if Object Storages become so interoperable, they support:

* guaranteed consistency and linkage of meta+data as Object
* metadata: bit-proof in=out storage for key=value
* metadata: unicode if read as text
* arbitrary separation of "meta and data" given simply by their byte-size (for multi-tier storing)

By "payload" I mean "the heavy stuff". What we now call a file - stripped of it's (meta)data header. That's the whole point ;)

I was not aware of the unreasonably limited (why??! 😭️) - charset and amount restrictions on S3 at that moment.

AHAObjectWorld S01E01 - Introduction

ArkThis AV-RD 2024-05-16 | This is for everyone who:

* has digital data
* somewhere
* and wants to save/search/find/use it

Imagine this:
You have a folder, a text-document, a song, a video and a bunch of photos. as digital "files". Somewhere in folders.

You always know "where" you put your files?
Do your kids have that concept even? A "folder location" - a file name?

Looking into the possibilities of using Object Storage implementations to finally store "meta AND data" as Data Objects. File-and-folder names become mere metadata strings. Full unicode support - and even bit-proof (*).

Simply imagine, you could "drag-n-drop-link-and-tag" any "file" (=object) on your storage to any other "file" (=Object) on your storage - and have any kind of "key=value" information along with it.

Regardless of its file format.
This concept even challenges the necessity of the following "best-and-common practices":

* de-embedding all metadata onto filesystem level. plaintext.
* using spreadsheet files to organize your actual files.
* requiring a database to annotate your data. Especially with relationships involved.

Existing Object Storages provide even more: version-control, failover, etc...

This maybe really great for all GLAM and memory institutions doing digital long-term preservation (DLTP)?

(*) Although some current limitations of S3 regarding size, charset and amount of metadata or tag entries with an Object Storage, I'm convinced allowing full bit-proof if binary and unicode if text, cannot be that much of a problem.

AHAObjectWorld S01E08 - Object IDs and Linked Open Data (LOD)

ArkThis AV-RD 2024-05-16 | Letting go of "a file and foldername and position" in favor of "data object with file-and-foldername as 2 separate key=value metadata pairs

How nice.

A data object is now referred to by its "Object Identifier" (ID). This clip includes an introduction to the idea of "collision-friendly IDs" - or at least a human-machine-readable ID schema:

github.com/ArkThis/AHA_ObjectWorld/blob/master/raw/plans/AHA-P9-Collision_Friendly_Identifiers-Intro.md

That CFI Identifier scheme actually allows a simple, even filename-safe ID string that may serve as "distinct-enough" network-cloud identifier. While staying perfectly human-readable.

And by the way, auto-collide "common" objects, merging their metadata: Therefore organically, shroom-cleaning such object-datasets.

This is a different kind/technique of de-duplification.

Since each Object can easily hold multiple key=value pairs, holding IDs - very similar to WikiData: Therefore by-design supporting Linked Open Data (LOD) as a built-in option.

Any existing dataset may therefore (by caching = copying) those metadata/IDs as Object Metadata. Object-collision cases may easily be fine-tuned, using common data-merging options - configured in profiles.

I believe this to be quite simple, at least do-able, actually.
If not already existing.

AHAObjectWorld S01E09 - Filesystem Metadata Structure

ArkThis AV-RD 2024-05-16 | Having only key=value pairs - flat, without (hierarchy) - how to structure all kind of meta/data in ways that makes access to meta suitable for daily search/retrieval use.

This includes considerations and experiences from working professionally with mass/automated metadata import/export, mapping and designing layouts and standards.

It's a mess.
But there's hope.

A BMP, PNG and TIFF meet on a filesystem...

ArkThis AV-RD 2024-05-15 | This is a short clip introducing the differences between conventional "files-in-folders" and file-format paradigms when it comes to accessing and handling metadata.

#AHAlodeck

(Some) Understanding (of) Digital Audiovisual File Formats

ArkThis AV-RD 2022-12-20 | A short introduction about "what is a digital video file".
(Note: There's a small issue with audio. I'm sorry)

*This video was originally recorded for a workshop for professional film-archivists.*

Big thanks for the Cineteca Chile for hosting the workshops, and providing the Mexican-Spanish subtitles! 😄️🌟️🌈️

*Contents:*

* Digital Video Trinity:
Container, video-encoding, audio-encoding
* What is a container?
* What is a codec?
* Format naming
* How to look "inside" a video-file to check it's technical properties?
(VLC, MediaInfo)