Sometimes you think you know a lot and then…
This was such kind of story. After years of working with different data formats, I thought I knew them quite well. However, Vladimir Alekseichenko from in his webinar “Data Formats” showed the topic from the Machine Learning perspective. Of course, the webinar was to some extent only a revision for me, but thanks to the lecturer I became more familiar with storage layout models, some new file types and data formats used in ML and in Big Data, including Parquet or Feather. The presentation also covered such important issues as data portability and benchmarks focusing on speed or file size. It is a good starting point into further studies on this topic.