Data File Format define standard ways of storing information in a file or database. We require different file formats for different use cases. For example — If we know that only Python systems are going to read our file then we can choose Pickle format as it is highly optimized. CSV data format has been the most widely used opti o n for data storage. Using CSV, we can read from and write to most data software. However, there is no schema attached and no standard way to control characters and it is not the best way to deal with complex data. In this article, we will discuss the evolving ways of the data file format: Parquet: Parquet is one of the most common data storage formats for Big Data as it is very fast. It also understands all the data types used by Pandas, including multi-index data frames. It is optimized to work with complex data in bulk and features different ways for efficient data compression and encoding types. It is mostly used as a data warehouse or data lak...
Comments
Post a Comment