Apache Hive CSV SerDe example
I’m going to show you a neat way to work with CSV files and Apache Hive. Usually, you’d have to do some preparatory work on CSV data before you can consume it with Hive but I’d like to show you a built-in SerDe (Serializer/Deseriazlier) for Hive that will make it a lot more convenient to work with CSV. This work was merged in Hive 0.14 and there’s no additional steps necessary to work with CSV from Hive. Suppose you have a CSV file with the following entries id first_name last_name email gender ip_address 1 James Coleman [email protected] Male 136.90.241.52 2 Lillian Lawrence [email protected] Female 101.177.15.130 3 Theresa Hall [email protected] Female 114.123.153.64 4 Samuel Tucker [email protected] Male 89.60.227.31 5 Emily Dixon [email protected] Female 119.92.21.19 to consume it from within Hive, you’ll need to upload it to hdfs hdfs dfs -put sample .csv /tmp/serdes/ now all it takes is to create a table schema on top o...