Position Details:
Big Data Developer
|
|
Job Duties:
Design, develop, configure, and implement high
performance data processing applications using Big Data Technology stack -
Apache Hive, Sqoop, HDFS, MapReduce, Oozie, Apache Spark, Scala, Kafka, and
Kerberos. Develop Hadoop MapReduce jobs in Java to process web log data on
HDFS. Develop batch processing applications on Hadoop platform using - Apache
Hive and Apache Spark in Scala. Develop ETL jobs using Data Integration tools
like Informatica to build Data Warehouse applications on Oracle. Perform performance
tune of Data Warehouse SQL queries by using data base indexes and tuning joins.
Develop Apache Sqoop scripts to pull data from Relational Database Management
Systems and inject into Hortonworks Hadoop cluster. Develop UNIX Shell Scripts
to automate file transmission jobs in distributed environments. Develop jobs in Apache Spark using high level
Application Programming Interface like Data Frames/Datasets and Spark SQL to
process and aggregate data in Hadoop cluster. Optimize Hive table design by
applying partitioning and bucketing techniques to reduce Hive queries execution
time to sub second. Develop Business Intelligence SQL queries and test them
against Low Level Analytical Processing engines -Druid, Apache Phoenix,
Kinetica, Brytlyt, Hive LLAP, Jethro, and Presto. Develop job orchestration
scripts in Oozie to automate Hive, Sqoop, and Spark jobs on Hortonworks Hadoop
Cluster. Perform System Integration testing and resolve defects arising out of
the testing. Work on production install activities, and perform code and data
validation. Develop back-out plans in case of data or code issues during
production install. Provide post production install support. Will work in Glastonbury, CT
and/or various client sites throughout the U.S. Must be willing to travel and
/or relocate.
|
|
|
|
|