Saturday, July 16, 2011

Hadoop Performance Tuning (Hadoop-Hive)

Hadoop Cluster performance tuning is little hectic, because hadoop framework uses all type of resource for processing and analyzing data. So tuning its parameter for good performance is not static one. Parameter values should be change based on clusters following items for better performance:
  • ·         Operating System
  • ·         Processor and its number of cores
  • ·         Memory (RAM)
  • ·         Number of nodes in cluster
  • ·         Storage capacity of each node
  • ·         Network bandwidth
  • ·         Amount of input data
  • ·         Number of jobs in business logic

Recommended OS for hadoop clusters is Linux, because windows and other GUI based OS runs lot of GUI (Graphical user interface) processes and will occupy most of the memory.

Storage capacity of each node should have at-least 5GB extra after storing distributed HDFS input data. For Example if input data in 1 TB and with 1000 node cluster means, (1024GB x 3(replication factor))/1000 nodes = approx 3GB of distributed data in each node, so it is recommended to have at-least 8GB of storage in each node. Because each data node writes log and need some space for swapping memory.


Network bandwidth is recommended to have at-least 100 Mbps, as well known while processing and loading data into HDFS, Hadoop moves lot of data over network. Lower bandwidth channel also degrade the performance of hadoop cluster.

Number of nodes requires for cluster is depends on amount of data to be processed and capacity of each node. For example node with 2GB Memory and 2 core processor can process 1GB of data in average time. It can also process 2 data block (of 256MB 0r 512MB) simultaneously. For Example:  To process 5TB of data, it is recommended to have 1000 nodes with 4-to-8 Core processor and 8-to-10 GB of memory in each node to produce result in few minutes.



Hadoop Parameters:

Data block size (Chunk size): 
        dfs.block.size parameter will be in hdfs-site.xml file, parameter value is mentioned in number of bytes. Block size should be chosen completely based on each node memory capacity. If memory is less then set smaller block size. Because TaskTracker, bring whole block of data to memory while processing. So for 512MB RAM, it is advised to set block size as 64MB or 128MB. If it is dual core processor then TaskTracker can process 2 block of data at same time, so two data block will be bring to memory while processing, so it should be planned according to that, for this have to set concurrent tasktracker parameter also.

Number of Maps and Reducer:
           mapred.reduce.tasks & mapred.map.tasks parameter will be in mapred-site.xml file. By default, number of maps will be equal to number of data block. For example, if input data is 2GB and block size is 256MB means, while processing 8 Maps will run. It won’t bother about memory capacity and number of processor. So we need to tune this parameter to number of nodes*number of cores in each node.

Number of Maps = Total number of processor core available in cluster.

As per above example it runs 8 Maps, if that cluster have only 4 processor core, then multiple thread will start running and keep swapping the memory data, which will degrade the performance of hadoop cluster. In same way set number of reducer to number of core in cluster. After mapping job is over, most of nodes go idle and few nodes working for reducer to complete, to make reducer job to complete fast, set its value to number of nodes or number of core processor.

Logging Level:
            HADOOP_ROOT_LOGGER = ERROR set this value in hadoop script file. By default its set to INFO mode, in information mode, hadoop will log all information about including all event, jobs, tasks completed, IO info, warning and error. It won’t increase huge performance improvement, but it will help to reduce number of log file I/Os and give small improvement in performance.


Hadoop performance tuning part 2 >> Click Here

Above suggestions are observed with Hadoop cluster with Hive querying, please leave a comment and recommend this post by clicking  Facebook ‘Like’ button and ‘+1’ at bottom of this page.

45 comments:

  1. An old white paper on the same topic- http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation

    ReplyDelete
  2. Thanks indoos, the above content i have written is purely from my experience and observed by running with 10node cluster, your suggestion give me idea to know about more parameter to optimize the hadoop cluster, i suggest reader to go through above link also.

    I expect more reader to share their ideas & suggestions to help all readers.

    Thanks once again.

    ReplyDelete
  3. Hi, I am newer to cloud computing, mainly in Hadoop. I am working at configuration and performance tuning at Haddop. Can you suggest what are configuration parameter with their relate values to best output.

    ReplyDelete
  4. @Ratan: I hope this post is helpful for good output. My next post is continuation of this post and give more details about configuration parameter and all. pl wait for my next post(just couple of days).

    Thanks for your support.

    ReplyDelete
  5. Thank you very much. I am waiting for your next post. If it's possible please include configuration parameter for big cluster also. I am waiting for your great suggestion.

    ReplyDelete
  6. Hi Venkat,

    Where can i find all list of paramters and their description for three different configuration files i.e
    hdfs-site.xml paramters(all) list
    core-site.xml paramters(all) list
    mapred-site.xml paramters(all) list

    Please help me !!

    -Ravi

    ReplyDelete
  7. Glad !!! Finally I am able to find default.xml files for hdfs/core/mapred

    src/hdfs/hdfs-default.xml
    src/mapred/mapred-default.xml
    src/core/core-default.xml

    Thanks !!!

    -Ravi

    ReplyDelete
  8. Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.


    Hadoop Training Chennai
    Hadoop Training in Chennai
    Big Data Training in Chennai

    ReplyDelete
  9. Cloud is one of the tremendous technology that any company in this world would rely on(Cloud computing course in Chennai). Using this technology many tough tasks can be accomplished easily in no time. Your content are also
    explaining the same(Cloud computing training chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

    ReplyDelete
  10. Thanks a lot for letting me a chance to visit your any pointers. Your article about ios-applications-development is really impressed me very much.IOS Design And Development

    ReplyDelete
  11. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    rpa training in Chennai | rpa training in pune

    rpa training in tambaram | rpa training in sholinganallur

    rpa training in Chennai | rpa training in velachery

    rpa online training | rpa training in bangalore

    ReplyDelete
  12. All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
    Python training in pune
    AWS Training in chennai
    Python course in chennai

    ReplyDelete
  13. I have read a few of the articles on your website now, and I really like your style of blogging. I added it to my favourites blog site list and will be checking back soon.

    angularjs Training

    in chennai

    angularjs Training in chennai

    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    ReplyDelete

  14. Hmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.

    AWS Interview Questions And Answers

    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    AWS Training in Pune | Best Amazon Web Services Training in Pune

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

    ReplyDelete
  15. From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.
    python training in rajajinagar | Python training in bangalore | Python training in usa

    ReplyDelete
  16. I really like the dear information you offer in your articles. I’m able to bookmark your site and show the kids check out up here generally. Im fairly positive theyre likely to be informed a great deal of new stuff here than anyone

    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in chennai

    automation anywhere online Training

    angularjs interview questions and answers

    ReplyDelete
  17. When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

    Top 110 AWS Interview Question and Answers

    ReplyDelete


  18. Your article is awesome! How long does it take to complete this article? I have read through other blogs, but they are cumbersome and confusing. I hope you continue to have such quality articles to share with everyone! I believe there will be many people who share my views when they read this article from you!

    Big Data Training | Digital Nest


    ReplyDelete
  19. I am really enjoying reading your well written articles.
    It looks like you spend a lot of effort and time on your blog.
    I have bookmarked it and I am looking forward to reading new articles. Keep up the good work..
    Hadoop Training in Chennai
    Big Data Hadoop Training in Chennai
    Hadoop Course in Chennai
    big data courses in bangalore
    hadoop training institutes in bangalore

    ReplyDelete
  20. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays. Well written article Thank You for Sharing with Us.
    Data Science Training in Hyderabad

    ReplyDelete
  21. From your discussion I have understood that which will be better for me and which is easy to use. Really, I have liked your brilliant discussion. I will comThis is great helping material for every one visitor. You have done a great responsible person. i want to say thanks owner of this blog.
    devops online training

    aws online training

    data science with python online training

    data science online training

    rpa online training

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete

  23. And indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.

    slajobs reviews and complaints
    slajobs reviews and complaints
    slajobs reviews and complaints
    slajobs reviews and complaints
    slajobs reviews and complaints

    ReplyDelete
  24. This comment has been removed by the author.

    ReplyDelete
  25. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    Looking for Cloud Computing Training in Bangalore , learn from eTechno Soft Solutions Cloud Computing Training on online training and classroom training. Join today!

    ReplyDelete
  26. The information you have deliver here is really useful to make my knowledge good. Thanks for your heavenly post. It is truly supportive for us and I have accumulated some essential data from this blog.

    Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery

    ReplyDelete
  27. It is a technology that is able to provide the great advancement in information mining and quantifiable illustration affirmation. Now it is considered as a future aspect caliber technology. machine learning course hyderabad

    ReplyDelete

  28. This post is so interactive and informative.keep update more information...
    Azure Training in Bangalore
    Microsoft Azure training in Bangalore

    ReplyDelete
  29. Thanks a lot very much for the high quality and results-oriented help.
    I won’t think twice to endorse your blog post to anybody who wants
    and needs support about this area.
    Best C# Course in Chennai
    best hadoop training in chennai
    software testing courses in chennai

    ReplyDelete