Performance evaluation of a Hadoop cluster using Hibench benchmarks

Assignment 4 Performance evaluation of a Hadoop cluster using Hibench benchmarks Weight: 5 points Due date: May 20, 2024. Disclaimer: To complete this assignment, you need to complete these two tutorials: https://www.ieeepsu.org/basit/cs435/notes/Hadoop1.txt https://www.ieeepsu.org/basit/cs435/notes/Hadoop2.txt Some useful resources to complete this assignment: Deamons GUI ports Namenode Hadoop:9870 YARN Hadoop:8088 • Namenode GUI allows you to observe the status of the cluster data-nodes and hdfs file system. You can browse the file in the hdfs. • YARN GUI allows you to interact with YARN, the demon responsible for executing MapReduce Tasks. Problem statement: Hadoop benchmarks such as Pi, WordCount, and TeraSort serve as crucial tools for assessing the performance and scalability of Hadoop clusters, especially in terms of CPU-intensive and IO-intensive tasks. • • • The Pi benchmark, which calculates the value of Pi using the Monte Carlo method, is particularly useful for evaluating the cluster’s processing capabilities for parallel tasks, making it an ideal benchmark for assessing CPU-intensive workloads. WordCount, on the other hand, measures the efficiency of data processing by counting the occurrences of words in a large dataset, reflecting the cluster’s ability to handle common data manipulation tasks that are often IO-intensive. TeraSort, known for evaluating sorting speed and effectiveness, provides valuable insights into both CPU and IO performance, making it a comprehensive benchmark for optimizing Hadoop cluster configurations and resource allocation in real-world scenarios. These benchmarks are invaluable for ensuring the optimal performance and reliability of Hadoop deployments across diverse big data applications. Single person submission Configuration 1 3 datanodes + 1 namenode All VMs execute on a single machine Two per group submission Configuration 2 7 datanodes + 1 namenode 4 datanodes execute on one machine. The other machine runs 1 namenode and 3 datanodes. In this assignment you will complete various experiments on your cluster and observe the runtimes of the execution jobs. Collect the data and write a report analyzing the performance of your cluster using the CPU intensive and IO intensive benchmarks. The following can be used to guide your experimentation: 1. Pi computation [1 point] Maps 3 3 3 10 100 1000 N 1000 1000000 1000000000 1000000000 1000000000 1000000000 Execution time > cd /usr/local/hadoop/share/hadoop/mapreduce > hadoop jar hadoop-mapreduce-examples-3.3.6.jar pi maps N 2. Experiments with Wordcount [1 point] Create a dataset for wordcount program. Use your knowledge from assignment 3 to download various text files from the project Gutenberg. We intend to create datasets of different sizes to test the performance of wordcount. Use the information in Hadoop tutorial 2/2 to upload your files to HDFS. > cd /usr/local/hadoop/share/hadoop/mapreduce > hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount Dataset size (MB) 128MB 512MB 1GB 1.5 GB # of blocks in HDFS Execution time Check YARN GUI to observe the following for your wordcount jobs: • • • Number of Map tasks Number of Reduce tasks Node Local requests # of maps/reduce • • Rack Local requests Container placement in the cluster; which container or what node? Download the resulting file to verify the correct execution of your input. These could be useful to improve your report analysis. 3. Experiments with Terasort [1 point] Create a dataset for terasort program. Learn about using TeraGen/Terasort. Build datasets and execute the teragen to observe the runtimes as follows: Dataset size (MB) 128MB 512MB 1GB 1.5 GB # of blocks in HDFS Execution time # of maps/reduce Writing Report Write a comprehensive report detailing your experimental evaluation using Hadoop. At the very least your report should consist of the following sections • • • Experiments Analysis of results Conclusions Assignment Deliverables The deliverables for the project are the following. These need to be uploaded to LMS. • • A complete report A URL for Video: Students are required to screen-capture a video showcasing the execution of any of the experiments (Pi, wordcount or terasort). The video can be posted on YouTube. A link to the video would be submitted for review. Submission and Grading • • All submissions are through LMS. Upload the Two deliverables to the LMS. Grading: Correct Execution of the • • • Pi: 20% Wordcount: 20% Terasort: 20% • Report: 40% (Your report should have the following sections, Experiments, Analysis of results and conclusion, with appropriate illustrations/graphs etc). Additional Notes: • • • • • Any Student would be requested to present their work. The instructor reserves the right to “interview” any student on their submission to see the understanding of the submission. The instructor may also ask the student to run the programs to satisfy any test-case(s) there in. It is the student’s responsibility to verify that all files have been uploaded to the LMS. Incomplete or wrong file types that do not execute will NOT be graded. After an assignment/project has been graded, re-submission with an intention to improve an assignments score will not be allowed.

Performance evaluation of a Hadoop cluster using Hibench benchmarks

We offer the best custom writing paper services. We have answered this question before and we can also do it for you.

GET STARTED TODAY AND GET A 20% DISCOUNT coupon code DISC20

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.