Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Image for post
Image for post

In this blog, I am going to discuss how we can integrate LVM with Hadoop to provide elasticity to the Data node storage. Let me first explain about the task which I and going to perform.

In this task, we have to make the data node storage elastic so that whenever we required to change the size of data node storage we can achieve it without shutting down the data node or without stopping the service of the data node.

For this task, there are some prerequisites

  • Basic knowledge of Hadoop
  • Knowledge of Linux Partition and LVM

Hadoop is an open-source framework that allows to store and process of big data in a distributed storage environment across the cluster of computers. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

Image for post
Image for post

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring, and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks. The physical volumes are combined into logical volumes.

Image for post
Image for post

Now let’s begin our practical with a simple Hadoop cluster of one data node only.

In this step, we will start the service of NameNode

hadoop-daemon.sh start namenode

Image for post
Image for post

We can also check that there is no datanode is connected to namenode.

hadoop dfsadmin -report

Image for post
Image for post

Here we can clearly see that no Datanode is available.

I am adding one new harddisk to the Datanode because we will share storage from this harddisk only.

Image for post
Image for post
Image for post
Image for post

Now we can check the Hard Disk with fdisk command.

fdisk -l

Image for post
Image for post

Here we can see that a new hard disk of 50GiB is added having the name /dev/sdb.

To create the physical volume we can use the pvcreate command.

pvcreate /dev/sdb

Also, we can display the physical volume with pvdisplay command.

pvdisplay /dev/sdb

Image for post
Image for post

Here a physical volume is created of size 50GiB. Also, we can see that the physical volume is not allocatable. So we have to allocate this physical volume to some Volume Group.

In this step I am going to create one Volume Group with the name dnvg and also I will allocate the physical volume that we had created in the previous step to this volume group.

To create the volume group we have to use vgcreate command.

vgcreate dnvg /dev/sdb

Also, we can display the physical volume with pvdisplay command.

vgdisplay dnvg

Image for post
Image for post

Now we can check the physical volume that it is allocated or not.

Image for post
Image for post

We can clearly see that the Allocatable is yes and it is allocated to volume group dbvg.

In this step, we will create one logical volume with the name dnlv of size 30GiB and also mount it with the folder associated with the DataNode.

To create the logical volume we have to use lvcreate command.

lvcreate — size 30G — name dnlv dnvg

Image for post
Image for post

We can display the logical volume with lvdisplay command.

lvdisplay dnvg/dnlv

Image for post
Image for post

We can clearly see that one logical volume of size 30GiB is created.

In this step, we will format the logical volume with the ext4 file system so that we can use it to store the data.

To format the logical volume we have to use mkfs command.

mkfs.ext4 /dev/dnvg/dnlv

Image for post
Image for post

To mount the logical volume we have to use the mount command.

mount /dev/dnvg/dnlv /dn

We can also check that the logical volume is mounted or not with df command.

df -h

Image for post
Image for post

In this step, we will start the service of DataNode

hadoop-daemon.sh start datanode

Image for post
Image for post

Now we can see the report of the Hadoop cluster to check how much storage is shared.

hadoop dfsadmin -report

Image for post
Image for post

Here we can clearly see that 30GiB storage is shared. Now we have to increase the storage online i.e elastically storage will increase without stopping the datanode.

In this step, we have to increase the logical volume size by 10GiB. For this, we can use lvextend command.

lvextend — size +10G /dev/dnvg/dnlv

Image for post
Image for post

Now we can see that the size of logical volume has been increased from 30GiB to 40GiB.

But if we will check the size of volume which is mounted with /dn directory it will be still 30GiB

Image for post
Image for post

It is still 30GiB because we have to update the inode table of the partition(logical volume).

In this step, we have to format the extended partition and in LVM we can do format online which means we don’t need to unmount the logical volume.

To format the extended logical volume we have to use the resize2fs command.

resize2fs /dev/dnvg/dnlv

We can also check the size of the logical volume increased or not with the df command.

df -h

Image for post
Image for post

Initially, the size of the logical volume was 30GiB but after resizing the size increased to 40GiB.

We can also check the report of Hadoop cluster to confirm that the storage size is increased or not.

Image for post
Image for post

Thank You !!

Hope you like it !!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store