Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Aditya Raj
6 min readNov 1, 2020

In this blog, I am going to discuss how we can integrate LVM with Hadoop to provide elasticity to the Data node storage. Let me first explain about the task which I and going to perform.

Task Description :

In this task, we have to make the data node storage elastic so that whenever we required to change the size of data node storage we can achieve it without shutting down the data node or without stopping the service of the data node.

For this task, there are some prerequisites

Prerequisite :

  • Basic knowledge of Hadoop
  • Knowledge of Linux Partition and LVM

Hadoop :

Hadoop is an open-source framework that allows to store and process of big data in a distributed storage environment across the cluster of computers. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

Logical Volume Management :

LVM is a tool for logical volume management which includes allocating disks, striping, mirroring, and resizing logical volumes. With LVM, a hard drive or set of hard drives is allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks. The physical volumes are combined into logical volumes.

Now let’s begin our practical with a simple Hadoop cluster of one data node only.

Step 1: Start the NameNode Service

In this step, we will start the service of NameNode

hadoop-daemon.sh start namenode

We can also check that there is no datanode is connected to namenode.

hadoop dfsadmin -report

Here we can clearly see that no Datanode is available.

Step 2: Add Hard Disk to the DataNode

I am adding one new harddisk to the Datanode because we will share storage from this harddisk only.

Now we can check the Hard Disk with fdisk command.

fdisk -l

Here we can see that a new hard disk of 50GiB is added having the name /dev/sdb.

Step 3: Create Physical Volume from that hard disk.

To create the physical volume we can use the pvcreate command.

pvcreate /dev/sdb

Also, we can display the physical volume with pvdisplay command.

pvdisplay /dev/sdb

Here a physical volume is created of size 50GiB. Also, we can see that the physical volume is not allocatable. So we have to allocate this physical volume to some Volume Group.

Step 4: Create the Volume Group

In this step I am going to create one Volume Group with the name dnvg and also I will allocate the physical volume that we had created in the previous step to this volume group.

To create the volume group we have to use vgcreate command.

vgcreate dnvg /dev/sdb

Also, we can display the physical volume with pvdisplay command.

vgdisplay dnvg

Now we can check the physical volume that it is allocated or not.

We can clearly see that the Allocatable is yes and it is allocated to volume group dbvg.

Step 5: Create Logical Volume of Size 30GiB

In this step, we will create one logical volume with the name dnlv of size 30GiB and also mount it with the folder associated with the DataNode.

To create the logical volume we have to use lvcreate command.

lvcreate — size 30G — name dnlv dnvg

We can display the logical volume with lvdisplay command.

lvdisplay dnvg/dnlv

We can clearly see that one logical volume of size 30GiB is created.

Step 6: Format the Logical Volume

In this step, we will format the logical volume with the ext4 file system so that we can use it to store the data.

To format the logical volume we have to use mkfs command.

mkfs.ext4 /dev/dnvg/dnlv

Step 7: Mount the Logical Volume with the DataNode directory

To mount the logical volume we have to use the mount command.

mount /dev/dnvg/dnlv /dn

We can also check that the logical volume is mounted or not with df command.

df -h

Step 8: Start Datanode service

In this step, we will start the service of DataNode

hadoop-daemon.sh start datanode

Now we can see the report of the Hadoop cluster to check how much storage is shared.

hadoop dfsadmin -report

Here we can clearly see that 30GiB storage is shared. Now we have to increase the storage online i.e elastically storage will increase without stopping the datanode.

Step 9: Increate the Logical Volume Size

In this step, we have to increase the logical volume size by 10GiB. For this, we can use lvextend command.

lvextend — size +10G /dev/dnvg/dnlv

Now we can see that the size of logical volume has been increased from 30GiB to 40GiB.

But if we will check the size of volume which is mounted with /dn directory it will be still 30GiB

It is still 30GiB because we have to update the inode table of the partition(logical volume).

Step 10: Format the extended Logical Volume

In this step, we have to format the extended partition and in LVM we can do format online which means we don’t need to unmount the logical volume.

To format the extended logical volume we have to use the resize2fs command.

resize2fs /dev/dnvg/dnlv

We can also check the size of the logical volume increased or not with the df command.

df -h

Initially, the size of the logical volume was 30GiB but after resizing the size increased to 40GiB.

We can also check the report of Hadoop cluster to confirm that the storage size is increased or not.

Thank You !!

Hope you like it !!

--

--

Aditya Raj

I'm passionate learner diving into the concepts of computing 💻