Netbackup buffer tuning.

Applies for Netbackup version : 3.x 4.x 5.x 6.x and 7.x

Any site using Netbackup should use time to configure and adjust the BPTM buffers. It does improve performance – a LOT.

Just to show what a difference the buffer settings really do for a LTO3 drive:

Before tuning: 19MB/sec (default values)

Second tuning attempt: 49MB/sec (using 128K block size, 64 buffers)

Final result: 129MB/sec  (using 128K block size and 256 buffers)

Since it’s a LOT3 drive have a native transfer rate of 80MB/sec any further tuning attempt are meaningless.

Buffer tuning advice could not be found anywhere in the Netbackup manuals until NBU5.x was released. Its now firmly documented in tech notes and the in “Netbackup backup planning and performance tuning guide”

All Net backup installations have a directory in /usr/openv/netbackup called db. The db directory is home for various configurations files, and the files for controlling the memory buffer size and the amount of buffers therefore lives here. For a full understanding of what the settings control, I need to be a little technical. All incoming backup data from the network are not written to tape immediately. It’s buffered in memory. If you ever have wondered why Netbackup activity monitor show files being written even if no tape was mounted, this is why.

Data are received by a bptm process and written into memory. The bptm process works on a block basic. Once a block is full, it’s written to tape by another bptm process. Since the memory block size is equal to the SCSI block size on tape this, we need to take care !!. By using bigger block size we can improve tape writes by reducing system overhead. To small a block sizes causes extensive repositioning (shoe shining) which also leads to:

  • Slow backups because the tape drive will use time for repositioning instead of writing data.
  • Increased wear on media and tape drive. Because of multiple tapes passes of the R/W head.
  • Increased build up of magnetic martial on the Read/Write head. Often a cleaning tape can’t remove this dirt!!

Netbackup config files for block size is controlled by SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS. SIZE_DATA_BUFFERS defines the size (see table below) and NUMBER_DATA_BUFFERS controls how may buffers are reserved for each stream. Memory used for buffers are allocated in shared memory. Shared memory can not be paged or swapped.

32K block size

32768

64K block size

65536

128K block size

131072

256K block size (default size)

262144

As far as I know you can’t set the value to any thing higher than 256KB because it is the biggest supported SCSI block size.  To ensure enough free memory, use this formula to calculate memory consumption:

number of tape drives  * MPX * SIZE_DATA_BUFFERS * NUMBER_DATA_BUFFERS = TOTAL AMOUNT OF SHARED MEMORY

or a real world example

8 tape drive * 5 MPX streams  * 128K block size  * 128 buffers per stream  = 655MB

If all drives and streams are in use, total memory usage are 655 MB (shared memory). Just remember that this amount change if any parameters are changed e.g. number of rives, larger MPX setting etc. etc. I think that reasonable NUMER_DATA_BUFFERS value would be 128 or 256. The value must be a power of 2.

Configuring SIZE_DATA_BUFFERS & NUMBER_DATA_BUFFERS:

The cookbook (the bullet proof version). We assume that the wanted configuration is 128K block size and 16 memory buffers per stream

touch /usr/open/netbackup/db/config/SIZE_DATA_BUFFERS
touch /usr/open/netbackup/db/config/NUMBER_DATA_BUFFERS
echo 262144 >> /usr/open/netbackup/db/config/SIZE_DATA_BUFFERS
echo 256 >> /usr/open/netbackup/db/config/NUMBER_DATA_BUFFERS

The configuration files needs to be created on all media servers. You don’t need to bounce any daemons. The very next backup will use the new settings. But be carefully to set the SIZE_DATA_BUFFERS value right. A misconfigured value will impact performance negative. Last thing to do is verifying the settings. We can get the information from the bptm logs in /usr/openv/netbackup/logs/bptm. Look for the io_init messages in BPTM log.

00:03:25 [17555] <2> io_init: using 131072 data buffer size
00:03:25 [17555] <2> io_init: CINDEX 0, sched Kbytes for monitoring = 10000
00:03:25 [17555] <2> io_init: using 24 data buffers

The io_init messages show up for every new bptm process. The value in the brackets is the PID. If you need to see what happened, do a grep on the PID.

Getting data back – NUMBER_DATA_BUFFERS_RESTORE

One thing is storing data on tape, getting them back fast is more important. When restoring data the BPTM process look for a file NUMBER_DATA_BUFFETS_RESTORE. Default value i 8 buffers, after my opinion this is WAY TO LOW. Use a value of 256 or larger. To verify  grep for “mpx_restore_shm_:”

3:08:14.308 [3328] <2> mpx_setup_restore_shm: using 512 data buffers, buffer size is 262144

Disk Buffers NUMBER_DATA_BUFFERS_DISK.

Work’s the same way as NUMBER_DATA_BUFFERS. In NBU 5.1 the default buffer size has been raised to 256KB (largest SCSI block size possible). You can however lower that value with SIZE_DATA_BUFFERS_DISK. If NUMBER_BUFFERS_DISK/SIZE_DATA_BUFFERS_DISK doesn’t exists values from NUMBER_DATA_BUFFERS/SIZE_DATA_BUFFERS are used.

Do backup/restore test.

Since Netbackup do automatic block size determination, every thing should work without any problem. However, please do backup/restore test.

ACSLS volume access control

Oracle ACSLS volume access control is a very useful feature when sharing a tape library between multiple hosts. Normal ACSLS operation allows all host to see all tapes – this is not wanted if you have multiple Netbackup domain attached, as tape may be overwritten because Netbackup “greedy” design of using tapes available . Careful configuration of host application may avoid this scenario – But this solution is vulnerable to errors. ACSLS’s volume access control feature add an extra layer of security. This guide is intended as a “configuring guide” explaining in details how to configure.

Step 1: Enable volume access control by starting acsss_config option 4 ” Set Access Control Variables”.

Answer TRUE to “Access control is active for volumes”

Answer NOACCESS to “Default access for volumes ACCESS/NOACCESS”.

Step 2: Go to /export/home/ACSSS/data/external and edit file vol_attr.dat. This file specify what ranges of tapes are owned by who. The owner is a definition, not a host.

Sample of vol_attr.dat – each field is delimited by a pipe sign |
000000-019999|ob-nile||force|
200000-299999|ob-nile||force|
300000-399999|ob-nile||force|
500000-599999|ob-main||force|
D20000-D39999|ob-triton||force|
D40000-D49999|ob-triton||force|
D10000-D19999|ob-proteus||force|

Field 1: tape rang – Specify a range. Cleaning tapes live their own lives – you can’t set ownership on them.

Field 2: Owner of tape range (definition not host name). In this example ob means “owned by” – the last part is the master server name. But the same can be anything – just be careful with special charters. Some version of ACSLS have problem with underscore sign “_”.

Field 3: pool id – not use at our site.

Field 4: force or blank. This option allow ACSLS to override previous volume owner ship. I recommend settings this field to “force”.

Field 5: move-to-lsm (not use at our site). Here you can define a home LSM for the defined volume series. We let our tapes flow freely so this field is blank.

Step 3: Go to /export/home/ACSSS/data/externa/access_control/. Edit the file internet.addresses – This file converts IP addresses to names. The names do not need to be a DNS style conversion but I highly recommend it’s kept that way. Add all host that will do mount/dismount requests.

Sample from internet.addresses (shorted for easy reading):

10.1.1.1 main
10.1.1.2 triton
10.1.1.3 proteus
10.1.1.4 congo
10.1.1.5 tyne

Step 4: edit users.ALL.allow – This file decides which host defined internet.addresses are allow to see tape ranges defined in vol_attr.dat. Specify all servers who are allowed to share/see the same tapes.

Sample of users.ALL.allow
ob-nile donau ganges mekong volga tyne hudson oder
ob-triton triton atlas gaia nyx rhea
ob-main congo gobi klat darwin indus

You can read the file as tapes owned by “ob-nile” are accessible to hosts “donau ganges mekong volga tyne hudson and oder”. All other tapes series are filter by ACSLS.

Step 5:Type the command acsss_config and chose option 6 – Rebuild Access Control information. Do a ps -ef and check the process watch_vols is stared.

Step 6: From now on all tapes entered through the cap will have have permissions set by ACSLS. Tapes already in LSM will need to have permission set by admin. From cmd_proc do a:

set owner {owner } volume {barcode star}-{barcode_end} or real world example “set owner “ob-nile” volume 000000-199999″

Tapes not matched in vol_attr.dat will be owned by SYSTEM. You can see volume ownership by issuing the command:

# /export/home/ACSSS/bin/volrpt -d -f /export/home/ACSSS/data/external/volrpt/owner_id.volrpt

Step 7: Keep an eye on acsss_event.log if a mis-configuration prevent tape mount/dismounts. A error message similar to the one below is displayed:

16:15:31 29-09-2008 QUERY[0]:
728 N cl_ac_vol_access.c 1 265
cl_ac_vol_access: Volume Access Denied
Command , Volume <500499>, Host ID <10.1.22.102>, Access ID <>

A monitoring routine should be implemented for tape without a ACSLS owner. This can happen when multiple event occur at the same time

Iperf & Data Domain

Having a good IP connection to your data domain is vital for good operation. Once excellent tool to either relax or concern you, is the Ipef tool. This is a small guide on how to use this great tool.

Prerequisite:
Backup server must have iperf installed. For Red Hat Linux you can install the package with:
yum install iperf.x86_64

Logon on to DDOS by ssh and run the following command:
# net iperf server run

On the backup server run:
# iperf -c {DNS name of Data Domain} -t 10 -i 2

-t indicates to run iperf for 10 seconds
-i 2 will cause iperf to report in 2 sec intervals.

This will output something like:
------------------------------------------------------------
Client connecting to dd01.acme.com, TCP port 5001
TCP window size: 512 KByte (default)
------------------------------------------------------------
[ 3] local 10.1.1.1 port 10066 connected with 10.1.1.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 1.20 GBytes 5.14 Gbits/sec
[ 3] 2.0- 4.0 sec 1.18 GBytes 5.05 Gbits/sec
[ 3] 4.0- 6.0 sec 1.17 GBytes 5.04 Gbits/sec
[ 3] 6.0- 8.0 sec 1.18 GBytes 5.07 Gbits/sec
[ 3] 8.0-10.0 sec 1.18 GBytes 5.06 Gbits/sec
[ 3] 0.0-10.0 sec 5.90 GBytes 5.07 Gbits/sec

This show there is 5Gbit of available network bandwidth, and its safe to conclude no network issues exist between backup server and Data Domain appliance

Update: the iperf client and server is not compatible with version 3 of Iperf