Netbackup buffer tuning.

Applies for Netbackup version : 3.x 4.x 5.x 6.x and 7.x

Any site using Netbackup should use time to configure and adjust the BPTM buffers. It does improve performance – a LOT.

Just to show what a difference the buffer settings really do for a LTO3 drive:

Before tuning: 19MB/sec (default values)

Second tuning attempt: 49MB/sec (using 128K block size, 64 buffers)

Final result: 129MB/sec  (using 128K block size and 256 buffers)

Since it’s a LOT3 drive have a native transfer rate of 80MB/sec any further tuning attempt are meaningless.

Buffer tuning advice could not be found anywhere in the Netbackup manuals until NBU5.x was released. Its now firmly documented in tech notes and the in “Netbackup backup planning and performance tuning guide”

All Net backup installations have a directory in /usr/openv/netbackup called db. The db directory is home for various configurations files, and the files for controlling the memory buffer size and the amount of buffers therefore lives here. For a full understanding of what the settings control, I need to be a little technical. All incoming backup data from the network are not written to tape immediately. It’s buffered in memory. If you ever have wondered why Netbackup activity monitor show files being written even if no tape was mounted, this is why.

Data are received by a bptm process and written into memory. The bptm process works on a block basic. Once a block is full, it’s written to tape by another bptm process. Since the memory block size is equal to the SCSI block size on tape this, we need to take care !!. By using bigger block size we can improve tape writes by reducing system overhead. To small a block sizes causes extensive repositioning (shoe shining) which also leads to:

  • Slow backups because the tape drive will use time for repositioning instead of writing data.
  • Increased wear on media and tape drive. Because of multiple tapes passes of the R/W head.
  • Increased build up of magnetic martial on the Read/Write head. Often a cleaning tape can’t remove this dirt!!

Netbackup config files for block size is controlled by SIZE_DATA_BUFFERS and NUMBER_DATA_BUFFERS. SIZE_DATA_BUFFERS defines the size (see table below) and NUMBER_DATA_BUFFERS controls how may buffers are reserved for each stream. Memory used for buffers are allocated in shared memory. Shared memory can not be paged or swapped.

32K block size

32768

64K block size

65536

128K block size

131072

256K block size (default size)

262144

As far as I know you can’t set the value to any thing higher than 256KB because it is the biggest supported SCSI block size.  To ensure enough free memory, use this formula to calculate memory consumption:

number of tape drives  * MPX * SIZE_DATA_BUFFERS * NUMBER_DATA_BUFFERS = TOTAL AMOUNT OF SHARED MEMORY

or a real world example

8 tape drive * 5 MPX streams  * 128K block size  * 128 buffers per stream  = 655MB

If all drives and streams are in use, total memory usage are 655 MB (shared memory). Just remember that this amount change if any parameters are changed e.g. number of rives, larger MPX setting etc. etc. I think that reasonable NUMER_DATA_BUFFERS value would be 128 or 256. The value must be a power of 2.

Configuring SIZE_DATA_BUFFERS & NUMBER_DATA_BUFFERS:

The cookbook (the bullet proof version). We assume that the wanted configuration is 128K block size and 16 memory buffers per stream

touch /usr/open/netbackup/db/config/SIZE_DATA_BUFFERS
touch /usr/open/netbackup/db/config/NUMBER_DATA_BUFFERS
echo 262144 >> /usr/open/netbackup/db/config/SIZE_DATA_BUFFERS
echo 256 >> /usr/open/netbackup/db/config/NUMBER_DATA_BUFFERS

The configuration files needs to be created on all media servers. You don’t need to bounce any daemons. The very next backup will use the new settings. But be carefully to set the SIZE_DATA_BUFFERS value right. A misconfigured value will impact performance negative. Last thing to do is verifying the settings. We can get the information from the bptm logs in /usr/openv/netbackup/logs/bptm. Look for the io_init messages in BPTM log.

00:03:25 [17555] <2> io_init: using 131072 data buffer size
00:03:25 [17555] <2> io_init: CINDEX 0, sched Kbytes for monitoring = 10000
00:03:25 [17555] <2> io_init: using 24 data buffers

The io_init messages show up for every new bptm process. The value in the brackets is the PID. If you need to see what happened, do a grep on the PID.

Getting data back – NUMBER_DATA_BUFFERS_RESTORE

One thing is storing data on tape, getting them back fast is more important. When restoring data the BPTM process look for a file NUMBER_DATA_BUFFETS_RESTORE. Default value i 8 buffers, after my opinion this is WAY TO LOW. Use a value of 256 or larger. To verify  grep for “mpx_restore_shm_:”

3:08:14.308 [3328] <2> mpx_setup_restore_shm: using 512 data buffers, buffer size is 262144

Disk Buffers NUMBER_DATA_BUFFERS_DISK.

Work’s the same way as NUMBER_DATA_BUFFERS. In NBU 5.1 the default buffer size has been raised to 256KB (largest SCSI block size possible). You can however lower that value with SIZE_DATA_BUFFERS_DISK. If NUMBER_BUFFERS_DISK/SIZE_DATA_BUFFERS_DISK doesn’t exists values from NUMBER_DATA_BUFFERS/SIZE_DATA_BUFFERS are used.

Do backup/restore test.

Since Netbackup do automatic block size determination, every thing should work without any problem. However, please do backup/restore test.

SCSI re-scan on Linux

No needs to reboot your Linux system if you add new fibre channel devices to it. You can discover them online.

This command will initiate the Loop Initialization Protocol for the selected adapter. The LIP sends a message to the HBA to look for new devices. Since you can have multiple fibre channel HBA’s make sure to send the LIP to the right adapter.

Example:
# echo 1 > /sys/class/fc_host/host<number>/issue_lip

After a LIP the FC adapter know the new devices, but the OS does not know about it yet. The next command will initiate a device discovery of the kernel SCSI layer.

# echo “- – -” > /sys/class/scsi_host/host<number>/scan

A tail on /var/log/messages or dmesg should unveil the newly found devices.

Feb 4 13:09:38 tamar kernel: Vendor: HP Model: Ultrium 5-SCSI Rev: I2DS
Feb 4 13:09:38 tamar kernel: Type: Sequential-Access ANSI SCSI revision: 06
Feb 4 13:09:38 tamar kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 1>
Feb 4 13:09:38 tamar kernel: st: Version 20070203, fixed bufsize 32768, s/g segs 256
Feb 4 13:09:38 tamar kernel: st 0:0:0:0: Attached scsi tape st0
Feb 4 13:09:38 tamar kernel: st0: try direct i/o: yes (alignment 512 B)

Netbackup OID

Originator ID as of NBU 7.6.0.4

Listed here because its easier to find than looking them up in the manual 😀

18: Authentication Broker nbatd
111: Enterprise Media Manager nbemm
116: Policy Execution Manager nbpem
117: Job Manager nbjm
118: Resource Broker nbrb
119: BMR Master Server Daemon bmrd
121: BMR Save Configuration bmrsavecfg
122: BMR Client Utility bmrc
123: BMR Server Utility bmrs
124: BMR Create Floppy bmrcreatefloppy
125: BMR Create SRT bmrsrt
126: BMR Prepare to Restore bmrprep
127: BMR Setup Commands bmrsetup
128: BMR Libraries and Common Code bmrcommon
129: BMR Edit Configuration Utility bmrconfig
130: BMR Create Package bmrcreatepkg
131: BMR Restore Utility bmrrst
132: NetBackup Service Layer nbsl
134: NDMP Agent ndmpagent
137: Libutil and libmessaging
140: Media Server UI
142: BMR External Procedure bmrepadm
143: EMM Media and Device Selection
144: EMM Device Allocator
151: NDMP ndmp
154: BMR Override Table Admin Utility bmrovradm
156: NBACE
158: Resource Access Interface
159: Transmater
163: NetBackup Service Monitor nbsvcmon
166: nbvault nbvault
178: Disk Service Manager
199: FT Server nbftsrvr
200: FT Client nbftclnt
201: FT Service Manager
202: Storage Service
210: Exchange FireDrill Wizard ncfive
219: Resource Event Manager
220: Disk Polling Service
221: Media Performance Monitor Service mpms
222: Remote Monitoring & Management Service nbrmms
226: Storage Services nbstserv
230: Remote Disk Service Manager
231: Event Management Service nbevtmgr
248: BMR Launcher Utility bmrlaunch
254: NetBackup Recovery Assistant for Sharepoint Portal Server
261: Artifact Generator Generated Source aggs
263: Windows GUI wingui
264: Windows BAR GUI winbargui
271: Legacy Error Codes
272: Expiration Manager expmgr
286: Encryption Key Management Service nbkms
293: Netbackup Audit Service nbaudit
294: Netbackup Audit Messages
309: NetBackup Client Framework ncf
311: NetBackup Client/Server Communications
317: NetBackup Client Beds Plugin
318: NetBackup Client Windows Plugin
321: NetBackup Relational Database access library
348: NetBackup Client Oracle Plugin
351: Live Browse Client ncflbc
352: Granular Restore ncfgre
355: NetBackup TAR Plugin
356: NetBackup Client VxMS Plugin
357: NetBackup Restore ncfnbrestore
359: NetBackup Browser ncfnbbrowse
360: NetBackup OraUtil ncforautil
361: NetBackup Client DB2 Plugin
362: NetBackup Agent Request Server nbars
363: Database Agent Request Service
366: NetBackup Client Services ncfnbcs
369: Import manager impmgr
371: NetBackup Indexing Manager nbim
372: NetBackup Hold Service Manager
373: NetBackup Indexing Service Manager
375: NetBackup Client Search Server Plugin
377: NetBackup Client Component Discovery ncfnbdiscover
380: NetBackup Client Component Quiescence/Unquiescence ncfnbquiescence
381: NetBackup Client Component Offline/Online ncfnbdboffline
385: NetBackup Content Indexer ncfnbci
386: NetBackup Client VMware Plugin
387: NetBackup Remote Network Transport nbrntd
395: STS Event Manager
396: NetBackup Utilities nbutils
398: NB Search EV Ingest nbevingest
400: NetBackup Discovery nbdisco
401: NetBackup Client MSSQL Plugin
402: NetBackup Client Exchange Plugin
403: NetBackup Client SharePoint Plugin
412: NetBackup File System Plugin ncffilesyspi
428: NetBackup FlashBackup
433: BMR P2V Request utility
433: bmrb2v
434: BMR P2V Restore utility bmrb2vrst
436: NetBackup Web Management Console nbwmc
439: NetBackup Web Service nbwebservice
443: NetBackup Core Web Service
444: NetBackup VM search Plugin
445: NetBackup Hardware Snapshot File Restore ncfnbhfr
447: NetBackup Session Manager ncfnbsessionmgr
448: NetBackup Single File Restore ncfsfr
450: NetBackup OpenStorage Proxy Server nbostpxy
451: NetBackup VM Copyback ncfnbvmcopyback
453: BMR Job Error Codes

 

bplist

Netbackup bpflist command is difficult to get working, however I got the command working remembering a few options.

  • Recursion level (-rl 999)
  • Policy Type (-pt {NAME})
  • -option GET_ALL_FILES

Using bpflist to List UNIX files
# bpflist -d 01/15/2011  -client acme123456789  -rl 999 -option GET_ALL_FILES

Using bpflist to list files a Windows host
# bpflist -d 01/1/2011 -client acme123456789  -rl 999 -pt MS-Windows-NT  -option GET_ALL_FILES

Using bpflist to list files a from a SAP BACKUP
# bpflist -d 01/1/2011 -client acme123456789 -rl 999 -pt SAP -option GET_ALL_FILES

Valid Policy types in Netbackup 7.6
0 = Standard (UNIX)
4 = Oracle
6 = Informix-On-BAR
7 = Sybase
8 = MS-SharePoint
10 = NetWare
11 = DataTools-SQL-BackTrack
13 = MS-Windows
14 = OS/2
15 = MS-SQL-Server
16 = MS-Exchange-Server
17 = SAP
18 = DB2
19 = NDMP
20 = FlashBackup
21 = Split-Mirror
22 = AFS
25 = Lotus Notes
29 = FlashBackup-Windows
35 = NBU-Catalog
39 = Enterprise-Vault
40 = VMware
41 = MS-Hyper-V

Netbackup Status code 2074 & disk volume is down

Symptom:
Backups to a Disk Storage Unit (DSU) intermittent fails with status code 2074 and an “EMM status: Disk volume is down” message.

31-07-2011 15:24:08 – requesting resource jpto-dsu
31-07-2011 15:24:08 – requesting resource affmaster01.NBU_CLIENT.MAXJOBS.somehost001.acme.net
31-07-2011 15:24:08 – requesting resource affmaster01.NBU_POLICY.MAXJOBS.WIN_FS_JPTO
31-07-2011 15:24:08 – awaiting resource jpto-dsu.
31-07-2011 19:32:27 – Error nbjm (pid=30793) NBU status: 2074, EMM status: Disk volume is down

Cause:
MacAfee was scanning the backup images written to the DSU.

Resolution:
Exclude MacAfee from scanning the Netbackup Images files in the DSU area.

See also Symantec Tech Note:
General recommendations for virus scanner exclusions working with NetBackup

Next tech note has a full description of how to exclude Netbackup in Mcafee

3RD PARTY: NetBackup Services are randomly shutting down on Windows servers after applying a patch for McAfee McShield 8.5 or 8.7i

 

Delay in NFS write operation using Data Domain and Netbackup

Symptom:
A 5 minutes delay may be expired when Netbackup 6.x or 7.x writes to a EMC Data Domain via  NFS share.

21/2012 11:59:14 - connecting
02/21/2012 11:59:14 - connected; connect time: 0:00:00
02/21/2012 11:59:15 - Info bptm (pid=28467) start
02/21/2012 11:59:15 - Info bptm (pid=28467) using 131072 data buffer size
02/21/2012 11:59:15 - Info bptm (pid=28467) using 128 data buffers
02/21/2012 11:59:15 - Info bptm (pid=28467) start backup
02/21/2012 12:05:16 - begin writing
02/21/2012 12:05:38 - Info bpbkar (pid=28466) bpbkar waited 4885 times for empty buffer, delayed 4917 times
02/21/2012 12:05:38 - Info bptm (pid=28467) waited for full buffer 1 times, delayed 28 times

Cause:
Incorrect NFS mount options. Netbackup debug logs (BPMT) do not give any indication of why this delay is present even with VERBOSE = 5.

Resolution:
EMC recommend the following qualifiers (Kernel 2.6 or newer):

mount -t nfs -o hard,intr,nfsvers=3,tcp,bg rstr01:/backup /dd/backup

or /etc/fstab options (all in one line):

duplo01:/data/col1/backup2 /backup2 nfs nolock,hard,intr,nfsvers=3,tcp,retry=10,rsize=32768,wsize=32768,bg 0 0

How to retrieve tape drive serial numbers with sg_inq

This small piece of code let you retrieve what serial numbers a tape has has. Quite convenient when configuring or replacing a lot of tape drives

#!/usr/bin/ksh
OS=`uname`
if [ $OS != “Linux” ]
then
echo “Script will not work on non-linux variants”
exit 1
fi
if [ ! -f /usr/bin/sg_inq ]
then
echo “sg_inq command not found. Do you have the sg3_util package installed ?”
exit 1
fi
tpconfig -l | grep drive | grep “/dev/n” | awk ‘{ print $8 ” “$9 }’ | while read NAME DEVICE
do
echo -n “Drive $NAME, $DEVICE”
/usr/bin/sg_inq $DEVICE | grep “serial”
done

Output will look like this:
# ./get_device_serial_number

Drive 0315-2F, /dev/nst5 Unit serial number: HU19477MA8
Drive 0011-2F, /dev/nst0 Unit serial number: HU10623E8J
Drive 0116-2F, /dev/nst4 Unit serial number: HU1052790W
Drive 10115-TN, /dev/nst1sg_inq: error opening file: /dev/nst1: Device or resource busy

A “Device or resource busy” mean the tape drives is busy (e.g using SSO) – this will prevent sg_inq from reading the tape drive serial number. Wait for current tape operation to stop and re-run the script.

The efficiencies of EMC BOOST Distributed Segment Processing

What EMC Data Domain BOOST & distributed segment processing can do to reduce the bandwidth usage. The picture shows the load on a 10G Ethernet link to a Data Domain before and after the enablement of Distributed Segment Processing.  Distributed segment processing was enabled 29-05-2013 during the day. The difference was so big that the network department  warned us that backup had stopped working 🙂

boost-graph

 

The distributed segment processing is turned on by default from DD OS 4.8 (current DD vesion is 5.5*)
The graph show Mbits/sec but it is in GB/sec.

RMAN input & output using Intelligent Polices in Netbackup 7.6

This is a tip on how to find the RMAN input/output when using Intelligent Polices in Netbackup 7.6.

Go to [install_path]\netbackup\logs\bpdbsbora

If the directory does not exist, create it and let Netbackup write debug information into the directory.

A series of log files are located in format: log.MMDDYY

Where MM is month
Where DD is the date
Where YY is two digit year

View one of the files and search for either:

“BEGIN LOGGING RMAN OUTPUT”

or

“BEGIN LOGGING RMAN INPUT”

Shell output will also be captured in the log file.

***************** BEGIN LOGGING RMAN INPUT *****************
# —————————————————————–
# RMAN command section
# —————————————————————–
RUN {
# Backup Archived Logs
sql ‘alter system archive log current’;
ALLOCATE CHANNEL ch00
TYPE ‘SBT_TAPE’
PARMS ‘SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64’;

***************** BEGIN LOGGING RMAN OUTPUT *****************
tset: standard error: Invalid argument
stty: standard input: Inappropriate ioctl for device
stty: standard input: Inappropriate ioctl for device
/opt/oracle/.profile[39]: tabs: not found [No such file or directory]
/opt/oracle/.profile[69]: .: ./.profile_body: cannot open [No such file or directory]
Recovery Manager: Release 11.2.0.2.0 – Production on Tue Jan 14 03:50:53 2014
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ODSP112 (DBID=3565432847)

If you are only interested in the generated RMAN script using templates see /usr/openv/netbackup/logs/user_ops/dbtemplates/oracle

# cat runrman.1389786675.520.tmp
connect target /
# —————————————————————–
# RMAN command section
# —————————————————————–

RUN {

# Backup Archived Logs
sql ‘alter system archive log current’;

ALLOCATE CHANNEL ch00
TYPE ‘SBT_TAPE’
PARMS ‘SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64’;
SEND ‘NB_ORA_CLIENT=clnt.acme.com,NB_ORA_SID=ODSP112,NB_ORA_SERV=srv.acme.com,NB_ORA_POLICY=ORA_IP,NB_ORA_PARENT_JOBID=847,NB_ORA_SCHED=3h’;
SEND ‘NB_ORA_PLOG=BRMJBD5’;
BACKUP
FORMAT ‘arch_d%d_u%u_s%s_p%p_t%t’
ARCHIVELOG
ALL;

RELEASE CHANNEL ch00;

# Control file backup

ALLOCATE CHANNEL ch00
TYPE ‘SBT_TAPE’
PARMS ‘SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64’;
SEND ‘NB_ORA_CLIENT=clnt.acme.com,NB_ORA_SID=ODSP112,NB_ORA_SERV=srv.acme.com,NB_ORA_POLICY=ORA_IP,NB_ORA_PARENT_JOBID=847,NB_ORA_SCHED=3h’;
SEND ‘NB_ORA_PLOG=BRMJBD5’;
BACKUP
FORMAT ‘ctrl_d%d_u%u_s%s_p%p_t%t’
CURRENT CONTROLFILE;
RELEASE CHANNEL ch00;
}

Netbackup 7.6 KMS & FIPS

Netbackup 7.6GA show this message when configuring KMS (Key Management Server) for the first time:

KMS can be configured to run in FIPS or non-FIPS mode. Please make certain you have read the documentation to understand the differences and best practices for these two modes.

Also a configured keygroup look like FIPS compliance is supported.

# nbkmsutil -listkeys -kgname ENCR_acme
Key Group Name : ENCR_acme
Supported Cipher : AES_256
Number of Keys : 1
Has Active Key : Yes
Creation Time : Mon Jan 6 17:33:47 2014
Last Modification Time: Mon Jan 6 17:33:47 2014
Description : –

Key Tag : xxx
Key Name : acme_encryption
Current State : ACTIVE
Creation Time : Mon Jan 6 17:37:31 2014
Last Modification Time: Mon Jan 6 17:37:31 2014
Description : –
FIPS Approved Key : No

Solution:
Both text messages should have been removed from the Netbackup 7.5 GA version. The code needed to support FIPS is at the time of writing not implemented in Netbackup. A Etrack has been opened to remove the text until FIPS support has been added to Netbackup KMS.