Sharing what I know – Page 4 – A Netbackup Admin hideout

Converting MS SQL’s NBIMAGE to Netbackup backupid.

This small article show how to find Netbackup’s backupid from SQL server’s NBIMAGE
This is the NBIMAGE id from SQL Server: NBIMAGE “SQLDKBA057.MSSQL7.SQLDKBA057.db.portus1_SITE.~.0.001of001.20070407082742..C”
First we do a bpflist from the master server. bpflist is not very known but very usefull. Narrow down the time frame as much as possible.

# bpflist -d 04/5/2007 -e 04/07/2007 -policy NT_SQL -client sqldkba057.corp.novocorp.net -U

Client: sqldkba057.corp.novocorp.net
Policy: NT_SQL
Backup ID: sqldkba057.corp.novocorp.net_1175927291
Backed up: Sat Apr 07 2007 08:28:11 (1175927291)
Software Version: ?
Policy Type: MS-SQL-Server
Schedule Type: UBAK
Version: 7
Keyword: ?
Num Files: 4
Files:
FN=1 L=698674688 PL=76 DL=151 BK=0 II=1 RS=0 GB=141733920801 DN=-1 P=/SQLDKBA057.MSSQL7.SQLDKBA057.db.portus1_SITE.~.0.001of001.20070407082742..C D=33152 spsservices spsservic
es 698674688 1175929566 1175929566 1175929566 1 12 13 0 14 0 0 3 1175927278 1 2 MSSQL Client NetBackup App MSSQL_DATA Mdb
FN=2 L=0 PL=76 DL=153 BK=70570619 II=1 RS=0 GB=0 DN=-1 P=/SQLDKBA057.MSSQL7.SQLDKBA057.db.portus1_SITE.~.0.001of001.20070407082742..C D=33152 spsservices spsservices 0 1175929
566 1175929566 1175929566 1 12 13 7 17 0 0 3 1175929566 2 2 MSSQL Client NetBackup App PRIMARY MSSQL_METADATA_FG
FN=3 L=0 PL=76 DL=198 BK=70570623 II=1 RS=0 GB=0 DN=-1 P=/SQLDKBA057.MSSQL7.SQLDKBA057.db.portus1_SITE.~.0.001of001.20070407082742..C D=33152 spsservices spsservices 0 1175929
566 1175929566 1175929566 1 12 13 35 20 0 12 3 1175929566 3 2 MSSQL Client NetBackup App D:\SqlData\Data01\\portus1_SITE.mdf MSSQL_METADATA_FILES portus1_SITE
FN=4 L=0 PL=76 DL=207 BK=70570627 II=1 RS=0 GB=0 DN=-1 P=/SQLDKBA057.MSSQL7.SQLDKBA057.db.portus1_SITE.~.0.001of001.20070407082742..C D=33152 spsservices spsservices 0 1175929
566 1175929566 1175929566 1 12 13 38 22 0 16 3 1175929566 4 2 MSSQL Client NetBackup App D:\SqlData\Log01\\portus1_SITE_log.LDF MSSQL_METADATA_LOGFILE portus1_SITE_loge>

Watch the FN= line and match the NBIMAGE id with the text string in blue. If there is a match, you have the backup id, if not there may be multiple database backups in the time frame you specified . Keep looking.

A Netbackup admin test toolbox

As a Netbackup admin, solving other problems may not be a stranger to you. Most seen is very bad backup performance. This article describe the test tool i use in my day to day work.

vxbench
A utility created by Symantec (previous Veritas). It’s splendid tool for finding bad disk performance, read or write. Works best on VXFS file system (obvious). Vxbench has different workload built in (seq read/write – random read/write) and you can specify block size as well. I always check new disk storage unit with vxbench before putting them in production. Vxbench is available for Solaris, AI, HP-UX and Linux – The package is called VRTSspt and can be downloaded from Symantec site.

Examples:
vxbench -w write -i iosize=128,iocount=262144 /diskstu4/dsu/testfile1

output:
total: 111.531 sec 300852.32 KB/s cpu: 48.65 sys 0.04 user

You can get VRTSspt package from here: http://www.symantec.com/docs/TECH27451

tcpdump
Whenever a firewall closes inn on you, tcpdump is you’re find. You don’t need to understand all the stuff, it’s reasonable easy to see connections in and out.

Netbackup only uses port 1556 and 13724 (for backward compatibility)

Here is a list of my most often used tcpdump commands. I always use the following arguments

-i To specify what interface to listen to e.g. eth6

-f Causes tcpdump to print internet addresses in numerical notation

-n Prevent service port to get translated into names (prints 1556 instead of VRTSpbx).

Listen for traffic for a entire network
# tcpdump -n -f -i eth6 net 10.10.10.0/24

Listen for traffic for just one host
# tcpdump -n -f -i eth6 host 10.1.1.1

Or just one service port.
# tcpdump -n -f -i eth2 port ssh

# You can also trace traffic for two host on a IP only layer.
tcpdump -n -f -i eth1 ip host 10.224.13.1 or 10.224.13.2

Listen for traffic but don’t clutter the picture with your’e own SSH traffic
# tcpdump -n -f -i eth5 ip and not port 22

Using Netbackup bpbkar as test tool
You can run bpbkar (the process responsible for reading from disk) by hand to see how performance is when network/tape drive layer is cut off. When issuing bpbkar by hand data is read from disk and thrown in the bit bucket. This will enable the admin to find out whether the problem is on the client side or server side.

# Windows
[INSTALL_PATH]\NetBackup\bin\bpbkar32.exe -nocont D:\ 1> nul 2> nul

# Unix
/usr/openv/netbackup/bin/bpbkar -nocont -nofileinfo -nokeepalives /var > /dev/null 2> /tmp/file.out

Make sure you have created the bpbkar debug directory in [INSTALL_PATH]\NetBackup\logs before starting. Command is return immediately, but the process will be visible in task manager, and the debug log will grow in size as well.

if running bpbkar manual takes the same time as a “real” backup to tape or disk you know the problem is on the client.

if bpbkar run by hand takes the same amount of time as a real backup, you know the problem is on the client and know where to chase the next bottleneck.

nbpercheck
Netbackup has a disk performance test tool included.

The tool is describe in this tech note:
http://www.symantec.com/docs/HOWTO94369

ACSLS volume access control

Oracle ACSLS volume access control is a very useful feature when sharing a tape library between multiple hosts. Normal ACSLS operation allows all host to see all tapes – this is not wanted if you have multiple Netbackup domain attached, as tape may be overwritten because Netbackup “greedy” design of using tapes available . Careful configuration of host application may avoid this scenario – But this solution is vulnerable to errors. ACSLS’s volume access control feature add an extra layer of security. This guide is intended as a “configuring guide” explaining in details how to configure.

Step 1: Enable volume access control by starting acsss_config option 4 ” Set Access Control Variables”.

Answer TRUE to “Access control is active for volumes”

Answer NOACCESS to “Default access for volumes ACCESS/NOACCESS”.

Step 2: Go to /export/home/ACSSS/data/external and edit file vol_attr.dat. This file specify what ranges of tapes are owned by who. The owner is a definition, not a host.

Sample of vol_attr.dat – each field is delimited by a pipe sign |
000000-019999|ob-nile||force|
200000-299999|ob-nile||force|
300000-399999|ob-nile||force|
500000-599999|ob-main||force|
D20000-D39999|ob-triton||force|
D40000-D49999|ob-triton||force|
D10000-D19999|ob-proteus||force|

Field 1: tape rang – Specify a range. Cleaning tapes live their own lives – you can’t set ownership on them.

Field 2: Owner of tape range (definition not host name). In this example ob means “owned by” – the last part is the master server name. But the same can be anything – just be careful with special charters. Some version of ACSLS have problem with underscore sign “_”.

Field 3: pool id – not use at our site.

Field 4: force or blank. This option allow ACSLS to override previous volume owner ship. I recommend settings this field to “force”.

Field 5: move-to-lsm (not use at our site). Here you can define a home LSM for the defined volume series. We let our tapes flow freely so this field is blank.

Step 3: Go to /export/home/ACSSS/data/externa/access_control/. Edit the file internet.addresses – This file converts IP addresses to names. The names do not need to be a DNS style conversion but I highly recommend it’s kept that way. Add all host that will do mount/dismount requests.

Sample from internet.addresses (shorted for easy reading):

10.1.1.1 main
10.1.1.2 triton
10.1.1.3 proteus
10.1.1.4 congo
10.1.1.5 tyne

Step 4: edit users.ALL.allow – This file decides which host defined internet.addresses are allow to see tape ranges defined in vol_attr.dat. Specify all servers who are allowed to share/see the same tapes.

Sample of users.ALL.allow
ob-nile donau ganges mekong volga tyne hudson oder
ob-triton triton atlas gaia nyx rhea
ob-main congo gobi klat darwin indus

You can read the file as tapes owned by “ob-nile” are accessible to hosts “donau ganges mekong volga tyne hudson and oder”. All other tapes series are filter by ACSLS.

Step 5:Type the command acsss_config and chose option 6 – Rebuild Access Control information. Do a ps -ef and check the process watch_vols is stared.

Step 6: From now on all tapes entered through the cap will have have permissions set by ACSLS. Tapes already in LSM will need to have permission set by admin. From cmd_proc do a:

set owner {owner } volume {barcode star}-{barcode_end} or real world example “set owner “ob-nile” volume 000000-199999″

Tapes not matched in vol_attr.dat will be owned by SYSTEM. You can see volume ownership by issuing the command:

# /export/home/ACSSS/bin/volrpt -d -f /export/home/ACSSS/data/external/volrpt/owner_id.volrpt

Step 7: Keep an eye on acsss_event.log if a mis-configuration prevent tape mount/dismounts. A error message similar to the one below is displayed:

16:15:31 29-09-2008 QUERY[0]:
728 N cl_ac_vol_access.c 1 265
cl_ac_vol_access: Volume Access Denied
Command , Volume <500499>, Host ID <10.1.22.102>, Access ID <>

A monitoring routine should be implemented for tape without a ACSLS owner. This can happen when multiple event occur at the same time

NB_ORA_PLOG

There is a new environment variable in Netbackup 7.6 called NB_ORA_PLOG. The variable is not documented in Symantec NetBackup 7.6 for Oracle Administrator’s Guide.

My curiosity never the less wanted an answer to what the new variables was used for. This is the answer I got:

NB_ORA_PLOG parameter is a NetBackup internal mechanism to identify the progress log to be shared/referenced by multiple processes involved in template/Guided Recovery/Intelligent Policy operations. Users should never configure this setting, hence the reason it is not included in the NetBackup for Oracle Admin Guide.

So now you know 🙂

ddboost storage-unit show compression

Be aware that “ddboost storage-unit show compressions {storage-unit}” only show information about 16384 backup files. According to EMC this is “by design”.

Using DataDomain OS 5.3 or newer you can use this command instead that will display all files in a Mtree:
# filesys show compression /data/col1/{mtree name} recursive no-sync

Iperf & Data Domain

Having a good IP connection to your data domain is vital for good operation. Once excellent tool to either relax or concern you, is the Ipef tool. This is a small guide on how to use this great tool.

Prerequisite:
Backup server must have iperf installed. For Red Hat Linux you can install the package with:
yum install iperf.x86_64

Logon on to DDOS by ssh and run the following command:
# net iperf server run

On the backup server run:
# iperf -c {DNS name of Data Domain} -t 10 -i 2

-t indicates to run iperf for 10 seconds
-i 2 will cause iperf to report in 2 sec intervals.

This will output something like:
------------------------------------------------------------
Client connecting to dd01.acme.com, TCP port 5001
TCP window size: 512 KByte (default)
------------------------------------------------------------
[ 3] local 10.1.1.1 port 10066 connected with 10.1.1.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 2.0 sec 1.20 GBytes 5.14 Gbits/sec
[ 3] 2.0- 4.0 sec 1.18 GBytes 5.05 Gbits/sec
[ 3] 4.0- 6.0 sec 1.17 GBytes 5.04 Gbits/sec
[ 3] 6.0- 8.0 sec 1.18 GBytes 5.07 Gbits/sec
[ 3] 8.0-10.0 sec 1.18 GBytes 5.06 Gbits/sec
[ 3] 0.0-10.0 sec 5.90 GBytes 5.07 Gbits/sec

This show there is 5Gbit of available network bandwidth, and its safe to conclude no network issues exist between backup server and Data Domain appliance

Update: the iperf client and server is not compatible with version 3 of Iperf

Setting the Instance or Database field in Netbackup 7.6 activity from RMAN

You may have noticed the new field called “Instance or Database” in Netbackup 7.6 activity monitor. This field will populate with the SID during backup using Intelligent Policies. But the joy does not stop here – you can also populate this field using custom scripts. Just set NB_ORA_SID= during the RMAN send in the RCV script.

This is a sample script - database SID is NMATEST:

# -----------------------------------------------------------------
# RMAN command section
# -----------------------------------------------------------------
RUN {
ALLOCATE CHANNEL ch00
TYPE 'SBT_TAPE'
PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_CLIENT=ora1.mass.dk,NB_ORA_SID=NMATEST,NB_ORA_SERV=srv1.mass.dk,NB_ORA_POLICY=ORA_MANUAL,NB_ORA_SCHED=daily';
BACKUP
INCREMENTAL LEVEL=1
FORMAT 'bk_d%d_u%u_s%s_p%p_t%t'
DATABASE;
RELEASE CHANNEL ch00;
# Backup Archived Logs
sql 'alter system archive log current';
ALLOCATE CHANNEL ch00
TYPE 'SBT_TAPE'
PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
ALLOCATE CHANNEL ch01
TYPE 'SBT_TAPE'
PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_CLIENT=ora1.mass.dk,NB_ORA_SID=NMATEST,NB_ORA_SERV=srv1.mass.dk,NB_ORA_POLICY=ORA_MANUAL,NB_ORA_SCHED=daily';
BACKUP
FORMAT 'arch_d%d_u%u_s%s_p%p_t%t'
ARCHIVELOG
ALL;
DELETE ARCHIVELOG ALL BACKED UP 2 TIMES to DEVICE TYPE sbt;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
# Control file backup
ALLOCATE CHANNEL ch00
TYPE 'SBT_TAPE'
PARMS 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
SEND 'NB_ORA_CLIENT=ora1.mass.dk,NB_ORA_SID=NMATEST,NB_ORA_SERV=srv1.mass.dk,NB_ORA_POLICY=ORA_MANUAL,NB_ORA_SCHED=daily';
BACKUP
FORMAT 'ctrl_d%d_u%u_s%s_p%p_t%t'
CURRENT CONTROLFILE;
RELEASE CHANNEL ch00;
}

Clustered Netbackup 7.6 master server softlinks

Clustered Netbackup 7.6 Master Servers uses undocumented link that will cause issues when performing catalog restore, if the nbu_server resource is either stopped or failed state.

VCS creates a link upon startup from /usr/openv/var/global to /opt/VRTSnbu/var/global and removes it again when nbu_server resource is offlined.

Before performing any restores of the NBDB/EMM database create the link manual:

# cd /usr/openv/var
# ln -s /opt/VRTSnbu/var/global global

Catalog restores will then work according to the manual in the troubleshooting normal.

Netbackup FT debuggin

On the SAN client set the DEBUG logging

vxlogcfg -a -p 51216 -o 200 -s DebugLevel=6 -s DiagnosticLevel=6
vvlogcfg -a -p 51216 -o 137 -s DebugLevel=6 -s DiagnosticLevel=6
vxlogcfg -a -p 51216 -o 156 -s DebugLevel=6 -s DiagnosticLevel=6

2: Stop and start “nbftlcnt” services.

3: Capture debugging logging from the SAN client console.

nbftclnt -console “monitor
Successful discovery:
DeviceInquiry: EVPD Page 0x83 “SYMANTECFATPIPE 0.0 tamar”
GetScsiAddress:GetScsiAddress: m_DeviceName = (/dev/sg435)
AddDevice:/dev/sg435
Inquiry “SYMANTECFATPIPE 0.0 tamar”
TargetHBA:LUN:InitiatorHBA = 0:0:0x10 State = 1 RefCount = 0
ClosePTDeviceHandle:/dev/sg435 m_HandleOpenCount 0
DeviceInquiry: EVPD Page 0x83 “SYMANTECFATPIPE 0.1 tamar”
GetScsiAddress:GetScsiAddress: m_DeviceName = (/dev/sg436)
AddDevice:/dev/sg436
Inquiry “SYMANTECFATPIPE 0.1 tamar”
TargetHBA:LUN:InitiatorHBA = 0:1:0x10 State = 1 RefCount = 0
ClosePTDeviceHandle:/dev/sg436 m_HandleOpenCount 0
DeviceInquiry: EVPD Page 0x83 “0”