Netbackup Admins test tools

As a Netbackup admin, solving other problems may not be a stranger to you. Most seen is very bad backup performance. This article describe the test tool i use in my day to day work.

GEN_DATA

Netbackup is able to generate huge amount of random data generated in memory to performance test underling hardware via a file directive. It’s a quite a hidden gem.  The GEN_DATA file directive work on UNIX and Linux- Sorry no Windows.

See tech note:
Documentation: How to use the GEN_DATA file list directives with NetBackup for UNIX/Linux Clients for Performance Tuning

vxbench

A utility created by Symantec (previous Veritas). It’s splendid tool for finding bad disk performance, read or write. Works best on VXFS file system (obvious). Vxbench has diffrent workload built in (seq read/write – random read/write) and you can specify block size as well. I always check new disk storage unit with vxbench before putting them i production. Vxbench is available for Solaris, AI, HP-UX and Linux – The package is called VRTSspt and can be downloaded from

Symantec site.

Examples:

vxbench -w write  -i iosize=128,iocount=262144 /diskstu4/dsu/testfile1

output:

total:  111.531 sec  300852.32 KB/s  cpu: 48.65 sys 0.04 user

ttcp

A freeware java based network performance utility. Can be run on any OS with a Java VM. Can be obtained from Netcordia.  In order for reliable figures the amount of data send/received must be tweaked.

On the reciver side: java ttcp -r -l 65536 -n 16384

And on the reciver side : java ttcp -l 65536 -n 16384

Output look like this:

Transmit: buflen= 65536  nbuf= 16384 port= 5001
Transmit connection:  Socket[addr=lena/10.1.22.134,port=5001,localport=59154].
Transmit: 1073741824 bytes in 10913 milli-seconds = 98391.08 KB/sec (787128.6 Kbps).

tcpdump

Whenever a firewall closes inn on you, tcpdump is you’re find. You don’t need to understand all the stuff, it’s reasonable easy to see connections in and out.

Here is a list of my most often used tcpdump commands. I always use the following arguments

-i  To specify what interface to listen to eg. eth6

-f Causes tcpdump to print internet addresses in numerical notation

-n Prevent service port to get translated into names (prints 13720 insted of bpcd).

Listen for traffic for a entire network

# tcpdump -n -f -i eth6  net  10.10.10.0/24

Listen for traffic for just one host

# tcpdump -n -f -i eth6 host 10.1.1.1

Or just one service  port.

# tcpdump -n -f -i eth2 port ssh

# You can also trace traffic for two host on a IP only layer.

tcpdump -n -f -i eth1 ip host 10.224.13.1 or 10.224.13.2

Listen for traffic but don’t clutter the picture with your’e own SSH traffic

# tcpdump -n -f -i eth5 ip and not port 22

Using Netbackup bpbkar as test tool

You can run bpbkar (the process responsible for reading from disk) by hand to see how performance is when network/tape drive layer is cut off. When issuing bpbkar by hand data is read from disk and thrown in the bit bucket. This will enable the admin to find out weather the problem is on the client side or server side.

# Windows

d:\VERITAS\NetBackup\bin\bpbkar32.exe -nocont  D:\  1> nul 2> nul

# Unix

/usr/openv/netbackup/bin/bpbkar  -nocont -nofileinfo -nokeepalives /var  > /dev/null 2> /tmp/file.out

Make sure you have created the bpbkar debug directory in  C:\Program Files\VERITAS\NetBackup\logs before starting. The command is return immediately, but the process will be visible in task manager, and the debug log will grow in size as well.

if bpbkar run by hand takes the same amount of time as a real backup, you know the problem is on the client and know where to chase the next bottleneck.

Deleting an Application Rollback shadow image

The normal vssadmin command can’t delete shadow copy’s of type “ApplicationRollback”. To delete the shadow image get the Volume Shadow Copy Service SDK from Microsoft using Windows 2003. Windows 2008 and newer has the command built-in .

The vshadow command can do very powerful stuff – indeed a very interesting command.

N:\>vssadmin list shadows
vssadmin 1.1 – Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001 Microsoft Corp.

Contents of shadow copy set ID: {047a3cb0-04fe-4298-bfe9-0124ec79410b}
Contained 1 shadow copies at creation time: 8/27/2008 2:51:14 PM
Shadow Copy ID: {0d456a73-e8f4-4695-b0dd-59e55c190753}
Original Volume: (D:)\\?\Volume{8a9334e5-c416-11dc-95ba-806e6f6e6963}\
Shadow Copy Volume: \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy1
Originating Machine: appdkba795.acme.com
Service Machine: appdkba795.acme.com
Provider: ‘Microsoft Software Shadow Copy provider 1.0’
Type: ApplicationRollback
Attributes: Persistent, No auto release, Differential

Run vshadow command and delete all snaps – text below for reference.

D:\vss_tools>vshadow -da

VSHADOW.EXE 2.2 – Volume Shadow Copy sample client
Copyright (C) 2005 Microsoft Corporation. All rights reserved.

(Option: Delete all shadow copies)
This will delete all shadow copies in the system. Are you sure? [Y/N] y

– Setting the VSS context to: 0xffffffff
– Deleting shadow copy {0d456a73-e8f4-4695-b0dd-59e55c190753} on \\?\Volume{8a9334e5-c416-11dc-95ba-806e6f6e6963}\ from provider {b5946137-7b9f-4925-af80-51abd6
0b20d5} [0x00020009]…

Alternate file location using install_client_files

I bet we all have tried this, you need to install a Netbackup client, just to discover that there is no place in /tmp. What now ??. Well, there is hope. With a little script hack it possible to let the “install_client_files” script install files in another directory than /tmp.
  • Identify client you want to push
  • Go to /usr/openv/netbackup/client/{HW}/{OS}/ for a SUSE2.6 that would be /usr/openv/netbackup/client/Linux/Linux/SuSE2.6
  • Edit file install_client
  • Find variable DEST_DIR=/tmp/bp.${pid} on line 891 (as of 6.5.4), and change the destination folder e.g. DEST_DIR=/var/bp.${pid}
  • Save file.
  • Push agent
  • Revert change back

How to use a LUN larger than 2TB with Veritas Volume Manager (VXVM)

If you want to use LUN’s larger than 2TB with VXVM on a LINUX host you need to use parted instead of fdiske to create partitions. fdisk is limited to 2TB partitions. The CDS feature is a no-go as well

We assume disk is sdu. Before VxVM can use the disk, it must have a GPT label

# parted /dev/sdu mklabel gpt (the change is instant !!!!)

The default format for vxdisksetup is CDS format, we can’t use so we revert to the old format:

# /etc/vx/bin/vxdisksetup -i sdu format=simple

From here it’s business as usual 😀

Some links to Symantec support pages (will open a new browser window):

Initializing a LUN that is greater than 2TB creates usable space equal to, or less than 2TB with Volume Manager 5.0 on Red Hat Enterprise Linux 4 Update 3 or higher.

Unable to process. Duplication session in progress. return value = [134]

From time to time the the nbstlutil command may return the message:

Unable to process. Duplication session in progress. return value = [134]

When activating or inactivating Storage Lifecycle Policies (slp). Since the nbstlutil command always return status code 0 this is quite a problem if the command is part of a script. However specifying the -wait argument causes nbstlutil command to retry the command until duplication session is no longer in progress.

Please note the -wait argument is picky about where it’s specified.

This command will retry/wait

# nbstlutil -wait inactive -lifecycle {slp_name}

But this doesn’t

# nbstlutil inactive -lifecycle {slp_name} -wait

Are your’e PCIe card running at full speed ?

Are youre PCIe card running at full speed?

Its pretty simple to check if you are running Linux. Just at a command prompt type

# lspci -vv

The command will return a list of PCI-Express devices along with the supported and actual speed/width.

Any difference in the supported and actual speed (or width) should trigger consideration whether or not to reconfigure the HBA layout.

This HBA run isn’t properly configured, the current PCIe slot has a width of 4 but the card actual REQUIRE a x8 slot :

4c:00.0 Ethernet controller: NetXen 10G Ethernet PCI Express (rev 25)
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 0
Link: Latency L0s <64ns, L1 <1us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x4

Sample of a HBA running as intended :

4f:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express
Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 0
Link: Latency L0s <4us, L1 unlimited
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch+
Link: Speed 2.5Gb/s, Width x4

Status code 50 – nbrb status: NBRB deallocated orphaned resources

Symptom:
Backup jobs spanning media exit with status code 50 “client process aborted” – Error nbrn status: RB deallocated orphaned resources. Master and media servers are separated by firewall.

Cause:
When a application opens a connection (socket) it uses the OS keepalive option. However the default keepalive value may be larger than the firewall’s tcp_close_interval. If this happen, neither a master or media server will be able to reuse an exiting connection as i have been dropped by the firewall. This will result in various COBRA/COMMS failures.

Cisco firewall’s are by default configured to close inactive TCP sessions after 1 hour.

Resolution:
Set the OS to send “keep alive” at a value lower than the firewalls TCP close interval on both master and media servers.

Temporary change (Linux):
echo 1800 > /proc/sys/net/ipv4/tcp_keepalive_time

Permanent (Linux):
Add net.ipv4.tcp_keepalive_time=1800 to /etc/sysctl.conf and issue command “sysctl -p”

Netbackup version:
Netbackup 6.5.x 7.x

How to set keep alive on other OS:
DOCUMENTATION: COMM_FAILURE as a consequence of reusing a transport that has been inactive across a firewall

Remove null byte from bplist output

The output from bplist terminate each line with a null byte . This null byte is not direct visible on the console, but may show  as a ^@. Also a od -c (UNIX tool) indicate each line is terminated with \0\n.

Some regular expression will not work correctly because of this extra non-visible character. Particular those commands that must match end of line (e.g file extensions).

To resolve the issue use sed to strip off the null byte:

/usr/openv/netbackup/bin/bplist -B -C $CLIENT -R -t 13 / | sed ‘s/\x0//g’

you can also use a perl one liner (much faster)

/usr/openv/netbackup/bin/bplist -B -C $CLIENT -R -t 13 / | perl -pe ‘s/\x0+$//’

Regular expression will now work as intended.

Error in opening VxPBX CLI resource bundle

Symptom:
When starting Symantec PBX error message “Error in opening VxPBX CLI resource bundle” is returned. Running “vxlogview -p 50936 -i 103 -d all -t 00:10:00” return:

11/18/11 15:36:43.841 [Application] VxICS 50936 103 PID:29866 TID:140051967465216 File ID:103 [No context] [Error] V-103-6 Error: Invalid argument.

Cause:
Incorrect link in /etc/init.d/

/etc/init.d/vxpbx_exchange – Link point to /opt/VRTSpbx/bin/pbx_exchange (which is a binary)

but should point to

/opt/VRTSpbx/bin/vxpbx_exchanged (which is a script)

Note the difference highlighted in red

Resolution:
Remove link in /etc/init.d and create the correct one
# ln -s /opt/VRTSpbx/bin/vxpbx_exchanged /etc/init.d/vxpbx_exchanged

Netbackup and the ACSLS firewall feature.

The intent with this document, is to show how the ACSLS firewall feature is configured in Netbackup. To be true, both Symantec/Veritas and SUN/Storagetek has done a really sloppy job documenting how to implement this feature. I hope this page helps.

Firewall compliant ACSLS is supported from Netbackup 4.5 with FP9 and forward. The firewall compliant feature uses port 30031/tcp by default, but can be custom chosen (not that I recommend this). It is essential the firewall is configured to allow 30031 to be initiated from both side of the firewall. If you are in doubt use snoop on the ACSLS side and tcpdump on the client side to verify traffic flow.

You also need to configure the ACSLS server for firewall operations. See the ACSLS manual.

How to configure on the client side:

An existing ACSLS server named “emulator” run the old style port mapper feature. A new one server “Moldau” runs firewall compliant ACSLS.

# denotes commands.

1: Edit vm.conf. Add the following entries:

ACS_TCP_RPCSERVICE
ACS_CSI_HOSTPORT = {ACSLS_servername} 30031
ACS_SSI_INET_PORT = { ACSLS servername} 30031

From the real world:

ACS_TCP_RPCSERVICE
ACS_CSI_HOSTPORT = emulator 0
ACS_SSI_INET_PORT = emulator 0
ACS_CSI_HOSTPORT = moldau 30031
ACS_SSI_INET_PORT = moldau 3003

A “0” (zero) in the port number sets the old style RPC portmapper feature (port 110). If youre media server has multiple NICs and you plan to direct traffic to and from the ACSLS server through a NIC not connected to the default gateway, add ACS_SSI_HOSTNAME = { DNS host name to use } to vm.conf. Else traffic won’t flow correct. If you are in doubt which IP address ACSLS think it uses see /usr/openv/volmgr/debug/acssi/event.log. Look for at line like this:

[csi_rpctinit.c:433] ONC RPC: csi_rpctinit(): B2 SOCKET 3: family= 2 port=30031 IPaddr= 10.1.22.37

2: Add devices in Netbackup (else acsd won’t start any acssi daemons for the new robot).

3: Stop Netbackup on the media server – Make sure to kill all daemons not stopped. Especially acsd and acssi NEEDS to be killed for Netbackup 5.x.

4: Delete any previous registered RPC services on the Netbackup servers. A stop/start of Netbackup may not do the job. If any previous registered RPC services is found, delete them with rpcinfo -d

#rpcinfo -d 1073741824 1
#rpcinfo -d 1073741824 2

5: Start Netbackup.

6: Verify with acstest and rpcinfo -p. Look for two entries like this

#rpcinfo -p
1073741824 2 tcp 49263 <-- Old style RPC
1073741825 2 tcp 30031 <-- Firewall compliant port.

7: Issue Netbackup’s acstest like this

#acstest -r ACSLS_HOST -s SSI_SOCKET -C qserver

If you have multiple connected ACSLS server you need to specify the -s SSI socket option else you can omit the -s option. The first SSI socket runs on 13740, the next 13741 and so on.

8: Coffee or beer.

Update:

If you experience acsssi & acssel doesn’t start but acsd do, try setting the following variables in vm.conf

ACS_CSI_HOSTNAME = ACSLS_SERVER_NAME

ACS_SSI_HOSTNAME = MEDIA_SERVER_NAME