SRDF/A -Best Practices   21 comments

QUALIFICATION:The BCSD tool must be used  to size new configurations. It is recommended that all SRDF configurations, including SRDF/A,  be qualified by your EMC support representative via the SVC group.

R2 FRAME: The R2 Frame should be AT LEAST as fast as the R1.
This includes: The same amount, size, type of drives and protection schemes should be used in both the R1 and R2 for the standard volumes. If additional volumes such as BCVs are configured on the R2 side, additional drives and cache should be used. For example, if using RAID 1/0 on the source frame with 15k drives, RAID 1/0 with 15k drives should be used on the target frame. Consideration should be given to segregating standards and BCV volumes onto separate drives.

The default device write pending limit (amount of cache slots per volume) should be the same or higher in the R2 as in the R1. This may require more physical cache in the R2 than in the R1.

  • When defining CLONE on the R2, keep the clone devices on segregated drives and use the pre-copy option.
  • QOS with an initial value of 2 can be used to help reduce the copy impact.
  • SNAP is NOT ALLOWED on the R2 volumes.

BANDWIDTH: Sufficient bandwidth needs to be provided to run SRDF/A. It is imperative to understand the workload prior to configuring SRDF/A. SRDF/A can sometimes reduce the overall bandwidth by 20% over Synchronous SRDF, but it is highly dependent on the workload. It is best to keep this bandwidth reduction in reserve until the actual solution is implemented and the data can be analyzed.

Symmetrix DMX GigE adapters and some Fibre Channel switches offer compression, but the actual compression values realized are highly dependent on the data and can fluctuate drastically over the business day. For example, certain batch workloads can get better compression than online workloads. The actual compression values can be pulled off of the GigE adapters via an Inline or out of the Fibre Channel switches. Compression can help reduce the overall bandwidth required for SRDF/A, but be extremely careful when counting on compression as cycle times can drastically elongate if the compression is not being realized.

Your support representative can use the SYMMMERGE and BCSD tools to model data and correctly size a proper configuration. Bandwidth needs to be at least equal to the average number of writes entering the sub-system. This will not guarantee minimum cycle times.

If you are targeting minimum cycle times, then sufficient bandwidth needs to be configured to handle the peak number of writes entering the system. Keep in mind that we typically are using 10 or 15 MINUTE data to model 30 SECOND cycle times. A sufficient amount of cache MUST be configured to keep SRDF/A active for the period that that data was collected. In other words, if you model on 15 minute data, you must configure enough cache and bandwidth to keep SRDF/A active for 15 minutes at a minimum. Cycle times may elongate past the minimum during this period. Never guarantee minimum cycle times. Required bandwidth must be dedicated to SRDF/A. Do not share bandwidth with network traffic, tape, etc.

RA COUNT: The correct number of RAs need to be configured. There should be at least N+1 RAs, where N is the number of RAs required, so that a service action can be performed to replace an RA if necessary.

Synchronous groups and SRDF/A groups should be segregated onto their own physical adapters. Do not mix Synchronous and SRDF/A on the same adapters. Directors supporting SRDF/A should not be shared with any other SRDF solution.

Caution! When moving from a Synchronous solution to SRDF/A, in many cases we have seen the bandwidth and adapter utilization INCREASE  as a result of the overall response time to the systemdecreasing.

MONITORING: SRDF/A should be monitored during the initial roll-out to ensure that all components were properly sized and configured. Data needs to be collected via STP or WLA and then run through the tools again to verify the initial projections were correct. STP at 5×71 microcode includes SRDF/A statistics, which can be very beneficial.

Do not forget that Mainframe MSC customers have a way to monitor for issues and that is the SCF1562I and SCF1563I messages. These will tell if they are getting transmit or restore issues. The messages will also tell which box is the issue.

The SYMSTAT commands were specifically created for monitoring open systems SRDF/A, but when issued from the Service Processor on the DMX it can be quite informative regardless of whether it is mainframe or open systems.

There are three options:

  1. Cycle
  2. Requests
  3. Cache

Using different combinations of the three options can help determine what caused the CACA and you can even prevent a drop by monitoring the cache utilization closely. SRDF/A should be monitored on a regular basis to look for workload changes and to predict  increases in CACHE or BANDWIDTH due to growth.

VERIFICATION: The network should always be verified to ensure that the projected amount of bandwidth is configured. STP or WLA should be collected during the initial Adaptive Copy Synchronization to ensure that the required bandwidth is configured and that the network runs error free. Compression ratios should also be checked either at the switches or on the GigE adapters to verify that the correct numbers were used.

Upgrade or Reconfiguration: Always re-evaluate the SRDF/A solution prior to doing any upgrades or reconfigurations. This includes drive upgrades, adding volumes to the SRDF/A links or changing the front end connectivity. For example changing ESCON to FICON.

Starting SRDF/A: SRDF/A activation is considerate of cache utilization. SRDF/A will capture a delta set of writes and send them in cycles across the link. In addition to the new writes, SRDF/A will include up to 30,000 invalid tracks per cycle. This is a design feature and the 30,000 track value was chosen to prevent cache from being flooded by the invalid tracks. Therefore, EMC generally recommends as a best practice to synchronize the boxes in Adaptive Copy Disk mode to below 30,000 invalid tracks before activating SRDF/A. This will ensure that SRDF/A will become secondary consistent within a few cycles.

SRDF/A will activate with many more than 30,000 invalid tracks and in fact, some customers choose to activate SRDF/A  when they have thousands or millions  of invalid tracks. This is allowed, but only a maximum of 30,000 invalid tracks will be sent with each SRDF/A cycle. As a result, it will take many cycles before the frames are secondary consistent.

Fiber RDF Directors: Enable RF flow control. See emc152051 for a description of this feature.

Page Data Sets: Your EMC CE needs to set Enable Page Date Set Mode to YES in the IMPL.bin file to ensure synchronous replication of all page data sets. Refer to emc100913.

Configuring Delta Set Extension (DSE): See emc204521 for best practices for configuring DSE.

Notes: SRDF/A  will drop when 94% of System WP limit is reached.  There is a parameter called “Snow Cache Use” or “Max Cache Usage” limit that controls this.  This parameter can be lowered to cause SRDF/A to drop sooner. Only SRDF/A  devices count against this value. If only a subset of the devices in the DMX has SRDF/A running, then this parameter may need to be lowered.   If DSE is configured in the Frame, Engineering recommends lowering the SRDF/A “Snow Cache Use” percentage to 74%. The “Snow Cache Use” limit can be changed via Inlines, Host Component, or SymCLI.  The recommendation is to have the customer change it with their software.As of October 6, 2009 the recommendation from EMC Engineering is to lower the SRDF/A  “Snow Cache Use” percentage to 74% on all Symmetrix running SRDF/A. The Snow Cache Use setting is normally set for the R1 (Source) side since that is where the host is typically configured. But if there is ever a fail over to the R2 (Target) side you would want to set it there. So for best practices set it to the recommended 74% on both the Source and Target boxes.

Symmetrix VMAX:
 Starting at Enginuity 5874.207.166,  the SRDF/A “Snow Cache Use” percentage will automatically be lowered to 74%.
Notes: To make the changes using Solutions Enabler, create a text file with the following and use the command symconfigure:

file.txt

set Symmetrix rdfa_cache_percent=75;

symconfigure -sid XXXX -file c:\file.txt preview  (to check that the command is valid)
symconfigure -sid XXXX -file c:\file.txt commit

spacer

Determining the EMC Symmetrix Remote Data Facility Pair State

The resource status message reflects the role and state of the RDF pair. For example, the resource status and status message of Faulted Split, is reported when the RDF pair is in a Split state.

The RDF pair state is mapped to the associated resource status as described in the following table.

Table 2–2 Mapping From the RDF Pair State to the Resource Status

 
Condition Resource Status Status Message
The RDF pair state is Invalid and the pair state is not Incorrect Role. Faulted Invalid state
The RDF pair state is Partitioned and the pair state is not Incorrect Role, or Invalid. Faulted Partitioned
The RDF pair state is Suspended and the pair state is not Incorrect Role, Invalid, or Partitioned. Faulted Suspended
The RDF pair state is SyncInProg and the pair state is not Incorrect Role, Invalid, Partitioned, or Suspended. Degraded SyncInProg
The RDF pair state is R1 UpdInProg and the pair state is not Incorrect Role, Invalid, Partitioned, Suspended, or SyncInProg. Faulted R1 UpdInProg
The RDF pair state is Split and the pair state is not Incorrect Role, Invalid, Partitioned, Suspended,SyncInProg, or R1 UpdInProg. Faulted Split
The RDF pair state is Failed over and the pair state is not Incorrect Role, Invalid, Partitioned, Suspended,SyncInProg, R1 UpdInProg, or Split. Faulted Failed over
The RDF pair state is R1 Updated and the pair state is not Incorrect Role, Invalid, Partitioned, Suspended,SyncInProg, R1 UpdInProg, Split, or Failed over. Faulted Replicating with role change
The RDF pair state is Synchronized. Online Replicating

The state of the RDF pair determines the availability of consistent data in the partnership. When the state of the RDF resource on the primary or secondary cluster is Degraded or Faulted, the data volumes might not be synchronized even if the application can still write data from the primary volume to the secondary volume. The RDF pair will be in a Partitioned state and the invalid entries will be logged as the data is written to the primary volume. Manual recovery operations are required to resolve the error and resynchronize the data.

Posted May 4, 2012 by g6237118

21 responses to “SRDF/A -Best Practices

Subscribe to comments with RSS.

  1. Hi Govindagouda,

    I am reading your piece because we are running into an issue with our SRDF/A environment. Randomly, we are saturating the cache on the R2 side. The cache on the R2 side is smaller (designed by EMC) than the R1 side. This is a virtualized environment hosting only Exchange 2010. Sometimes the SRDF suspends are 3-6 weeks apart, other times only 3 days apart.

    EMC Support has been working this and is not seeing issues on the SAN side. However, your comment above about having a larger cache on the R2 side as a recommend appoach is not the case but seems sensible.

    I’m wondering what issues you’ve seen, if any, with a smaller cache on the R2 side.

    • Usually what happens is wp increases and reach point where wp is more than 75 percent of cache. At this point by design srdf get suspended

      You can look to enable DSE if not enabled

  2. This is a topic that’s close to my heart… Many thanks! Exactly where are your contact details though?

  3. Hello there! Do you use Twitter? I’d like to follow you if that would be okay. I’m undoubtedly enjoying your
    blog and look forward to new posts.

  4. lko dzięki Arnoldowi. Zbrojni nie śmieli przeglądać co nonetheless wiezie krzyżacki
    poseł,
    zwłaszcza jak huknął na nich spośród wysokości siodła.
    Prędko odstąpili odkąd wozu. Odskoczyło również 2 przerażon.

  5. Great Website Made Here! Very Educational Subject For A Website Keep Up The Amazing Work!

  6. Very Nice … Blog

  7. Thank you for the good writeup. It in fact was a amusement account it.
    Look advanced to more added agreeable from you!
    However, how can we communicate?

  8. You ought to take part in a contest for one of the most
    useful sites online. I will highly recommend
    this website!

  9. Very useful site. Thanks a million.

  10. very nicely explained and great blog thanks lot

  11. very nice submit, i definitely love this website, keep on it fbfbedefadgf

  12. Hi,

    Please I am trying to move a new device pair into a SRDF/A session and I am getting the message

    The Cache Partition setup is invalid

    here is the command I am typing

    $ symrdf movepair -sid 4315 -rdfg 2 -new_rdfg 10 -cons_exempt -f “D:\EMC_Management\Dev_SRDF\4315\rdf_create_pair_4315.txt” -nop

    An RDF ‘Move Pair’ operation execution is in progress for device
    file ‘D:\EMC_Management\Dev_SRDF\4315\rdf_create_pair_4315.txt’. Please wait…

    The Cache Partition setup is invalid

Leave a comment