Tuesday 8 November 2011

Replication Problems & Solutions

The following section completely deals with various problems faced by the real time customers and solutions provided by the Sybase.

1. Is there a way to temporarily disable a trigger for only one session?

Solution:

The 'SET TRIGGERS OFF' command is used in Replication Server and briefly described in Replication Server Adminisration Guide and Reference Manual. Use the 'SET TRIGGERS OFF' command to disable a trigger for a session. This command affects only the session in which it is issued. Triggers will still be active for other sessions. The login must have "replication role" to issue the 'SET TRIGGERS OFF' command. This setting can be toggled on and off within the same connection without requiring that the client disconnects.


2. Using the rs_subcmp utility to reconcile tables that contain IDENTITY columns. Getting the following error:

Message from server: Message: 584, State 1, Severity 16 -- 'Explicit value specified for identity field in table 'xxx' when IDENTITY_INSERT is set to OFF.

Solution:

Specifying "IDENTITY = identity_col_name" in the rs_subcmp.cfg file or using the -i parameter in the command line should resolve the problem.

For example:

PDS                =   REL125_SHAN_coriander
RDS                =   REL125_SHAN_coriander
PDB                =   adb125
RDB                =   sdb125
PTABLE             =   identity_tbl
RTABLE             =   identity_tbl
PSELECT            =   select * from identity_tbl order by col1
RSELECT            =   select * from identity_tbl order by col1
PUSER              =  sa
RUSER              =  sa
PPWD               =
RPWD               =
KEY                = col1
RECONCILE          =   Y
RECONCILE_CHECK    =   N
WAIT               =   5
NUM_TRIES          =   2
VISUAL             =   Y
IDENTITY           =  col1

3. To eliminate "empty" begin/commit pairs sent to Replication Server, should you use the "set replication off" command in DBA scripts that perform such tasks as dbcc, dump database, dump transaction, update statistics, or sp_sysmon?


Solution:

The following DBA tasks are routinely run: update statistics, dump database, dump transaction, dump transaction...truncate_only, dbcc checkcatalog, dbcc checkalloc, dbcc checkdb, dbcc tablealloc, dbcc checktable, sp_sysmon.   Only update statistics, dump database and dump tran cause "empty" begin/commit pairs to be sent to Replication Server.

4. Getting a 1028 error on a connection to a replicate database (on a DSI thread) that reports an Adaptive Server Enterprise (ASE) error 9137:

ERROR #1028 DSI EXEC(116(1) xxxxx.gremsdb) - /dsiutil.c(401)

Message from server: Message: 9137, State 2, Severity 16 -- 'LTM trunc. point not yet established with DBCC SETTRUNC().  Until the LTM trunc. point is established, changes to the object 'spxxxxx' may not be replicated.

Solution:

If replication is configured for warm standby, Replication Server will replicate the sp_setreplicate and sp_setrepproc stored procedures when you execute them on the primary dataserver.  These procedures check that the primary database (where these procedures are executed) has a valid truncation point.  There will be a valid truncation point in the active/primary side, but on the standby/replicate database, there is no valid truncation point, and the errors 9137 and 1028 are raised.

When the DSI thread executes the procedure, ASE raises a 9137 error.  However, this ASE error is assigned the error action of IGNORE, so the message is written to the Replication Server log and the DSI thread continues processing.  The error action is defined this way so that the stored procedure can get sent to the replicate side, but it is ignored.

5. Getting error 9137 in rep server log with warm standby

Solution:

This is expected. sp_...  (mark 'x' for replication) issued in active database sp_reptostandby is turned on so this is replicated to standby database. Because secondary truncation point is not valid in standby database, this message gets generated.  This IS EXPECTED and is not an error.

6. Can we configure multiple warm stanby setup with Replication Server 12.1? When would the "multiple warm standby" feature be available?

Solution:

With the versions of Replicaion Server 12.1 and below, Replication Server can manage only "1 warm standby" setup. In Replication Server 12.6, introduced a new feature that will allow configuring multiple warm standby databases.  Please go through Replication Administration Guide and chapter Multi Site Availability.

7. Is it possible to send DDL and not the data to a replicate site?

Solution:

            No.  However you can use sp_reptostandby <PDB>, ‘all | none’
                        ‘all’ – set schema and data replication
                        ‘none’ – turns of schema and data replication

or have sp_setreptable 'false' prevent tables from replicating".


8. How to get a stored procedure to replicate in the Warm Standby environment if not using the database level replication?

Solution:

There are two ways to replicate in the Warm Standby environment:

- Database level, where you run the 'sp_setreptostandby' stored procedure to mark the entire database for replication

- Object level, where you use the 'sp_setreptable' and 'sp_setrepproc' stored procedures to mark individual tables and stored procedures for replication.

Use the "sp_setrepproc proc_name, 'function'" to mark stored procedures for replication in the Warm Standby environment that is not set to replicate the entire database.

9. Cannot create a replication definition in a database. Getting the 15189 error:

Msg 15189, Level 12, State 0: Server 'repservername': Log transfer from 'dataserver.database' is not ON.

Slution:

Open the isql session into Replication Server and issue the "alter connection to dataserver.database set log transfer on" command. Then start Replication Agent and proceed to create the replication definition.

10.  Starting Replication Server. Replication Server does not start because it cannot allocate 2 MB of memory:

F. 2003/01/06 . FATAL ERROR #7035 SQT(105096:1 DIST DATASERVER.DATABASE) - t/sqtint.c(1932)
Additional allocation would exceed the memory_limit of '20971520' specified in the configuration.
T. 2003/01/06 . (74): Exiting due to a fatal error

Solution :

The default value for 'memory_limit' in Replication Server 12.0 is 20 MB. Replication Server may need more than 20 MB. This depends on the Replication Server configuration.

That Replication Server was not able to start with default 20 MB, but it started fine with 'memory_limit' set to 50 MB.

The 'memory_limit' parameter needed to be set to a higher value -- 20 MB or more. Increasing 'memory_limit' resolved the problem. Replication Server started fine.

11. Not able to see a 'where' clause for a publication via Replication Server Manager (RSM) in Sybase Central. In RSM, after adding a 'where' clause and clicking OK, reopening the 'where' tab of the article properties dialog does not display the 'where' clause ?

Use the following Replication Server commands as a workaround:

- rs_helppub with article name.
- rs_helppub [publication_name, primary_dataserver, primary_db, article_name]
- rs_helpsub with subscription name.

12. When Replication Server starts up, it reports the following errors with some SQM and/or DSI threads down:

WARNING #75 DSI(1165 ) - seful/cm.c(3058)

CT_WARNING: retryable error not CS_SV_RETRY_FAIL or CS_SV_COMM_FAIL. severity = '4', msgnumber = '84083714', text ='ct_connect(): network packet layer: internal net library error: Attempt to use invalid address handle'.

ERROR #1027 DSI(1165 ) - seful/cm.c(3070)

Open Client Client-Library error: Error: 84083714, Severity 4 -- 'ct_connect(): network packet layer: internal net library error: Attempt to use invalid address handle'.

Solution:

The 'max_thread_proc' parameter sets the limit on the number of threads allowed for each process in the operating system.

Increase the HP-UX kernel parameter 'max_thread_proc' and restart the box. The default value is 64.

13. Experiencing a problem with replicating a table which has an identity column.

Solution:

Replication Server issues the "set identity insert {on |off}" command when replicating a table with an identity column. Because of this the maintenance user MUST be the owner of the table or aliased to the "dbo" login name.

Here are the steps to set a table with an identity column for replication:

           I.    In the replication definition, define a column as IDENTITY (not numeric).

          II.    In the replicate/standby database, alias the maintenance user to the "dbo" login:
sp_dropuser maint_usr
go

                        sp_addalias maint_user, dbo
                        go

         III.    In Replication Server, resume a connection to the replicate/standby database:

                        resume connection to Replicate_DS.Replicate_DB
                        go


14. How can data in Replication Server (RS) queues be purged?

Solution:

Internal Use ONLY

a. Determine queue number and type to be purged

                        isql -Usa -P -S<REP_SERVER>
                        >admin who,sqm
                        Replicate db     nnn:0  Server.DB  (queue type is 0)
                        Primary DB      nnn:1  Server.DB  (queue type is 1)
                        Warm Standby Replication      nnn:1  Logical Connection  (queue type is 1)

b. Shutdown rep server and restart in single user mode
                                                or
                        >sysadmin hibernate_on
                        >go

c. Connect to rep server

                        isql -Usa -P -S<REP_SERVER>
                        >sysadmin sqm_purge_queue,nnn,<QUEUE_TYPE>   (type from above 0 or 1)
                        >go

 either
                        >sysadmin hibernate_off
                        >go
    or
shutdown rep server and start in normal (multi-access) mode

15. The DSI thread shuts down due to the following error in the replicate database:

Message: 10330, State 1, Severity 14 -- 'INSERT permission denied on object al_activity_log, database atlpldb, owner dbo'

Solution:

If the maintenance user is not the owner of the object or database, alias the maintenance user to the dbo or grant 'insert', 'update', 'delete', 'select' and 'execute' permissions to the maintenance user and then resume the DSI connection.

Note: If the maintenance user is not the dbo, nor aliased to the dbo nor the owner of the object, the maintenance user requires the 'replication role' in order to replicate the 'truncate table' command.

16. Received the following error message from the Replication Server: Msg 14029 Replication Agent connection for ... is not allowed because the stable queue for this source is not active. Problem is because the database that is being loaded is from the same database on a different dataserver.

Solution:

Disable rep_agent and configure it back with correct repservername, repusername, and repuserpasswd.  Specifically, if you do not care about data in the log, you can issue the following command:

sp_config_rep_agent <dbname>, 'disable'  (Or) issue the following if you wish to preserve it:
sp_config_rep_agent <dbname>, 'disable','preserve truncpt'

Then issue:

      sp_config_rep_agent <dbname>,'enable','repservername','repusername','reppassword'

use <rssd>
go

rs_zeroltm ds,db
go

use dbname
go

sp_start_rep_agent <dbname>
go

17. WARNING #6131 GLOBAL RS(GLOBAL_RS) -qm/sqmsp.c (1561)  Replication server has no partitions.

Solution:

rs_init failed while rebooting repserver.

a. Add partition
b. create user "Repserver_rsi" set password to "Repserver_rs_ps"
c. grant CONNECT SOURCE to "Repserver_rsi"
b. create user "Repserver_ra" set password to "Repserver_ra_ps"
e. grant CONNECT SOURCE to "Repserver_ra"
f.  create user "Repserver_id_user" set password to "Repserver_id_passwd"




18.  Recovered both primary and replicate. Started replication and first transaction does not replicate. Happens consistently after recover. Why?? Replicate is Oracle if that makes a difference.

Solution :

After recovering the database, the user needed to increase the gen_id by one based on what the gen_id was before the restore.  User used dbcc gettrunc to get the before gen_id and then used dbcc settrunc to set the gen_id 1 higher.  Now the first transaction is not skipped due to the Replication Server thinking it is a duplicate.  The syntax for dbcc gettrunc and dbcc settrunc can be found in the Replication Server Reference Manual.

19.  When dumping the stable queue to a file using dump_queue the file has the sql statements of the dumped transactions. When the statements hit a certain length, the sql is truncated (partial sql is printed). Is there a way to issue dump_queue and tell it to dump the full sql?

Solution:

rs_configure queue_dump_buffer_size,'nnn' in RSSD and recycle rep server


20.  How do you mark text columns for replication ?

Solution:

If there are text columns to be marked, and there is text data, create the rep defs and sp_setreptable to mark the columns.

In RS 12.6 use, sp_setrepcol table_name, {‘column_name’ | ‘null’},{do_not_replicate | always_replicate | replicate_if_canged}


21. Can Sybase Replication Server replicate a Microsoft (MS) SQL Server Database?

Solution:

Beginning with version 12, the Replication Agent enables replication between Sybase and MS SQL Server 6.5, 7.0, and 2000.

22. Is it possible to replicate ESPs (Extended Stored Procedures)?

Solution:  No

23. Getting lots of 32020 messages in the log.

Solution:

32020 mean that the table is marked for replication, but no replication definition exists for the table. sp_setreptable and sp_setreplicate are different indicators than sp_reptostandby.  Depending upon circumstances, it is possible for sp_reptostandby to be on without a warm standby to be configured.

24. If replication is stopped while the warm standby server is upgraded to ASE 12.0 but needs to be rolled back from the upgrade to ASE 11.9.2, how can the warm standby be resynchronized?

Solution:

Consider resynchronizing a warm standby db with its primary db using dump marker while replication is active.

If replication to the warm standby is backed up on an error condition, for example, when the error has been corrected, resynchronizing the warm standby with the primary eliminates the need to process backed up transactions in the stable queue.

Steps included below to dump from an 'active' primary database and load to a 'warmstandby' database.  User opts to resynch their warm standby with the active primary when RS has been down due to errors and, after the RS errors have been corrected, the time required for RS to process the backed-up transactions in the stable queues is lengthy.

The steps are based on "Using dump and load with Dump Marker" in the Replication Server Administration Guide with the additional steps of dropping the standby connection and then re-adding the standby to the 'logical connection'(logical connection being unique to warm standby) with the Dump Marker option.

Dropping the standby connection insures that only the warmstandby queues are affected.  The Dump Marker option handles all in-flight transactions.  Other replicate target databases that are in the RS system are not affected and replication remains active.

Steps to dump the Active primary database and load to the standby database:

a. Drop the physical connection to the standby;

Ø  drop connection to warmstandby_dataserver.database:

In warm standby there are no subscriptions to be dropped.  Replication definitions are dropped only when the connection to the primary is dropped, not to the standby.    

  
b. Use rs_init to add the standby to the logical connection with Dump Marker option:
           
See RS_INIT: below for prompts to add standby to logical connection     


c. Dump at the primary

d. Load at the standby

e. When the load is complete, resume connection to standby

f. After loading the database, turn of the ltm marker -

            dbcc settrunc(ltm, ignore) - on the warmstandby database

Documentation on adding the standby to the logical connection with dump marker and messages in Errorlog are shown below.

EXAMPLE OF RS_INIT TO RE-ADD THE STANDBY FROM STEP 10:

Is this a Physical Connection for Existing Logical Connection: yes  
Logical DB Setup                                        Incomplete        

LOGICAL CONNECTION FORMATION                                                

Is this an Active Connection or Standby Connection: standby                

DEFINITIONS FOR PROMPTS FROM 'REPLICATION SERVER CONFIGURATION GUIDE FOR UNIX':

Is This an Active Connection or Standby Connection?

Enter "active" if you're adding the active database for the warm standby application.

Enter "standby" if you're adding the standby database for the warm standby application.

Logical DS Name

Enter the name of the logical data server for the logical connection.  You must have already created the logical connection using the create logical connection command.


Logical DB Name

Enter the name of the logical database for the logical connection.  You must have already created the logical connection using the create logical connection command.

The remaining items in the "Logical Connection Information" section apply only if you checked "standby" for "Is This an Active Connection or Standby Connection?"

Active DS Name

Enter the name of the data server with the active database. This is the actual data server name, not the logical data server name defined for the warm standby database pair.

Active DB Name

Enter the name of the active database.  This is the actual database name, not the logical database name defined for the warm standby database pair.

Active DB SA User

Enter the login name of the System Administrator for the data server for the active database.  Replication Server uses this login name for various operations involved in setting up warm standby applications.



Active DB SA Password

Enter the password for the System Administrator login name for the active data server.


Initialize Standby Using Dump and Load?

Check "yes" if you plan to initialize the standby database with the contents of the active database using the dump and load commands.


Check "no" if you plan to initialize the standby database using the bcp program, or if you do not need to initialize the standby database.


Use Dump Marker to Start Replicating to Standby?


Check "yes" if there will be transactions executed in the active database while you initialize the standby database.

Check "no" if you plan to suspend transaction processing in the active database while you initialize the standby database.

25.  WARNING #5072 DSI(129 DSI: no transactions for database 'lucertola.qa' from origin 'precious.qa' will be applied or deleted from the queue, because loss was detected.

This condition may occur when the command rebuild queues was issued on the Replication Server.  If the queue cannot be rebuilt due to any of a number of reasons then it will go into this state.

To correct this log into the ASE that has the RSSD and use the RSSD database.  Do the following queries:

select * from rs_oqid where valid > 0
select * from rs_exceptslast where status > 0

NOTE: If either table returns any rows then either the valid column or status column must be updated back to 0.
           
            update rs_oqid set valid = 0 where valid > 0
update rs_exceptslast set status =0 where status > 0

Stop and restart the Replication Server and it come back and start replicating.

26. Steps to drop partition

Solution:

Drain the queues to make sure there is no data on the partition being dropped.  Suspend the route from primary to replicate.  On replicate repserver issue:

drop partition <partition name>

Resume route and check status of partition being dropped by executing: admin disk_space.  If it shows as being dropped, issue rebuild queues.  Recycle repserver, and check log for loss detected messages.  If there are messages, then issue ignore loss command and query the following tables:

            At RS : ignore loss from PDS.PDB to RDS.RDB

            At DS : select * from rs_oqid where valid >0
select * from rs_exceptslast where status >0

If any rows are found, shutdown both repservers and delete these rows from these tables.

27.  A way to remove ddl replication from existing warm/standby

sp_reptostandby <dbname>,'none'

29. DSI thread went down due to bad tran.  After skipping tran, data is still not replicated. (See below how customer (ct) rectified the problem)

Told ct to check admin who, sqt, for open tran.  Admin who, sqt showed open tran.  Told ct to dump the queue and then execute:

sysadmin purge_first_open <queueno>, <queuetype>
go

After purging queue, data still was not replicated.  Told ct to check RSSD:

select * from rs_oqid where valid > 0
select * from rs_exceptslast where status > 0
(returned 0 rows)

Told ct to recycle repserver; still data not replicating.

Told ct to login to repserver and issue:

rebuild queues
go

Then ran the below sql in rssd database:

select * from rs_oqid where valid > 0
select * from rs_exceptslast where status > 0

Returned two rows in primary and replicate. 

Ct shutdown both repservers and then issued:

delete rs_oqid where valid >0
delete rs_exceptslast where status >0

Then restarted both repservers.  It started replicating.

30. How to increase RS memory_limit

In order to increase the memory limit for RS, you can reconfigure 'memory limit' in the RSSD or in the RS:

a. In the RS's ASE RSSD database, use rs_configure to set the new memory value in mg:

               rs_configure "memory_limit", 200
                go
            
Reboot RS for the change to take effect. Upon reboot, RS will login to the RSSD database to determine the memory configuration.  It is not necessary to reboot the the ASE dataserver housing the RSSD database.

b.  You can change memory_limit in the RS, if you prefer, using:

                        configure replication server
                        set memory_limit to '200'

RS will log into the RSSD as the maintenance user and run rs_configure.  Reboot RS for the change to take effect.

31.  FATAL ERROR #21 DSI(965 DSS_COLRPRD1.phipldb) - dsisched.c(3525)  Open server call to routine 'srv_deletemsgq' fail Exiting due to fatal error.

num_msgqueues and num_mutex must have a value greater than num_threads. Increase the num_threads parameter because num_msgqueues and num_mutex must have a value greater than num_threads.

32. User wants to know how to check if primary and replicate tables are in sync.

RS includes a utility called rs_subcmp.  It is a stand-alone executable and basically does a select statement, ordering the data in primary key order as defined in the rep def on the primary and replicate tables and looks for differences.  See Rep server documentation (also see Question No 2 in this documentation).

Alternatively, you can:

a. BCP the data out from the two tables using -c (character mode), then go to DOS and do a FC (file check) on the two output files to see if they are the same.  On UNIX use the diff command on the two files.

b. You can do a select * from table1 UNION select * from table2 and store the results in a new table3.  You could then count(*) the rows in table1 and count (*) rows in table3.  If they are the same, table1 and table2 are identical.







33. After upgraded ASE, received the following error message when trying to run admin who,sqm: Msg 91, Level 12, State 0: Server ... : Failed to execute the '' RPC to the remote site.

Solution:

This is because the replication status of some RS stored procedures is not set correctly.  Reset replication status of the RS stored procedures in RSSD and the primary databases.

a. On the server with the RSSD database issue the following commands:

> use RSSD
> go

> sp_setrepproc rs_marker, 'false'
> go

> sp_setrepproc rs_marker, 'table'
> go

> sp_setrepproc rs_cmd_marker, 'false'
> go

> sp_setrepproc rs_cmd_marker, 'function'
> go

> sp_setrepproc rs_section_marker, 'false'
> go

> sp_setrepproc rs_section_marker, 'function'
> go

sp_setrepproc
go

It should return the following:

Name                                        Type                 Log Mode
------------------------------                  --------------          --------------
rs_marker                                  table                 log_sproc
rs_cmd_marker              function            log_sproc
rs_section_marker                     function            log_sproc


b. On the server with the primary database issue the following commands:

> use PDB
> go

> sp_setrepproc rs_marker, 'false'
> go


> sp_setrepproc rs_marker, 'table'
> go

> sp_setrepproc rs_cmd_marker, 'false'
> go

> sp_setrepproc rs_cmd_marker, 'function'
> go

> sp_setrepproc rs_section_marker, 'false'
> go

1> sp_setrepproc rs_section_marker, 'function'
> go

34. After 'switch active', DSI-EXEC gets shut down with following message:
Message: 241, State 2, Severity 16 -- 'Scale error during implicit conversion of NUMERIC value ... to a DECIMAL field.

Solution:

Steps to find the victim datarow:

Log on to Rep Server

- alter connection set batch 'off'
- trace 'on', dsi, dsi_buf_dump
- resume connection to SDS.SDB
- wait for few seconds
- trace 'off', dsi, dsi_buf_dump
- check the rep server errorlog

Reason for the error:

Discrepancy datatype between the primary and the replicate on this case, BAL_PLD_AMT is defined as float on the primary and defined as decimal(22,5) on the replicate, which is a warmstandby system.  There is an user defined function strings to rounded the scale to five decimal points.  However, the user defined function string cannot be defined for the logical connection but for the physical  connection.  So, you have to alter connection set function string for both active and standby databases.

35.  How to materialize subscription for large table without filling up stable queue.

Solution:

Simulate Nonatomic Materialization.  To simulate Nonatomic Materialization:

a. Verify that the entire replication system is working.

b. Execute the define subscription command at the replicate Replication Server.

c. Wait for the subscription to be defined at both the primary and replicate Replication Servers.  Execute the check subscription command at both the primary and replicate Replication Servers to verify that the subscription status is DEFINED.

d. Execute the activate subscription command, using the with suspension clause, at the replicate Replication Server.

e. Wait for the subscription to become active at both the primary and replicate Replication Servers. Execute the check subscription command at the replicate Replication Server to verify that the subscription status is ACTIVE.  When the subscription status is ACTIVE at the replicate Replication Server, the database connection for the replicate database has been suspended.

f. As soon as the subscription becomes active at the primary Replication Server, retrieve the data from the primary database using a select or a database dump.

g. Find the ID number (subid) for the subscription by querying the rs_subscriptions system table.

h. Execute the rs_marker stored procedure in the primary database:

            rs_marker 'validate subscription subid'

WARNING! Be sure that you execute the rs_marker stored procedure with the correct subid number for the subscription. The subid column in the rs_subscriptions system table contains the unique ID number for each subscription. Entering any other number or character string may cause serious problems.

i. Load the subscription data into the replicate database.

j. Enable autocorrection for the replication definition at the replicate database.

k. Use the resume connection command to resume the database connection for the replicate database.

l. Wait for the subscription to become valid at both the primary and replicate Replication Servers. Execute the check subscription command at the replicate Replication Server to verify that the subscription status is VALID.  Once the subscription status is VALID, the replicate data is consistent with the primary data.

m. Disable autocorrection for the replicate database.

Now the subscription is created and replication is active.

37. What understand for the replication server is quiesced?

A replication system is 'quiesced' (or 'quiescent') when all RepServers have processed all their incoming messages and delivered all outgoing messages, when there are no open transactions, and when no subscription (de)materialization is in progress. Quiescing a replication system or an
individual RepServer is sometimes required, for example to change the schema of a replicated table, or for troubleshooting. To determine whether a RepServer is currently quiesced, run 'admin
quiesce_check'. Unless all log transfer and incoming routes have been suspended, there is no guarantee that a quiesced state will be maintained."  In other words: 'unquiescing' a queisced RepServer happens automatically when a new transaction comes into the RepServer.


38. What is the exact use of rs_marker procedure ?

It is an internal mechanism that RS uses to notify itself of particular events from the primary db.  For example, when a connection is created,  it uses rs_marker in the primary db to tell it when to start replicating   - records prior to that are assumed to be materialized already (i.e.  think of a WS created via dump/load).  It is also used in some cases for subscriptions.

39. Why rs_lastcommit table is in each user database ?

Strictly speaking, rs_lastcommit has nothing to do with the RSSD. It's a table that exists in each replicate database to record the most recent committed transaction. Since the RSSD can be a replicate database itself, rs_lastcommit will be create in the RSSD as well (the same applies to
rs_threads, BTW).

40. medi_maint faded out. "faded out" mean ?

No help required, its just an information message that the connection has nothing to do, when more data arrives at the the repserver it will login again.

41. Migrating Data Off of a Stable Queue

Simplest and safest way is to add the new devices as partitions and drop the old ones - once all the data is off the old ones (it may take a while depending on the save interval) the old devices will be gone.  One way to speed up the drop is to do this after running a purge queue (assuming system is quiesced).

There probably is a slightly faster method but all would assume that the queues are identical in size to the old ones - i.e. shutdown RS, dd the devices to the new ones, change the pointers in the RSSD, and try to restart RS.....

42. How to skip "deletes" to replicated databases ?

By default, the standby connection uses rs_default_function_class, which you cannot modify. There is no supported way of changing the function string class for a standby connection (an undocumented+unsupported command actually exists, but there's probably a good reason why it isn't official -- so don't consider using it).

So the answer is no... As an alternative, you can just run 'set replication off'  before running a delete in the active DB. When finished, do 'set replication on'. Basically, anything executed in a primary (or active) DB while 'set replication off' is in effect will not be picked up by the RepAgent, so it won't be replicated.

If you're on RS 12.6, you can also use database repdefs, which are quite similar to a warm standby in many ways. When combining database repdefs with table repdefs, you can actually modify the function strings for those repdefs.

43. Stuck queue? 

Check RSSD Log full - one solution.
Fist Seg. Block.  Last Seg. Block Next Seg. Block - sqt also needs to be check.


44. How to check the transaction logs have been applied ?

Actually, you need to do a couple of things:

a. sp_help_rep_agent - check the rep agent status - if sleeping, it is caught up.

b. Then compare Next.Read to LastSeg.Block - not FirstSeg.Block  FirstSeg.Block includes save interval as well as latency due to asynchronous space management.


45.  How do you switch active to standby in warm-standby environment?

Ex : Presently Active : sybase_pri.pridb   Standby : sybase_bu.wsdb

  1. Disconnect Clients to Active Databases
  2. At RS

Switch active for sybase_pri.pridb to sybase_bu.wsdb

  1. admin logical_status
  2. restart repagent

46. How to keep transactions in stable queue for some time before the replication server or database is recovered.

If replicate replication server is down, the messages will be stored in Primary Replication Server, without discarding after sending to Replicate Replication Server (now the server is down) till the time given in save intervals.

admin who,sqm - save_int:seg  (will show the save intervals time and segment no)

Setting save Intervals:

AT Route Level :            alter route to REP_RS set save_interval to ‘60’
AT RS to DS :               alter connection to DS.DB set save_interval to ‘60’

47.  When will you take backup of RSSD ?

You should perform dumps of RSSD following any replication DDL, such as changing routes or adding replications, publications, subscriptions and configuration changes.

48.  How do you recover from partition loss or failure?

Replication Server will shuts down stable queue, if there is failure.

a.         Log on to Replication Server     

            1> drop partition logical_name
            2>

            1>add partition logical_name on ‘/export/data2/paritions/sq1’ size 100 starting at 1
            2>

.           1>rebuild queues
            2>

b.         Check replication server log for loss detection.

c.         If replication server detected message loss, you can get from:

            Message Recovery from off-line database logs.

49.  Explain Message Recovery from Off-line Database Logs.

a.         Start Replication Server in Standalone Mode (use –m in run_server file)

b.         rebuild queues

c.         Check Replication Server Logs for Message Loss.

d.         Use the date and time in the error log messages to determine which dumps to load.

e.         Enable RepAgent for a temporary recovery Database.

            sp_config_rep_agent tempdb, ‘enable’, ‘rs_name’,’rs_user_name’,’rs_password’

f.          Load the Database Dump and Log Dump into a temporary recovery database.

g.         Start RepAgent in Recovery Mode

            sp_start_rep_agent temp_db,’for_recovery’, ‘connect_dataserver’,’connect_database’,
      ’rs_name’, ‘rs_user_name’, ’rs_password’

h.         See Error Logs

i.          sp_help_rep_agent dbname, ‘recovery’

            Recovery Status: ‘not running’, ‘end of log’ means complete.
            Recovery Status: ‘initial’ or ‘scanning’ means in process.

j.          Repeate Steps f, g & h for each transaction dump.

k.          Restart Replication Server in normal mode.

l.          Restart Rep Agent for Primary DB in normal mode.

50.  How do you recover from Truncated Primary Database Logs?

Failure causes by truncating a primary transaction log before Replication Server has received the messages.

a.         Start Replication Server in Standalone mode

b.         In Replication Server, execute ‘set log recovery’ for primary Database.

                        set log recovery for sybase_lab1.pridb

c.         Execute ‘allow connection’ to all Replicate Servers to accept connection.

d.         Enable RepAgent for a temporary recovery Database.

            sp_config_rep_agent tempdb, ‘enable’, ‘rs_name’,’rs_user_name’,’rs_password’

e.         Load the Database Dump and Log Dump into a temporary recovery database.

f.          Start RepAgent in Recovery Mode

            sp_start_rep_agent temp_db,’for_recovery’, ‘connect_dataserver’,’connect_database’,
      ’rs_name’, ‘rs_user_name’, ’rs_password’

g.         See Error Logs

h.         sp_help_rep_agent dbname, ‘recovery’

            Recovery Status: ‘not running’, ‘end of log’ means complete.
            Recovery Status: ‘initial’ or ‘scanning’ means in process.

i.          Repeat Steps e, f & g for each transaction dump.

j.          Restart Replication Server in normal mode.

k.          Restart Rep Agent for Primary DB in normal mode.

51.  How do you recover from Primary Database Failure?

a.         Shutdown Rep Agent
b.         Suspend DSI Connection
c.         Load Dumps
d.         Resume DSI
e.         Restart primary RS in standalone mode.
f.          Log into primary RS and get generation no.
                        admin get_generation, DS, DB
g.         set log recovery for DS.db
h.         allow connection
i.          Log into active DB and checkpoint DB
j.          Sart Rep Agent in recovery mode.
k.          Verify Logs
l.          If loss was detected, it must be “false loss”, in RS
                        ignore loss from ADS.DB to RDS.DB
m.        Shutdown Rep Agent for ADS.DB
n.     Log into ADS
Use DB
Dbcc setrunct(‘ltm’,’gen_id’,101) (if ID generated from ‘f’ is 100 then keep 101)
o.         Restart All
p.         use rs_subcmp to compare




52.        How do you recover from RSSD failure ?

a.         Shutdown Rep. Agent
b.         Shutdown RS
c.         Load DB dump & tran dump.
d.         Restart RS in Standalone mode.
e.         In Replication Server
                        admin get_generation, DS, RSSD_DB
f.          Rebuild Queues
g.         Start Rep. Agent
h.         Check Error Log.
i.          Shutdown Rep Agent for all Primary Databases.
j.          use rssd_db
            go
            dbcc setrun(‘ltm’,’ignore’)
            go       
            dump tran rssd_name with truncate_only
            go
            begin tran commit tran
            go 40
            rs_zeroltm  DS, rssd_db
            go
k.          dbcc settrun(‘ltm’,’valid’)
l.          Start Replication Server in normal mode.
m.        Start Replication Agent


51. DSI thread is down due to duplicate rows in replicate database table.  Error log is as follows :

E. 2004/09/29 . ERROR #1028 DSI EXEC(104(1) sybase_lab1.repdb) - dsiqmint.c(2888)         Message from server: Message: 2601, State 1, Severity 14 -- 'Attempt to insert duplicate key row in object 'employee' with unique index 'employee_empno_1920006841'
'
I. 2004/09/29 . The DSI thread for database 'sybase_lab1.repdb' is shutdown.

> trace ‘on’, ‘dsi,’ dsi_buf_dump’
see RS error log and correct the problem.  If this does’t work then follow :

Steps to be followed:

at RS :
resume connection to sybase_lab1.pridb skip transaction

at DS :

use rs_RSSD
rs_helpexception - displayed xact no 107
rs_helpexception 107, v

where you will see the transaction causing the problem.  Apply this transaction manually.

52.        Permanently stopping replication from active database to warmstandy ?

a)     run sp_stop_repagent dbname, 'nowait" in active DB
b)    run sp_config_repagent dbname, 'disable' in active DB
c)     drop all the repdef's in RS for the WS setup
d)    run drop connection to active.dbname in RS
e)     run drop connection to standby.dbname in RS
f)     run drop logical connection to WS.dbname in RS
g)    run sp_config_repagent dbname, 'disable' in standby DB (repagent shouldn't be running in standby)
h)     unalias the maint login in the standby database (or drop the user)
i)      drop the maint login from the standby server
j)      unalias the main login from the active database (or drop the user)
k)     drop the maint login from the active server
l)      unmark the tables/database for both the primary and standby databases (sp_reptostandby, sp_setreptable, sp_setrepproc as appropriate   - specifying false).
m)   verify clean up in RS via admin who and admin who,sqm commands  


53. Replication Agent for active connection sybase_lab1.ws must be started with Warm Standby option -W to send all transactions.
               For an ASE RepAgent thread, the equivalent option is
               "send_warm_standby_xacts" (set with sp_config_rep_agent).

No comments:

Post a Comment