August 18, 2011

Thinking my way through a tough MS SQL Server 2000 replication problem

August 18, 2011 - Categories: geek, ibm, work

Thinking through a particularly tough replication problem at work. You see, I’ve been working on a dashboard for a small call center. Since they sometimes have really short calls, they’d like the display to be as close to real-time as possible. Cisco handles the calls and stores the information in either of two Microsoft SQL Server 2000 servers depending on the failover situation, which use merge replication to stay in sync. We have a custom script that takes the call information and adds it to our own database. Then the PHP dashboard I wrote uses JQuery to poll the server every 5 seconds and display updates.

Since our database wasn’t part of the standard configuration, I also set up SQL Server replication for it. I have little experience with Microsoft SQL Server and I found it even more stressful trying to look up steps while people were watching (ah, web conferences), so I heaved a huge sigh of relief when I saw a row replicate properly.

I originally programmed the dashboards to check only for any updates since it last checked. It turns out that I need to account for late synchronization, too. It seems to take about ten seconds, maybe more, for the information to replicate from one server. This meant that if the dashboard on the second computer had already checked for updates, it would miss the older updates that had just newly replicated, and dashboards polling the two servers would eventually get out of sync. Both servers needed to support read/write operations. The systems don’t have a static master-slave relationship. Either one of the two servers could become the primary server – the first server to receive updates – transparently, depending on the failovers.

I’m modifying the script to check for older updates. It’s a trade-off. A larger window means more transfer time, higher server load, possibly more delays. A smaller window could miss updates. If I set the dashboard to check the last 45 seconds, I get a delay of about 25 seconds from the time that a call shows up on a dashboard looking at the primary server and the time the call shows up on a dashboard looking at the secondary server. I’m reasonably happy with the code.

I also need to change the SQL Server configurations, so I’m waiting for a maintenance window and the go-ahead to do so. Anything I can learn or clarify before then will help me make the most of the limited time (and reduce stress, which is also important). Here are some things I need to properly figure out:

Working with Microsoft SQL Server 2000, I wish I had lot of these things I’m reading about, like MS SQL Server 2005’s Database Mirroring and MS SQL Server 2008’s Configure Peer-to-Peer Topology Wizard. There are these instructions for setting up nonpartitioned, bidirectional, transactional replication, but I don’t know if they’ll really get me to where I want to go, and they look complicated. The stored procedures look familiar, though, so maybe that’s how I got replication working even though the config doesn’t reflect it.

Theoretically, I should be able to just reach out into this really big company of ours and find a Microsoft SQL Server expert who can set this up properly in all of five minutes, but it’s complicated. Something’s changed, or maybe I have. I’m letting the focus on utilization get to me. I hate having to say no to other people because I’m supposed to focus on billable work, and I know it’s a hassle and a half to set up project codes for spot consulting, and I don’t feel comfortable putting other people in that position. Anyway, I’ve posted a question in a community with 500 members, and I’ll try reaching out to one of the community SMEs tomorrow.

You know how it’s a good learning experience to work outside your comfort zone? This is really far out of my comfort zone. I tried asking for a proper DBA. I didn’t get one. No amount of searching and speed-reading is going to turn me into an experienced one, at least for this project. I really hate this feeling – wanting to do right by the client, but feeling the limitations of my experience, and worrying about messing things up even more.

I’m going to see if another attempt to set up proper continuous push replication will do the trick. I’m also going to try e-mailing people to ask for their help, and see if our resource deployment managers know of anyone in between engagements who might be able to help. It’s a long shot, but if I take enough long shots, it might work.

You know, this makes dealing with cross-browser issues look like fun. But it needs to be done, and there will be rare moments like this anyway, so I might as well learn how to even more effectively ask for help in the company.

2011-08-17 Wed 20:13