14 Day Free Trial - Eve Online

List your corp on this site. Just sign up to the site and you will be given access to post.
You will be able to add text upload a logo and add a link to your site if you have one.
Make sure to pick a relevant category for your corp.

progress update: restoring tranquility to tranquility

Add comments

Since November 25th 2009, a few days before the deployment of Dominion, we have experienced frequent unscheduled reboots of Tranquility. Almost all of those were due to a bug in the networking subsystem that causes the SQL Server to fail over.

Why, oh, why and what are you doing about it?

As soon as the first failover happened, we, per our policy, opened up a support case with the vendor regarding the incident, since our logs, surprisingly, showed nothing. Their response was that the problem had been caused by a race condition in the system.

We have worked closely with the vendor’s support and development teams in an attempt to isolate the bug, collected vast amounts of diagnostic data and implemented changes that were considered potential solutions by the vendor. We believe we’ve found a workaround that makes it unlikely that the bug is triggered, but does not 100% prevent it. This has yet to be confirmed however.

As one can imagine it is difficult to diagnose a running, high performance production environment like ours without causing lag or other performance or reliability problems. The vendor has been working diligently to attempt to reproduce this issue in their lab, although collecting diagnostic data from similar systems presents a major challenge – doing so without negatively impacting performance levels for customers.

We do have programmers and virtual world system administrators working on putting together a test script to run on the database server we use for Singularity and Multiplicity, and if we are able to reproduce the issue there, we can supply our vendor with code that reproduces the problem in their lab.

I, personally, have been spending quite a large part of my work hours the last 3 months communicating directly with the vendor, collecting diagnostics data, setting up collection tools and working on things related to solving the SQL Server issues.

In short, we are using all the resources at our disposal to resolve this issue. It is a high priority issue for all parties involved as it affects not only our system and customers, but can affect equally massive systems and user bases using similar network and database solutions.

What have we done already? What do we know?

We know that problem lies in the TCP stack and likely has something to do with handling of closed or closing sockets. Our vendor has asked us to implement a few potential fixes or workarounds. We’ve adjusted various networking features and upgraded our SQL Server engine with a version that has a workaround for issues of this nature. The database handler in the EVE application server uses session pooling and we’ve experimented with changing various settings there. Turning off recycling of idle sessions seems promising as a workaround that makes triggering the bug less likely.

We still are working toward a fix, as I said before, and we seem to be able to make the failovers happen less frequently with the latest workaround. Expect to hear more in the near future on our progress with this issue.

Misc News March 3rd 2010

Leave a Reply

Powered by Sweet Captcha
Verify your real existence,
Drag the coins in the piggy bank.
  • captcha
  • captcha
  • captcha
  • captcha



14 Day Free Trial - Eve Online

www.Dust514-Fan.co.uk Website
EVE Online, the EVE logo, EVE and all associated logos and designs are the intellectual property of CCP hf. All artwork, screenshots, characters, vehicles, storylines, world facts or other recognizable features of the intellectual property relating to these trademarks are likewise the intellectual property of CCP hf. EVE Online and the EVE logo are the registered trademarks of CCP hf. All rights are reserved worldwide. All other trademarks are the property of their respective owners. CCP hf. has granted permission to www.eve-online-fan.co.uk to use EVE Online and all associated logos and designs for promotional and information purposes on its website but does not endorse, and is not in any way affiliated with, www.eve-online-fan.co.uk CCP is in no way responsible for the content on or functioning of this website, nor can it be liable for any damage arising from the use of this website.

Theme courtesy of Podlogs.com | Pluggit.org.