Dealing with SIGSEGV in php5-fpm and Nginx

I screwed up my self-hosted WordPress blog three ways in one evening.

First, I had an infinite loop thanks to Display Post Shortcode and include_content in a post that had the same tag that I was looking for. Right. Don’t do that. Remember to remove include_content or exclude the post right before publishing.

Then I published a post without checking several times that my site was still up. I checked twice, which was apparently not enough. I should probably check five times. Or ten. Or at least a few times after restarting PHP. Since I hadn’t checked, I spent a couple of hours playing a video game with W- instead of, say, stressing out about my site. Could’ve solved the problem sooner.

And then when I stressed out about my site, I “fixed it” in entirely the wrong way. I reduced php5-fpm‘s max_children instead of increasing it. This made it worse.

I had been worrying about running out of memory when I should’ve been worrying about running out of processes.

I felt that panic-induced haze setting in, scrambling through Google and adding all sorts of bits to my config, ripping out all sorts of plugins, and wondering how I could get a coredump. When I noticed I was making things worse, I made myself stop. I took a deep breath, and started untangling what was going on. I tried the opposite of what I had been trying. That worked.

Good timing, actually. Well, it could have been better timing, but that’s one of the nice things about minimizing commitments and making good stuff for free; it lowers the risk of trying things out and learning something new.

I’m still getting SIGSEGV errors. It happens even if there are few active server processes. But at least it’s sporadic instead of constant, and 1:59 AM is not the best time to dig into something like that. After sleep and reading, perhaps.

I’m a little bit nervous about my setup, and I’ll probably set aside some time this week to dig into system administration and build my skills. I really should have a good plan for downtime, and a better plan for learning the essentials outside of a fire. But mistakes are great because they show you multiple holes in your system, so this is not too bad. Better now than when I’m managing a client site.

On the plus side, I did have the presence of mind to temporarily redirect sach.ac to the static URL, switching it back after the PHP issues seemed to have cleared. Good strategy. Should do that first in the future.

With any luck, the blog is still up today. If you’re reading this, yay!

I’m still happy that I self-host, even if I make mistakes like this. =) Good time to make mistakes and learn from them.

So, what am I going to change for next time?

  • Prepare a contingency plan for Stuff Happening, possibly involving throwing up a quick maintenance page that collects e-mails so that I can send abject apologies and link updates. Make this static so that it loads quickly, and have an external copy (ex: Dropbox) just in case my server is down and I have to redirect at the domain level.
  • Check several times after restarting PHP. Just because it loads once doesn’t mean it’s going to load again.
  • Consider redirecting URLs to static sketchnotes or external pages. Links and comments are nice, but viewing is essential.
  • Start screen right away instead of trying to juggle different commands. Set up logs and go to different configuration directories in order to minimize typing.
  • Don’t panic. Yes, website failures are embarrassing, but the nice thing about making this a gift is that I don’t have to worry about racking up the business losses. When I notice that I’m panicking (making wild guesses in terms of changes, for example), I should slow down and remind myself that This Is Not The End of The World.

Everything’s going to be all right.

  • http://gravatar.com/justinhj justinhj

    HeHe exciting night! Have you tried any of the free monitoring services such as NewRelic or set up munin so you can get a good idea what’s going wrong on your server?

  • https://plus.google.com/110405271596528803342 Archimedes Trajano

    I have to set it up myself, but I would build a staging environment for me to test configuration changes on and content before I push it up.

    • http://sachachua.com Sacha Chua

      Hmm, maybe a virtual machine image and some kind of load tester… I’ve been meaning to set up a clone of my server anyway. =)

  • Alan Pearce

    Perhaps you should consider setting up fastcgi caching in nginx. It will help you reduce the load on your server and there’s an option to specify that nginx should serve stale files from the cache if, say, PHP returns a 500 or 502 error.

    However, nginx will respect any caching headers sent by the application by default—if that’s a problem, either find a way to remove or replace the headers or tell nginx to ignore those particular headers.

    If you need any more help, let me know and I’ll help you out.

    • http://sachachua.com Sacha Chua

      That looks promising, particularly with the conditional purging from http://rtcamp.com/wordpress-nginx/tutorials/single-site/fastcgi-cache-with-purging/ . I should figure out how to clone my environment and version-control it so that I can try out configuration changes without disrupting my main site. I have a staging area for my blog, but none yet for my servers. Thanks for the pointer! I’ll see how far I can get on my own, and then I’ll e-mail if I get totally stuck. I’ll probably also post notes along the way, and I’d love it if you pointed out any misunderstandings or missed steps! =)