Blogs
Centos Console Problem
Sun, 09/23/2007 - 01:07 — Derek AndersonJust a note to self. In the case of a serial console never getting around to the login prompt on a Xen DomU, even though it still displays all of the normal console boot messages, just change /etc/inittab as follows:
from:1:2345:respawn:/sbin/mingetty tty1to:
1:2345:respawn:/sbin/mingetty --noclear console
Also note: Why do we still call these things tty(s)? That is SOOO 70s!
Enomalism 0.7.2 Released
Thu, 05/24/2007 - 16:24 — Derek AndersonJust a heads up, Enomalism version 0.7.2 has been released. Tonnes of bugfixes, and some new features (the Firewall module is functional now).
- New firewall module can create firewall rules in the Dom0 which do not require iptables support in the DomU. Great for additional security on insecure DomU images.
- New CentOS 5 VM image in the VMCasting feed.
- Much more extensible API, with hooks on all functions for extending Enomalism with new features.
- Widget style modules, which work with TurboGears widget system.
- Intelligent thread locking and queueing throughout entire Enomalism core to prevent blocking and deadlocks.
- Improved caching and lower utilization in all Enomalism libraries.
- Better non-blocking calls for starting and stopping virtual machines
- Improved CPU usage monitor for DomU machines
- Selenium based regression tests for UI.
- Better provisioning management tools, with feedback on space free, and
Annoying Twisted Python Problem
Fri, 05/18/2007 - 00:58 — Derek AndersonSo... Twisted has an issue. A very annoying one.
Subprocess doesn't work. The reason? No idea ;) Twisted installs default signal handlers for EVERYTHING, which results in some odd behavior. The specific problem we are running into with Enomalism is here (via bug 2535):
Twisted currently needs to install a SIGCHLD handler for reactor.spawnProcess to work reliably. This is problematic when Twisted is used with other libraries that depend on SIGCHLD (e.g. the standard library's subprocess module depends on SIGCHLD being left as SIG_DFL, otherwise it can get EINTR at unexpected times). Hence the reactor can be run with the installSignalHandlers=False flag, but then reactor.spawnProcess isn't reliable.
Seriously though, launching subprocess calls from inside Twisted python is
a good way to end up with a dead reactor. To get around this, make sure
that you call your subprocess Popens from inside a Fork, whenever possible. This
means that if you pull libraries in that need to use subprocess, the twisted web
methods MUST fork before calling. You can check the return codes using return
values, or redirect to a stdout file descriptor if necessary. Actually, if you
want (unlike me) to not be an idiot, this little snippet completely
works around the problem AFAICT except for the fact that reactor.spawnProcess will
no longer work (nope. It still crashes, it just takes longer):
THERE IS A COMPLETE WORKING LIBRARY POSTED WAY BELOW
Ruby on Rails Workarounds
Wed, 05/16/2007 - 06:15 — Derek AndersonI ran into some nasty bugs in RoR this weekend, so I thought I would share the fixes with you briefly, and perhaps get somebody to email me with a link to a fix to a separate nasty bug I have not been able to remedy...
This is what I was seeing on a new rails install inside of Webrick (mod_ruby/CGI/fastCGI failed to work at all):
Expected /var/www/vhosts/NAMEOFCLIENTGOESHERE/rubysite/controllers/home_controller.rb to define HomeController
No end of messing with the config would help, but I found some useful advice here: Slicehost Forum Post. Reinstall an old rails. LAME!
gem uninstall rails gem install -v=1.1.6 rails
Admittedly, I had to rewrite some stuff with the refactorings on TurboGears, but at least I could figure it out. This rails thing was totally impenetrable.
Next, I ran against the wall trying to figure out how to get fastCGI working. FastCGI works on a new freshly created rails app, but fails utterly on this older one. IT is definitely something to do with the app specifically. Anybody seen this error before (and fixed it?):
Dispatcher failed to catch: You have a nil object when you didn't expect it! You might have expected an instance of Array. The error occurred while evaluating nil.split (NoMethodError) /usr/lib/ruby/1.8/cgi.rb:897:in `parse' /usr/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/cgi_ext/raw_post_data_fix.rb:45:in `initialize_query' /usr/lib/ruby/1.8/cgi.rb:2274:in `initialize' /usr/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:606:in `new' /usr/lib/ruby/gems/1.8/gems/fcgi-0.8.7/lib/fcgi.rb:606:in `each_cgi' /usr/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:141:in `process_each_request!' /usr/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:55:in `process!' /usr/lib/ruby/gems/1.8/gems/rails-1.2.3/lib/fcgi_handler.rb:25:in `process!' ./dispatch.fcgi:24 killed by this error
FOLLOWUP: Found the nastiest Enomalism bug yet!
Mon, 05/14/2007 - 15:54 — Derek AndersonSo... Even when you think you have it all figured out, sometimes, you are wrong.
Enomalism seemed to live a lot longer when simplefirewall was removed, but would still go down overnight when dos attacking via some automated javascript stuff. By the way, if you have not yet checked out the Selenium project, you really should. Awesome automated regression testing framework.
Anyways, I did a LOT of digging, and wrote a test harness in addition to my NOSE tests that re-ran tight loops on the various lowlevel XEN services in my API, and found another bug. Turns out that when you retrieve (or try to) details for a non-running xen domain enough times, sooner or later, the XenD socket library starts leaking sockets. I think. I only know that it blocks on the thread forever, leading to a socket being used up. I also discovered that it is not only non-re-entrant, but also non concurrent!
Solution:- Thread locks on all cheezy api calls
- Cache all running machine states, and request a list of running ones only to avoid the dead machine bug
- More regression testing :(
Found the nastiest Enomalism bug yet!
Thu, 05/10/2007 - 06:51 — Derek AndersonWow. That was a painful week and a half.
Periodically, Enomalism would hang, and I could not find out why. The system would load up with CLOSE_WAIT sockets, and enomalism would fail over (not to be confused with the "Apache is stupid" bug which is a separate problem which has been remedied with nginx. Trust the Russians to generate a nearly unknown, indestructable, high quality web server, with all of lighttpd's functionality, but without the memory leaks, and in half the ram).
Note to Derek: Brackets considered harmful, especially when they contain run on sentences.
So, right, the bug again... When Enomalism starts and stops machines, and has the SimpleFirewall (note to self, bad name) module installed, iptables is called in subprocess, and this shuts down the firewall for that machine (or starts it). Starting is no problem, since we perform a series of operations out of sync, but stopping is more problematic. When we stop the machines, iptables is called with -F (to flush the filter chain), -D (to remove the jump to the filter chain), and then -X (to delete the old filter chain). The problem is that sometimes iptables hangs on indefinitely, which blocks the next filter chain, and worse, results in a futex_wait on the subprocess completing. The solution? Not sure yet. I am sure that I will be able to workaround nicely, but the sequential blocking behavior is necessary to prevent early shutdown of the firewall, and also to do the operations without stepping on each others toes. Perhaps a clever del P in the right place? More details in a couple of days!
Could not find rails (> 0) in any repository
Wed, 05/09/2007 - 18:48 — Derek AndersonThe problem is solved by following the instructions here: Delete Your RubyGems Cache. Here is the secret sauce:
gem env
#which will give you something like this:
RubyGems Environment:
- VERSION: 0.9.2 (0.9.2)
- INSTALLATION DIRECTORY: /usr/lib/ruby/gems/1.8
- GEM PATH:
- /usr/lib/ruby/gems/1.8
- REMOTE SOURCES:
- http://gems.rubyforge.org
Next you type the following:
Five things mainstream XEN distributions (other than Xensource) get wrong.
Mon, 05/07/2007 - 19:11 — Derek AndersonSo, there are some things about the stock XEN setups in a lot of the mainstream operating systems. Not to <cough>Centos5/RHEL5</cough> name any names. I though I would share the most egregious ones with you today :)
So here are some use cases:
- You want to download jailtime.org images
- You want to use Enomalism
- You want to be able to resize your hard drive image some day
- You want to move partitions between DomU images (this is actually very convenient)
- You hate pyGrub and want to avoid it.
Virtual Machine Upgrades and VMCasting
Mon, 05/07/2007 - 17:44 — Derek AndersonI just read this story: RSS Tapped to update virtual machines, and thought I would comment on it a little bit.
First of all, VMCasting is designed to provide a distribution method for virtual machines. Other people have tried to put something together before us, and it never really stuck. I am not saying we have it all figured out, but we have something that works with our Enomalism platform, and that is good enough for me right now. Also, we are not trying to provide an ugprade path for running machines anyways, just for the images. I am sure that our management for this will get better as time passes, but for now, again, this works.
As for the upgrading of instances/VMs, I think that people really don't understand the entire purpose of the VM in the first place. I mean, the idea is that you can aggregate servers together, sure, but also you can use different distributions and leverage their different strengths, and that includes upgrade paths. People have spent YEARS on these upgrade systems. VM management is not the silver bullet for that, and we are not going to surpass the richness and flexibility of existing package management any time soon.
Do I think that there is a better way? Sure. This field is immature, and there are a lot of players. rPath has some fantastic ideas, as long as you can stump the $$$ to use their platform. We are targeting a more fully open (as in no lock-in) platform, so we are investigating other methods. Of course, since we are fully modular, I would love to develop a plugin to pull rPath machines. I think that would be hella sexy.
NGINX+FastCGI+PHP+Python==teh funs!
Sun, 05/06/2007 - 19:58 — Derek AndersonI have started using nginx web server for testing Enomalism (to supply the https proxy), and I am really impressed. I also tried to run some php/static serving stuff over it, and the speed increase blew me away. Then I checked out the capacity to handle massively concurrent connections, and was REALLY blown away. I was so excited I actually sent benchmark results to Khaz.
Anyhow, I think that I would like to investigate using nginx with fastCGI and PHP for future high traffic sites, and I am thinking that the combo might actually be the magic bullet for memory heavy sites like Typo3 where the apache process, even with extraneous modules disabled, easily adds another 10M or so per pageview. This becomes even more important on VPS servers where RAM is the most limiting resource you have.
As an added bonus, nobody is targeting nginx with attacks right now.
Benchmarks here:
http://blog.kovyrin.net/2006/08/28/ruby-performance-results/


