Ruby Blues in Openshift Origin v2 (or How I broke my app by installing new relic gem)

The day I tried to install a Ruby gem for New Relic RPM, it causes almost a day  downtime for Phusion Passenger-run Rails Application. It turns out that there are several thing that worth noting about Ruby dependency management.

The First Error

The error message that causes downtime is like this :

Ruby (Rack) application could not be started

These are the possible causes:
  • There may be a syntax error in the application's code. Please check for such errors and fix them.
  • A required library may not installed. Please install all libraries that this application requires.
  • The application may not be properly configured. Please check whether all configuration files are written correctly, fix any incorrect configurations, and restart this application.
  • A service that the application relies on (such as the database server or the Ferret search engine server) may not have been started. Please start that service.
Further information about the error may have been written to the application's log file. Please check it in order to analyse the problem.
Error message:
Could not find newrelic_rpm-3.18.1.330 in any of the sources (Bundler::GemNotFound)
Exception class:
PhusionPassenger::UnknownError
Application root:
/var/www/openshift/console
The nature of this error is tightly coupled to how 'bundle' and 'gem install' works.

Bundle install and Gem.lock

If we execute bundle install, bundle would :
1) connect to the internet to check for latest gem versions
2) Sometimes update the gem.lock (need to explore further when it does) when finding newer gem version
3) install the gem using standard gem environment
These 3 might break your application during some circumstances. In my case, the environment running the application is a bit different than the environment in the command line.

Gem environment

The result of running 'gem environment' tells much about the situation :

RubyGems Environment:
  - RUBYGEMS VERSION: 1.8.24
  - RUBY VERSION: 1.9.3 (2013-11-22 patchlevel 484) [x86_64-linux]
  - INSTALLATION DIRECTORY: /opt/rh/ruby193/root/usr/local/share/gems
  - RUBY EXECUTABLE: /opt/rh/ruby193/root/usr/bin/ruby
  - EXECUTABLE DIRECTORY: /opt/rh/ruby193/root/usr/local/bin
  - RUBYGEMS PLATFORMS:
    - ruby
    - x86_64-linux
  - GEM PATHS:
     - /opt/rh/ruby193/root/usr/local/share/gems
     - /root/.gem/ruby/1.9.1
     - /opt/rh/ruby193/root/usr/share/gems
  - GEM CONFIGURATION:
     - :update_sources => true
     - :verbose => true
     - :benchmark => false
     - :backtrace => false
     - :bulk_threshold => 1000
  - REMOTE SOURCES:
     - http://rubygems.org/

Things to note here, the INSTALLATION DIRECTORY and GEM PATHS difference. The /usr/local/share/gems exist in GEM PATHS, as well as /usr/share/gems, but INSTALLATION will put the gems under /usr/local/share/gems.

The environment for running Ruby is a bit different too :


broker ~ # cat /var/www/openshift/console/script/console_ruby
export LD_LIBRARY_PATH=/opt/rh/ruby193/root/usr/local/lib64:/opt/rh/ruby193/root/usr/lib64:/opt/rh/v8314/root/usr/lib64
export GEM_HOME=/opt/rh/ruby193/root/usr/share/gems
export GEM_PATH=/opt/rh/root/usr/local/share/gems:/opt/rh/ruby193/root/usr/share/gems

ruby193-ruby $@

So the first GEM_PATH missed ruby193 prefix. Without the ruby193 prefix, the gems inside the actual directory would not be found, causing the error. Whats with the local directory anyway, this is not the first time I get confused by Linux/Unix directory scheme. It has historical significance but IMHO it is better not to have too many directories nowadays.

Solution to Problem I (or how to install newrelic_rpm in openshift broker&console)

Instead of fixing the GEM_PATH, I opt to use the install gem in the /usr/share/gems directory by using --install-dir :


gem install newrelic_rpm -v 3.18.1.330 --install-dir=/opt/rh/ruby193/root/usr/share/gems

Then add this one-liner in /var/www/openshift/broker/Gemfile

gem 'newrelic_rpm', '3.18.1.330'


Second error

Actually, before finding the solution above, another error crops up with these symptoms :


broker console # bundle install --local
Could not find rake-0.9.2.2 in any of the sources
broker console # bundle install
Fetching gem metadata from https://rubygems.org/...........
Fetching gem metadata from https://rubygems.org/..
Could not find openshift-origin-console-1.26.3.1 in any of the sources

As someone who are not familiar Ruby development, I found these two errors confusing. The first error, where bundle cannot find rake package, is very strange, because the 0.9.2.2 package is there :
broker console # gem list rake

*** LOCAL GEMS ***

rake (0.9.2.2)
broker console # ls -l /opt/rh/ruby193/root/usr/share/gems/gems | grep rake
drwxr-xr-x.  4 root root 4096 Nov 18  2014 rake-0.9.2.2
broker console #

Because version 0.9.2.2 is in available in the 'net (see https://rubygems.org/gems/rake/versions), I tried to use the internet by removing the --local keyword, and this time bundle complains about openshift-origin-console, which is not avaiable in the great internet.

The clue for the second error is the result of bundle config command :
broker console # bundle config
Settings are listed in order of priority. The top value will be used.

path
Set for your local app (/var/www/openshift/console/.bundle/config): "vendor"

disable_shared_gems
Set for your local app (/var/www/openshift/console/.bundle/config): "1"

Which is not the same with the neighboring application which has no settings at all:

broker broker # bundle config
Settings are listed in order of priority. The top value will be used.

broker broker # 

Solution to problem II

The solution is to remove .bundle/config file, which inadvertently created when I tried running bundle install --path vendor. The file redirects local gem searches into the ./vendor directory, thus skipping /opt/rh/ruby193/root/usr/share/gems directory and causing the 2nd error. The bundle program are unable to find the openshift gem location because it were searching the ./vendor directory.


broker console # ls -al
total 104
drwxr-x---. 14 apache apache  4096 Dec 10 20:46 .
drwxr-xr-x.  4 root   root    4096 Jun 15  2014 ..
drwxr-x---.  8 apache apache  4096 Dec  8 08:05 app
drwxr-xr-x.  2 root   root    4096 Dec 10 20:46 .bundle
drwxr-x---.  5 apache apache  4096 Dec  8 08:38 config
-rw-r-----.  1 apache apache   166 Jul 11  2014 config.ru
-rw-r-----.  1 apache apache   809 Dec  8 08:37 Gemfile
-rw-r--r--.  1 root   root    3487 Dec 10 20:31 Gemfile.lock
-rw-r--r--.  1 root   root    3453 Dec  8 07:18 Gemfile.lock.copy
drwxr-x---.  6 apache apache  4096 Dec  9 13:12 httpd
drwxr-x---.  2 apache apache  4096 Dec  8 08:38 log
drwxr-x---.  2 apache apache  4096 Jul 11  2014 .openshift
-rw-r-----.  1 apache apache 11754 Jul 11  2014 openshift-origin-console.spec
drwxr-x---.  3 apache apache  4096 Dec  8 08:05 public
-rw-r-----.  1 apache apache   398 Jul 11  2014 Rakefile
-rw-r-----.  1 apache apache  9208 Jul 11  2014 README.rdoc
drwxr-x---.  2 apache apache  4096 Jul 11  2014 run
drwxr-x---.  2 apache apache  4096 Dec  9 13:12 script
drwxr-x---.  7 apache apache  4096 Dec  8 08:05 test
drwxr-x---.  6 apache apache  4096 Dec  8 08:05 tmp
drwxr-x---.  6 apache apache  4096 Dec 10 20:46 vendor
-rw-r--r--.  1 root   root    1485 Dec 10 20:27 versionlist.log
broker console # rm -rf .bundle
broker console #

Now I can do bundle install --local again..

Lessons learned

What I learned, that is : bundle install are evil. It  install new gems and change application dependencies, sometimes this breaks your app (especially if your app is written by Red Hat and has missed some GEM_PATH). You must understood bundle's basic mechanism, because incorrect parameters will lead to breaking your application.

Comments

Popular posts from this blog

Long running process in Linux using PHP

Reverse Engineering Reptile Kernel module to Extract Authentication code

SAP System Copy Lessons Learned