Practical Development: 2009

Monday, November 23, 2009

Recursion in Ruby

I recently took part in the RPCFN. Being that this month's challenge was a Shortest Path type challenge and I've done enough discrete maths papers to know that the best graph traversal Algorithm for this is Dijkstra's least cost path one I went away and implemented it in a couple of hours. Many others did exactly the same thing, often much more elegantly that me. What struck me was that the markers were quite obviously looking for people to imagine up their own method of finding the shortest path, and had a massive bias towards recursive algorithms.

Now the winning submission is a very nice piece of code, it's concise, easy to read, simple, well tested, and it clearly lays out it's goals, to find a recursive solution to the problem with disregard of performance. It performed ok, I did wonder however how the recursion would impact the performance on larger problem sets.

So I took Todd's code and refactored mine to work with his data structures. I also changed up my code to match his module's interface and packaged both implementations into their own classes so that there wouldn't be clashes while trying to test them. (modules are for adding abilities to classes, not for providing whole feature sets, if you see yourself including a module into main, you're doing it wrong...)

And here it is.

I ran benchmarks on a low end computer (Pentium 2.8) against 3 graphs, one with 2 nodes and 2 edges, the RPCFN example with 7 nodes and 10 edges, and a larger, more complicated one of 7 nodes and 21 edges, all of this is in the code linked above. Here's the results over 10000 iterations:

Simple Graph:
Iterative Approach took 1.54700second(s).
Recursive Approach took 0.95300second(s).
RPCFN Graph:
Iterative Approach took 6.37500second(s).
Recursive Approach took 20.62500second(s).
Large Graph:
Iterative Approach took 10.70300second(s).
Recursive Approach took 472.96200second(s).

The numbers speak for themselves, this approach doesn't scale at all well. In fact at first I thought I'd broken some of Todd's code and created an infinite loop of some sort. It's more than likely that the recursive function could be tweaked to get better performance, but short version is that if I were ever to need to implement graph traversal in Ruby for any production system, recursion would be the last approach I'd look at.

Final word
Todd's decision to use a recursive solution in this case was totally justified, readability and simplicity were massive factors in this event. That said I believe that outside of the smaller, more contrived problem scopes, recursion brings a lot of headaches and scaling issues than the iterative approaches.

Exception Errno::ENOENT in PhusionPassenger

As Passenger fails so rarely there's not much documentation on the error messages, so I'm posting this for the benefit of googlers everwhere... Scroll to the bottom for the Cliffs notes.

Passenger is a fantastic piece of software, I can't praise it highly enough. It's brought Ruby deployments from the complexity of reverse proxying to a mongrel cluster up to being on par of ease with PHP, a difficult and noteworthy achievement that can only increase the adoption of Rack based technologies in the future.

That said, in very rare cases the Passenger spawner blows up in spectacular ways that the old Mongrel deployments never even dreamed of. When this happens all of your current and future Rack instances running on Passenger will respond to every request with a 500 error, your only course of action being a full Apache restart, and since it's so rare there's usually nothing that can be found on the net to fix it.

Here's an example from one of my company's Sinatra services:


*** Exception Errno::ENOENT in PhusionPassenger::Rack::ApplicationSpawner (No such file or directory - /tmp/passenger.19250/backends/backend.Hoa3NPf9xWIsBy3D4sBZtTKE1DgjhTKjbSpdwSPWW6FakeTwddzh2hwteMo6qYecQC
zgV) (process 8942):
      from /opt/ruby-enterprise-1.8.6-20090610/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_request_handler.rb:279:in `initialize'
      from /opt/ruby-enterprise-1.8.6-20090610/lib/ruby/gems/1.8/gems/passenger-
...snip...
[ pid=725 file=ext/apache2/Hooks.cpp:688 time=2009-11-24 06:31:02.381 ]:
Unexpected error in mod_passenger: Cannot spawn application '/opt/number_service': The spawn server has exited unexpectedly.
Backtrace:
   in 'virtual boost::shared_ptr Passenger::ApplicationPoolServer::Client::get(const Passenger::PoolOptions&)' (ApplicationPoolServer.h:471)
   in 'int Hooks::handleRequest(request_rec*)' (Hooks.cpp:485)

The root cause here is actually this ticket - the /tmp/passenger.****/ folder will be almost empty because some temp watcher cleaned it out in a quiet period. The ticket has been closed but unless you feel like building from source you'll have to wait until the next release to fix the issue.

Our fix was to move the Passenger temp directory out of /tmp in the Apache httpd.conf to stop temp watchers from nuking our deployments. Instructions on setting this up are found in the fantastic Passenger documentation.

Wednesday, September 16, 2009

Serving Rails sites using Passenger with SSL

This comes up a lot in the rails community, and it's not too difficult to do.

This brief guide assumes that you have a working site hosted on a Linux server using Passenger (and also having installed mod_ssl for Apache) and that you want to secure the whole thing using SSL. If not, you'll need to add extra steps like deploying the project, installing the gems, running your migrations, installing mod_ssl and Passenger for Apache etc.

We will generate our own (horrible...) temporary SSL cert assuming that later we will get a real one from a company like VeriSign and can swap out the temp one. Also, because the staging environment I've been deploying to runs on XAMPP and we've still needed phpMyAdmin while developing the site, I'll throw in the "Serving a PHP folder from passenger" whirlwind tour (for a much better explanation, see AbleTech's description).

Firstly Passenger will want a log file it can write to.
You'll want your Passenger logs to come into a log file in your rails site's log folder (near the end of the post you'll see I've configured "ErrorLog /opt/railssite/log/apache.log" in my httpd.conf). Create one in your rails apps log directory and run "chmod 0666" against it so that Apache will have read/write rights. If you don't, Apache will write all of the Passenger logging to the standard error_log and you'll waste time trying to see why your site isn't working like you thought it should.

Second create your SSL key/crt files.
Create a folder to hold your ssl key/crt files. We're going to use openssl to generate our key and certs so if your flavor of Linux doesn't have it installed you'll need to use apt or yum or something to get it. Here's the brief rundown on how to generate a valid key/cert with openssl, with the password "password". You may need to do this as root.

Run "openssl genrsa -des3 -out your.domain.name.key 1024"
When prompted enter the passphrase "password"
This generates the file your.domain.name.key. Because you want to get into good security habits early run "chmod 400" against this file, this makes sure that your key file is only editable by the root user.

To generate the self signed cert from your key run "openssl req -new -key your.domain.name.key -x509 -out your.domain.name.crt"
This generates the file your.domain.name.crt which is the certificate that we're going to use to secure this site.

Thirdly configure Apaches httpd.conf.
We're going to forward all traffic that comes in on port 80 to port 443 by configuring a RewriteRule in a virtual host listening on port 80 to forward all requests to the virtual host listening on port 443. You don't have to do this but users are easy to frustrate. Since HTTP is roughly equivalent to HTTPS in the mind of the average end user while they're typing a URL, making both "just work" is easy and highly recommended.

For the purposes of the folowing httpd.conf snippet, my SSL key/crt files were located in /opt/lampp/ssl and my rails site was in /opt/railssite


<VirtualHost *:80>
  RewriteEngine On
  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=permanent]
</VirtualHost>

<IfDefine SSL>
  <VirtualHost *:443>
      ServerName your.domain.name
      RailsEnv staging
      DocumentRoot /opt/railssite/public
      ErrorLog /opt/railssite/log/apache.log
      SSLEngine on
      SSLCertificateFile /opt/lampp/ssl/your.domain.name.crt
      SSLCertificateKeyFile /opt/lampp/ssl/your.domain.name.key
      SSLProtocol all -SSLv2
      SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
   
      <Location /phpmyadmin>
          PassengerEnabled off
      </Location>
  </VirtualHost>
</IfDefine>

A few notes on the above.
"PassengerEnabled off" in the location phpmyadmin ensures that when someone requests your.domain.name/phpmyadmin Passenger doesn't try to serve it up and PHP takes over. If Apache can't make the jump and find the directory you need it to serve up on it's own, add "Alias /phpmyadmin /dir/to/phpmyadmin" above the directory node to force it to look in the right physical location. This is particularly useful if you need to serve up something like a PHP blog alongside your rails application, and want to keep the source code for this in the rails project.

Often you see people configure the rails document root in both virtualhosts, it's not necessary and depending on how your copy of Apache is running, it can cause Passenger to spin up a new rails instance just to have the user forwarded to the secured site. On a heavy usage site this could cause a lot of memory to be consumed for no reason.

It goes without saying that the secured virtualhost (443) should be configured within an <IfDefine SSL> node. This ensures that the site isn't served up unsecured if SSL fails for some reason.

Monday, August 24, 2009

Drop downs selections for belongs_to in rails

Rails does a lot of work for you out of the box, and with it's form builders you can get it to do some things that make you never want to go back to .NET and databind a control ever again...

Example: Recently I needed to bash together a quick admin screen with a couple of models. We have an organisation model that belongs_to :region and a Region model that has_many :organisations. This dictates a foreign key in organisation that points at a single region. In our generated Rails forms we enter that as a number, but it'd really be a nicer user experience to select the region from a drop down list right?

Enter collection_select. This little helper generates a selection dropdown from a collection (not surprisingly). So if in my organisation_controller's edit method I return @regions = Region.all we can get instant rails gratification.

Finally, since Edit and New use EXACTLY the same form, we're going to try be DRY and pull all the duplicated code out into a partial. I usually use the default form builder to base it on (you know, "form_for(@organisation) do |f|" ) and call "render :partial => f"

When this renders it looks for the partial pointed at by f, which is a form. so _form.html.erb will be rendered. A final note, even though the partial here will be able to see @regions, I like to explicitly pass it in the locals hash, so I know where the heck it came from in 6 months time. If that's confusing you can just strip out that code and use the global set in the organisation controller. Here's the code:

models/organisation.rb


class Organisation < ActiveRecord::Base

  belongs_to :region

end

models/region.rb


class Region< ActiveRecord::Base

  has_many :organisations

end

controllers/organisations_controller.rb


class OrganisationsController < ApplicationController

  # GET /organisations/new

  # GET /organisations/new.xml

  def new

    @organisation = Organisation.new

    @regions = Region.all

    respond_to do |format|

      format.html # new.html.erb

      format.xml  { render :xml => @organisation }

    end

  end

  # GET /organisations/1/edit

  def edit

    @organisation = Organisation.find(params[:id])

    @regions = Region.all

  end
end

views/organisations/_form.html.erb


<%= form.error_messages %>

<p>

  <%= form.label :org_name %><br />

  <%= form.text_field :org_name %>

</p>

<p>

  <%= form.label :org_code %><br />

  <%= form.text_field :org_code %>

</p>

<p>

  <%= form.label :provider %><br />

  <%= form.text_field :provider %>

</p>

<p>

  <%= form.label :region_id %><br />

  <%= collection_select(:organisation, :region_id, regions, :id, :full_name, {:prompt => false}) %>

</p>

views/organisations/edit.html.erb


<% form_for(@organisation) do |f| %>

  #:region => regions exposes the regions array as a local variable in the partial

  <%= render :partial => f, :locals => { :regions => @regions } %>

<p>

  <%= f.submit 'Update' %>

</p>

<% end %>

Monday, August 17, 2009

SOAP4R clients with proxy and basic authentication

If you go and google "soap4r proxy" you'll get a lot of helpful hits that show you how to set your http_proxy environment variable. Unfortunately there's not many (read none) that are devoted to configuring it programatically. So here it is.

If you run the wsdl2r tool to generate code you'll end up with a defaultDriver.rb file. This defines the RPC driver that you can call the webservices against. You'll want to ensure that it's going to use the HTTPClient libraries. If you installed it manually you should be fine, but if you installed it with rubygems you'll need to add requires rubygems to the top of your file.

Next you'll need to add the basic authentication configuration and the proxy configuration to the SOAP driver (using some horrendously poorly documented features of the SOAP4R library). The best place to do this is in your initialize method. You'll end up with a class that resembles the following (I generated mine from a WSDL pertaining to sending SMS messages):


require 'rubygems'
require 'xsd/qname'
require 'httpclient'
require 'soap/rpc/driver'

class MessagingService < ::SOAP::RPC::Driver
 
  DefaultEndpointUrl = "https://a.secure.soap/endpoint"
  MappingRegistry = ::SOAP::Mapping::Registry.new
 
  Methods = [
    [ XSD::QName.new("https://somesoapgeneratedmappingname", "sendSMS"),
      "",
      "sendSMS",
      [ ["in", "to", ["::SOAP::SOAPString"]],
        ["in", "application", ["::SOAP::SOAPString"]],
        ["in", "message", ["::SOAP::SOAPString"]],
        ["in", "options", ["::SOAP::SOAPInt"]],
        ["retval", "sendSMSReturn", ["::SOAP::SOAPString"]] ],
      { :request_style =>  :rpc, :request_use =>  :encoded,
        :response_style => :rpc, :response_use => :encoded }
    ]
  ]

  def initialize(endpoint_url = nil)
    endpoint_url ||= DefaultEndpointUrl
    super(endpoint_url, nil)
    self.mapping_registry = MappingRegistry
    init_methods
    self.options["protocol.http.basic_auth"] << [endpoint_url,'username','password']
    self.options["protocol.http.proxy"] = "http://yourproxyserver:8080/"
  end

private

  def init_methods
    Methods.each do |definitions|
      opt = definitions.last
      if opt[:request_style] == :document
        add_document_operation(*definitions)
      else
        add_rpc_operation(*definitions)
        qname = definitions[0]
        name = definitions[2]
        if qname.name != name and qname.name.capitalize == name.capitalize
          ::SOAP::Mapping.define_singleton_method(self, qname.name) do |*arg|
            __send__(name, *arg)
          end
        end
      end
    end
  end
end

Thursday, July 16, 2009

Rails with a legacy Oracle DB

It's a bit annoying...
It seems like every ROR tutorial assumes two things, firstly that you're writing your web application on Mac OSX with Textmate, and second that you're starting with a fresh database schema on MySQL. For a large percentage of people this is correct, but for the other half it's more than frustrating.

I am currently investigating using rails to replace a rather old (think 6 years+) and difficult (JSP, the kind that looks like PHP gone bad) reporting site. The site shows various metrics to our business partners by analyzing SMS traffic sent through our gateway. I need to replace it with something simpler and more maintainable, Rails seemed to fit the bill, however there are some hurdles.

Firstly, Ruby likes UNIX. I program on a windows box and deploy to UNIX environments, there's nothing I can do to change that. Setting up a Rails environment on windows without resorting to the horribly outdated installers is a feat in itself, I'll cover that in a later post.

Secondly, my company likes Oracle. For the moment. It wouldn't be my first choice on a new project, but our software (with high availability requirements, somewhere in the 99.9% up time region) has been running smoothly on Oracle databases since I was in first year university. There's nothing like a good track record to make loyal followers of upper management.

Thirdly the schemas of the data I need to report on are sorely outdated. Everything is capitalized, there's no standard "id" column on any table, and there's a scattering of prefixes on half the columns. This makes for a slow start with ActiveRecord...

The Oracle to Rails "Stack"
Slow it may be but start we will. My "Secondly" was pretty easy to fix. Basically we need to plug 3 gaps to get ActiveRecord to talk to an Oracle Database, that can be summarised as: Your pc needs a connection to the Oracle Database Server, Ruby needs to see this connection, and ActiveRecord needs to see Ruby's connection. To make it a bit easier (maybe) here's a diagram, drawn in glorious MS Paint =>

Oracle Client: Since I already do a lot of work on Oracle databases I didn't need a client, but if you don't already have one, go find an oracle client preferable a thin (hah!) one. Google Oracle XE if you want to have a local oracle database, it's Oracle's attempt at an express edition, however at over a gigabyte to install on windows it's not for the netbook developer types...

Ruby-OCI8: go to your command line and get the Ruby-OCI8 gem via: gem install ruby-oci8 on windows you'll get the ruby-oci8-2.0.2-x86-mswin32 version. This gem lets Ruby talk to your Oracle client.

Oracle-Enhanced: This gem is an ActiveRecord adapter. Get it by running gem install activerecord-oracle_enhanced-adapter

These different parts will get you a path from ActiveRecord in Rails all the way to the Oracle database you need to work on, but there's a few things we need to tweak in our application's config to get it all the way, see here for a lot of info that is much better than I can give.

Schema woes...
Most things in rails "Just work", so long as you followed EVERY SINGLE CONVENTION. This is a bit much to ask of the developers that designed our legacy database 6 years ago. Luckily you can override and alias enough values to shoehorn your data into a valid ActiveRecord model, and once that (admittedly long and tedious) task is done you can develop like your database schema was never an issue.

I'm going to cover a few of the more common issues by rebuilding a hypothetical (cough) table called ORGANISATION. ORGANISATION has the following columns: OR_ID, FK_RE_ID, OR_NAME, OR_CODE, FK_PMP_ID. As you can see it's kind of well structured, but definitely not "Rails Safe".

Issue 1: Rails expects database tables to be pluralised.
ActiveRecord offers a method set_table_name that takes a string. We'll add set_table_name "organisation" to our model.

Issue 2: Rails wants a column called "id" for it's primary key.
ActiveRecord offers a method set_primary_key that takes a string. We'll add set_primary_key "OR_ID" to our model.

OK we're half way there. If we were building from a scaffold command our views might work (maybe) but I can think of a few places where they'll break.

Issue 3: By ERB code is breaking, what the hell?
There's a few issues here, first is that if we use the scaffold command we'll probably get what we asked for. If we asked for upper case column names (as you'd expect) then your views will break when they try to extract the info in "organisation.FK_RE_ID". Oracle SQL is very lenient on the casing of it's table and column names, while something in Rails or our adapters hates uppercase. Use the lowercase "organisation.fk_re_id".

Issue 4: By ERB code is still breaking, what the hell?
Shortcuts... The scaffolded views will invariably at some stage try to call a method that takes the organisation and try to guess at it's. In the index.html.erb for example (we're scaffolding here) it will try to do something like: "<%= link_to 'Edit', edit_organisation_path(organisation) %>" since the organisation has no "id" column it will just send everything else. Poof, your code broke.
To fix it you need to alias your foreign key as "id" in your model, add alias_attribute :id, :or_id

Final niggling issue 5: I don't like all the unintuitive column names in my views.
Since we've aliased OR_ID to id, we might as well do the same on all our columns. Just remember to alias to LOWERCASED versions of the column names, otherwise your views will throw errors again. Below is the code for the model I've described above, it works a treat.


class Organisation < ActiveRecord::Base
 set_table_name "organisation"
 set_primary_key "OR_ID" 
 
 alias_attribute :id, :or_id
 alias_attribute :region_id, :fk_re_id
 alias_attribute :org_name, :or_name
 alias_attribute :org_code, :or_code
 alias_attribute :provider, :fk_pmp_id 
end

Thursday, June 25, 2009

Installing Ruby on Windows - it's all about the libraries

Ruby on windows is a bit lackluster, primarily due to the poor C compiler support making it hard to compile the source there, the documentation doesn't mention the extra libraries it requires and the fact that the One-Click-Installer is always several versions behind.

The issue here is that Ruby doesn't give you any hints as to where to find the required libraries, or even that it uses them, so installing is a bit of a trial and error affair. The other issue is that a couple of the error messages are a bit cryptic (there's one that states "Ordinal 3873 could not be located in dynamic library... that means openssl isn't installed) making it hard to pinpoint the fix.

So: The Cabin has a great tutorial (that I heartily recommend) for installing 1.9 on windows from the zipped binary distro, I found it only after I'd done most of these steps myself.
For those of you who want the Cliffs Notes version, here it is:

http://www.ruby-lang.org/en/downloads

Add the ruby/bin directory to your system path.

http://gnuwin32.sourceforge.net

Rename zlib1.dll to zlib.dll, libssl32.dll to ssleay32.dll and readline5.dll to readline.dll

And that's it. Installing an up to date Ruby distribution in 4 easy steps.

Tuesday, June 16, 2009

Using dsget and dsquery

Being a developer in the support department in a medium-large organisation forces you to do some odd bits and pieces that would usually be performed by the system admins in a bigger company. So this week I was asked to do a bit of command line trickery using some Active Directory administration tools to extract some user information.

The company I work for wanted mobile phone numbers extracted to files based on the email groups. For example I belong to a group, let's call it "Support" (how original), that is an email group in AD. Management wanted the name and mobile phone number of each person who's receives an email when it's sent to "Support" spit out to a file called "Support.csv" in the format "Username", "Mobile Number", to be used for updating an SMS application.

I won't cover the conversion from the odd format that comes out of the tools to CSV, suffice to say it's easy with python, or ruby or perl, so here's the command:

dsquery group ou="User Groups",dc=domainname,dc=net -name "Support" | dsget group -members -expand | dsget user -display -mobile -c > c:\Support.txt

In order from left to right, this command runs dsquery to find the any group called Support in the User Groups organisational unit on the domainname.net domain. It then runs dsget against it to spit out the full list of member objects, the -expand makes it expand all of the groups below it. It finally runs dsget against all of these objects and if it's a user pipes the display name and number out to Support.txt. The -c is important, it ensures that any groups that come out of the "dsget group" call don't crash the "dsget user" call.

Monday, June 1, 2009

Simple SOAP calls over SSL with Ruby

I deal with web services a lot in my day to day, some good, some nightmarish. Having scripts and example code to deal with them makes my job a lot easier. My example last week was a piece of python code that implemented CONNECT to allow you to make SSL encrypted requests across a proxy in Python. I had to write this code because the company I work for exposes it's web services to clients via SSL, and being able to offer example code to leverage our services in multiple languages is a good thing. Not being able to offer example code in a pretty mainstream (in web terms) language is a bad thing.

Therefore I wrote a similar script to last week's example that uses Ruby to post an XMP SOAP envelope request to an SSL secured site over a proxy. For enterprise use in apps expecting to make thousands of SOAP calls a day I would recommend users to build a full SOAP app using SOAP4R or some equivalent framework. However as this is often overkill for smaller apps that simply want to make a few calls to a web service, doing the request manually as a raw HTTP POST to send the SOAP envelope to the service is usually enough, and doesn't obscure the details of what SOAP really is, an HTTP POST request with a rigorously defined XML payload.

As an aside, to build the SOAP envelope for this example I heartily recommend SoapUI, it allows you to open a WSDL for a particular service and easily generate the XML for each SOAP action. It's really good, especially if you build SOAP web services in your day to day job. It's so good, in fact, that as you get more and more used to using it you might find it replacing the test harnesses you inevitably build to test the web services you build.

Back on topic, because we are building the XML as a string we need to inject any parameters for the SOAP call after we have built it. Luckily Ruby lets us replace substrings in strings pretty easily with the .sub! method. I use it to template my SOAP requests with {1} and {2} etc... Also to avoid being stung with content length issues make sure you finalize your XML before creating the headers dictionary so that your Content-Lenght ('Content-Length'=> soap_data.length) is correct. If you don't, the next 2 hours will be wasted while you go round in circles with HTTP errors that don't make too much sense.

Anyway: here's the code, you'll notice that it's a lot shorter that the Python code from last week, this is because the standard library gives you native SSL over Proxy support. Note the call 'http_session.use_ssl = true'.


require 'net/https'
require 'open-uri'

# Create the SOAP Envelope
soap_data = '''
This is where the SOAP xml would be drafted. I use {1} to template value fields.
'''
# normally I'd inject parameters into the SOAP Envelope using a call like:
# soap_data.sub!('{1}', "example string")

# Set Headers
headers = {
  'Content-type'=> 'text/xml; charset=utf-8',
  'SOAPAction'=> '""',
  'User-Agent'=> 'The useragent you wish to use, useful if you ever have to debug at the other end...',
  'Host'=>  'www.securedurl.com',
  'Content-Length'=> soap_data.length
}

#create session object
uri = URI.parse("https://www.securedurl.com")
path = '/WebServiceHome/services/'
proxy = Net::HTTP::Proxy("aproxyserver",8080)
http_session = proxy.new(uri.host, uri.port)
http_session.use_ssl = true

#start the http session
http_session.start { |http|
  # create the request
  req = Net::HTTP::Post.new(path)
  req.basic_auth mip_user, mip_password
  headers.each{|key, val| req.add_field(key, val)}
  # Post the request
  resp, data = http.request(req, soap_data)
  puts 'Code = ' + resp.code
  puts 'Message = ' + resp.message
  resp.each { |key, val| puts key + ' = ' + val }
  puts data
}

Sunday, May 24, 2009

Python 3.0 SSL over Proxy

With the IT world the way it is now I'm certain the majority of enterprises put their users behind proxy servers, and I'm also certain that a lot of users behind said proxies need to access SSL secured sites programatically. With python touting itself as an enterprise worthy product and the batteries included philosophy of the core language libraries I'm surprised that there is no built in support for accessing SSL secured sites over a proxy. Admittedly there is a patch in the root bug report for this feature, however it's been around for years and it's only just getting to the implementation stage. Hopefully it's going to make it into 3.1.

The python HOW TO documentation points to a cookbook recipe that handles this issue, but it's still only available in 2.x version. So since I had to implement this for a little code snippet here it is, SSL over proxy, the Python 3.0 version.


import urllib, urllib.parse, ssl, http.client, socket
from urllib.request import Request, urlopen
from urllib.error import  URLError, HTTPError

class ProxyHTTPConnection(http.client.HTTPConnection):
    _ports = {'http' : 80, 'https' : 443}
    def request(self, method, url, body=None, headers={}):
        #request is called before connect, so can interpret url and get
        #real host/port to be used to make CONNECT request to proxy
        proto, rest = urllib.parse.splittype(url)
        if proto is None:
            raise ValueError("unknown URL type: %s" % url)
        #get host
        host, rest = urllib.parse.splithost(rest)
        #try to get port
        host, port = urllib.parse.splitport(host)
        #if port is not defined try to get from proto
        if port is None:
            try:
                port = self._ports[proto]
            except KeyError:
                raise ValueError("unknown protocol for: %s" % url)
        self._real_host = host
        self._real_port = port
        http.client.HTTPConnection.request(self, method, url, body, headers)

    def connect(self):
        http.client.HTTPConnection.connect(self)
        #send proxy CONNECT request
        connect_string="CONNECT {0}:{1} HTTP/1.0\r\n\r\n".format(self._real_host, self._real_port)
        self.send(connect_string.encode('utf-8'))
        #expect a HTTP/1.0 200 Connection established
        response = self.response_class(self.sock, strict=self.strict, method=self._method)
        (version, code, message) = response._read_status()
        #probably here we can handle auth requests...
        if code != 200:
            #proxy returned and error, abort connection, and raise exception
            self.close()
            raise socket.error("Proxy connection failed: %d %s" % (code, message.strip()))
        #eat up header block from proxy....
        while True:
            #should not use directly fp probablu
            line = response.fp.readline()
            print(line)
            if line == b'\r\n': break


class ProxyHTTPSConnection(ProxyHTTPConnection):
    default_port = 443
    def __init__(self, host, timeout = 10, port = None, key_file = None, cert_file = None, strict = None):
        ProxyHTTPConnection.__init__(self, host, port)
        self.key_file = key_file
        self.cert_file = cert_file

    def connect(self):
        ProxyHTTPConnection.connect(self)
        #make the sock ssl-aware
        self.sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file)


class ConnectHTTPHandler(urllib.request.HTTPHandler):
    def do_open(self, http_class, req):
        return urllib.request.HTTPHandler.do_open(self, ProxyHTTPConnection, req)

class ConnectHTTPSHandler(urllib.request.HTTPSHandler):

    def do_open(self, http_class, req):
        return urllib.request.HTTPSHandler.do_open(self, ProxyHTTPSConnection, req)


if __name__ == '__main__':
    import sys
    # build Proxy handler
    proxies = {'http': 'http://aproxyserver:8080/', 'https': 'http://aproxyserver:8080/'}
    proxy_handler = urllib.request.ProxyHandler(proxies)
    # build basic authentication handler
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    password_mgr.add_password(None, 'https://www.securedurl.com/', 'username', 'password')
    auth_handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
    # create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(ConnectHTTPHandler, ConnectHTTPSHandler, proxy_handler, auth_handler)
    urllib.request.install_opener(opener)
    request_url = "https://www.securedurl.com/default.html"
    req = Request(request_url)
    try:
        response = urlopen(req)
        print(response.read())
    except URLError as e:
        print(e.headers)
        print (e.code)