Wednesday, February 8, 2012

ETL : Moving Data From Oracle To PostgreSQL

Covered in this post are (on Fedora Core 16 x86_64)
  • Installing and configuring oracle instant client x86_64
  • Installing perl-DBD-Oracle
  • Installing and configuring ora2pg

I have a system written in java, using hibernate and JPA. This allows me to run on a variety of different databases by only making a small change in a configuration file. Specifically I run this system on Oracle in production, and use Postgres on my development box.

Yesterday I decided to dump my production data into my development database which proved to be a bit more of a challenge then I had originally anticipated. As you know every database likes to make small additions to the standard ANSI SQL language which are only applicable to their flavour. In my specific scenario, the structure of the database was not my problem; at jboss startup hibernate automatically creates the database structure for me.

Originally I had hoped that I could just make use of Oracle SQL Developer to dump my data into SQL insert statements which I could just run into Postgres. However I quickly found out that this wasn't going to work because Postgress doesn't support all of the Oracle keywords (eg. to_timestamp).

A quick google search left me with a few possible solutions, the most possible being ora2pg, so that's the route I went for.

In summary, ora2pg is a perl ETL tool which specialises in migrating oracle to postgres. As I'm not really all too familiar with PERL, I did encounter a few snags getting ora2pg to run, but learned a lot in the process.

So let's get down to it then :

  1. Install and Configure Oracle Instant Client for linux
  2. I'm going to cover this step in quite a bit of detail; even though there are pre-built .rpm packages available from oracle for both i386 and x86_64 platforms, getting everything in running can be quite a mission if you omit a few vital steps...

    1. Download the Oracle Instant Client .rpm's from Oracle here

      Make sure you get the following :

      • oracle-instantclient11.2-basic-
      • oracle-instantclient11.2-sqlplus-
      • oracle-instantclient11.2-devel-

    2. Once you've downloaded the .rpm's, you can install them either with rpm or yum (I generally prefer yum as it will automatically handle resolving and installing dependencies for you).

      $ sudo yum -y localinstall oracle-instantclient*.rpm

    3. Update the ld library cache

    4. $ cat > /etc/ <<EOF

      $ ldconfig

    5. Set the Oracle environment variable

      $ cat >> /etc/bashrc <<EOF
      export TNS_ADMIN=$HOME
      export EDITOR=vim
      export ORACLE_HOME=/usr/lib/oracle/11.2/client64

      $ source /etc/bashrc

    6. Place your tnsnames.ora file in your home directory
    7. You can test your oracle client installation by running sqlplus

      $ sqlplus64 /nolog

  3. Install Perl-DBD-Oracle

    1. Before we start ensure that you have the following packages installed
      • perl
      • perl-CPAN
      • perl-DBD
      • perl-DBI
      • gcc
      • libaio
      • libaio-devel

    2. Download the Perl-DBD-Oracle Source with CPAN

    $ perl -MCPAN -e shell
    CPAN> get DBD::Oracle
    CPAN> exit

    $ cd ~/.cpan/build/DBD-Oracle*
    $ perl Makefile.PL
    $ make
    $ make install

  4. Install and Configure ora2pg

    1. Download ora2pg here
    2. Uncompress the tar archive

      $ tar xjvf ora2pg*.tar.bz2 && rm ora2pg*.tar.bz2

    3. Build ora2pg

    4. $ cd ora2pg*
      $ perl Makefile.PL
      $ make && make install

    5. Configure ora2pg

    6. $ gedit /etc/ora2pg/ora2pg.conf

      Look out for the following configuration properties

      • ORACLE_HOME - you can set this to $ORACLE_HOME
      • ORACLE_DSN - set this to the desired entry in you tnsnames.ora
      • ORACLE_USER - the oracle username
      • ORACLE_PWD - the oracle password
      • USER_GRANTS - set this to 1 if you're not connecting as a DBA role
      • TYPE - set this to DATA if you only want to export data and ignore structure

    7. run ora2pg
    8. Simply run ora2pg from the command line, by default your data will be dumped to a file called output.sql in the current directory.

In conclusion, yes this wasn't really a "download and run" application, but now that I have it set up I'll be using it quite a lot going forward with minimal effort. I've also found that it's relatively light-weight and will dump a 5GB database in a matter of a few minutes; so I'm quite pleased.

1 comment:

  1. May I know how to change the config file so that I can export the datas alone.

    I used | INSERT |1|

    In what way shud we use it ..please do explain me via mail