yoyo-indexima

Unix Build Status PyPI Version PyPI License

Versions following Semantic Versioning

Overview

Indexima migration schema based on yoyo and pyhive.

The little story

In the land of database migration tool, i have tried:

  • flyway
  • liquidbase with hive extention

Both either did not support hive (as flyway), or indexima did not fully compliant with hive (wich cause probleme with liquidbase)

So I try to found a module not too complex in order to migrate our indexima schema in a safe way.

In this early release, I just trying to do the job.

Setup

Requirements

  • Python 3.7+

Installation

Install this library directly into an activated virtual environment:

$ pip install yoyo-indexima

or add it to your Poetry project:

$ poetry add yoyo-indexima

Usage

Hive connection

  1. backend ui must start with indexima://
  2. If you have trouble to obtain an hive connection, please read http://dwgeek.com/guide-connecting-hiveserver2-using-python-pyhive.html/

Note: If you using python in docker, you should install :

apt-get update -qq
apt-get install -qqy gcc libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit 

Migration

You could see a complete sample under 'example' folder.

using python client

yoyo_indexima
usage: yoyo_indexima [-h] [-s SOURCE] -u URI {show,apply}

example:

yoyo_indexima  apply  -s $(pwd)/example/migrations/ -u "indexima://admin:super_password@localhost:10000/default"

Commands:

  • show Show migrations
  • apply Apply migrations
  • reapply Reapply migrations
  • rollback Rollback migrations
  • mark Mark migrations as applied, without running them
  • unmark Unmark applied migrations, without rolling them back
  • break-lock Break migration locks

Help for apply:

> yoyo_indexima  apply -h
usage: yoyo_indexima apply [-h] [-s SOURCE] -u URI [-f] [-a] [-r REVISION]
                           [-d]

optional arguments:
  -h, --help            show this help message and exit
  -s SOURCE, --source SOURCE
                        source path of migration script (default ./migrations)
  -u URI, --uri URI     backend uri
  -f, --force           Force apply/rollback of steps even if previous steps
                        have failed
  -a, --all             Select all migrations, regardless of whether they have
                        been previously applied
  -r REVISION, --revision REVISION
                        Apply/rollback migration with id REVISION
  -d, --dry-run         Dry run: no modification will be applied

within python code

If your migrations script are under directory migration folder

import os

from yoyo_indexima import get_backend, read_migrations


if __name__ == "__main__":

    # obtain IndeximaBackend
    backend = get_backend('indexima://admin:super_password@localhost:10000/default?auth=CUSTOM')

    # Read migrations folder
    migrations = read_migrations(os.path.join(os.getcwd(), 'migrations/**/*'))
    print(f'migrations: {migrations}')
    if migrations:
        # apply migration
        with backend.lock():
            backend.apply_migrations(backend.to_apply(migrations))

Management table

This tool create in your default schema:

  • a log table: 'yoyo_log'
  • a lock_table: 'yoyo_lock'
  • a version table: 'yoyo_version'
  • a migration table: 'yoyo_migration'

Migration script template

"""
{message}
"""

from yoyo import step

__depends__ = {{{depends}}}

steps = [
    step("create ...", "drop ...")
]

Configure hive connection

In python script, on IndeximaBackendinstance, you could use:

  • set_hive_configuration: A dictionary of Hive settings (functionally same as the set command)
  • set_hive_thrift_transport: an instance of TSaslClientTransport

As see in https://github.com/dropbox/PyHive/issues/162, you could do things like that:

import sasl
from thrift_sasl import TSaslClientTransport
from thrift.transport.TSocket import TSocket


def create_hive_plain_transport(host, port, username, password, timeout=60):
    socket = TSocket(host, port)
    socket.setTimeout(timeout * 1000)

    sasl_auth = 'PLAIN'

    def sasl_factory():
        sasl_client = sasl.Client()
        sasl_client.setAttr('host', host)
        sasl_client.setAttr('username', username)
        sasl_client.setAttr('password', password)
        sasl_client.init()
        return sasl_client

    return TSaslClientTransport(sasl_factory, sasl_auth, socket)


backend = get_backend('indexima://admin:super_password@localhost:10000/default?auth=CUSTOM')
backend.set_hive_thrift_transport(create_hive_plain_transport(...))

Extends IndeximaBackend

If you extends IndeximaBackend you could register your classes, in the function: get_backend(uri: str, backend=IndeximaBackend) -> DatabaseBackend:

TODO: add a client parameter to specify full class name in cli.

License

The MIT License (MIT)

Contributing

See Contributing

Next step

  • apply socket timeout and serialization encoding
  • release a v1.0.0