Monday, May 11, 2015

Fixing Unicode issues in Pentaho CDA

Recently I encountered problem when sending a query parameter with unicode text to Pentaho CDA. For these type of queries CDA returns an empty result set although there are matching items. After some research I fixed this issue by adding an additional parameter to the JDBC connection url. Now JDBC connection is like below.

<DataSources>
        <Connection id="1" type="sql.jdbc">
            <Driver>com.mysql.jdbc.Driver</Driver>
            <Url>jdbc:mysql://host:3306/DB?useUnicode=true&amp;characterEncoding=UTF-8</Url>
            <User>user</User>
            <Pass>pass</Pass>
        </Connection>
 </DataSources>


Saturday, March 14, 2015

Creating a Pentaho BI server cluster

Note - This is applicable to Pentaho BI server community edition 5.x only.

Pentaho BI server provides a large set of features which are essential for  BI applications. To use this in production we might need to create a CDA cluster to maintain high availability as well as load balancing.
To create a cluster we need to configure BI server instances to use a common data source to store configurations. I configured the following setup for this.



Follow these steps to create the cluster.

  1. Install MySQL servers and setup master master replication.
  2. Make sure you have installed Oracle Java 7 in all nodes. Using other java versions will cause runtime errors.
  3. Follow this document to to install a CDA instance. Make sure to follow the document named "Install with Your Own BA Repository" and follow the configurations related to MySQL. 
  4. Start the server and install all the components needed.
  5. Modify the cluster documentation as mentioned in this document. https://help.pentaho.com/Documentation/5.2/0P0/000/060
    1. This document is missing the information related Quartz clustering with MySQL. Only PostgresQL configuration is there. Use the following configuration instead.
      1. #_replace_jobstore_properties
        org.quartz.jobStore.misfireThreshold = 60000
        org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
        org.quartz.jobStore.useProperties = false
        org.quartz.jobStore.dataSource = myDS
        org.quartz.jobStore.tablePrefix = QRTZ5_
        org.quartz.jobStore.isClustered = true
        org.quartz.jobStore.clusterCheckinInterval = 20000
    2. When configuring Jackrabbit clustering replace unique ID  in<Cluster id="Unique_ID"> with a ID like CDA1.
  6. Recompress the CDA folder and copy it to all other nodes.
  7. Extract the file and replace unique ID  in<Cluster id="Unique_ID"> accordingly. eg- CDA2, CDA3
  8. Start each node and make sure there are no error logs in tomcat/logs/pentaho.log.
  9. Make sure all CDA changes are replicated between cluster nodes.
  10. When configure the ELB make sure to enable sticky sessions. http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_StickySessions.html.
  11. Alerts should be set to monitor CDA instances as well as MySQL replication.

Sunday, February 15, 2015

Deploying a HA Redis setup



Sentinel process takes the responsibility of electing a slave as master if a failure occurs. For more information refer  this.
  • Install Redis in each node. Following methods can be used to install Redis.
    1. Using Ubuntu repositories. 
      1. sudo apt-get install redis-server
    2. Manual installation
      1. You can download a Redis distribution from this page http://redis.io/download. Follow the instructions on this page to setup Redis using the downloaded setup https://www.digitalocean.com/community/tutorials/how-to-install-and-use-redis
  • Set requirepass property to set the password in the configuration file in /etc/redis. Note that this should be same in all nodes.
  • Set Up replication
    1. Set the following properties to set replication on slaves.
      1. slaveof <masterip> <masterport>
      2. masterauth <master-password> //The same password we used previously
    2. Set masterauth property also in the master in case of master goes down and later joins as a slave (Once a slave is elected as a master).
  • To test whether the replication is working, log in to redis console of the master using redis-cli command and add a data. Log in to all slaves and see the entered data is there.
  • Setup sentinel in each node
    1. create a file named sentinel.conf where your redis configurations exists.
    2. add the following content to sentinel.conf (change values according to your setup). 
      1. sentinel monitor mymaster <ip> <port> 2
        sentinel down-after-milliseconds mymaster 60000
        sentinel failover-timeout mymaster 180000
        sentinel parallel-syncs mymaster 1
        It tells Redis Sentinel that the master is <ip>, the master's name is "mymaster", and  start failover if more than two Redis Sentinel has detected the master failed.
  • Start sentinel process in each node using the following command.
    sudo redis-sentinel <path to the configuration file> &
    To stop a sentinel process use the following command.
    sudo redis-sentinel <path to the configuration file> shutdown
  • To test failover you can kill master process and see still you can write to the cluster using your client application. Not that when master changes sentinel will rewrite the redis configurations in each node. To get back to the original state you will have to revert back the changes.

  • How to access above setup programmatically.

    I will be using Java with Jedis library for this example. 
    HashSet<String> sentinels = new HashSet<>();
    JedisPoolConfig jedisPoolConfig = getJedisPoolConfig();
    String[] nodes = //This should contain addresses to sentinel processes;
    for (String node : nodes) {
        sentinels.add(node);
    }
    pool = new JedisSentinelPool(getConfig(sentinelMasterKey), sentinels, jedisPoolConfig);
    Jedis jedis = pool.getResource();
    jedis.auth(redisPassword);
    jedis.set(key, value);

    private JedisPoolConfig getJedisPoolConfig() { JedisPoolConfig jedisPoolConfig = new JedisPoolConfig(); if (application().configuration().getInt("redis.pool.maxIdle") != null) { jedisPoolConfig.setMaxIdle(application().configuration().getInt("redis.pool.maxIdle")); } if (application().configuration().getInt("redis.pool.minIdle") != null) { jedisPoolConfig.setMinIdle(application().configuration().getInt("redis.pool.minIdle")); } if (application().configuration().getInt("redis.pool.maxTotal") != null) { jedisPoolConfig.setMaxTotal(application().configuration().getInt("redis.pool.maxTotal")); } if (application().configuration().getInt("redis.pool.maxWaitMillis") != null) { jedisPoolConfig.setMaxWaitMillis(application().configuration().getInt("redis.pool.maxWaitMillis")); } if (application().configuration().getBoolean("redis.pool.testOnBorrow") != null) { jedisPoolConfig.setTestOnBorrow(application().configuration().getBoolean("redis.pool.testOnBorrow")); } if (application().configuration().getBoolean("redis.pool.testOnReturn") != null) { jedisPoolConfig.setTestOnReturn(application().configuration().getBoolean("redis.pool.testOnReturn")); } if (application().configuration().getBoolean("redis.pool.testWhileIdle") != null) { jedisPoolConfig.setTestWhileIdle(application().configuration().getBoolean("redis.pool.testWhileIdle")); } if (application().configuration().getLong("redis.pool.timeBetweenEvictionRunsMillis") != null) { jedisPoolConfig.setTimeBetweenEvictionRunsMillis(application().configuration().getLong("redis.pool.timeBetweenEvictionRunsMillis")); } if (application().configuration().getInt("redis.pool.numTestsPerEvictionRun") != null) { jedisPoolConfig.setNumTestsPerEvictionRun(application().configuration().getInt("redis.pool.numTestsPerEvictionRun")); } if (application().configuration().getLong("redis.pool.minEvictableIdleTimeMillis") != null) { jedisPoolConfig.setMinEvictableIdleTimeMillis(application().configuration().getLong("redis.pool.minEvictableIdleTimeMillis")); } if (application().configuration().getLong("redis.pool.softMinEvictableIdleTimeMillis") != null) { jedisPoolConfig.setSoftMinEvictableIdleTimeMillis(application().configuration().getLong("redis.pool.softMinEvictableIdleTimeMillis")); } if (application().configuration().getBoolean("redis.pool.lifo") != null) { jedisPoolConfig.setLifo(application().configuration().getBoolean("redis.pool.lifo")); } if (application().configuration().getBoolean("redis.pool.blockWhenExhausted") != null) { jedisPoolConfig.setBlockWhenExhausted(application().configuration().getBoolean("redis.pool.blockWhenExhausted")); } return jedisPoolConfig; }

    Thursday, February 5, 2015

    Bind a remote server's port to a local port

    If you have a remote server (say in Amazon EC2) you might want to access a particular port of that server. But there can be situations where it is not that port is not globally open. If you do not want to bother making it globally open you can use it by binding it to a local port of your workstation via ssh. Following command will bind port 9000 of remote machine to your local port 8000.  As an example if it is web server you can easily access it by typing localhost:8000 in your web browser.

    ssh -L 8000:localhost:9000 username@host

    Sunday, September 7, 2014

    Implementing session timeout in playframework

    According to play documentation "There is no technical timeout for the Session. It expires when the user closes the web browser. If you need a functional timeout for a specific application, just store a timestamp into the user Session and use it however your application needs (e.g. for a maximum session duration, maximum inactivity duration, etc.)."

    So I used the following way to implement a session timeout. Following custom authenticator class was used to implement this.

      public class ValidateUserSessionAction extends Security.Authenticator{
    
        @Override
        public String getUsername(Http.Context ctx) {
            long currentTime=System.currentTimeMillis();
            long timeOut=Long.parseLong(Play.application().configuration().getString("sessionTimeout")) * 1000 * 60;
            String temp=ctx.session().get(Constants.LAST_SEEN_KEY);
            if (temp == null) {
                temp = String.valueOf(currentTime);
            }
            if((currentTime-Long.parseLong(temp))<timeOut) {
                //If multiple instances are running, time should be synchronized between nodes
                ctx.session().put(Constants.LAST_SEEN_KEY, String.valueOf(System.currentTimeMillis()));
                return ctx.session().get(Constants.SESSION_USER_KEY);
            }else{
                ctx.session().clear();
                return null;
            }
        }
    
        @Override
        public Result onUnauthorized(Http.Context ctx) {
            return redirect(controllers.routes.UserController.signIn());
        }
    
        
    }
    

    Above authenticator class can be used to validate user actions like below.
     
     @Security.Authenticated(ValidateUserSessionAction.class)
        public static F.Promise<result< updateEmail() {
            //do something
            return ok();
        }
    

    Saturday, September 6, 2014

    Optimizing Apache Storm deployment

    Recently we happen to run some pretty large Storm topologies in a Storm cluster which runs on Linux. When we running it there were two main issues occurred due to system limitations. First one was logged in Storm logs as,

    “java.lang.OutOfMemoryError : unable to create new native Thread”.


    We fixed this problem by increasing the Ulimit for Storm. Usually storm spawns processes with an user named Storm. So we have to increase the Ulimit for storm user. You can see the ulimits using command "ulimit -u". To increase the ulimits you can follow an approach like this.

    The second problem was communication link failures between Storm nodes as well as communication link failures between other services (e.g. external APIs, databases, etc). To resolve this problem we had to enable tcp time wait reuse and also we increased the port range for tcp. hat can be done in following manner.

     Put these to /etc/sysctl.conf file and issue 'sysctl -f'
        net.ipv4.tcp_tw_reuse = 1
        net.ipv4.ip_local_port_range = 18000    65000

    Note that we have to apply this setting for every node in Storm cluster. In addition to there were some errors like below due to Netty timeouts (Storm uses Netty as underlying messaging layer).

    java.lang.RuntimeException: java.lang.RuntimeException: Client is being closed, and does not take requests any more
    at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:107) ~[storm-core-0.9.1.2.1.4.0-632.jar:0.9.1.2.1.4.0-632]
    at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:78)

    This can be corrected by increasing netty timeout values in configurations.

    Friday, June 13, 2014

    Introduction to Play-Framework modules

    Play framework inherently support modularization. In other words we can develop play modules and reuse them in different play applications. This article provides a good guide on how to do that. But that is little bit outdated for latest Play distributions. I will describe the changes which is needed to be done to create a play module on latest play versions.
    Playframework no longer has a play console. Instead of that it uses Typesafe activator. So you need to download activator and add activator to your environment path.
    You can create boilerplate code for a play app using the  template called Just play java. In my case I needed to create a authentication module. So I created a Play action called auth in controllers package. Then to publish the module go the project directory and issue clean command. Then issue publish-local command. If it is successful you will get a output like this.

    [info] published ivy to /home/prabhath/.ivy2/local/authmodule/authmodule_2.10/1.0-SNAPSHOT/ivys/ivy.xml
    [success] Total time: 15 s, completed Jun 13, 2014 10:49:13 AM

    Then create a new application which will use the previously created auth module. To add the dependency to new project update the build.sbt file. Content of my build.sbt file is like below.

    name := """Test"""

    version := "1.0-SNAPSHOT"

    libraryDependencies ++= Seq(
      javaCore,  // The core Java API
      "junit" % "junit" % "4.8.2",
       "authmodule"%"authmodule_2.10"%"1.0-SNAPSHOT"
    )

    play.Project.playJavaSettings

    Next you have to tell the path of your local repository to the application. To do that edit plugins.sbt file in /<your project>/project directory. I added the following line to plugins.sbt file.

    resolvers += "Local Play Repository" at "file://home/prabhathp/.ivy2/local/"

    Now build the project and you will be able to use the classes in your module/controllers package inside classes in your controllers. Note that you do not need to add imports when reusing module components.