CruiseControl + Parallel + Remoteant + Daemons = Joy

Peter Davison has a post describing how to do Distributed Builds with Ant and Cruisecontrol.

On my recent project, we used this strategy (i.e. use the ‘remoteant’ Ant task to call multiple slave build boxes from a master CruiseControl build box). After upgrading to CruiseControl 2.6.2, I’ve had to reboot the master build box every day.

The problem is that, with CruiseControl, if you use the ‘parallel’ task (and we were, via Ant-Contrib’s ‘foreach’ task) to startup multiple threads, and then those threads call another Ant task that starts up another thread (in this case, ‘remoteant’), CruiseControl hangs intermittently. It hangs, because it’s waiting for the Ant target to complete, which only occurs when all the threads started by the ‘parallel’ task complete, which, in this scenario, just doesn’t happen sometimes.

What’s the solution? Instead of using the ‘foreach’ task, use the ‘for’ task with ‘parallel’ and ‘daemons’. This will allow the top-level Ant target to complete, regardless of whether the threads created by ‘parallel’ have completed. Below are some excerpts of the build file on the master build box that illustrate this approach:

<target name="build">
  <!-- ... -->

  <!-- Fire off the appropriate build on each build machine -->
  <for param="podfile">
    <path>
      <fileset dir="${basedir}/pods" includes="*.properties"/>
    </path>
    <sequential>
      <antcall target="delete_semaphore_file">
        <param name="podfile" value="@{podfile}"/>
      </antcall>
      <parallel>
        <daemons>
          <antcall target="run_remote_ant_script">
            <param name="podfile" value="@{podfile}"/>
          </antcall>
        </daemons>
      </parallel>
    </sequential>
  </for>

  <!-- Wait for build completion -->
  <foreach param="podfile" parallel="true" target="wait_for_build_completion" inheritall="true">
    <path>
      <fileset dir="${basedir}/pods" includes="*.properties"/>
    </path>
  </foreach>

  <!-- ... -->
</target>

<target name="delete_semaphore_file">
  <basename property="name" file="${podfile}" suffix=".properties"/>
  <property name="semaphore" value="${pod.log.dir}/${name}-complete.txt"/>
  <delete file="${semaphore}"/>
</target>

<!-- Run Build on remote machine -->
<target name="run_remote_ant_script">
  <!-- ... -->

  <trycatch property="exception">
    <try>
      <remoteant machine="${name}">
        <runtarget target="run_build">
          <!-- ... -->
        </runtarget>
      </remoteant>
    </try>
    <catch>
      <antcall target="fail_build">
        <!-- ... -->
      </antcall>
    </catch>
  </trycatch>
</target>

<target name="wait_for_build_completion">
  <basename property="name" file="${podfile}" suffix=".properties"/>
  <property file="${podfile}"/>
  <property name="semaphore" value="${pod.log.dir}/${name}-complete.txt"/>

  <waitfor timeoutproperty="build.timed.out" maxwaitunit="${max.wait.unit}" maxwait="${max.wait.time}" checkeveryunit="second" checkevery="1">
    <available file="${semaphore}"/>
  </waitfor>

  <!-- ... -->
</target>
It's only fair to share...
Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Leave a Reply