[mlpack] Regarding profiling for parallelization

Ryan Curtin ryan at ratml.org
Thu Mar 30 17:06:17 EDT 2017

On Fri, Mar 31, 2017 at 12:43:04AM +0530, Shikhar Bhardwaj wrote:
> Hi Yannis,
> Thanks a lot for the detailed reply.
> I had given a thought to the problem of over subscription of hardware
> threads during testing if we were to test already parallelized methods in
> parallel. In the model that I am proposing, each test suite would have a
> different executable, which can be run in parallel with other test suites,
> using ctest. ctest has the option to declare a test to not be run in
> parallel with any other test, called RUN_SERIAL
> <https://cmake.org/cmake/help/v2.8.12/cmake.html#prop_test:RUN_SERIAL>. In
> this way, we could declare already parallelized methods to not be run in
> parallel with other tests.

Hm, we would have to mark tests that are parallelized somehow.  Right
now I think this is only the LSH and DET tests.  That isn't impossible
of course; maybe the Boost Unit Test Framework has some support for
something like this, or possibly we can graft something on top of it to
work with Marcus' code to extract the test names.

Thanks for the clarification on the MapReduce paper---you are right that
OpenMP could be used for a lot of the implementations to accomplish the
same goal.  We should just be careful to ensure that what we do doesn't
make the code difficult to understand (though, lucky for us, OpenMP is
pretty easy to understand).  The reason for this is that mlpack already
stands at the intersection of modern C++ and machine learning, which is
a pretty small niche---so already not very many people can contribute
effectively; making that the intersection of modern C++, machine
learning, and parallel programming makes it an even smaller niche.
Parallelism is definitely not bad---it's great!---but we just have to be
somewhat cautious of how much we require developers to know to be able
to contribute.

Hence, I have tried to focus on "simple" parallelism like OpenMP (though
there is still a lot of OpenMP work to be done). It still gives pretty
great speedup even if it is not crazy MPI code. :)

Ryan Curtin    | "I love it when a plan comes together."
ryan at ratml.org |   - Hannibal Smith

More information about the mlpack mailing list