[haskell-llvm] target dependent vector operations

Thu May 24 11:59:30 BST 2012

  I am currently using my llvm-extra package for target dependent vector 
operations. E.g. I provide a 'max' operation for vectors. If SSE is 
present and a 'max' of Float vectors is required, then this operation is 
split into 'maxps' calls on chunks of the vector. If AVX is present and 
the vector is large enough, then 'vmaxps' should be used. More variants 
exist for Double, Int8, Int16, Int32. If no native 'max' instruction is 
available then compose the function from 'neg', 'cmp', 'select'.
  It works somehow but I got the impression that I am not following the 
correct way.
  1. I use the cpuid instruction for detection of available features. This 
will fail for cross-compilation. Unfortunately I have not access to LLVM's 
Subtarget detection via the C interface.
  2. Even if I succeed to access the Subtarget detection via C++ glue code, 
I think that Subtargets specific code should be inserted at a later step. 
Currently I insert the target specific intrinsics when generating the LLVM 
code, that is, before optimization and code generation. However LLVM 
already does target specific code generation for the built-in instructions 
like 'fadd' and common intrinsics like 'exp'. I like to insert my 
enhancements at the same place or at least close to them. I think this 
must be after optimization and before or within code generation.
  3. I think many applications including non-Haskell applications could 
benefit from such optimizations. One could try to add more intrinsics to 
the core LLVM library. But it sounds more reasonable to write some stuff 
that can be plugged in by programmers who really do vector computations. 
By including such an extended vector operation collection a set of generic 
intrinsics like 'max' should become available that are then optimally 
encoded to the target architecture.

Are my concerns correct and is writing an LLVM pass the right answer? If 
yes, has someone experiences with writing an LLVM pass in Haskell and 
inserting it in the compiler pipeline? How to cope with the DAG structure? 
I know there was a GSoC proposal for it 2010 but did it come reality? Or 
is there a tutorial for writing a simple LLVM pass at all?