class Bones::Engine

This class holds the main functionality: the Bones source- to-source compilation engine based on algorithmic skeletons. This class processes command line arguments, makes calls to the Bones preprocessor and the CAST gem, analyzes the source code, performs source transformations, instantiates the skeletons, and finally writes output code to file.

Constants

BONES_DIR_SKELETONS

Locate the skeletons directory.

COMMON_FILES

A list of files to be found in the common directory of the skeleton library (excluding timer files).

COMMON_GLOBALS

The name of the file containing the globals as found in the skeleton library

COMMON_GLOBALS_KERNEL

The name of the file containing the globals for the kernel files as found in the skeleton library

COMMON_HEADER

The name of the file containing the header file for the original C code as found in the skeleton library

COMMON_SCHEDULER

The name of the file containing the scheduler code

GLOBAL_TIMERS

Global timers

OUTPUT_DEVICE

The suffix added to the generated output file for the device file. See also OUTPUT_HOST.

OUTPUT_HOST

The suffix added to the generated output file for the host file. See also OUTPUT_DEVICE.

OUTPUT_VERIFICATION

The suffix added to the generated verification file. See also OUTPUT_DEVICE and OUTPUT_HOST.

SKELETON_DEVICE

The extension of a device file in the skeleton library. See also SKELETON_HOST.

SKELETON_FILE

Set the name of the transformations file as found in the skeleton library.

SKELETON_HOST

The extension of a host file in the skeleton library. See also SKELETON_DEVICE.

TIMER_FILES

A list of timer files to be found in the skeleton library.

Public Class Methods

new() click to toggle source

Initializes the engine and processes the command line arguments. This method uses the ‘trollop’ gem to parse the arguments and to create a nicely formatted help menu. This method additionally initializes a result-hash and reads the contents of the source file from disk.

Command-line usage:

bones --application <input> --target <target> [OPTIONS]

Options:

--application, -a <s>:   Input application file
     --target, -t <s>:   Target processor (choose from: 'GPU-CUDA','GPU-OPENCL-AMD','CPU-OPENCL-INTEL','CPU-OPENCL-AMD','CPU-OPENMP','CPU-C')
   --measurements, -m:   Enable/disable timers
        --version, -v:   Print version and exit
           --help, -h:   Show this message
    # File lib/bones/engine.rb
 60 def initialize
 61         @result = {:original_code            => [],
 62                    :header_code              => [],
 63                    :host_declarations        => [],
 64                    :host_code_lists          => [],
 65                    :algorithm_declarations   => [],
 66                    :algorithm_code_lists     => [],
 67                    :verify_code              => [],
 68                    :host_device_mem_globals  => []}
 69         @state = 0
 70         
 71         # Provides a list of possible targets (e.g. GPU-CUDA, 'CPU-OPENCL-INTEL').
 72         targets = []
 73         Dir[File.join(BONES_DIR_SKELETONS,'*')].each do |entry|
 74                 if (File.directory?(entry)) && !(entry =~ /verification/)
 75                         targets.push(File.basename(entry))
 76                 end
 77         end
 78         targets = targets.sort
 79         
 80         # Parse the command line options using the 'trollop' gem.
 81         pp_targets = targets.inspect.gsub(/("|\[)|\]/,'')
 82         @options = Trollop::options do
 83                 version 'Bones '+File.read(BONES_DIR+'/VERSION').strip+' (c) 2012 Cedric Nugteren, Eindhoven University of Technology'
 84                 banner  NL+'Bones is a parallelizing source-to-source compiler based on algorithmic skeletons. ' +
 85                         'For more information, see the README.rdoc file or visit the Bones website at http://parse.ele.tue.nl/bones/.' + NL + NL +
 86                         'Usage:' + NL +
 87                         '    bones --application <input> --target <target> [OPTIONS]' + NL +
 88                         'using the following flags:'
 89                 opt :application,     'Input application file',                               :short => 'a', :type => String
 90                 opt :target,          'Target processor (choose from: '+pp_targets+')',       :short => 't', :type => String
 91                 opt :measurements,    'Enable/disable timers',                                :short => 'm', :default => false
 92                 opt :verify,          'Verify correctness of the generated code',             :short => 'c', :default => false
 93                 opt :only_alg_number, 'Only generate code for the x-th species (99 -> all)',  :short => 'o', :type => Integer, :default => 99
 94                 opt :merge_factor,    'Thread merge factor, default is 1 (==disabled)',       :short => 'f', :type => Integer, :default => 0
 95                 opt :register_caching,'Enable register caching: 1:enabled (default), 0:disabled',      :short => 'r', :type => Integer, :default => 1
 96                 opt :zero_copy       ,'Enable OpenCL zero-copy: 1:enabled (default), 0:disabled',      :short => 'z', :type => Integer, :default => 1
 97                 opt :skeletons       ,'Enable non-default skeletons: 1:enabled (default), 0:disabled', :short => 's', :type => Integer, :default => 1
 98         end
 99         Trollop::die 'no input file supplied (use: --application)'              if !@options[:application_given]
100         Trollop::die 'no target supplied (use: --target)'                       if !@options[:target_given]
101         Trollop::die 'input file "'+@options[:application]+'" does not exist'   if !File.exists?(@options[:application])
102         Trollop::die 'target not supported, supported targets are: '+pp_targets if !targets.include?(@options[:target].upcase)
103         @options[:name] = File.basename(@options[:application], ".*")
104         @options[:target] = @options[:target].upcase
105         
106         # Extension for the host files corresponding to the target.
107         @extension = File.extname(Dir[File.join(BONES_DIR_SKELETONS,@options[:target],'common','*')][0])
108         
109         # Extension for the device files corresponding to the target.
110         @algorithm_extension = File.extname(Dir[File.join(BONES_DIR_SKELETONS,@options[:target],'kernel','*.kernel.*')][0])
111         
112         # Set a prefix for functions called from the original file but defined in a host file
113         @prefix = (@options[:target] == 'GPU-CUDA') ? '' : ''
114         
115         # Setting to include the scheduler (CUDA only)
116         @scheduler = (@options[:target] == 'GPU-CUDA') ? true : false
117         
118         # Skip analyse passes for certain targets
119         @skiptarget = false #(@options[:target] == 'PAR4ALL') ? true : false
120         
121         # Set the location for the skeleton library
122         @dir = {}
123         @dir[:library] = File.join(BONES_DIR_SKELETONS,@options[:target])
124         @dir[:skeleton_library] = File.join(@dir[:library],'kernel')
125         @dir[:common_library] = File.join(@dir[:library],'common')
126         @dir[:verify_library] = File.join(BONES_DIR_SKELETONS,'verification')
127         
128         # Obtain the source code from file
129         @source = File.open(@options[:application],'r'){|f| f.read}
130         @basename = File.basename(@options[:application],'.c')
131 end

Public Instance Methods

process() click to toggle source

Method to process a file and to output target code. This method calls all relevant private methods.

Tasks:

  • Run the preprocessor to obtain algorithm information.

  • Use the ‘CAST’ gem to parse the source into an AST.

  • Call the code generator to perform the real work and produce output.

    # File lib/bones/engine.rb
140 def process
141         
142         # Run the preprocessor
143         preprocessor = Bones::Preprocessor.new(@source,File.dirname(@options[:application]),@basename,@scheduler)
144         preprocessor.process
145         @result[:header_code] = preprocessor.header_code
146         @result[:device_header] = preprocessor.device_header
147         @result[:header_code] += '#include <sys/time.h>'+NL if @options[:measurements]
148         
149         # Parse the source code into AST
150         parser = C::Parser.new
151         parser.type_names << 'FILE'
152         parser.type_names << 'size_t'
153         ast = parser.parse(preprocessor.target_code)
154         ast.preprocess
155 
156         # Add the scheduler's global code
157         if @scheduler
158                 @result[:host_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_SCHEDULER+@extension)))
159         end
160         
161         # Set the algorithm's skeleton and generate the global code
162         one_time = true
163         preprocessor.algorithms.each_with_index do |algorithm,algorithm_number|
164                 algorithm.species.set_skeleton(File.join(@dir[:library],SKELETON_FILE))
165                 if @options[:skeletons] == 0
166                         algorithm.species.skeleton_name = 'default'
167                         algorithm.species.settings.gsub!('10','00').gsub!('20','00').gsub!('30','00')
168                 end
169                 if algorithm.species.skeleton_name && one_time
170                         @result[:host_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_GLOBALS+@extension)))
171                         @result[:algorithm_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_GLOBALS_KERNEL+@extension)))
172                         one_time = false
173                 end
174         end
175         
176         # Perform code generation (per-species code)
177         @result[:original_code] = ast
178         arrays = []
179         preprocessor.algorithms.each_with_index do |algorithm,algorithm_number|
180                 if @options[:only_alg_number] == 99 || algorithm_number == [@options[:only_alg_number],preprocessor.algorithms.length-1].min
181                         puts MESSAGE+'Starting code generation for algorithm "'+algorithm.name+'"'
182                         if algorithm.species.skeleton_name
183                                 algorithm.merge_factor = @options[:merge_factor] if (@options[:target] == 'GPU-CUDA')
184                                 algorithm.register_caching_enabled = @options[:register_caching]
185                                 algorithm.set_function(ast)
186                                 algorithm.populate_variables(ast,preprocessor.defines) if !@skiptarget
187                                 algorithm.populate_lists()
188                                 algorithm.populate_hash() if !@skiptarget
189                                 generate(algorithm)
190                                 puts MESSAGE+'Code generated using the "'+algorithm.species.skeleton_name+'" skeleton'
191                                 arrays.concat(algorithm.arrays)
192                         else
193                                 puts WARNING+'Skeleton "'+algorithm.species.name+'" not available'
194                         end
195                 end
196         end
197         
198         # Only if the scheduler is included
199         if @scheduler
200         
201                 # Perform code generation (sync statements)
202                 @result[:host_declarations].push('void bones_synchronize(int bones_task_id);')
203                 
204                 # Perform code generation (memory allocs)
205                 allocs = []
206                 preprocessor.copies.each do |copy|
207                         name_scop = Set.new([copy.name, copy.scop])
208                         if !allocs.include?(name_scop)
209                                 generate_memory('alloc',copy,arrays,0)
210                                 allocs << name_scop
211                         end
212                 end
213                 
214                 # Perform code generation (memory copies)
215                 preprocessor.copies.each_with_index do |copy,index|
216                         #puts MESSAGE+'Generating copy code for array "'+copy.name+'"'
217                         generate_memory('copy',copy,arrays,index)
218                 end
219                 
220                 # Perform code generation (memory frees)
221                 frees = []
222                 preprocessor.copies.each do |copy|
223                         name_scop = Set.new([copy.name, copy.scop])
224                         if !frees.include?(name_scop)
225                                 generate_memory('free',copy,arrays,0)
226                                 frees << name_scop
227                         end
228                 end
229         
230         end
231         
232 end
write_output() click to toggle source

This method writes the output code to files. It creates a new directory formatted as ‘name_target’ and produces three files.

Output files:

  • main - a file containing the original code with function calls substituting the original algorithms.

  • target - a file containing the host code for the target.

  • kernel - a file containing the kernel code for the target.

    # File lib/bones/engine.rb
242 def write_output
243         
244         # Create a new directory for the output
245         directory = @options[:application].rpartition('.').first+'_'+@options[:target]
246         Dir.mkdir(directory,0744) unless File.directory?(directory)
247         
248         parser = C::Parser.new
249         parser.type_names << 'FILE'
250         parser.type_names << 'size_t'
251         
252         # Populate the main file
253         File.open(File.join(directory,@options[:application].split(File::SEPARATOR).last),'w') do |main|
254                 main.puts '#include <string.h>' if @options[:verify]
255                 main.puts @result[:header_code]
256                 main.puts File.read(File.join(@dir[:common_library],COMMON_HEADER+@extension))
257                 main.puts @result[:host_declarations]
258                 main.puts
259                 begin
260                         main.puts parser.parse(@result[:original_code]).to_s
261                 rescue
262                         puts WARNING+'Recovering from CAST parse error'
263                         main.puts parser.parse(@result[:original_code].clone).to_s
264                 end
265         end
266         
267         # Populate the verification file
268         if @options[:verify]
269                 File.open(File.join(directory,@options[:name]+OUTPUT_VERIFICATION+@extension),'w') do |verification|
270                         verification.puts @result[:header_code]
271                         verification.puts File.read(File.join(@dir[:verify_library],'header.c'))
272                         verification.puts
273                         verification.puts @result[:verify_code]
274                 end
275         end
276         
277         # Populate the target file (host)
278         
279         File.open(File.join(directory,@options[:name]+OUTPUT_HOST+@extension),'w') do |target|
280                 target.puts '#include <cuda_runtime.h>'+NL if @options[:target] == 'GPU-CUDA'
281                 target.puts "#define ZEROCOPY 0"+NL if @options[:zero_copy] == 0 && @options[:target] == 'CPU-OPENCL-INTEL'
282                 target.puts "#define ZEROCOPY 1"+NL if @options[:zero_copy] == 1 && @options[:target] == 'CPU-OPENCL-INTEL'
283                 target.puts @result[:header_code]
284                 target.puts
285                 target.puts @result[:host_device_mem_globals].uniq
286                 target.puts
287                 target.puts @result[:algorithm_declarations]
288                 target.puts @result[:host_code_lists]
289                 target.puts
290                 target.puts File.read(File.join(@dir[:common_library],GLOBAL_TIMERS+@extension))
291         end
292         
293         # Populate the algorithm file (device)
294         File.open(File.join(directory,@options[:name]+OUTPUT_DEVICE+@algorithm_extension),'w') do |algorithm|
295                 algorithm.puts @result[:device_header]
296                 algorithm.puts @result[:algorithm_code_lists]
297         end
298         
299 end

Private Instance Methods

generate(algorithm) click to toggle source

This method takes as an input an indivual algorithm and generates the corresponding output code. The method first creates a search-and-replace hash, after which it instan- tiates a skeleton.

This method returns a message informing the user whether the code was succesfully generated or the skeleton was not available.

    # File lib/bones/engine.rb
312 def generate(algorithm)
313         
314         # Determine the skeleton filenames and load them skeletons from the skeleton library
315         file_name_host = File.join(@dir[:skeleton_library],algorithm.species.skeleton_name+SKELETON_HOST)
316         file_name_device = File.join(@dir[:skeleton_library],algorithm.species.skeleton_name+SKELETON_DEVICE)
317         if !File.exists?(file_name_host+@extension) || !File.exists?(file_name_device+@algorithm_extension)
318                 raise_error('Skeleton files for skeleton "'+algorithm.species.skeleton_name+'" not available')
319         end
320         skeletons = {:host   => File.read(file_name_host+@extension),
321                      :device => File.read(file_name_device+@algorithm_extension)}
322         
323         # Perform the transformations on the algorithm's code
324         algorithm.perform_transformations(algorithm.species.settings) if !@skiptarget
325         
326         # Load the common skeletons from the skeleton library
327         COMMON_FILES.each do |skeleton|
328                 skeletons[skeleton.to_sym] = File.read(File.join(@dir[:common_library],skeleton+@extension))
329         end
330         
331         # Load the timer code from the skeleton library (only if the '--measurements' flag is given)
332         TIMER_FILES.each do |skeleton|
333                 skeletons[skeleton.to_sym] = @options[:measurements] ? File.read(File.join(@dir[:common_library],skeleton+@extension)) : ''
334         end
335         
336         # Perform search-and-replace on the device skeleton
337         search_and_replace!(algorithm.hash,skeletons[:device])
338         skeletons[:device].remove_extras
339         
340         # Replace mathematical functions with their equivalent device functions
341         if @options[:target] == 'GPU-CUDA'
342                 math_functions = {:sqrt => 'sqrtf', :max  => 'fmaxf', :min  => 'fminf'}
343                 math_functions.each do |original, replacement|
344                         skeletons[:device].gsub!(/\b#{original}\(/,replacement+'(')
345                 end
346         end
347         
348         # Create the algorithm declaration list from the header supplied in the skeletons
349         algorithm_declaration = skeletons[:device].scan(/#{START_DEFINITION}(.+)#{END_DEFINITION}/m).join.strip.remove_extras
350         @result[:algorithm_declarations].push(algorithm_declaration)
351         
352         # Remove the (commented) algorithm declaration from the code and push the skeleton to the output
353         @result[:algorithm_code_lists].push(skeletons[:device].gsub!(/#{START_DEFINITION}(.+)#{END_DEFINITION}/m,''))
354         
355         # Setup some variables to create the host body function including memory allocation and memory copies
356         processed = {:mem_prologue => '', :mem_copy_H2D => '', :mem_copy_D2H => '', :mem_epilogue => ''}
357         counter = {:out => 0, :in => 0}
358         
359         # Iterate over all the array variables and create a mini-search-and-replace hash for each array (all arrays)
360         algorithm.arrays.each_with_index do |array, arrayid|
361                 minihash = { :array               => array.name,
362                              :type                => array.type_name,
363                              :flatten             => array.flatten,
364                              :variable_dimensions => array.size.join('*'),
365                              :state               => @state.to_s}
366                 @state += 1
367                 
368                 # Apply the mini-search-and-replace hash to create the memory allocations, memory copies (if input only), etc.
369                 processed[:mem_prologue] += search_and_replace(minihash,skeletons[:mem_prologue])
370                 processed[:mem_copy_H2D] += search_and_replace(minihash,skeletons[:mem_copy_H2D]) if array.input? || array.species.shared?
371                 processed[:mem_epilogue] += search_and_replace(minihash,skeletons[:mem_epilogue])
372         
373                 # Add the device declarations
374                 @result[:host_device_mem_globals].push(search_and_replace(minihash,skeletons[:mem_global]))
375         end
376         
377         # Iterate over all the array variables and create a mini-search-and-replace hash for each array (output arrays)
378         algorithm.arrays.select(OUTPUT).each_with_index do |array, num_array|
379                 hash = algorithm.hash["out#{num_array}".to_sym]
380                 minihash = { :array               => array.name,
381                              :type                => array.type_name,
382                              :flatten             => array.flatten,
383                              :offset              => '('+hash[:dimension0][:from]+')',
384                              :variable_dimensions => '('+hash[:dimensions]+')',
385                              :state               => @state.to_s}
386                 @state += 1
387                 
388                 # Perform selective copy for arrays with 2 dimensions (uses a for-loop over the memory copies)
389                 if array.dimensions == 2 && @options[:target] == 'GPU-CUDA' && false
390                         x_from = '('+hash[:dimension0][:from]+')'
391                         x_to   = '('+hash[:dimension0][:to]+')'
392                         x_sum  = '('+hash[:dimension0][:sum]+')'
393                         x_size = array.size[0]
394                         y_from = '('+hash[:dimension1][:from]+')'
395                         y_to   = '('+hash[:dimension1][:to]+')'
396                         y_sum  = '('+hash[:dimension1][:sum]+')'
397                         y_size = array.size[1]
398                         processed[:mem_copy_D2H] += NL+INDENT+"for(int bones_x=#{x_from}; bones_x<=#{x_to}; bones_x++) {"+INDENT*2
399                         minihash[:offset] = "(bones_x*#{y_size})+#{y_from}"
400                         minihash[:variable_dimensions] = "#{y_sum}"
401                 # Don't do selective copy for multi-dimensional arrays (yet)
402                 elsif array.dimensions > 1
403                         minihash[:offset] = '0'
404                         minihash[:variable_dimensions] = array.size.join('*')
405                 end
406                 
407                 # Apply the mini-search-and-replace hash to create the memory copies from device to host
408                 processed[:mem_copy_D2H] += search_and_replace(minihash,skeletons[:mem_copy_D2H])
409                 if array.dimensions == 2 && @options[:target] == 'GPU-CUDA' && false
410                         processed[:mem_copy_D2H] += INDENT+'}'
411                 end
412         end
413         
414         # Apply the search-and-replace hash to all timer skeletons and the host skeleton
415         (['host']+TIMER_FILES).each do |skeleton|
416                 search_and_replace!(algorithm.hash,skeletons[skeleton.to_sym])
417         end
418         
419         # Repair some invalid syntax that could have been introduced by performing the search-and-replace
420         skeletons[:host].remove_extras
421         
422         # Run the prologue/epilogue code through the search-and-replace hash
423         search_and_replace!(algorithm.hash,skeletons[:prologue])
424         search_and_replace!(algorithm.hash,skeletons[:epilogue])
425         
426         # Construct the final host function, inluding the timers and memory copies
427         if @scheduler
428                 host = skeletons[:prologue     ] + 
429                        skeletons[:timer_2_start] + skeletons[:host         ] + skeletons[:timer_2_stop ] +
430                        skeletons[:epilogue     ]
431         else
432                 host = skeletons[:prologue     ] + 
433                        skeletons[:timer_1_start] + processed[:mem_prologue ] + processed[:mem_copy_H2D ] +
434                        skeletons[:timer_2_start] + skeletons[:host         ] + skeletons[:timer_2_stop ] +
435                        processed[:mem_copy_D2H ] + processed[:mem_epilogue ] + skeletons[:timer_1_stop ] + 
436                        skeletons[:epilogue     ]
437         end
438         
439         # Generate code to replace the original code, including verification code if specified by the option flag
440         verify_skeleton = File.read(File.join(@dir[:verify_library],'verify_results.c'))
441         timer_start = (@options[:measurements]) ? File.read(File.join(@dir[:verify_library],'timer_start.c')) : ''
442         timer_stop  = (@options[:measurements]) ? File.read(File.join(@dir[:verify_library],'timer_stop.c')) : ''
443         replacement_code, original_definition, verify_definition = algorithm.generate_replacement_code(@options, verify_skeleton, @result[:verify_code], @prefix, timer_start, timer_stop)
444         @result[:host_declarations].push(verify_definition)
445         
446         # Add a performance model to the original code
447         #replacement_code.insert(0,algorithm.performance_model_code('model'))
448         
449         # Replace mallocs and frees in the original code with aligned memory allocations (only for CPU-OpenCL targets with zero-copy)
450         if @options[:zero_copy] == 1 && @options[:target] == 'CPU-OPENCL-INTEL'
451                 @result[:original_code].search_and_replace_function_call(C::Variable.parse('malloc'),C::Variable.parse(VARIABLE_PREFIX+'malloc_128'))
452                 @result[:original_code].search_and_replace_function_call(C::Variable.parse('free'),C::Variable.parse(VARIABLE_PREFIX+'free_128'))
453         end
454         
455         # Give the original main function a new name
456         @result[:original_code].search_and_replace_function_definition('main',VARIABLE_PREFIX+'main')
457         
458         # Replace the original code with a function call to the newly generated code
459         @result[:original_code].search_and_replace_node(algorithm.code,replacement_code)
460         
461         # The host code is generated, push the data to the output hashes
462         accelerated_definition = 'void '+algorithm.name+'_accelerated('+algorithm.lists[:host_definition]+')'
463         @result[:host_code_lists].push(@prefix+accelerated_definition+' {'+NL+host+NL+'}'+NL+NL)
464         @result[:host_declarations].push(@prefix+accelerated_definition+';'+NL+@prefix+original_definition+';')
465 end
generate_memory(type,copy,arrays,index) click to toggle source
    # File lib/bones/engine.rb
468 def generate_memory(type,copy,arrays,index)
469         
470         # Find the corresponding array
471         arrays.each do |array|
472                 if array.name == copy.name && (array.direction == copy.direction || array.direction == INOUT)
473                         
474                         # Load the skeleton from the skeleton library
475                         type += copy.direction if type == 'copy'
476                         skeleton = File.read(File.join(@dir[:common_library],'mem_async_'+type+@extension))
477                         
478                         # Create the find-and-replace hash
479                         minihash = { :array               => copy.name,
480                                      :id                  => copy.id,
481                                      :index               => index.to_s,
482                                      :direction           => copy.direction,
483                                      :definition          => array.definition,
484                                      :type                => array.type_name,
485                                      :flatten             => array.flatten,
486                                      :offset              => '0',
487                                      :variable_dimensions => array.size.join('*'),
488                                      :state               => copy.deadline}
489                         
490                         # Instanstiate the skeleton and add it to the final result
491                         @result[:host_code_lists].push(search_and_replace(minihash,skeleton))
492                         
493                         # Add a forward declaration of this function
494                         @result[:host_declarations].push(copy.get_definition(array.definition,type))
495                         
496                         # Done
497                         return
498                 end
499         end
500 end