class Bones::Engine
This class holds the main functionality: the Bones
source- to-source compilation engine based on algorithmic skeletons. This class processes command line arguments, makes calls to the Bones
preprocessor and the CAST gem, analyzes the source code, performs source transformations, instantiates the skeletons, and finally writes output code to file.
Constants
- BONES_DIR_SKELETONS
Locate the skeletons directory.
- COMMON_FILES
A list of files to be found in the common directory of the skeleton library (excluding timer files).
- COMMON_GLOBALS
The name of the file containing the globals as found in the skeleton library
- COMMON_GLOBALS_KERNEL
The name of the file containing the globals for the kernel files as found in the skeleton library
- COMMON_HEADER
The name of the file containing the header file for the original
C
code as found in the skeleton library- COMMON_SCHEDULER
The name of the file containing the scheduler code
- GLOBAL_TIMERS
Global timers
- OUTPUT_DEVICE
The suffix added to the generated output file for the device file. See also
OUTPUT_HOST
.- OUTPUT_HOST
The suffix added to the generated output file for the host file. See also
OUTPUT_DEVICE
.- OUTPUT_VERIFICATION
The suffix added to the generated verification file. See also
OUTPUT_DEVICE
andOUTPUT_HOST
.- SKELETON_DEVICE
The extension of a device file in the skeleton library. See also
SKELETON_HOST
.- SKELETON_FILE
Set the name of the transformations file as found in the skeleton library.
- SKELETON_HOST
The extension of a host file in the skeleton library. See also
SKELETON_DEVICE
.- TIMER_FILES
A list of timer files to be found in the skeleton library.
Public Class Methods
Initializes the engine and processes the command line arguments. This method uses the ‘trollop’ gem to parse the arguments and to create a nicely formatted help menu. This method additionally initializes a result-hash and reads the contents of the source file from disk.
Command-line usage:¶ ↑
bones --application <input> --target <target> [OPTIONS]
Options:¶ ↑
--application, -a <s>: Input application file --target, -t <s>: Target processor (choose from: 'GPU-CUDA','GPU-OPENCL-AMD','CPU-OPENCL-INTEL','CPU-OPENCL-AMD','CPU-OPENMP','CPU-C') --measurements, -m: Enable/disable timers --version, -v: Print version and exit --help, -h: Show this message
# File lib/bones/engine.rb 60 def initialize 61 @result = {:original_code => [], 62 :header_code => [], 63 :host_declarations => [], 64 :host_code_lists => [], 65 :algorithm_declarations => [], 66 :algorithm_code_lists => [], 67 :verify_code => [], 68 :host_device_mem_globals => []} 69 @state = 0 70 71 # Provides a list of possible targets (e.g. GPU-CUDA, 'CPU-OPENCL-INTEL'). 72 targets = [] 73 Dir[File.join(BONES_DIR_SKELETONS,'*')].each do |entry| 74 if (File.directory?(entry)) && !(entry =~ /verification/) 75 targets.push(File.basename(entry)) 76 end 77 end 78 targets = targets.sort 79 80 # Parse the command line options using the 'trollop' gem. 81 pp_targets = targets.inspect.gsub(/("|\[)|\]/,'') 82 @options = Trollop::options do 83 version 'Bones '+File.read(BONES_DIR+'/VERSION').strip+' (c) 2012 Cedric Nugteren, Eindhoven University of Technology' 84 banner NL+'Bones is a parallelizing source-to-source compiler based on algorithmic skeletons. ' + 85 'For more information, see the README.rdoc file or visit the Bones website at http://parse.ele.tue.nl/bones/.' + NL + NL + 86 'Usage:' + NL + 87 ' bones --application <input> --target <target> [OPTIONS]' + NL + 88 'using the following flags:' 89 opt :application, 'Input application file', :short => 'a', :type => String 90 opt :target, 'Target processor (choose from: '+pp_targets+')', :short => 't', :type => String 91 opt :measurements, 'Enable/disable timers', :short => 'm', :default => false 92 opt :verify, 'Verify correctness of the generated code', :short => 'c', :default => false 93 opt :only_alg_number, 'Only generate code for the x-th species (99 -> all)', :short => 'o', :type => Integer, :default => 99 94 opt :merge_factor, 'Thread merge factor, default is 1 (==disabled)', :short => 'f', :type => Integer, :default => 0 95 opt :register_caching,'Enable register caching: 1:enabled (default), 0:disabled', :short => 'r', :type => Integer, :default => 1 96 opt :zero_copy ,'Enable OpenCL zero-copy: 1:enabled (default), 0:disabled', :short => 'z', :type => Integer, :default => 1 97 opt :skeletons ,'Enable non-default skeletons: 1:enabled (default), 0:disabled', :short => 's', :type => Integer, :default => 1 98 end 99 Trollop::die 'no input file supplied (use: --application)' if !@options[:application_given] 100 Trollop::die 'no target supplied (use: --target)' if !@options[:target_given] 101 Trollop::die 'input file "'+@options[:application]+'" does not exist' if !File.exists?(@options[:application]) 102 Trollop::die 'target not supported, supported targets are: '+pp_targets if !targets.include?(@options[:target].upcase) 103 @options[:name] = File.basename(@options[:application], ".*") 104 @options[:target] = @options[:target].upcase 105 106 # Extension for the host files corresponding to the target. 107 @extension = File.extname(Dir[File.join(BONES_DIR_SKELETONS,@options[:target],'common','*')][0]) 108 109 # Extension for the device files corresponding to the target. 110 @algorithm_extension = File.extname(Dir[File.join(BONES_DIR_SKELETONS,@options[:target],'kernel','*.kernel.*')][0]) 111 112 # Set a prefix for functions called from the original file but defined in a host file 113 @prefix = (@options[:target] == 'GPU-CUDA') ? '' : '' 114 115 # Setting to include the scheduler (CUDA only) 116 @scheduler = (@options[:target] == 'GPU-CUDA') ? true : false 117 118 # Skip analyse passes for certain targets 119 @skiptarget = false #(@options[:target] == 'PAR4ALL') ? true : false 120 121 # Set the location for the skeleton library 122 @dir = {} 123 @dir[:library] = File.join(BONES_DIR_SKELETONS,@options[:target]) 124 @dir[:skeleton_library] = File.join(@dir[:library],'kernel') 125 @dir[:common_library] = File.join(@dir[:library],'common') 126 @dir[:verify_library] = File.join(BONES_DIR_SKELETONS,'verification') 127 128 # Obtain the source code from file 129 @source = File.open(@options[:application],'r'){|f| f.read} 130 @basename = File.basename(@options[:application],'.c') 131 end
Public Instance Methods
Method to process a file and to output target code. This method calls all relevant private methods.
Tasks:¶ ↑
-
Run the preprocessor to obtain algorithm information.
-
Use the ‘CAST’ gem to parse the source into an AST.
-
Call the code generator to perform the real work and produce output.
# File lib/bones/engine.rb 140 def process 141 142 # Run the preprocessor 143 preprocessor = Bones::Preprocessor.new(@source,File.dirname(@options[:application]),@basename,@scheduler) 144 preprocessor.process 145 @result[:header_code] = preprocessor.header_code 146 @result[:device_header] = preprocessor.device_header 147 @result[:header_code] += '#include <sys/time.h>'+NL if @options[:measurements] 148 149 # Parse the source code into AST 150 parser = C::Parser.new 151 parser.type_names << 'FILE' 152 parser.type_names << 'size_t' 153 ast = parser.parse(preprocessor.target_code) 154 ast.preprocess 155 156 # Add the scheduler's global code 157 if @scheduler 158 @result[:host_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_SCHEDULER+@extension))) 159 end 160 161 # Set the algorithm's skeleton and generate the global code 162 one_time = true 163 preprocessor.algorithms.each_with_index do |algorithm,algorithm_number| 164 algorithm.species.set_skeleton(File.join(@dir[:library],SKELETON_FILE)) 165 if @options[:skeletons] == 0 166 algorithm.species.skeleton_name = 'default' 167 algorithm.species.settings.gsub!('10','00').gsub!('20','00').gsub!('30','00') 168 end 169 if algorithm.species.skeleton_name && one_time 170 @result[:host_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_GLOBALS+@extension))) 171 @result[:algorithm_code_lists].push(File.read(File.join(@dir[:common_library],COMMON_GLOBALS_KERNEL+@extension))) 172 one_time = false 173 end 174 end 175 176 # Perform code generation (per-species code) 177 @result[:original_code] = ast 178 arrays = [] 179 preprocessor.algorithms.each_with_index do |algorithm,algorithm_number| 180 if @options[:only_alg_number] == 99 || algorithm_number == [@options[:only_alg_number],preprocessor.algorithms.length-1].min 181 puts MESSAGE+'Starting code generation for algorithm "'+algorithm.name+'"' 182 if algorithm.species.skeleton_name 183 algorithm.merge_factor = @options[:merge_factor] if (@options[:target] == 'GPU-CUDA') 184 algorithm.register_caching_enabled = @options[:register_caching] 185 algorithm.set_function(ast) 186 algorithm.populate_variables(ast,preprocessor.defines) if !@skiptarget 187 algorithm.populate_lists() 188 algorithm.populate_hash() if !@skiptarget 189 generate(algorithm) 190 puts MESSAGE+'Code generated using the "'+algorithm.species.skeleton_name+'" skeleton' 191 arrays.concat(algorithm.arrays) 192 else 193 puts WARNING+'Skeleton "'+algorithm.species.name+'" not available' 194 end 195 end 196 end 197 198 # Only if the scheduler is included 199 if @scheduler 200 201 # Perform code generation (sync statements) 202 @result[:host_declarations].push('void bones_synchronize(int bones_task_id);') 203 204 # Perform code generation (memory allocs) 205 allocs = [] 206 preprocessor.copies.each do |copy| 207 name_scop = Set.new([copy.name, copy.scop]) 208 if !allocs.include?(name_scop) 209 generate_memory('alloc',copy,arrays,0) 210 allocs << name_scop 211 end 212 end 213 214 # Perform code generation (memory copies) 215 preprocessor.copies.each_with_index do |copy,index| 216 #puts MESSAGE+'Generating copy code for array "'+copy.name+'"' 217 generate_memory('copy',copy,arrays,index) 218 end 219 220 # Perform code generation (memory frees) 221 frees = [] 222 preprocessor.copies.each do |copy| 223 name_scop = Set.new([copy.name, copy.scop]) 224 if !frees.include?(name_scop) 225 generate_memory('free',copy,arrays,0) 226 frees << name_scop 227 end 228 end 229 230 end 231 232 end
This method writes the output code to files. It creates a new directory formatted as ‘name_target’ and produces three files.
Output files:¶ ↑
-
main
- a file containing the original code with function calls substituting the original algorithms. -
target
- a file containing the host code for the target. -
kernel
- a file containing the kernel code for the target.
# File lib/bones/engine.rb 242 def write_output 243 244 # Create a new directory for the output 245 directory = @options[:application].rpartition('.').first+'_'+@options[:target] 246 Dir.mkdir(directory,0744) unless File.directory?(directory) 247 248 parser = C::Parser.new 249 parser.type_names << 'FILE' 250 parser.type_names << 'size_t' 251 252 # Populate the main file 253 File.open(File.join(directory,@options[:application].split(File::SEPARATOR).last),'w') do |main| 254 main.puts '#include <string.h>' if @options[:verify] 255 main.puts @result[:header_code] 256 main.puts File.read(File.join(@dir[:common_library],COMMON_HEADER+@extension)) 257 main.puts @result[:host_declarations] 258 main.puts 259 begin 260 main.puts parser.parse(@result[:original_code]).to_s 261 rescue 262 puts WARNING+'Recovering from CAST parse error' 263 main.puts parser.parse(@result[:original_code].clone).to_s 264 end 265 end 266 267 # Populate the verification file 268 if @options[:verify] 269 File.open(File.join(directory,@options[:name]+OUTPUT_VERIFICATION+@extension),'w') do |verification| 270 verification.puts @result[:header_code] 271 verification.puts File.read(File.join(@dir[:verify_library],'header.c')) 272 verification.puts 273 verification.puts @result[:verify_code] 274 end 275 end 276 277 # Populate the target file (host) 278 279 File.open(File.join(directory,@options[:name]+OUTPUT_HOST+@extension),'w') do |target| 280 target.puts '#include <cuda_runtime.h>'+NL if @options[:target] == 'GPU-CUDA' 281 target.puts "#define ZEROCOPY 0"+NL if @options[:zero_copy] == 0 && @options[:target] == 'CPU-OPENCL-INTEL' 282 target.puts "#define ZEROCOPY 1"+NL if @options[:zero_copy] == 1 && @options[:target] == 'CPU-OPENCL-INTEL' 283 target.puts @result[:header_code] 284 target.puts 285 target.puts @result[:host_device_mem_globals].uniq 286 target.puts 287 target.puts @result[:algorithm_declarations] 288 target.puts @result[:host_code_lists] 289 target.puts 290 target.puts File.read(File.join(@dir[:common_library],GLOBAL_TIMERS+@extension)) 291 end 292 293 # Populate the algorithm file (device) 294 File.open(File.join(directory,@options[:name]+OUTPUT_DEVICE+@algorithm_extension),'w') do |algorithm| 295 algorithm.puts @result[:device_header] 296 algorithm.puts @result[:algorithm_code_lists] 297 end 298 299 end
Private Instance Methods
This method takes as an input an indivual algorithm and generates the corresponding output code. The method first creates a search-and-replace hash, after which it instan- tiates a skeleton.
This method returns a message informing the user whether the code was succesfully generated or the skeleton was not available.
# File lib/bones/engine.rb 312 def generate(algorithm) 313 314 # Determine the skeleton filenames and load them skeletons from the skeleton library 315 file_name_host = File.join(@dir[:skeleton_library],algorithm.species.skeleton_name+SKELETON_HOST) 316 file_name_device = File.join(@dir[:skeleton_library],algorithm.species.skeleton_name+SKELETON_DEVICE) 317 if !File.exists?(file_name_host+@extension) || !File.exists?(file_name_device+@algorithm_extension) 318 raise_error('Skeleton files for skeleton "'+algorithm.species.skeleton_name+'" not available') 319 end 320 skeletons = {:host => File.read(file_name_host+@extension), 321 :device => File.read(file_name_device+@algorithm_extension)} 322 323 # Perform the transformations on the algorithm's code 324 algorithm.perform_transformations(algorithm.species.settings) if !@skiptarget 325 326 # Load the common skeletons from the skeleton library 327 COMMON_FILES.each do |skeleton| 328 skeletons[skeleton.to_sym] = File.read(File.join(@dir[:common_library],skeleton+@extension)) 329 end 330 331 # Load the timer code from the skeleton library (only if the '--measurements' flag is given) 332 TIMER_FILES.each do |skeleton| 333 skeletons[skeleton.to_sym] = @options[:measurements] ? File.read(File.join(@dir[:common_library],skeleton+@extension)) : '' 334 end 335 336 # Perform search-and-replace on the device skeleton 337 search_and_replace!(algorithm.hash,skeletons[:device]) 338 skeletons[:device].remove_extras 339 340 # Replace mathematical functions with their equivalent device functions 341 if @options[:target] == 'GPU-CUDA' 342 math_functions = {:sqrt => 'sqrtf', :max => 'fmaxf', :min => 'fminf'} 343 math_functions.each do |original, replacement| 344 skeletons[:device].gsub!(/\b#{original}\(/,replacement+'(') 345 end 346 end 347 348 # Create the algorithm declaration list from the header supplied in the skeletons 349 algorithm_declaration = skeletons[:device].scan(/#{START_DEFINITION}(.+)#{END_DEFINITION}/m).join.strip.remove_extras 350 @result[:algorithm_declarations].push(algorithm_declaration) 351 352 # Remove the (commented) algorithm declaration from the code and push the skeleton to the output 353 @result[:algorithm_code_lists].push(skeletons[:device].gsub!(/#{START_DEFINITION}(.+)#{END_DEFINITION}/m,'')) 354 355 # Setup some variables to create the host body function including memory allocation and memory copies 356 processed = {:mem_prologue => '', :mem_copy_H2D => '', :mem_copy_D2H => '', :mem_epilogue => ''} 357 counter = {:out => 0, :in => 0} 358 359 # Iterate over all the array variables and create a mini-search-and-replace hash for each array (all arrays) 360 algorithm.arrays.each_with_index do |array, arrayid| 361 minihash = { :array => array.name, 362 :type => array.type_name, 363 :flatten => array.flatten, 364 :variable_dimensions => array.size.join('*'), 365 :state => @state.to_s} 366 @state += 1 367 368 # Apply the mini-search-and-replace hash to create the memory allocations, memory copies (if input only), etc. 369 processed[:mem_prologue] += search_and_replace(minihash,skeletons[:mem_prologue]) 370 processed[:mem_copy_H2D] += search_and_replace(minihash,skeletons[:mem_copy_H2D]) if array.input? || array.species.shared? 371 processed[:mem_epilogue] += search_and_replace(minihash,skeletons[:mem_epilogue]) 372 373 # Add the device declarations 374 @result[:host_device_mem_globals].push(search_and_replace(minihash,skeletons[:mem_global])) 375 end 376 377 # Iterate over all the array variables and create a mini-search-and-replace hash for each array (output arrays) 378 algorithm.arrays.select(OUTPUT).each_with_index do |array, num_array| 379 hash = algorithm.hash["out#{num_array}".to_sym] 380 minihash = { :array => array.name, 381 :type => array.type_name, 382 :flatten => array.flatten, 383 :offset => '('+hash[:dimension0][:from]+')', 384 :variable_dimensions => '('+hash[:dimensions]+')', 385 :state => @state.to_s} 386 @state += 1 387 388 # Perform selective copy for arrays with 2 dimensions (uses a for-loop over the memory copies) 389 if array.dimensions == 2 && @options[:target] == 'GPU-CUDA' && false 390 x_from = '('+hash[:dimension0][:from]+')' 391 x_to = '('+hash[:dimension0][:to]+')' 392 x_sum = '('+hash[:dimension0][:sum]+')' 393 x_size = array.size[0] 394 y_from = '('+hash[:dimension1][:from]+')' 395 y_to = '('+hash[:dimension1][:to]+')' 396 y_sum = '('+hash[:dimension1][:sum]+')' 397 y_size = array.size[1] 398 processed[:mem_copy_D2H] += NL+INDENT+"for(int bones_x=#{x_from}; bones_x<=#{x_to}; bones_x++) {"+INDENT*2 399 minihash[:offset] = "(bones_x*#{y_size})+#{y_from}" 400 minihash[:variable_dimensions] = "#{y_sum}" 401 # Don't do selective copy for multi-dimensional arrays (yet) 402 elsif array.dimensions > 1 403 minihash[:offset] = '0' 404 minihash[:variable_dimensions] = array.size.join('*') 405 end 406 407 # Apply the mini-search-and-replace hash to create the memory copies from device to host 408 processed[:mem_copy_D2H] += search_and_replace(minihash,skeletons[:mem_copy_D2H]) 409 if array.dimensions == 2 && @options[:target] == 'GPU-CUDA' && false 410 processed[:mem_copy_D2H] += INDENT+'}' 411 end 412 end 413 414 # Apply the search-and-replace hash to all timer skeletons and the host skeleton 415 (['host']+TIMER_FILES).each do |skeleton| 416 search_and_replace!(algorithm.hash,skeletons[skeleton.to_sym]) 417 end 418 419 # Repair some invalid syntax that could have been introduced by performing the search-and-replace 420 skeletons[:host].remove_extras 421 422 # Run the prologue/epilogue code through the search-and-replace hash 423 search_and_replace!(algorithm.hash,skeletons[:prologue]) 424 search_and_replace!(algorithm.hash,skeletons[:epilogue]) 425 426 # Construct the final host function, inluding the timers and memory copies 427 if @scheduler 428 host = skeletons[:prologue ] + 429 skeletons[:timer_2_start] + skeletons[:host ] + skeletons[:timer_2_stop ] + 430 skeletons[:epilogue ] 431 else 432 host = skeletons[:prologue ] + 433 skeletons[:timer_1_start] + processed[:mem_prologue ] + processed[:mem_copy_H2D ] + 434 skeletons[:timer_2_start] + skeletons[:host ] + skeletons[:timer_2_stop ] + 435 processed[:mem_copy_D2H ] + processed[:mem_epilogue ] + skeletons[:timer_1_stop ] + 436 skeletons[:epilogue ] 437 end 438 439 # Generate code to replace the original code, including verification code if specified by the option flag 440 verify_skeleton = File.read(File.join(@dir[:verify_library],'verify_results.c')) 441 timer_start = (@options[:measurements]) ? File.read(File.join(@dir[:verify_library],'timer_start.c')) : '' 442 timer_stop = (@options[:measurements]) ? File.read(File.join(@dir[:verify_library],'timer_stop.c')) : '' 443 replacement_code, original_definition, verify_definition = algorithm.generate_replacement_code(@options, verify_skeleton, @result[:verify_code], @prefix, timer_start, timer_stop) 444 @result[:host_declarations].push(verify_definition) 445 446 # Add a performance model to the original code 447 #replacement_code.insert(0,algorithm.performance_model_code('model')) 448 449 # Replace mallocs and frees in the original code with aligned memory allocations (only for CPU-OpenCL targets with zero-copy) 450 if @options[:zero_copy] == 1 && @options[:target] == 'CPU-OPENCL-INTEL' 451 @result[:original_code].search_and_replace_function_call(C::Variable.parse('malloc'),C::Variable.parse(VARIABLE_PREFIX+'malloc_128')) 452 @result[:original_code].search_and_replace_function_call(C::Variable.parse('free'),C::Variable.parse(VARIABLE_PREFIX+'free_128')) 453 end 454 455 # Give the original main function a new name 456 @result[:original_code].search_and_replace_function_definition('main',VARIABLE_PREFIX+'main') 457 458 # Replace the original code with a function call to the newly generated code 459 @result[:original_code].search_and_replace_node(algorithm.code,replacement_code) 460 461 # The host code is generated, push the data to the output hashes 462 accelerated_definition = 'void '+algorithm.name+'_accelerated('+algorithm.lists[:host_definition]+')' 463 @result[:host_code_lists].push(@prefix+accelerated_definition+' {'+NL+host+NL+'}'+NL+NL) 464 @result[:host_declarations].push(@prefix+accelerated_definition+';'+NL+@prefix+original_definition+';') 465 end
# File lib/bones/engine.rb 468 def generate_memory(type,copy,arrays,index) 469 470 # Find the corresponding array 471 arrays.each do |array| 472 if array.name == copy.name && (array.direction == copy.direction || array.direction == INOUT) 473 474 # Load the skeleton from the skeleton library 475 type += copy.direction if type == 'copy' 476 skeleton = File.read(File.join(@dir[:common_library],'mem_async_'+type+@extension)) 477 478 # Create the find-and-replace hash 479 minihash = { :array => copy.name, 480 :id => copy.id, 481 :index => index.to_s, 482 :direction => copy.direction, 483 :definition => array.definition, 484 :type => array.type_name, 485 :flatten => array.flatten, 486 :offset => '0', 487 :variable_dimensions => array.size.join('*'), 488 :state => copy.deadline} 489 490 # Instanstiate the skeleton and add it to the final result 491 @result[:host_code_lists].push(search_and_replace(minihash,skeleton)) 492 493 # Add a forward declaration of this function 494 @result[:host_declarations].push(copy.get_definition(array.definition,type)) 495 496 # Done 497 return 498 end 499 end 500 end