class Ikra::Translator::CommandTranslator::KernelLauncher

Builds the launch of the kernel. This class is responsible for generating the invocation of the kernel.

For example: kernel<<<…, …>>>(env, result, d_a, …);

Attributes

debug_free_previous_input_immediately[RW]

Debug flag only: Frees all input after launching kernel. This causes an error if data is used twice or kept (using the `keep` flag)

additional_arguments[RW]

Additional parameters that this kernel should accept (to access the result of previous kernels)

block_dim[RW]
cached_results[RW]

IDs and types of commands whose results are kept on the GPU

grid_dim[RW]

Block/grid dimensions (should be 1D)

kernel_builder[RW]
kernel_result_var_name[R]

Pointer to the resulting array (device memory)

num_threads[RW]

Number of threads (elements to be processed)

previous_kernel_input[RW]

Additional parameters that this kernel should accept (to access the result of previous kernels)

previously_cached_results[R]

IDs and types of commands that were previously computed and shall now be used in this kernel as input

reuse_memory[RW]

Whether the launch allocates new memory beforehand or uses previous memory

Public Class Methods

new(kernel_builder) click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 46
def initialize(kernel_builder)
    @kernel_builder = kernel_builder
    @additional_arguments = []
    @previous_kernel_input = []
    @reuse_memory = false
    @kernel_result_var_name = "_kernel_result_" + CommandTranslator.next_unique_id.to_s
    @cached_results = {}
    @previously_cached_results = {}
end

Public Instance Methods

add_additional_arguments(*arguments) click to toggle source

Add additional arguments to the kernel function that might be needed for some computations

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 101
def add_additional_arguments(*arguments)
    @additional_arguments.push(*arguments)
end
add_cached_result(result_id, type) click to toggle source

Adds command whose result will be kept on GPU

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 78
def add_cached_result(result_id, type)
    @cached_results[result_id] = type
end
add_previous_kernel_parameter(parameter) click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 96
def add_previous_kernel_parameter(parameter)
    kernel_builder.add_previous_kernel_parameter(parameter)
end
assert_ready_to_build() click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 139
def assert_ready_to_build
    required_values = [:num_threads, :grid_dim, :block_dim]

    for selector in required_values
        if send(selector) == nil
            raise AssertionError.new(
                "Not ready to build (KernelBuilder): #{selector} is not set")
        end
    end
end
build_device_memory_free() click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 229
def build_device_memory_free
    Log.info("Building kernel post-launch CUDA free")

    assert_ready_to_build

    if KernelLauncher.debug_free_previous_input_immediately == true
        Log.warn("Debug flag set... Freeing input memory immediately and some memory not at all!")
        return ""
    end

    return Translator.read_file(file_name: "free_device_memory.cpp", replacements: {
        "name" => kernel_result_var_name})
end
build_device_memory_free_in_host_section() click to toggle source

Same as above, but also removes item from the list of allocated memory chunks.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 244
def build_device_memory_free_in_host_section
    Log.info("Building kernel post-launch CUDA free (host section")

    assert_ready_to_build

    return Translator.read_file(file_name: "host_section_free_device_memory.cpp", replacements: {
        "name" => kernel_result_var_name})
end
build_kernel_launcher() click to toggle source

Build the code that launches this kernel. The generated code performs the following steps:

  1. Allocate device memory for the result.

  2. If result should be written back: Allocate host memory for the result.

  3. Launch the kernel (+ error checking, synchronization)

  4. If result should be written back: Copy result back to host memory.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 158
def build_kernel_launcher
    
    Log.info("Building kernel launcher")

    assert_ready_to_build

    result = ""
    if !reuse_memory
        # Allocate device memory for kernel result
        result = result + Translator.read_file(file_name: "allocate_device_memory.cpp", replacements: {
            "name" => kernel_result_var_name,
            "bytes" => "(sizeof(#{kernel_builder.result_type.to_c_type}) * #{num_threads})",
            "type" => kernel_builder.result_type.to_c_type})
    end

    previously_cached_results.each do |result_id, type|
        result = result + "    #{type.to_c_type} *prev_" + result_id.to_s + " = (#{type.to_c_type} *) " + Constants::ENV_HOST_IDENTIFIER + "->prev_" + result_id.to_s + ";\n"
    end 

    # Allocate device memory for cached results
    cached_results.each do |result_id, type|
        result = result + Translator.read_file(file_name: "allocate_device_memory.cpp", replacements: {
            "name" => Constants::RESULT_IDENTIFIER + result_id,
            "bytes" => "(#{type.c_size} * #{num_threads})",
            "type" => type.to_c_type})
    end

    # Build arguments
    a_env = Constants::ENV_DEVICE_IDENTIFIER
    a_result = kernel_result_var_name

    previous_kernel_args = []
    for var in kernel_builder.previous_kernel_input
        previous_kernel_args.push(var.name.to_s)
    end

    a_cached_results = cached_results.map do |result_id, type|
        Constants::RESULT_IDENTIFIER + result_id
    end

    if reuse_memory
        previous_kernel_args[0] = a_result
    end

    arguments = ([a_env, num_threads, a_result] + a_cached_results + previous_kernel_args + additional_arguments).join(", ")

    # Launch kernel
    result = result + Translator.read_file(file_name: "launch_kernel.cpp", replacements: {
        "kernel_name" => kernel_builder.kernel_name,
        "arguments" => arguments,
        "grid_dim" => grid_dim,
        "block_dim" => block_dim})

    # ---- DEBUG ONLY: Free input after computation so that we can process larger
    #                  data sets in benchmarks without running out of memory
    # TODO: Implement analysis and do this automatically
    if KernelLauncher.debug_free_previous_input_immediately == true
        for var in kernel_builder.previous_kernel_input
            result = result + Translator.read_file(file_name: "free_device_memory.cpp", replacements: {
                "name" => var.name.to_s})
        end 
    end
    # ---- END DEBUG ONLY

    cached_results.each do |result_id, type|
        result = result + "    " + Constants::ENV_HOST_IDENTIFIER + "->prev_" + result_id + " = " + Constants::RESULT_IDENTIFIER + result_id + ";\n"
    end

    return result
end
configure_grid(size, block_size: 256) click to toggle source

Configures grid size and block size. Also sets number of threads.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 117
def configure_grid(size, block_size: 256)
    if block_size == nil
        block_size = 256
    end
    
    if size.is_a?(Fixnum)
        # Precompute constants
        @grid_dim = [size.fdiv(block_size).ceil, 1].max.to_s
        @block_dim = (size >= block_size ? block_size : size).to_s
        @num_threads = size
    else
        if !size.is_a?(String)
            raise AssertionError.new("Fixnum or String expected")
        end

        # Source code string determines the size
        @grid_dim = "max((int) ceil(((float) #{size}) / #{block_size}), 1)"
        @block_dim = "(#{size} >= #{block_size} ? #{block_size} : #{size})"
        @num_threads = size
    end
end
kernel_builders() click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 70
def kernel_builders
    # The program builder accesses kernel builders via kernel launchers through
    # this method, because some specialized launchers might have multiple kernel
    # builders.
    return [kernel_builder]
end
prepare_additional_args_for_launch(command) click to toggle source

Some of the values stored in `@additional_arguments` might be blocks, because not all information was known when adding something to that list. This method replaces those blocks (evaluates them) with actual strings, based on the command that is being launched.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 60
def prepare_additional_args_for_launch(command)
    @additional_arguments = @additional_arguments.map do |arg|
        if arg.is_a?(String)
            arg
        else
            arg.call(command)
        end
    end
end
result_size() click to toggle source

The size of the result array is the number of threads.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 112
def result_size
    return num_threads
end
result_type() click to toggle source

The result type of this kernel launcher. Same as the result type of its kernel builder.

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 107
def result_type
    return kernel_builder.result_type
end
reuse_memory!(parameter_name) click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 87
def reuse_memory!(parameter_name)
    @reuse_memory = true
    @kernel_result_var_name = parameter_name
end
reuse_memory?() click to toggle source
# File lib/translator/kernel_launcher/kernel_launcher.rb, line 92
def reuse_memory?
    return @reuse_memory
end
use_cached_result(result_id, type) click to toggle source

Adds a previously computed result which will be used in this launche as input

# File lib/translator/kernel_launcher/kernel_launcher.rb, line 83
def use_cached_result(result_id, type)
    @previously_cached_results[result_id] = type
end