Halide  17.0.2
Halide compiler and libraries
Schedule.h
Go to the documentation of this file.
1 #ifndef HALIDE_SCHEDULE_H
2 #define HALIDE_SCHEDULE_H
3 
4 /** \file
5  * Defines the internal representation of the schedule for a function
6  */
7 
8 #include <map>
9 #include <string>
10 #include <utility>
11 #include <vector>
12 
13 #include "DeviceAPI.h"
14 #include "Expr.h"
15 #include "FunctionPtr.h"
17 #include "Parameter.h"
18 #include "PrefetchDirective.h"
19 
20 namespace Halide {
21 
22 class Func;
23 struct VarOrRVar;
24 
25 namespace Internal {
26 class Function;
27 struct FunctionContents;
28 struct LoopLevelContents;
29 } // namespace Internal
30 
31 /** Different ways to handle a tail case in a split when the
32  * factor does not provably divide the extent. */
33 enum class TailStrategy {
34  /** Round up the extent to be a multiple of the split
35  * factor. Not legal for RVars, as it would change the meaning
36  * of the algorithm. Pros: generates the simplest, fastest
37  * code. Cons: if used on a stage that reads from the input or
38  * writes to the output, constrains the input or output size
39  * to be a multiple of the split factor. */
40  RoundUp,
41 
42  /** Guard the inner loop with an if statement that prevents
43  * evaluation beyond the original extent. Always legal. The if
44  * statement is treated like a boundary condition, and
45  * factored out into a loop epilogue if possible. Pros: no
46  * redundant re-evaluation; does not constrain input our
47  * output sizes. Cons: increases code size due to separate
48  * tail-case handling; vectorization will scalarize in the tail
49  * case to handle the if statement. */
51 
52  /** Guard the loads and stores in the loop with an if statement
53  * that prevents evaluation beyond the original extent. Always
54  * legal. The if statement is treated like a boundary condition,
55  * and factored out into a loop epilogue if possible.
56  * Pros: no redundant re-evaluation; does not constrain input or
57  * output sizes. Cons: increases code size due to separate
58  * tail-case handling. */
59  Predicate,
60 
61  /** Guard the loads in the loop with an if statement that
62  * prevents evaluation beyond the original extent. Only legal
63  * for innermost splits. Not legal for RVars, as it would change
64  * the meaning of the algorithm. The if statement is treated like
65  * a boundary condition, and factored out into a loop epilogue if
66  * possible.
67  * Pros: does not constrain input sizes, output size constraints
68  * are simpler than full predication. Cons: increases code size
69  * due to separate tail-case handling, constrains the output size
70  * to be a multiple of the split factor. */
72 
73  /** Guard the stores in the loop with an if statement that
74  * prevents evaluation beyond the original extent. Only legal
75  * for innermost splits. Not legal for RVars, as it would change
76  * the meaning of the algorithm. The if statement is treated like
77  * a boundary condition, and factored out into a loop epilogue if
78  * possible.
79  * Pros: does not constrain output sizes, input size constraints
80  * are simpler than full predication. Cons: increases code size
81  * due to separate tail-case handling, constraints the input size
82  * to be a multiple of the split factor.. */
84 
85  /** Prevent evaluation beyond the original extent by shifting
86  * the tail case inwards, re-evaluating some points near the
87  * end. Only legal for pure variables in pure definitions. If
88  * the inner loop is very simple, the tail case is treated
89  * like a boundary condition and factored out into an
90  * epilogue.
91  *
92  * This is a good trade-off between several factors. Like
93  * RoundUp, it supports vectorization well, because the inner
94  * loop is always a fixed size with no data-dependent
95  * branching. It increases code size slightly for inner loops
96  * due to the epilogue handling, but not for outer loops
97  * (e.g. loops over tiles). If used on a stage that reads from
98  * an input or writes to an output, this stategy only requires
99  * that the input/output extent be at least the split factor,
100  * instead of a multiple of the split factor as with RoundUp. */
101  ShiftInwards,
102 
103  /** Equivalent to ShiftInwards, but protects values that would be
104  * re-evaluated by loading the memory location that would be stored to,
105  * modifying only the elements not contained within the overlap, and then
106  * storing the blended result.
107  *
108  * This tail strategy is useful when you want to use ShiftInwards to
109  * vectorize without a scalar tail, but are scheduling a stage where that
110  * isn't legal (e.g. an update definition).
111  *
112  * Because this is a read - modify - write, this tail strategy cannot be
113  * used on any dimension the stage is parallelized over as it would cause a
114  * race condition.
115  */
117 
118  /** Equivalent to RoundUp, but protected values that would be written beyond
119  * the end by loading the memory location that would be stored to,
120  * modifying only the elements within the region being computed, and then
121  * storing the blended result.
122  *
123  * This tail strategy is useful when vectorizing an update to some sub-region
124  * of a larger Func. As with ShiftInwardsAndBlend, it can't be combined with
125  * parallelism.
126  */
128 
129  /** For pure definitions use ShiftInwards. For pure vars in
130  * update definitions use RoundUp. For RVars in update
131  * definitions use GuardWithIf. */
132  Auto
133 };
134 
135 /** Different ways to handle the case when the start/end of the loops of stages
136  * computed with (fused) are not aligned. */
137 enum class LoopAlignStrategy {
138  /** Shift the start of the fused loops to align. */
139  AlignStart,
140 
141  /** Shift the end of the fused loops to align. */
142  AlignEnd,
143 
144  /** compute_with will make no attempt to align the start/end of the
145  * fused loops. */
146  NoAlign,
147 
148  /** By default, LoopAlignStrategy is set to NoAlign. */
149  Auto
150 };
151 
152 /** A reference to a site in a Halide statement at the top of the
153  * body of a particular for loop. Evaluating a region of a halide
154  * function is done by generating a loop nest that spans its
155  * dimensions. We schedule the inputs to that function by
156  * recursively injecting realizations for them at particular sites
157  * in this loop nest. A LoopLevel identifies such a site. The site
158  * can either be a loop nest within all stages of a function
159  * or it can refer to a loop nest within a particular function's
160  * stage (initial definition or updates).
161  *
162  * Note that a LoopLevel is essentially a pointer to an underlying value;
163  * all copies of a LoopLevel refer to the same site, so mutating one copy
164  * (via the set() method) will effectively mutate all copies:
165  \code
166  Func f;
167  Var x;
168  LoopLevel a(f, x);
169  // Both a and b refer to LoopLevel(f, x)
170  LoopLevel b = a;
171  // Now both a and b refer to LoopLevel::root()
172  a.set(LoopLevel::root());
173  \endcode
174  * This is quite useful when splitting Halide code into utility libraries, as it allows
175  * a library to schedule code according to a caller's specifications, even if the caller
176  * hasn't fully defined its pipeline yet:
177  \code
178  Func demosaic(Func input,
179  LoopLevel intermed_compute_at,
180  LoopLevel intermed_store_at,
181  LoopLevel output_compute_at) {
182  Func intermed = ...;
183  Func output = ...;
184  intermed.compute_at(intermed_compute_at).store_at(intermed_store_at);
185  output.compute_at(output_compute_at);
186  return output;
187  }
188 
189  void process() {
190  // Note that these LoopLevels are all undefined when we pass them to demosaic()
191  LoopLevel intermed_compute_at, intermed_store_at, output_compute_at;
192  Func input = ...;
193  Func demosaiced = demosaic(input, intermed_compute_at, intermed_store_at, output_compute_at);
194  Func output = ...;
195 
196  // We need to ensure all LoopLevels have a well-defined value prior to lowering:
197  intermed_compute_at.set(LoopLevel(output, y));
198  intermed_store_at.set(LoopLevel(output, y));
199  output_compute_at.set(LoopLevel(output, x));
200  }
201  \endcode
202  */
203 class LoopLevel {
205 
207  : contents(std::move(c)) {
208  }
209 
210 public:
211  /** Return the index of the function stage associated with this loop level.
212  * Asserts if undefined */
213  int stage_index() const;
214 
215  /** Identify the loop nest corresponding to some dimension of some function */
216  // @{
217  LoopLevel(const Internal::Function &f, const VarOrRVar &v, int stage_index = -1);
218  LoopLevel(const Func &f, const VarOrRVar &v, int stage_index = -1);
219  // @}
220 
221  /** Construct an undefined LoopLevel. Calling any method on an undefined
222  * LoopLevel (other than set()) will assert. */
223  LoopLevel();
224 
225  /** For deserialization only. */
226  LoopLevel(const std::string &func_name, const std::string &var_name,
227  bool is_rvar, int stage_index, bool locked = false);
228 
229  /** Construct a special LoopLevel value that implies
230  * that a function should be inlined away. */
231  static LoopLevel inlined();
232 
233  /** Construct a special LoopLevel value which represents the
234  * location outside of all for loops. */
235  static LoopLevel root();
236 
237  /** Mutate our contents to match the contents of 'other'. */
238  void set(const LoopLevel &other);
239 
240  // All the public methods below this point are meant only for internal
241  // use by Halide, rather than user code; hence, they are deliberately
242  // documented with plain comments (rather than Doxygen) to avoid being
243  // present in user documentation.
244 
245  // Lock this LoopLevel.
246  LoopLevel &lock();
247 
248  // Return the Func name. Asserts if the LoopLevel is_root() or is_inlined() or !defined().
249  std::string func() const;
250 
251  // Return the VarOrRVar. Asserts if the LoopLevel is_root() or is_inlined() or !defined().
252  VarOrRVar var() const;
253 
254  // Return true iff the LoopLevel is defined. (Only LoopLevels created
255  // with the default ctor are undefined.)
256  bool defined() const;
257 
258  // Test if a loop level corresponds to inlining the function.
259  bool is_inlined() const;
260 
261  // Test if a loop level is 'root', which describes the site
262  // outside of all for loops.
263  bool is_root() const;
264 
265  // For serialization only. Do not use in other cases.
266  int get_stage_index() const;
267 
268  // For serialization only. Do not use in other cases.
269  std::string func_name() const;
270 
271  // For serialization only. Do not use in other cases.
272  std::string var_name() const;
273 
274  // For serialization only. Do not use in other cases.
275  bool is_rvar() const;
276 
277  // For serialization only. Do not use in other cases.
278  bool locked() const;
279 
280  // Return a string of the form func.var -- note that this is safe
281  // to call for root or inline LoopLevels, but asserts if !defined().
282  std::string to_string() const;
283 
284  // Compare this loop level against the variable name of a for
285  // loop, to see if this loop level refers to the site
286  // immediately inside this loop. Asserts if !defined().
287  bool match(const std::string &loop) const;
288 
289  bool match(const LoopLevel &other) const;
290 
291  // Check if two loop levels are exactly the same.
292  bool operator==(const LoopLevel &other) const;
293 
294  bool operator!=(const LoopLevel &other) const {
295  return !(*this == other);
296  }
297 
298 private:
299  void check_defined() const;
300  void check_locked() const;
301  void check_defined_and_locked() const;
302 };
303 
306  /** Contains alignment strategies for the fused dimensions (indexed by the
307  * dimension name). If not in the map, use the default alignment strategy
308  * to align the fused dimension (see \ref LoopAlignStrategy::Auto).
309  */
310  std::map<std::string, LoopAlignStrategy> align;
311 
313  : level(LoopLevel::inlined().lock()) {
314  }
315  FuseLoopLevel(const LoopLevel &level, const std::map<std::string, LoopAlignStrategy> &align)
316  : level(level), align(align) {
317  }
318 };
319 
320 namespace Internal {
321 
322 class IRMutator;
323 struct ReductionVariable;
324 
325 struct Split {
326  std::string old_var, outer, inner;
328  bool exact; // Is it required that the factor divides the extent
329  // of the old var. True for splits of RVars. Forces
330  // tail strategy to be GuardWithIf.
332 
333  enum SplitType { SplitVar = 0,
337 
338  // If split_type is Rename, then this is just a renaming of the
339  // old_var to the outer and not a split. The inner var should
340  // be ignored, and factor should be one. Renames are kept in
341  // the same list as splits so that ordering between them is
342  // respected.
343 
344  // If split type is Purify, this replaces the old_var RVar to
345  // the outer Var. The inner var should be ignored, and factor
346  // should be one.
347 
348  // If split_type is Fuse, then this does the opposite of a
349  // split, it joins the outer and inner into the old_var.
351 
352  bool is_rename() const {
353  return split_type == RenameVar;
354  }
355  bool is_split() const {
356  return split_type == SplitVar;
357  }
358  bool is_fuse() const {
359  return split_type == FuseVars;
360  }
361  bool is_purify() const {
362  return split_type == PurifyRVar;
363  }
364 };
365 
366 /** Each Dim below has a dim_type, which tells you what
367  * transformations are legal on it. When you combine two Dims of
368  * distinct DimTypes (e.g. with Stage::fuse), the combined result has
369  * the greater enum value of the two types. */
370 enum class DimType {
371  /** This dim originated from a Var. You can evaluate a Func at
372  * distinct values of this Var in any order over an interval
373  * that's at least as large as the interval required. In pure
374  * definitions you can even redundantly re-evaluate points. */
375  PureVar = 0,
376 
377  /** The dim originated from an RVar. You can evaluate a Func at
378  * distinct values of this RVar in any order (including in
379  * parallel) over exactly the interval specified in the
380  * RDom. PureRVars can also be reordered arbitrarily in the dims
381  * list, as there are no data hazards between the evaluation of
382  * the Func at distinct values of the RVar.
383  *
384  * The most common case where an RVar is considered pure is RVars
385  * that are used in a way which obeys all the syntactic
386  * constraints that a Var does, e.g:
387  *
388  \code
389  RDom r(0, 100);
390  f(r.x) = f(r.x) + 5;
391  \endcode
392  *
393  * Other cases where RVars are pure are where the sites being
394  * written to by the Func evaluated at one value of the RVar
395  * couldn't possibly collide with the sites being written or read
396  * by the Func at a distinct value of the RVar. For example, r.x
397  * is pure in the following three definitions:
398  *
399  \code
400 
401  // This definition writes to even coordinates and reads from the
402  // same site (which no other value of r.x is writing to) and odd
403  // sites (which no other value of r.x is writing to):
404  f(2*r.x) = max(f(2*r.x), f(2*r.x + 7));
405 
406  // This definition writes to scanline zero and reads from the the
407  // same site and scanline one:
408  f(r.x, 0) += f(r.x, 1);
409 
410  // This definition reads and writes over non-overlapping ranges:
411  f(r.x + 100) += f(r.x);
412  \endcode
413  *
414  * To give two counterexamples, r.x is not pure in the following
415  * definitions:
416  *
417  \code
418  // The same site is written by distinct values of the RVar
419  // (write-after-write hazard):
420  f(r.x / 2) += f(r.x);
421 
422  // One value of r.x reads from a site that another value of r.x
423  // is writing to (read-after-write hazard):
424  f(r.x) += f(r.x + 1);
425  \endcode
426  */
427  PureRVar,
428 
429  /** The dim originated from an RVar. You must evaluate a Func at
430  * distinct values of this RVar in increasing order over precisely
431  * the interval specified in the RDom. ImpureRVars may not be
432  * reordered with respect to other ImpureRVars.
433  *
434  * All RVars are impure by default. Those for which we can prove
435  * no data hazards exist get promoted to PureRVar. There are two
436  * instances in which ImpureRVars may be parallelized or reordered
437  * even in the presence of hazards:
438  *
439  * 1) In the case of an update definition that has been proven to be
440  * an associative and commutative reduction, reordering of
441  * ImpureRVars is allowed, and parallelizing them is allowed if
442  * the update has been made atomic.
443  *
444  * 2) ImpureRVars can also be reordered and parallelized if
445  * Func::allow_race_conditions() has been set. This is the escape
446  * hatch for when there are no hazards but the checks above failed
447  * to prove that (RDom::where can encode arbitrary facts about
448  * non-linear integer arithmetic, which is undecidable), or for
449  * when you don't actually care about the non-determinism
450  * introduced by data hazards (e.g. in the algorithm HOGWILD!).
451  */
452  ImpureRVar,
453 };
454 
455 /** The Dim struct represents one loop in the schedule's
456  * representation of a loop nest. */
457 struct Dim {
458  /** Name of the loop variable */
459  std::string var;
460 
461  /** How are the loop values traversed (e.g. unrolled, vectorized, parallel) */
463 
464  /** On what device does the body of the loop execute (e.g. Host, GPU, Hexagon) */
466 
467  /** The DimType tells us what transformations are legal on this
468  * loop (see the DimType enum above). */
470 
471  /** The strategy for loop partitioning. */
473 
474  /** Can this loop be evaluated in any order (including in
475  * parallel)? Equivalently, are there no data hazards between
476  * evaluations of the Func at distinct values of this var? */
477  bool is_pure() const {
479  }
480 
481  /** Did this loop originate from an RVar (in which case the bounds
482  * of the loops are algorithmically meaningful)? */
483  bool is_rvar() const {
485  }
486 
487  /** Could multiple iterations of this loop happen at the same
488  * time, with reads and writes interleaved in arbitrary ways
489  * according to the memory model of the underlying compiler and
490  * machine? */
491  bool is_unordered_parallel() const {
493  }
494 
495  /** Could multiple iterations of this loop happen at the same
496  * time? Vectorized and GPULanes loop types are parallel but not
497  * unordered, because the loop iterations proceed together in
498  * lockstep with some well-defined outcome if there are hazards. */
499  bool is_parallel() const {
501  }
502 };
503 
504 /** A bound on a loop, typically from Func::bound */
505 struct Bound {
506  /** The loop var being bounded */
507  std::string var;
508 
509  /** Declared min and extent of the loop. min may be undefined if
510  * Func::bound_extent was used. */
512 
513  /** If defined, the number of iterations will be a multiple of
514  * "modulus", and the first iteration will be at a value congruent
515  * to "remainder" modulo "modulus". Set by Func::align_bounds and
516  * Func::align_extent. */
518 };
519 
520 /** Properties of one axis of the storage of a Func */
521 struct StorageDim {
522  /** The var in the pure definition corresponding to this axis */
523  std::string var;
524 
525  /** The bounds allocated (not computed) must be a multiple of
526  * "alignment". Set by Func::align_storage. */
528 
529  /** The bounds allocated (not computed). Set by Func::bound_storage. */
531 
532  /** If the Func is explicitly folded along this axis (with
533  * Func::fold_storage) this gives the extent of the circular
534  * buffer used, and whether it is used in increasing order
535  * (fold_forward = true) or decreasing order (fold_forward =
536  * false). */
539 };
540 
541 /** This represents two stages with fused loop nests from outermost to
542  * a specific loop level. The loops to compute func_1(stage_1) are
543  * fused with the loops to compute func_2(stage_2) from outermost to
544  * loop level var_name and the computation from stage_1 of func_1
545  * occurs first.
546  */
547 struct FusedPair {
548  std::string func_1;
549  std::string func_2;
550  size_t stage_1;
551  size_t stage_2;
552  std::string var_name;
553 
554  FusedPair() = default;
555  FusedPair(const std::string &f1, size_t s1, const std::string &f2,
556  size_t s2, const std::string &var)
557  : func_1(f1), func_2(f2), stage_1(s1), stage_2(s2), var_name(var) {
558  }
559 
560  bool operator==(const FusedPair &other) const {
561  return (func_1 == other.func_1) && (func_2 == other.func_2) &&
562  (stage_1 == other.stage_1) && (stage_2 == other.stage_2) &&
563  (var_name == other.var_name);
564  }
565  bool operator<(const FusedPair &other) const {
566  if (func_1 != other.func_1) {
567  return func_1 < other.func_1;
568  }
569  if (func_2 != other.func_2) {
570  return func_2 < other.func_2;
571  }
572  if (var_name != other.var_name) {
573  return var_name < other.var_name;
574  }
575  if (stage_1 != other.stage_1) {
576  return stage_1 < other.stage_1;
577  }
578  return stage_2 < other.stage_2;
579  }
580 };
581 
582 struct FuncScheduleContents;
583 struct StageScheduleContents;
584 struct FunctionContents;
585 
586 /** A schedule for a Function of a Halide pipeline. This schedule is
587  * applied to all stages of the Function. Right now this interface is
588  * basically a struct, offering mutable access to its innards.
589  * In the future it may become more encapsulated. */
592 
593 public:
595  : contents(std::move(c)) {
596  }
597  FuncSchedule(const FuncSchedule &other) = default;
598  FuncSchedule();
599 
600  /** Return a deep copy of this FuncSchedule. It recursively deep copies all
601  * called functions, schedules, specializations, and reduction domains. This
602  * method takes a map of <old FunctionContents, deep-copied version> as input
603  * and would use the deep-copied FunctionContents from the map if exists
604  * instead of creating a new deep-copy to avoid creating deep-copies of the
605  * same FunctionContents multiple times.
606  */
608  std::map<FunctionPtr, FunctionPtr> &copied_map) const;
609 
610  /** This flag is set to true if the schedule is memoized. */
611  // @{
612  bool &memoized();
613  bool memoized() const;
614  // @}
615 
616  /** This flag is set to true if the schedule is memoized and has an attached
617  * eviction key. */
618  // @{
620  Expr memoize_eviction_key() const;
621  // @}
622 
623  /** Is the production of this Function done asynchronously */
624  bool &async();
625  bool async() const;
626 
627  /** The list and order of dimensions used to store this
628  * function. The first dimension in the vector corresponds to the
629  * innermost dimension for storage (i.e. which dimension is
630  * tightly packed in memory) */
631  // @{
632  const std::vector<StorageDim> &storage_dims() const;
633  std::vector<StorageDim> &storage_dims();
634  // @}
635 
636  /** The memory type (heap/stack/shared/etc) used to back this Func. */
637  // @{
638  MemoryType memory_type() const;
640  // @}
641 
642  /** You may explicitly bound some of the dimensions of a function,
643  * or constrain them to lie on multiples of a given factor. See
644  * \ref Func::bound and \ref Func::align_bounds and \ref Func::align_extent. */
645  // @{
646  const std::vector<Bound> &bounds() const;
647  std::vector<Bound> &bounds();
648  // @}
649 
650  /** You may explicitly specify an estimate of some of the function
651  * dimensions. See \ref Func::set_estimate */
652  // @{
653  const std::vector<Bound> &estimates() const;
654  std::vector<Bound> &estimates();
655  // @}
656 
657  /** Mark calls of a function by 'f' to be replaced with its identity
658  * wrapper or clone during the lowering stage. If the string 'f' is empty,
659  * it means replace all calls to the function by all other functions
660  * (excluding itself) in the pipeline with the global identity wrapper.
661  * See \ref Func::in and \ref Func::clone_in for more details. */
662  // @{
663  const std::map<std::string, Internal::FunctionPtr> &wrappers() const;
664  std::map<std::string, Internal::FunctionPtr> &wrappers();
665  void add_wrapper(const std::string &f,
666  const Internal::FunctionPtr &wrapper);
667  // @}
668 
669  /** At what sites should we inject the allocation and the
670  * computation of this function? The store_level must be outside
671  * of or equal to the compute_level. If the compute_level is
672  * inline, the store_level is meaningless. See \ref Func::store_at
673  * and \ref Func::compute_at */
674  // @{
675  const LoopLevel &store_level() const;
676  const LoopLevel &compute_level() const;
677  const LoopLevel &hoist_storage_level() const;
681  // @}
682 
683  /** Pass an IRVisitor through to all Exprs referenced in the
684  * Schedule. */
685  void accept(IRVisitor *) const;
686 
687  /** Pass an IRMutator through to all Exprs referenced in the
688  * Schedule. */
689  void mutate(IRMutator *);
690 };
691 
692 /** A schedule for a single stage of a Halide pipeline. Right now this
693  * interface is basically a struct, offering mutable access to its
694  * innards. In the future it may become more encapsulated. */
697 
698 public:
700  : contents(std::move(c)) {
701  }
702  StageSchedule(const StageSchedule &other) = default;
703  StageSchedule();
704  StageSchedule(const std::vector<ReductionVariable> &rvars, const std::vector<Split> &splits,
705  const std::vector<Dim> &dims, const std::vector<PrefetchDirective> &prefetches,
706  const FuseLoopLevel &fuse_level, const std::vector<FusedPair> &fused_pairs,
708 
709  /** Return a copy of this StageSchedule. */
710  StageSchedule get_copy() const;
711 
712  /** This flag is set to true if the dims list has been manipulated
713  * by the user (or if a ScheduleHandle was created that could have
714  * been used to manipulate it). It controls the warning that
715  * occurs if you schedule the vars of the pure step but not the
716  * update steps. */
717  // @{
718  bool &touched();
719  bool touched() const;
720  // @}
721 
722  /** RVars of reduction domain associated with this schedule if there is any. */
723  // @{
724  const std::vector<ReductionVariable> &rvars() const;
725  std::vector<ReductionVariable> &rvars();
726  // @}
727 
728  /** The traversal of the domain of a function can have some of its
729  * dimensions split into sub-dimensions. See \ref Func::split */
730  // @{
731  const std::vector<Split> &splits() const;
732  std::vector<Split> &splits();
733  // @}
734 
735  /** The list and ordering of dimensions used to evaluate this
736  * function, after all splits have taken place. The first
737  * dimension in the vector corresponds to the innermost for loop,
738  * and the last is the outermost. Also specifies what type of for
739  * loop to use for each dimension. Does not specify the bounds on
740  * each dimension. These get inferred from how the function is
741  * used, what the splits are, and any optional bounds in the list below. */
742  // @{
743  const std::vector<Dim> &dims() const;
744  std::vector<Dim> &dims();
745  // @}
746 
747  /** You may perform prefetching in some of the dimensions of a
748  * function. See \ref Func::prefetch */
749  // @{
750  const std::vector<PrefetchDirective> &prefetches() const;
751  std::vector<PrefetchDirective> &prefetches();
752  // @}
753 
754  /** Innermost loop level of fused loop nest for this function stage.
755  * Fusion runs from outermost to this loop level. The stages being fused
756  * should not have producer/consumer relationship. See \ref Func::compute_with
757  * and \ref Func::compute_with */
758  // @{
759  const FuseLoopLevel &fuse_level() const;
761  // @}
762 
763  /** List of function stages that are to be fused with this function stage
764  * from the outermost loop to a certain loop level. Those function stages
765  * are to be computed AFTER this function stage at the last fused loop level.
766  * This list is populated when realization_order() is called. See
767  * \ref Func::compute_with */
768  // @{
769  const std::vector<FusedPair> &fused_pairs() const;
770  std::vector<FusedPair> &fused_pairs();
771 
772  /** Are race conditions permitted? */
773  // @{
774  bool allow_race_conditions() const;
775  bool &allow_race_conditions();
776  // @}
777 
778  /** Use atomic update? */
779  // @{
780  bool atomic() const;
781  bool &atomic();
782  // @}
783 
784  /** Atomic updates are only allowed on associative reductions.
785  * We try to prove the associativity, but the user can override
786  * the associativity test and suppress compiler error if the prover
787  * fails to recognize the associativity or the user does not care. */
788  // @{
791  // @}
792 
793  /** Pass an IRVisitor through to all Exprs referenced in the
794  * Schedule. */
795  void accept(IRVisitor *) const;
796 
797  /** Pass an IRMutator through to all Exprs referenced in the
798  * Schedule. */
799  void mutate(IRMutator *);
800 };
801 
802 } // namespace Internal
803 } // namespace Halide
804 
805 #endif
bool is_unordered_parallel(ForType for_type)
Check if for_type executes for loop iterations in parallel and unordered.
bool is_rename() const
Definition: Schedule.h:352
A reference to a site in a Halide statement at the top of the body of a particular for loop...
Definition: Schedule.h:203
A base class for algorithms that need to recursively walk over the IR.
Definition: IRVisitor.h:19
bool & touched()
This flag is set to true if the dims list has been manipulated by the user (or if a ScheduleHandle wa...
A schedule for a single stage of a Halide pipeline.
Definition: Schedule.h:695
A fragment of Halide syntax.
Definition: Expr.h:258
const LoopLevel & hoist_storage_level() const
At what sites should we inject the allocation and the computation of this function? The store_level must be outside of or equal to the compute_level.
Prevent evaluation beyond the original extent by shifting the tail case inwards, re-evaluating some p...
Defines the PrefetchDirective struct.
bool locked() const
FusedPair(const std::string &f1, size_t s1, const std::string &f2, size_t s2, const std::string &var)
Definition: Schedule.h:555
void accept(IRVisitor *) const
Pass an IRVisitor through to all Exprs referenced in the Schedule.
bool is_fuse() const
Definition: Schedule.h:358
A halide function.
Definition: Func.h:706
Defines DeviceAPI.
Partition
Different ways to handle loops with a potentially optimizable boundary conditions.
Expr fold_factor
If the Func is explicitly folded along this axis (with Func::fold_storage) this gives the extent of t...
Definition: Schedule.h:537
A possibly-weak pointer to a Halide function.
Definition: FunctionPtr.h:27
TailStrategy
Different ways to handle a tail case in a split when the factor does not provably divide the extent...
Definition: Schedule.h:33
bool & async()
Is the production of this Function done asynchronously.
Round up the extent to be a multiple of the split factor.
int get_stage_index() const
FuncSchedule deep_copy(std::map< FunctionPtr, FunctionPtr > &copied_map) const
Return a deep copy of this FuncSchedule.
StageSchedule(IntrusivePtr< StageScheduleContents > c)
Definition: Schedule.h:699
FuseLoopLevel(const LoopLevel &level, const std::map< std::string, LoopAlignStrategy > &align)
Definition: Schedule.h:315
std::string var
Name of the loop variable.
Definition: Schedule.h:459
ForType for_type
How are the loop values traversed (e.g.
Definition: Schedule.h:462
The dim originated from an RVar.
bool operator==(const FusedPair &other) const
Definition: Schedule.h:560
bool is_unordered_parallel() const
Could multiple iterations of this loop happen at the same time, with reads and writes interleaved in ...
Definition: Schedule.h:491
STL namespace.
Expr min
Declared min and extent of the loop.
Definition: Schedule.h:511
StageSchedule get_copy() const
Return a copy of this StageSchedule.
Defines the Partition enum.
DeviceAPI device_api
On what device does the body of the loop execute (e.g.
Definition: Schedule.h:465
int stage_index() const
Return the index of the function stage associated with this loop level.
This file defines the class FunctionDAG, which is our representation of a Halide pipeline, and contains methods to using Halide&#39;s bounds tools to query properties of it.
bool atomic() const
Use atomic update?
A class that can represent Vars or RVars.
Definition: Func.h:29
const std::map< std::string, Internal::FunctionPtr > & wrappers() const
Mark calls of a function by &#39;f&#39; to be replaced with its identity wrapper or clone during the lowering...
Guard the loads in the loop with an if statement that prevents evaluation beyond the original extent...
Guard the stores in the loop with an if statement that prevents evaluation beyond the original extent...
bool is_parallel() const
Could multiple iterations of this loop happen at the same time? Vectorized and GPULanes loop types ar...
Definition: Schedule.h:499
DimType
Each Dim below has a dim_type, which tells you what transformations are legal on it.
Definition: Schedule.h:370
bool is_parallel(ForType for_type)
Returns true if for_type executes for loop iterations in parallel.
Expr & memoize_eviction_key()
This flag is set to true if the schedule is memoized and has an attached eviction key...
const LoopLevel & compute_level() const
At what sites should we inject the allocation and the computation of this function? The store_level must be outside of or equal to the compute_level.
std::string var_name() const
TailStrategy tail
Definition: Schedule.h:331
bool is_root() const
bool override_atomic_associativity_test() const
Atomic updates are only allowed on associative reductions.
bool is_split() const
Definition: Schedule.h:355
bool & memoized()
This flag is set to true if the schedule is memoized.
void mutate(IRMutator *)
Pass an IRMutator through to all Exprs referenced in the Schedule.
DimType dim_type
The DimType tells us what transformations are legal on this loop (see the DimType enum above)...
Definition: Schedule.h:469
LoopAlignStrategy
Different ways to handle the case when the start/end of the loops of stages computed with (fused) are...
Definition: Schedule.h:137
Guard the prefetch with if-guards that ignores the prefetch if any of the prefetched region ever goes...
A schedule for a Function of a Halide pipeline.
Definition: Schedule.h:590
Base classes for Halide expressions (Halide::Expr) and statements (Halide::Internal::Stmt) ...
MemoryType memory_type() const
The memory type (heap/stack/shared/etc) used to back this Func.
static LoopLevel root()
Construct a special LoopLevel value which represents the location outside of all for loops...
void add_wrapper(const std::string &f, const Internal::FunctionPtr &wrapper)
Mark calls of a function by &#39;f&#39; to be replaced with its identity wrapper or clone during the lowering...
bool is_purify() const
Definition: Schedule.h:361
Expr alignment
The bounds allocated (not computed) must be a multiple of "alignment".
Definition: Schedule.h:527
Shift the end of the fused loops to align.
bool is_inlined() const
const std::vector< Bound > & bounds() const
You may explicitly bound some of the dimensions of a function, or constrain them to lie on multiples ...
Guard the loads and stores in the loop with an if statement that prevents evaluation beyond the origi...
The dim originated from an RVar.
const LoopLevel & store_level() const
At what sites should we inject the allocation and the computation of this function? The store_level must be outside of or equal to the compute_level.
VarOrRVar var() const
Expr bound
The bounds allocated (not computed).
Definition: Schedule.h:530
Expr modulus
If defined, the number of iterations will be a multiple of "modulus", and the first iteration will be...
Definition: Schedule.h:517
Let Halide select a storage type automatically.
Partition partition_policy
The strategy for loop partitioning.
Definition: Schedule.h:472
A bound on a loop, typically from Func::bound.
Definition: Schedule.h:505
Not visible externally, similar to &#39;static&#39; linkage in C.
compute_with will make no attempt to align the start/end of the fused loops.
bool is_rvar() const
Did this loop originate from an RVar (in which case the bounds of the loops are algorithmically meani...
Definition: Schedule.h:483
const std::vector< Bound > & estimates() const
You may explicitly specify an estimate of some of the function dimensions.
std::map< std::string, LoopAlignStrategy > align
Contains alignment strategies for the fused dimensions (indexed by the dimension name).
Definition: Schedule.h:310
Defines the internal representation of parameters to halide piplines.
bool defined() const
Shift the start of the fused loops to align.
ForType
An enum describing a type of loop traversal.
Definition: Expr.h:401
LoopLevel()
Construct an undefined LoopLevel.
static LoopLevel inlined()
Construct a special LoopLevel value that implies that a function should be inlined away...
std::string var
The var in the pure definition corresponding to this axis.
Definition: Schedule.h:523
const std::vector< Split > & splits() const
The traversal of the domain of a function can have some of its dimensions split into sub-dimensions...
bool allow_race_conditions() const
Are race conditions permitted?
std::string to_string() const
Properties of one axis of the storage of a Func.
Definition: Schedule.h:521
const std::vector< FusedPair > & fused_pairs() const
List of function stages that are to be fused with this function stage from the outermost loop to a ce...
A reference-counted handle to Halide&#39;s internal representation of a function.
Definition: Function.h:38
Equivalent to RoundUp, but protected values that would be written beyond the end by loading the memor...
bool is_rvar() const
const std::vector< PrefetchDirective > & prefetches() const
You may perform prefetching in some of the dimensions of a function.
FuncSchedule(IntrusivePtr< FuncScheduleContents > c)
Definition: Schedule.h:594
void mutate(IRMutator *)
Pass an IRMutator through to all Exprs referenced in the Schedule.
void accept(IRVisitor *) const
Pass an IRVisitor through to all Exprs referenced in the Schedule.
std::string func() const
This represents two stages with fused loop nests from outermost to a specific loop level...
Definition: Schedule.h:547
std::string func_name() const
bool match(const std::string &loop) const
const std::vector< Dim > & dims() const
The list and ordering of dimensions used to evaluate this function, after all splits have taken place...
LoopLevel & lock()
const std::vector< StorageDim > & storage_dims() const
The list and order of dimensions used to store this function.
const FuseLoopLevel & fuse_level() const
Innermost loop level of fused loop nest for this function stage.
Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory loca...
A base class for passes over the IR which modify it (e.g.
Definition: IRMutator.h:26
bool is_pure() const
Can this loop be evaluated in any order (including in parallel)? Equivalently, are there no data haza...
Definition: Schedule.h:477
std::string var
The loop var being bounded.
Definition: Schedule.h:507
const std::vector< ReductionVariable > & rvars() const
RVars of reduction domain associated with this schedule if there is any.
DeviceAPI
An enum describing a type of device API.
Definition: DeviceAPI.h:15
The Dim struct represents one loop in the schedule&#39;s representation of a loop nest.
Definition: Schedule.h:457
bool operator!=(const LoopLevel &other) const
Definition: Schedule.h:294
bool operator<(const FusedPair &other) const
Definition: Schedule.h:565
bool operator==(const LoopLevel &other) const
This dim originated from a Var.
std::string old_var
Definition: Schedule.h:326
MemoryType
An enum describing different address spaces to be used with Func::store_in.
Definition: Expr.h:348