SmartHLS Pragmas Manual

SmartHLS accepts pragma directives in the source code to guide the hardware generation. This reference section explains all of the pragmas available for SmartHLS.

The pragmas follow the following syntax:

#pragma HLS <category> <feature> <parameter>(<value>)

The category refers to the general usage class of the pragma. Each pragma has one of the following categories:

  • function: configure a hardware function.
  • loop: configure loop optimizations.
  • interface: configure hardware interfaces (arguments / global variables).
  • memory: configure hardware memory implementation.

Each category can have different configurable features. Some categories / features have parameters to passed to the pragma. A parameter can be optional with a default behaviour if not specified.

The value of a parameter can be either integer, boolean (true|false), name (variable / argument), or a set of pre-specified values.

Note

For integer parameters, the user is allowed to use constants (or expressions of constants) defined using #define directive. For example, this is allowed:

#define N 10
void fun() {
  #pragma HLS loop unroll factor(N+1)
  for (int i = 0; i < 100; i++)
    ...
}

The pragma position is not arbitrary and placing the pragma in an incorrect position will cause an error. Each pragma can has one of the following positions:

  • At the beginning of function definition block before any other statements.
  • Before global / local variable declaration.
  • Before loop block.

Pipeline Loop

Syntax

#pragma HLS loop pipeline II(<int>)

Description

This pragma enables pipelining for a given loop in the code. Loop pipelining allows a new iteration of the loop to begin before the current one has completed, achieving higher throughput. It can be specified to pipeline a single loop or a nested loop. If specified on a single loop or on a inner loop of a nested loop, that loop will be pipelined. If specified on the outer loop of a nested loop, the outer loop will be pipelined and all of its inner loops will be automatically unrolled.

Parameters

Parameter Value Optional Default Description
II Integer Yes 1 Pipeline initiation interval

Position

Before the beginning of the loop. If there is a loop label, the pragma should be placed after the label.

Examples

#pragma HLS loop pipeline II(2)
for (int i = 0; i < 10; i++) {
  ...
}
LOOP_LABEL:
#pragma HLS loop pipeline
while (i < 10) {
  ...
}

Unroll Loop

Syntax

#pragma HLS loop unroll factor(<int>)

Description

Specifies a loop to be unrolled. The factor indicates how many times to unroll the loop. If it is not specified, or specified as N (the total number of loop iterations), the loop will be fully unrolled. If it is specified as 2, the loop will be unrolled 2 times, where the number of loop iterations will be halved and the loop body will be replicated twice. If it is specified as 1, the loop will NOT be unrolled.

Parameters

Parameter Value Optional Default Description
factor Integer Yes N (fully unroll) Unroll count

Position

Before the beginning of the loop. If there is a loop label, the pragma should be placed after the label.

Examples

#pragma HLS loop unroll factor(1)
for (int i = 0; i < 10; i++) {
  ...
}

Bound Loop

Syntax

#pragma HLS loop bounds lower(<int>) upper(<int>)

Description

Specifies the bounds on the number of times a loop will iterate. This pragma does not affect the synthesized circuit. It’s only used to improve reporting. By manually specifying bounds, the overall loop latency can be computed and reported.

Parameters

Parameter Value Optional Default Description
lower Integer No   Lower bound
upper Integer No   Upper bound

Position

Before the beginning of the loop. If there is a loop label, the pragma should be placed after the label.

Examples

#pragma HLS loop bounds lower(5) upper(10)
// Here, the bounds tell us the trip count (number of iterations) will be between 5 and 10 inclusively.
for (int i = 0; i < N; i++) {
  ...
}

Inline Function

Syntax

#pragma HLS function inline

Description

This pragma forces a given function to be inlined.

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function inline
  ...
}

Noinline Function

Syntax

#pragma HLS function noinline

Description

This pragma prevents a given function from being inlined.

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function noinline
  ...
}

Replicate Function

Syntax

#pragma HLS function replicate

Description

This pragma specifies a function to be replicated every time it is called. By default, when the circuit is not pipelined, SmartHLS creates a single instance for each function which is shared across multiple calls to the function. When using this pragma on the function, SmartHLS will create a new instance of the function for every function call.

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function replicate
  ...
}

Flatten Function

Syntax

#pragma HLS function flatten branchless(true|false)

Description

This pragma unrolls all loops and inlines all subfunctions for a given function. If the branchless option is set to true, all branches (e.g., if-else, switch) in the specified function will also be flattened to allow more parallelism between operations, specifically between the operations that are under different-yet-independent conditions.

Parameters

Parameter Value Optional Default Description
branchless true|false Yes true true to flatten branch statements

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function flatten branchless(true)
  ...
}

Set Top-Level Function

Syntax

#pragma HLS function top

Description

This pragma specifies the top-level C/C++ function. The top-level function and all of its descendant functions will be compiled to hardware.

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function top
  ...
}

Pipeline Function

Syntax

#pragma HLS function pipeline II(<int>)

Description

This pragma enables pipelining for a given function in the code. Function pipelining allows a new invocation of a function to begin before the current one has completed, achieving higher throughput.

Parameters

Parameter Value Optional Default Description
II Integer Yes 1 Pipeline initiation interval

Position

At the beginning of the function definition block.

Examples

int sum(int *a) {
#pragma HLS function pipeline
  ...
}
int conv(int a[], int b[]) {
#pragma HLS function pipeline II(3)
  ...
}

Dataflow

Syntax

#pragma HLS function dataflow

Description

Specifies that a function should execute using dataflow parallelism.

Position

At the beginning of the function definition block.

Examples

void canny(hls::FIFO<unsigned char> &input_fifo,
           hls::FIFO<unsigned char> &output_fifo) {
#pragma HLS function dataflow

#pragma HLS dataflow_channel variable(output_gf) type(fifo)
    unsigned char output_gf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_sf) type(fifo)
    unsigned short output_sf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_nm) type(fifo)
    unsigned char output_nm [HEIGHT * WIDTH];

    gaussian_filter(input_fifo, output_gf);
    sobel_filter(output_gf, output_sf);
    nonmaximum_suppression(output_sf, output_nm);
    hysteresis_filter(output_nm, output_fifo);
}

Dataflow Channel

Syntax

#pragma HLS dataflow_channel variable(<var_name>) type(fifo|shared_buffer|double_buffer) depth(<int>)

Description

Specifies that a variable in a dataflow function should be converted to use a particular channel type.

Parameters

Parameter Value Optional Default Description
variable string No   Variable Name
type fifo|shared_buffer|double_buffer No double_buffer Channel type
depth Integer Yes 2 FIFO depth (only when type is FIFO)

Position

Before the variable declaration.

Examples

FIFO channels are used for a Canny edge detector, since the pixels are processed in sequential order.

void canny(hls::FIFO<unsigned char> &input_fifo,
           hls::FIFO<unsigned char> &output_fifo) {
#pragma HLS function dataflow

#pragma HLS dataflow_channel variable(output_gf) type(fifo)
    unsigned char output_gf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_sf) type(fifo)
    unsigned short output_sf [HEIGHT * WIDTH];
#pragma HLS dataflow_channel variable(output_nm) type(fifo)
    unsigned char output_nm [HEIGHT * WIDTH];

    gaussian_filter(input_fifo, output_gf);
    sobel_filter(output_gf, output_sf);
    nonmaximum_suppression(output_sf, output_nm);
    hysteresis_filter(output_nm, output_fifo);
}

Scalar Argument Interface

Syntax

#pragma HLS interface argument(<arg_name>) type(<simple|axi_target>) stable(<false|true>)

Description

This pragma configures the RTL interface for a scalar argument. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
argument string No   Argument name
type simple|axi_target No simple Interface type
stable true|false Yes false Only available for simple type; true if the argument is stable.

Position

At the beginning of the function definition block.

Examples

int fun(int a) {
#pragma HLS function top
#pragma HLS interface argument(a) type(simple) stable(true)
  ...
}

Memory Interface for Pointer Argument/Global Variable

Syntax

#pragma HLS interface argument(<arg_name>) type(memory) num_elements(<int>)

#pragma HLS interface variable(<var_name>) type(memory) num_elements(<int>)

Description

This pragma specifies the memory interface type for a pointer (including array, struct, class types) argument or shared global variable. For more details, see the memory interface section. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
argument/variable string No   Argument/Variable name
type memory No   Interface type
num_elements integer Yes   Specifies the number of elements of the argument array. Can override the array size in the argument.

Position

Argument - At the beginning of the function definition block.

Variable - Before the global variable declaration.

Examples

#pragma HLS interface variable(c) type(memory) num_elements(100)
int c[100];

int fun(int a[], int b[]) {
#pragma HLS function top
#pragma HLS interface argument(a) type(memory) num_elements(100)
#pragma HLS interface argument(b) type(memory)
  ...
}

AXI4 Target Interface for Pointer Argument

Syntax

#pragma HLS interface argument(<arg_name>) type(axi_target) num_elements(<int>) dma(true|false) requires_copy_in(true|false)

Description

This pragma specifies the AXI4 target interface type for a pointer (including array, struct, class types) argument. For more details, see the AXI4 Target Interface section. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
argument string No   Argument name
type axi_target No   Interface type
num_elements integer Yes   Specifies the number of elements of the argument array. Can override the array size in the argument.
dma, requires_copy_in true|false Yes false Specifies the transfer method and copy-in behaviour in the top-level driver function. See Top-level Driver Options in Pointer Arguments’ AXI4 Target Interface Pragma

Position

At the beginning of the function definition block.

Examples

int fun(int a[], int b[101]) {
#pragma HLS function top
#pragma HLS interface argument(a) type(axi_target) num_elements(100) dma(true) requires_copy_in(false)
#pragma HLS interface argument(b) type(axi_target)
  ...
}

AXI4 Initiator Interface for Pointer Argument

Syntax

#pragma HLS interface argument(<arg_name>) type(axi_initiator) ptr_addr_interface(<simple|axi_target>) num_elements(<int>) max_burst_len(<int>) max_outstanding_reads(<int>) max_outstanding_writes(<int>)

Description

This pragma specifies the AXI4 initiator interface type for a pointer (including array, struct, class types) argument. For more details, see the AXI4 Initiator Interface section. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
argument string No   Argument name
type axi_target No   Interface type
num_elements integer Yes   Specifies the number of elements of the argument array. Can override the array size in the argument. Only needed by the SW/HW Co-Simulation feature and does not affect HLS-generated RTL.
ptr_addr_interface simple|axi_target Yes   Specifies the interface type for setting the base address of the accessing memory. The default type is simple but is changed to axi_target if Default All Interface to Use AXI4 Target is set.
max_burst_len integer Yes 16 Specifies the maximum burst length for each AXI initiator transaction. Transfers that are larger than the maximum burst length will be split into multiple AXI transactions. The permitted values are between 1 and 256.
max_outstanding_reads integer Yes 1 Specifies the maximum amount of read burst requests to external AXI4 Targets that can be left outstanding (waiting for response) before the accelerator stalls. This infers a FIFO of size max_outstanding_reads*addr_size for the AR channel, and a FIFO of size max_outstanding_reads*max_burst_len*word_size for the R channel. The permitted values are between 1 and 8.
max_outstanding_writes integer Yes 1 Specifies the maximum amount of write burst requests to external AXI4 Targets that can be left outstanding (waiting for response) before the accelerator stalls. This infers a FIFO of size max_outstanding_writes*addr_size for the AW channel, and a FIFO of size max_outstanding_writes*max_burst_len*word_size for the W channel. The permitted values are between 1 and 8.

Position

At the beginning of the function definition block.

Examples

int fun(int a[]) {
#pragma HLS function top
#pragma HLS interface argument(a) type(axi_initiator) num_elements(100) ptr_addr_interface(axi_target) max_burst_len(8)
#pragma HLS interface argument(a) type(axi_initiator) num_elements(100) max_outstanding_reads(4) max_outstanding_writes(4)
  ...
}

Legacy AXI4 Slave Interface for Global Variable

Syntax

#pragma HLS interface variable(<var_name>) type(axi_slave) concurrent_access(true|false)

Description

This pragma specifies the legacy AXI4 slave interface for a global struct. When the concurrent_access option is set to true (default to false), the external logic can read/write the AXI4 slave interface while the SmartHLS module is running. The concurrent access will however reduce the SmartHLS module’s throughput to access the memory. For more details. see the Legacy AXI4 Slave Interface section. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
variable string No   Variable name
type axi_slave No   Interface type
concurent_access true|false Yes false Enable/disable concurrent access

Position

Before the global variable declaration.

Examples

#pragma HLS interface variable(b) type(axi_slave) concurrent_access(true)
int b[SIZE]

Module Control Interface

Syntax

#pragma HLS interface control type(<simple|axi_target>)

Description

This pragma configures the module control interface. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
type simple|axi_target No simple Interface type

Position

At the beginning of the function definition block.

Examples

int fun() {
#pragma HLS function top
#pragma HLS interface control type(simple)    ...
}

Default All Interfaces to Use AXI4 Target

Syntax

#pragma HLS interface default type(axi_target)

Description

This pragma specifies the default interface to AXI4 target for all arguments and module control. This pragma is ignored if the enclosing function is not specified as the top-level.

Parameters

Parameter Value Optional Default Description
type axi_target No   Interface type

Position

At the beginning of the function definition block.

Examples

// The following two functions have the same interface configurations.

// Without using default interface pragma:
int fun(int a, int b[10], int c[20], int d[30]) {
#pragma HLS function top
#pragma HLS interface control     type(axi_target)
#pragma HLS interface argument(a) type(axi_target)
#pragma HLS interface argument(b) type(axi_target)
#pragma HLS interface argument(c) type(axi_target) dma(true)
#pragma HLS interface argument(d) type(axi_initiator) ptr_addr_interface(axi_target)
  ...
}

// Use default interface pragma:
int fun(int a, int b[10], int c[20], int d[30]) {
#pragma HLS function top
#pragma HLS interface default     type(axi_target)
#pragma HLS interface argument(c) type(axi_target) dma(true)
// Note that 'ptr_addr_interface(axi_target)' can be omitted when default interface is set to axi_target.  #pragma HLS interface argument(d) type(axi_initiator)
  ...
}

Contention-Free Memory Access

Syntax

#pragma HLS memory impl variable(<arg_name>) contention_free(true|false)

Description

The pragma is to be used for variables accessed by parallel functions (hls::thread) so that SmartHLS does not create arbiters for the specified variable. The specified variable can still be accessed by multiple concurrently running functions, but without contention. It will be the users’ responsibility to ensure at most one function may access the shared variable in a clock cycle. If not specified, by default, SmartHLS creates arbiters for variables that are accessed by parallel functions.

Parameters

Parameter Value Optional Default Description
variable string No   Variable Name
contention_free true|false Yes false true for contention-free access

Position

Before the global / local variable declaration.

Examples

#pragma HLS memory impl variable(b) contention_free(true)
int b[100]

Struct Variable/Argument Packing

Syntax

#pragma HLS memory impl variable(<var_name>) pack(bit|byte) byte_enable(true|false)

#pragma HLS memory impl argument(<arg_name>) pack(bit|byte) byte_enable(true|false)

Description

The pragma is to be used to pack a global interface / local memory variable with struct type. There are two packing modes: bit / byte where bit packing packs the struct fields using the exact bit-width and byte mode packs the fields with 8-bit alignment. byte_enable option creates an interface / memory with byte enable signals to write individual fields when set to true. Note that byte_enable is only valid with byte packing.

Parameters

Parameter Value Optional Default Description
variable/argument string No   Variable/Argument Name
pack bit|byte No   Packing Mode
byte_enable true|false Yes false Use byte-enable to write struct fields

Position

Argument - At the beginning of the function definition block.

Variable - Before the global / local variable declaration.

Examples

#pragma HLS memory impl variable(b) pack(bit)
struct S b[100];

int sum(struct S &s) {
#pragma HLS function top
#pragma HLS memory impl argument(s) pack(byte) byte_enable(true)
 ...
}

Partition Memory

Syntax

#pragma HLS memory partition variable(<var_name>) type(block|cyclic|complete|struct_fields|none) dim(<int>) factor(<int>)

Description

This pragma specifies a variable to be partitioned. Dimension 1 corresponds to the left-most dimension of an array and higher dimensions correspond to right-ward dimensions. The dim parameter is only applicable for block|cyclic|complete types. If dim is 0, the specified partitioning will be applied to all dimensions. The factor parameter is only applicable for block|cyclic types to specify the number of partitions. factor must be larger than 1. For more details about the pragma options, see User-Specified Memory Partitioning.

Parameters

Parameter Value Optional Default Description
variable string No   Variable name
type block, cyclic, complete, struct_fields, none Yes complete Partition type
dim integer Yes 0 Partition dimension
factor integer Yes   Number of partitions

Position

Before the global / local variable declaration.

Examples

#pragma HLS memory partition variable(b) type(none)
int b[100];

int fun(int *a) {
  ...
  #pragma HLS memory partition variable(c) type(block) dim(1) factor(2)
  int c[100][100];
  ...
}

Partition Top-Level Interface

Syntax

#pragma HLS memory partition argument(<arg_name>) type(block|cyclic|complete|struct_fields|none) dim(<int>) factor(<int>)

Description

This pragma specifies a top-level argument to be partitioned. Dimension 1 corresponds to the left-most dimension of an array and higher dimensions correspond to right-ward dimensions. The dim parameter is only applicable for block|cyclic|complete types. If dim is 0, the specified partitioning will be applied to all dimensions. The factor parameter is only applicable for block|cyclic types to specify the number of partitions. factor must be larger than 1. For more details about the pragma options, see User-Specified Memory Partitioning.

Parameters

Parameter Value Optional Default Description
argument string No   Argument name
type block, cyclic, complete, struct_fields, none Yes complete Partition type
dim integer Yes 0 Partition dimension
factor integer Yes   Number of partitions

Position

At the beginning of the function definition block.

Examples

int sum(int *a, int *b) {
#pragma HLS function top
#pragma HLS memory partition argument(a) type(cyclic) dim(2) factor(4)
#pragma HLS memory partition argument(b)
}

Replicate ROM

Syntax

#pragma HLS memory replicate_rom variable(<rom_var_name>) max_replicas(<int>)

Description

This pragma can be used to replicate constant memory (i.e., arrays) to achieve better throughput (shorter cycle latency) at the expense of extra resource (e.g., block RAM). Typically when an array is implemented in block RAMs, there are up-to-two RAM ports to allow a maximum of two reads per clock cycle. To allow more parallel read accesses in each clock cycle, constant read-only memories (ROM) can be replicated by using this pragma.

The optional max_replicas can be used to control the maximum number of replicas. If a max_replicas of N is specified, SmartHLS will make sure to use no more than N replicas of the ROM in the generated circuit; the generated circuit may use less than N replicas when the throughput cannot be further improved with more replicas. When max_replicas is unspecified or set to 0, the number of replicas is unlimited and SmartHLS will use as many replicas as it needs to maximize throughput. A max_replicas of 1 means only one copy is allowed, hence no replication, equivalent to not having the pragma.

Parameters

Parameter Value Optional Default Description
max_replicas integer Yes 0 The maximum number of replicas allowed

Position

Before the global / local variable declaration.

Examples

#pragma HLS memory replicate_rom variable(my_rom) max_replicas(10)
const int my_rom[100]