Software
Design Document
for the
Computational Engine (Task 1.1)
for the
Tool for Automating Estimation of
DSP Resource Statistics for Waveform Components
Submitted under Subcontract
FP-19738-430292
An Integrated Tool for SCA Waveform
Development, Testing, and Debugging and a Tool for Automated Estimation of DSP
Resource Statistics for Waveform Components
Version 1.0
Revision History
|
Version |
Summary of Changes |
Date |
|
0.1 (JN) |
Internal Release (C++
Engine) |
|
|
1.0 (JN) |
Updated for Python
Implementation |
|
|
1.1 (JN) |
Updated for new program
memory calculation |
|
1.2 Document
Purpose and Goals
5.3 String (equation) Evaluation
7 Integration
into Other Packages
Figure
2: Basic Operation of Computational Engine
Figure
3: Sample Component File
Figure
5: Sample Parameter File
Figure
6: Example Output File when Successful
Figure
7: Example Output File when Successful
Figure
8: Basic Flow of Operations in Calculation
Figure
9: VLIW Adjustment Calculation
Table
1: Failure Strings Returned by the CE in the Output File
Table
2: Functional Breakdown by File
Table
3: File Dependency Structure Matrix
This Software Design Document establishes the software design for the Computational Engine (CE) for the “Tool for Automating Estimation of DSP Resource Statistics for Waveform Components.” This document has been prepared in accordance with the requirements of Task 2 of Subcontract FP-19738-430292 as part of the project to develop “An Integrated Tool for SCA Waveform Development, Testing, and Debugging and a Tool for Automated Estimation of DSP Resource Statistics for Waveform Components.”
The project aims to develop an open source stand alone program for estimating the cycles, power and memory required to implement waveform components on arbitrary processors which can then be incorporated into OSSIE. This tool is intended as an aid when beginning a new software radio (SDR) design or when investigating porting feasibility of an existing waveform to a new platform. The intent is to provide immediately available guidance to rapidly winnow down candidate platforms to a handful for more detailed systems analysis.
The stand alone program consists of the following three principle components:
In the envisioned operation illustrated
in Figure 1, the user first defines in the GUI
For each combination of DSP and waveform
component, the GUI calls the computational engine and passes the following
information:
Using methods first described in [1],
the computational engine uses this information to estimate the following values
if the component were to be implemented on the target DSP:
This information is then passed back to
the GUI. After all user-specified component/DSP pairs have been evaluated by
successive calls to the CE, the GUI tabulates the following five tables for
each DSP/component pair:

Figure 1: Primary Components for the “Tool for Automating Estimation of DSP Resource Statistics for Waveform Components”
This document serves as the blueprint
for the software development and implementation of the Computational Engine (CE).
This document covers the design of all
the CE software components. The document focuses primarily on the Python implementation
of the CE software on a stand-alone Windows or Linux based system. This
document does not cover design for other hardware/software platforms.
Specifically, this design covers the
following aspects of the CE:
The remainder of this document is organized as follows. Section 3 describes the CE user interface. Section 4 describes the CE’s software structure (architecture) and interaction of primary components. Section 5 describes the methods to be used in the primary procedures. Section 6 provides suggestions as to how the software could be upgraded in subsequent releases. Section 7 describes how the CE could be integrated into other packages.
The computation engine is responsible for estimating the cycles, computational time, energy, program memory, and data memory which would be used by a specified parameterized waveform component when implemented on a specified DSP. The design of the CE has the following primary goals:
· Accuracy – when possible, the CE should give the best possible estimate of the desired design parameters
· Extensibility – the CE should support the use components and DSPs not envisioned when first created.
· Reusability – code which specifies a particular component should be able to be readily reused for different waveforms with a minimal impact to the user.
The basic operation of the computational engine is illustrated in Figure 2. Upon initialization, the engine evaluates the number of input arguments present. If 5 arguments are being passed, the engine operates in its normal mode wherein another program or a user has called the computation engine with files which specify the component, DSP, parameters for the particular estimation, the output file and a flag which indicates if the results should be echoed to the screen. When an argument is not assigned by the caller, the engine assigns the following default names for these files:
The engine then reads in the associated input files (component, DSP,
parameter), performs its calculation routines and prints the results to the
specified output file. The expected format for these files is documented in
Section 3. Throughout this
process, the engine tests for failure modes which may prevent the engine from
succesfully executing. Failures and their description are put in the output
file to provide detailed feedback to the entity which called the engine. A
listing of failure strings and their meanings is tabulated in Section 3.

Figure 2: Basic Operation of Computational
Engine
To facilitate the goal of extensibility, many different components and DSPs must be supported beyond initial specifications. Thus the CE does not maintain an internal compiled library of components and DSPs, but rather processes specific components and DSPs when called. As there is great variation between how components are implemented and how they map to different DSPs, a different set of calculations effectively needs to be performed for every component-DSP pair. In general the most involved calculations and most sensitive to variations of DSPs and components is the estimates of cycles. To mitigate this sensitivity, each component file specifies a number of different equations which the engine will parse and evaluate which will collectively result in the appropriate estimations. Loosely, the engine will make an estimate of the total number of operations required to support the desired component independent of the DSP and then form a cycle estimate by determining how many of these operations could be performed in parallel. Once armed with a cycle estimate, estimates of processing time, power consumption, and memory accesses are relatively trivial. A more detailed explanation of this calculation routine is given in Section 5.
Another goal of the engine is to support code (component file/DSP file) reuse to minimize the burden placed on the user. To support this goal, the equations used in the component files are variable specified equations, e.g., x + 2*(y-x) where x = 3, y =4. In this manner, the cycles required to implement a component can be parameterized for many different waveforms. This approach allows the same component code to be reused across waveforms and DSPs with the user only specifying a handful of parameters at run-time (e.g., filter length or code rate).
Such an approach requires the CE to be capable or reading and interpreting equations with variables (specified in the component file) and values (specified in the parameter file). This means that the most called routine in the CE is the routine which evaluates these variable-specified equations. In fact, the vast majority of the equations evaluated by the CE will be specified in this general fashion with only a handful of calculations (e.g., SIMD cycle reduction, power estimates, processing time estimates) will be hard coded into the engine. The method by which the CE performs this method is explained in Section 5.
In general, any estimation which the engine makes based on the programmed specifications for the component and DSP will be inferior to actually hand-coding and measuring the result. So towards the goal of maximizing accuracy from available information, the engine also supports the use of estimations based on library code which has either been developed by the user or some third-party vendor for use on specific DSPs. Thus when the CE encounters a component/DSP pair for which a known library exists, the CE abandons its normal calculation routine and instead uses the equations supplied by the library vendor.
The following describes the input interface and output
interface for the user interface.
As currently defined, the CE is called via a file interface with the following arguments:
component_file_name
dsp_file_name
parameter_file_name
output_file_name
echo_flag.
If no arguments are passed to the CE, then the following default values will be used.
component_file_name = sample_component.txt
dsp_file_name = sample_dsp.txt
parameter_file_name = sample_parameter.txt
output_file_name = sample_output.txt
echo_flag
= 1
The following describes the required formatting for the component file, the dsp file, and the parameter file. The output file is described in Section 3.2.
A component file should be formatted as illustrated in Figure 3 and described in the following. Failure to adhere to this format will result in the CE returning a relevant error message.
Figure 3: Sample Component File
Component Name: xxxxx
Pseudo-code File: xxxxxxx
Precision: 16
Requirement: 2
Fixed Saturate
Num parameters: 3
Meaning:
a: input data length
b: constraint length
c: rate
Total Operations Equation
Memory + Arithmetic + ACS +
Num Operation Types: 5
Memory
5, 3 + a
Multiplication
5000, 10
Arithmetic
2*(1+b*c)*(a+b), 3*a + b
ACS
(1+b*c)*(a+b), c - b
(1+b*c)*(a+b), c+25*a
Subtractive Modifiers: 3
(ACS) ACS: 2*a*b, 3*c
(
(Arithmetic) MAC: -2*c, 2
Synergistic Modifiers: 1
(Memory), 2, ACS VLIW, target + a*b, 3*a
Library Code: 1
ID: TMS3206201
Code file: xxxxxx
Cycles: (2*b*c + a)
Data_Memory: (2*b*c - a)
Program_Memory: (3*b*c + a)
These fields have the following restrictions.
Component Name: A string without intervening whitespace characters. The string will be discarded by the CE, but will be useful for the GUI.
Pseudo-code File: A string without intervening whitespace characters. The string will be discarded by the CE, but will be useful for the GUI.
Precision: An integer specifying the native bit field width assumed by the component. This will be a requirement tested by the CE.
Requirement: An integer specifying processor requirements (other than precision) assumed by the component. This is followed on the next line by a whitespace delimited list of requirements of length equal to the specified integer.
Num parameters: An integer specifying the number of parameters (variables) used in the component’s equations.
Meaning: A list of strings on separate lines formatted as follows.
parameter_name: description string.
The engine is responsible for parsing this string to extract the parameter names and discard the descriptions (useful in the GUI). Values for these parameters will be passed to the engine in a different file.
Total Operations Equation: A line which describes how the engine should evaluate its total operations in terms of its primary operations. The variable names used in this equation must be further specified as operation type names later in this file.
Num Operation Types: An integer which specifies the number of operation types which this component supports. This number must be at least 3. This is followed by a listing of operation type variable names with the operation equation and program memory equation following on the subsequent line separated by a comma, e.g.,
operation equation, program equation.
These equations must be parameterized in terms of parameters named in the Meaning field. Equations must be given for “Memory”, “Multiplication”, and “Arithmetic” though the equations for each may be “0”.
Subtractive Modifiers: An integer specifying the number of subtractive modifiers which will alter the results of the equations listed after the Num Operation Types field. This is followed by lines which specify the following for the number of subtractive modifiers:
Each subtractive modifier line is formatted as follows:
(target_operation_type_name) DSP_capability operations_equation,
program_equation
Synergistic Modifiers: An integer specifying the number of synergistic modifiers for the component. Synergistic modifiers are used for estimations which are dependent on the simultaneous presence of multiple modifiers which a DSP may have.
This integer will be followed by lines which define the target operation type, the number of synergistic modifiers which must be present to use the modifier (white-space delimited), and the modifier equation for operations and the modifier equation for program memory. This is formatted as:
(target_operation_type_name), number,
requisite_modifiers, operation_equation, program_equation
The synergistic modifiers are intended to either replace whatever calculation may have been performed previously for an operation type or to modify the existing result. This is supported via the keyword “target”. When the synergistic equation does not include the word “target”, the existing result for the operation type will be replaced. When “target” is included, the existing result will substituted into the equation in place of the word “target”.
Library Code: An integer which specifies the number of DSPs for which library code is known and should be used for a more accurate estimation. This is followed by a listing of known library codes formatted as follows:
ID: A text string giving the name of a DSP to which the library code belongs.
Code file: A text string giving the name of the file which documents the library code. This will be discarded by the engine.
Cycles: An equation for the number of cycles required by the DSP as parameterized by the variables listed in the Meaning field.
Data_Memory: An equation for the amount of data memory words required by the DSP as parameterized by the variables listed in the Meaning field.
Program_Memory: An equation
for the amount of program memory words required by the DSP as parameterized by
the variables listed in the Meaning field.
A component file should be formatted as illustrated in Figure 4 and described in the following. Failure to adhere to this format will result in the CE returning a relevant error message.
Figure 4: Sample DSP File
DSP ID: TMS3206202
Number Capabilities: 2
Saturate Fixed
Native Precision: 32
Clock Rate (MHz): 200
Peak Power (mW): 2500
Cycle Dividers
VLIW_flag: 1
VLIW_max: 5
VLIW Memory: 2
VLIW
VLIW Mult: 2
SIMD_flag: 1
Min Precision: 16
Cycle Modifiers
Number modifiers: 4
Circular_Addressing
Bit_Reversal_Addressing
ACS
ZOL
Each of these fields have the following restrictions.
DSP ID: A string giving the name of the DSP. This is used by the engine when evaluating if a component can make use of known library code in its estimations.
Number of Capabilities: An integer specifying the number of capabilities the DSP has. If non-zero, this will be followed on the next line by a series of white-space delimited strings listing the DSP capabilities. These will be used by the engine to evaluate if a DSP can support the component as designed.
Native Precision: An integer which describes the native bit-field width of the processor. This is used in SIMD calculations and to evaluate if a DSP can support the component as designed.
Clock Rate (MHz): A double which gives the clock rate of the DSP. This is used by the engine to calculate execution time.
Peak Power (mW): A double which gives the peak power consumption of the DSP. This is used by the engine to calculate energy consumption.
VLIW_flag: An integer taking the values 0 or 1 which indicate if the DSP should be considered a VLIW processor. VLIW indicates the ability to support the issuance and execution of multiple simultaneous instructions.
VLIW_max: An integer which specifies the maximum number of simultaneous instructions a DSP can issue. If VLIW_flag = 0, this value is discarded.
VLIW Memory: An integer which specifies the number of simultaneous data memory operations which the DSP supports. For the purposes of the CE read and write operations are treated as equivalent.
VLIW
VLIW Mult: An integer which specifies the number of units capable of multiplication present on a DSP. A MAC unit counts as a multiplier unit, not an ALU unit.
SIMD_flag: An integer taking the values 0 or 1 which indicate if the DSP should be considered an SIMD processor. SIMD indicates the ability to simultaneously operate on multiple smaller words packed into a word of the native precision. For instance performing two 16-bit multiplications with 2 32-bit words.
Min Precision: The smallest word for which the DSP can exploit SIMD.
Cycle Modifiers: A list of strings used to
characterize modifiers the DSP has. This is used by the CE to determine when an
equation should be modified for a particular DSP.
A component file should be formatted as illustrated in Figure 5 and described in the following. Failure to adhere to this format will result in the CE returning a relevant error message.
Figure 5: Sample Parameter File
Available Execution Time: 4
a: 20
b: 3
c: 5
Each of these fields have the following restrictions.
Execution Time: A double which describes the time available to execute a component. This is discarded by the CE, but is used by the GUI.
This is followed by a list of the parameters and their values. All parameter names must conform to the parameters given in the component file and there must be an equal number of parameters in the parameter file and the component file. Parameter names will be followed by a colon, white-space and then a double giving the parameter’s value. Each parameter appears on a new line.
The file output by the engine has two different output formats depending on the success of evaluating the component on the engine. If the component could be successfully implemented on the DSP, the output file will appear as shown in Figure 6. Each string will be followed by a double giving the associated estimation of the desired parameter (cycles, time, data memory, program memory, energy).
Figure 6: Example Output File when
Successful
Success
Estimated_Cycles: xxx
Estimate_Time: xxx
Estimated_Data_Memory:
xxx
Estimated_Program_Memory:
xxx
Estimated_Energy: xxx
If unsuccessful, the routine will return a value of -1 and the output file will appear as shown in Figure 7 (assuming an output file was specified). The first string will be “Failure:” and then after a white-space, a string describing the failure will be listed. The varying failure strings and their meaning are described in
Figure 7: Example Output File when
Successful
Failure: failure
string
Table
1: Failure Strings Returned by the CE in the Output File
|
Failure String |
Significance |
|
Reserved Word,
'target', used as parameter name |
‘target’ was listed as a parameter which is forbidden |
|
Insufficient
Precision |
Component’s precision > DSP’s precision |
|
DSP lacks a
requirement |
Component has a requirement for which the DSP does not
have a matching capability |
|
No Memory
Classification |
No Operation Type with the name “Memory” was found |
|
Invalid
Parameter Name in Parameter File |
Parameter in the parameter file not found in component
file |
|
Operations
List Lacks Memory |
No Operation Type with the name “Memory” was found |
|
Operations
List Lacks Multiplication |
No Operation Type with the name “Multiplication” was found |
The following presents a structural view of the PCET engine as implemented in Python in terms of the software structure and the file structure. Further details on the basic procedural flow, user interface, and design of specific functions are documented in Sections 2, 3, and 5, respectively.
The PCET engine as implemented in Python consists of the following objects (responsibilities):
Additionally, the following key routines are defined in the software:
Reviewing the basic operation of the computational engine illustrated in Figure 2, the EngineMain handles all tasks, except for the following tasks:
The PCET computational engine is implemented using the following files (responsibilities):
Each of these files implements the functions indicated and described in Table 2.
Table 2: Functional Breakdown by File
|
File |
Functions (arguments) |
Purpose |
|
engine.py |
StringInList
(num_to_search, string_list, target_string) |
Finds index corresponding
to a target_string in a string_list |
|
EngineMain(fileWave,fileProc, filePara, outputFileName) |
Attempts to map the
component described in fileWave onto the DSP described in fileProc using the
component parameters stored in filePara. Results of this mapping are stored
in outputFileName |
|
|
engine_calculate.py |
StringInList
(num_to_search, string_list, target_string) |
Finds index corresponding
to a target_string in a string_list |
|
Calculate(EC, DSP, params) |
Calculates the cycles,
memory and energy usage based on the component information stored in the
Estimator Component (EC), DSP (DSP), and params.. |
|
|
EvaluateString(input_string,
L, var_names, var_values) |
Evaluates a string
(equation) which contains variables by substituting in passed in values for
the variable names. Assumes that var_values[index] corresponds to
var_names[index] |
|
|
AdjustVLIW(EC, DSP, cycle_classes) |
Performs the VLIW
calculations for the calculation routine. Initial and stored results in
cycle_classes. |
|
|
engine_dsp.py |
ClassDSP
:: __init__(self, filename) |
Initializes a ClassDSP
object using the data in filename (see Section 3.1.2) |
|
ClassDSP
:: has_capability(self, s) |
Checks to see if ClassDSP
object has a string in its capability_list which matches the string s.
Returns true if successful, false otherwise. |
|
|
ClassDSP ::has_modifier(self,
s) |
Checks to see if ClassDSP
object has a string in its subtractive_modifiers list which matches the
string s. Returns true if successful, false otherwise. |
|
|
engine_para.py |
ClassPara
:: __init__(self, filename) |
Initializes a ClassPara
object using the data in filename (see Section 3.1.3) |
|
engine_wave.py |
ClassPara
:: __init__(self, filename) |
Initializes a ClassWave
(component) object using the data in filename (see Section 3.1.1) |
The engine.py and engine_calculate.py files both depend on
importing (including) other files from the computational engine. The
dependencies between all files and python modules in the computational engine
can be visualized as shown in the dependency structure matrix shown in Table 3 where an x in
a cell indicates that the file named in the cell’s row depends on the file
named in the cell’s column. An empty cell indicates that no dependency
relationship exists between the files.
Table 3: File Dependency Structure Matrix
|
|
os |
string |
math |
engine_dsp.py |
engine_para.py |
engine_wave.py |
engine_calculate.py |
engine.py |
|
os |
|
|
|
|
|
|
|
|
|
string |
|
|
|
|
|
|
|
|
|
math |
|
|
|
|
|
|
|
|
|
engine_dsp.py |
x |
x |
|
|
|
|
|
|
|
engine_para.py |
x |
x |
|
|
|
|
|
|
|
engine_wave.py |
x |
x |
|
|
|
|
|
|
|
engine_calculate.py |
|
|
x |
x |
x |
x |
|
|
|
engine.py |
x |
|
|
x |
x |
x |
x |
|
The following are key methods to be developed in this project and documented in the following subsections:
Other methods (such as File I/O) are considered to be sufficiently documented by the user interface description in Section 3.
This method is implemented in the file engine_calculate.py in the function Calculate.
This is the primary
computational method of the program. It returns true if successful and false if
it fails. If successful it sets the following values:
double cycles
double computation_time
double data_memory;
double prog_memory;
double energy;
The basic flow of the
component estimation routine is shown in Figure 8. The process begins by first verifying that the target
DSP satisfies all requirements specified by the component (specifically listed
requirements and precision). If successful, the process then checks to see if
the component has a library file for the target DSP, thereby enabling
cycle-accurate estimations when known. If a library is known for the target
DSP, the associated equations for cycles, data memory, and program memory are
then evaluated.
If no library is known,
a longer procedure is required. First, the process evaluates the Operation Type
equations to make a raw estimate of operations.
Modifiers
Then the process runs
through its list of known modifiers and for each modifier in its list for which
the DSP is capable of implementing, the DSP evaluates the modifier’s equation
and subtracts the result from the target operation type estimate. This is then
repeated for program memory. This step models the effect of specialized
circuits in DSPs such as MACs (Multiply-and-Accumulate) and Single-cycle
butterfly circuits which permit multiple operations in a single cycle.
SIMD
The process then checks
to see if the DSP supports SIMD (Single-Instruction-Multiple-Data). When a DSP
supports SIMD, it is capable of processing several smaller packed words inside
of its native word width. For example a component designed for 16-bit precision
implemented on a 32-bit SIMD DSP could pack 2 16-bit words into each operation.
To model this effect, the process checks to ensure that the DSP’s precision is
greater than the component’s precision, and if so, divides all operation type
cycle estimates and program memory estimates by the ratio of the DSP’s
precision divided by the larger of the DSP’s minimum precision and the
component precision. This can be expressed as the following equation.
Operations = Operations / floor(DSP_precision/max(DSP_min_precision,
component_precision))
Program Memory = Program Memory / floor(DSP_precision /
max(DSP_min_precision, component_precision))

Figure 8: Basic Flow of Calculation Process
Memory
The process then
calculates the required data memory. Data memory is given by the number of
remaining operations of type “Memory” multipled by the by the DSP’s bit field
width. These steps can be expressed as shown in the following equations.
Data Memory = Memory cycles (before VLIW) x DSP Bit Field
Width
VLIW
Note that these memory
estimates are performed before applying adjustments for VLIW (Very-long
instruction word). This is because VLIW is the process of simultaneously
dispatching and executing multiple instructions and does not alter the number instructions
used in the implementation nor the number of accesses to data memory. However,
VLIW does reduce the number of cycles required to implement a component by
facilitating the simutaneous execution of multiple operations. Because the
impact of VLIW is highly dependent on the types of instructions used by a
component, the numbers and types of functional units available on the DSP, and
the maximum number of simultaneously dispatchable instructions, adjusting
cycles for VLIW is a complicated process and is addressed in detail in Section 5.2.
Synergistic Effects
After adjusting the
estimated operations for VLIW, the process then considers the synnergistic
effects which occur when combinations of modifiers are present. These modifiers
may include the component’s specified modifier list as well as if the DSP uses
SIMD and/or VLIW. For each synnergistic modifier possibility, there is a set of
required modifiers, a target operation type and equations for adjusting the cycle
and program memory values in that operation type. When a DSP satisfies all of
the required modifiers, the associated synnergistic equations are evaluated.
Unlike other equations,
synergistic equations support a variable name in addition to the parameter values
defined in the component – “target”.
When “target” is not present, the synnergistic equation replaces the value
in the operational type estimate. When “target” is present, the previous value
for the operational type should be substituted into the equation and evaluated.
Note that when multiple synnergistic equations target the same operational
type, they are evaluated in the order listed in the component file.
Cycle Estimate
After evaluating the synnergistic
modifiers, a cycle estimate is formed by summing all operation types
corresponding to cycle estimates.
Program Memory Estimate
After evaluating the
synnergistic modifiers, a program memory estimate is formed by summing all
operation types corresponding to program memory estimates. This is then
multiplied by the DSP Bit Field Width.
Final Calculations
The process concludes
by calculating the required computation time and energy consumption. Computation
time is given by the number of cycle multipled by the DSP clock rate or
computation_time = cycles * DSP clock rate
For example 100 cycles
at a clock rate of 1 MHz requires 100 * 1 e-6 = 100 us.
Energy is calculated as
the peak DSP power consumption multiplied by the estimated computation time.
This formulation operates under the assumption that while the DSP is in
calculation, it will be performing as many operations as it possibly can.
energy = computation_time x Peak DSP power
consumption
For example 100 us on a
DSP which has a peak power consumption of 1000 mW is 100 us * 1000 mW = 100 uJ.
This method is implemented in the file engine_calculate.py in the function AdjustVLIW.
When
adjusting the estimated cycles under VLIW, the system should have a vector of
cycle estimations with each element of the vector associated with one of the
declared Operational Types. Two elements of this vector must be “Memory” and
“Multiplication”. This process classifies the remaining operations as “Other”
which loosely corresponds to operations which would be expected to be performed
by Arithmetic Logic Units (
As illustrated in Figure 9, the process begins by verifying that the target DSP supports VLIW operations (simultaneous execution of multiple instructions). If successful, it then identifies the cycles associated with the three broad classifications (Multiplication, Memory, and Other) and fetches the maximum number of VLIW instructions the DSP can dispatch (stored in d), the maximum number of simultaneous memory instructions, multiplication instructions, and ALU (other instructions) (stored in d1, d2, d3, respectively). The cycle classes are then adjusted dividing by the classes by the smaller of the maximum number of all types of instructions and the maximum number of class-specific instructions.
A
3-bit flag vector (CF) is then formed by checking to see if each class of
cycles has nonzero cycles (true = 1, false = 0). The eight possible
combinations are then evaluated to determine if the DSP could simultaneously
support multiple classes of instructions. When no or only one class of cycle is
present (CF = 0,1,2,4), no further adjustments are made. When only two classes
of cycles are present (CF = 3,5,6), simultaneous execution of both classes is
checked for feasibility. If feasible, they are assumed to be implemented
simultaneously, thereby reducing the required cycles by a factor of 2.

Figure 9: VLIW Adjustment Calculation
When all three classes of cycles are present (CF = 7) a more complicated process is required as shown in Figure 10. If all three classes can be feasibly implemented, they are assumed to be implemented simultaneously, thereby reducing the required cycles by a factor of 3. If this is not possible, then the routine attempts to find the two classes of cycles with the largest number of cycles present. The largest feasible pair of classes is assumed to be implemented simultaneously, thereby reducing the required cycles for those classes by a factor of 2.

Figure 10: VLIW processing for the case
where Memory, Multiplication, and
This method is
implemented in the file engine_calculate.py in the function EvaluateString.
This method is
responsible for evaluating a string which corresponds to one of the equations
in use in the component definition. It is intended to support the run-time
evaluation of equations with arbitrary variable names and values. The basic structure
of the function takes the following form:
double
= EvaluateString(string, num_variables,
variable_names, variable_values)
This method evaluates an input string. The method parses the
input string to replace variables identified in variable_names list with the
values (doubles) in variable_values and evaluates the resulting equation. This
method is quite complicated to implement in C++, but is trivial in Python when
using the eval() method. By passing in the variable num_variables, we are able
to work with substitutions of subsets of the variables. This simplifies the
handling of synergistic equations.
The PCET computational engine can be readily extended to support additional DSPs and components, FPGAs, different methods for interfacing with the program, and new calculations.
The PCET computational engine was designed to facilitate the estimation of computational resources for DSPs and components not included in its initial deployment. These can be created simply by following the file formats described in Sections 3.1.1 and 3.1.2 to describe the new components and DSPs in a manner the engine understands.
The suggested method for creating a new component file is the following.
The suggested method for creating a new DSP file is the following.
Adding support for FPGAs to the calculation engine will not be a straight forward process as the use a radically different architecture than DSPs, but then exhibit much less variation in processing fabric between FPGAs mostly differing in routing structures and what is embedded into the fabric (some include multipliers, some have full processors, and some have co-processors like DSPs). Ignoring routing, placement, and co-processor considerations, similar resources could be expected to be used when moving from one FPGA to another, but radically different results will be seen when different relative weights are given to area, speed, and memory in the optimization processes.
It seems likely that adding FPGAs will sufficiently different that referring to the process as an “upgrade” would be misleading. However, it seems reasonable that the same basic input interface with the GUI could be preserved (specifying target processor, component, and parameters), though the output interface might have to change (to specify fabric and embedded elements being consumed in the mapping and eliminate program memory).
It later incarnations, it may be desired to change the current engine interface. In general, this will require modifying the main routine.
The main routine can be eliminated all together if the component/DSP files were hard coded as Python objects. The interfacing entity would then act like the main routine in passing the objects to the calculation routine.
A separate routine is included to handle the component, DSP, and parameter files. Changing these formats requires that these routines be modified. Support for multiple file formats could be defined by implementing new file handling routines for the respective data sources and then adding an additional argument to the main routine to specify file formats or by writing a master file handling routine for each which autodetects the file format being used.
While the engine is designed to operate as a stand-alone package, there are two relatively straight forward methods for integrating the engine into other packages.
The engine can be easily integrated into any external package by using the main engine interface. In such a scenario the external package calls the engine while specifying the component, DSP, and parameter files the engine should use for its estimation and then reads the resulting output file.
This can be accomplished by using the input interface specified in Section 3.1 and calling a compiled version of the engine or by directly calling the main routine from within the Python environment. This approach requires the least amount of understanding of the innerworkings of the engine.
Python based external packages can more directly interface with the engine code and can in fact directly call the component, DSP, parameter, and calculation routines. When speed is of critical importance (i.e., when real-time estimates are needed), directly calling these routines would be the preferred method.
In such a setup, the external package should maintain component, DSP, and parameter objects instead of component, DSP, and parameter files, directly call calculate routine, and then access the results stored in the component object.
A GUI will typically be responsible for:
Either integration approach could be used, but because of the non-real-time nature and the possibly much greater set of DSPs that would need to be managed than would be expected in real-time applications embedded on a radio, the simpler file interface method appears preferable.
Consider a cognitive SDR speculatively trying different parameterizations as guided by a genetic algorithm. To evaluate the fitness of various solutions these novel waveform parameterizations are tested in internal models and the results measured. In this case, the rapid estimation of the PCET computational engine is an ideal model for estimating the impact of possible adaptations on power consumption and to estimate the feasibility of waveforms generated by the cognitive engine. Because of the critical requirement for speed (needed to track and exploit rapidly changing operating conditions), this would likely need to be done via by directly managing engine classes.
However, the inherent speed limitations of Python imply that the engine will likely need to be ported to a different language more amenable to embedded processing to make it suitable for integration with deployed cognitive engines.
The required execution time for a waveform partitioner falls somewhere between that of a design GUI and that of an embedded cognitive engine. When the partitioner is part of a design tool, it’s operation is analogous to that of the GUI, except that the set of available processors is predefined. In such a case the main file-handling interface should be used to minimize development time. In an embedded system, execution time is more critical than code-development time so interfacing with the engine by directly managing cognitive engine classes seems more appropriate, especially in light of the fact that potential DSPs will largely be unchanged from instantiation to instantiation and a unique parameterization for each component will already exist.
However, because of the generally long times required to switch between waveforms, this is likely not an absolute necessity.
[1]
J. Neel, P. Robert, J. Reed, "A formal methodology for estimating
the feasible processor solution space for a software radio,” SDR Forum
Technical Conference 2005, paper # 1.2-03. Available online: http://www.sdrforum.org/pages/sdr05/1.2%20Reconfigurable%20Hardware/1.2-03%20Neel%20et%20al.pdf