99 resources related to Customizable Processors
- Topics related to Customizable Processors
- IEEE Organizations related to Customizable Processors
- Conferences related to Customizable Processors
- Periodicals related to Customizable Processors
- Most published Xplore authors for Customizable Processors
The world's premier EDA and semiconductor design conference and exhibition. DAC features over 60 sessions on design methodologies and EDA tool developments, keynotes, panels, plus the NEW User Track presentations. A diverse worldwide community representing more than 1,000 organizations attends each year, from system designers and architects, logic and circuit designers, validation engineers, CAD managers, senior managers and executives to researchers and academicians from leading universities.
2020 IEEE International Conference on Industrial Technology (ICIT)
ICIT focuses on industrial and manufacturing applications of electronics, controls, communications, instrumentation, and computational intelligence.
2020 IEEE International Symposium on Circuits and Systems (ISCAS)
The International Symposium on Circuits and Systems (ISCAS) is the flagship conference of the IEEE Circuits and Systems (CAS) Society and the world’s premier networking and exchange forum for researchers in the highly active fields of theory, design and implementation of circuits and systems. ISCAS2020 focuses on the deployment of CASS knowledge towards Society Grand Challenges and highlights the strong foundation in methodology and the integration of multidisciplinary approaches which are the distinctive features of CAS contributions. The worldwide CAS community is exploiting such CASS knowledge to change the way in which devices and circuits are understood, optimized, and leveraged in a variety of systems and applications.
All areas of ionizing radiation detection - detectors, signal processing, analysis of results, PET development, PET results, medical imaging using ionizing radiation
The ICASSP meeting is the world's largest and most comprehensive technical conference focused on signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and over 50 lecture and poster sessions.
Video A/D and D/A, display technology, image analysis and processing, video signal characterization and representation, video compression techniques and signal processing, multidimensional filters and transforms, analog video signal processing, neural networks for video applications, nonlinear video signal processing, video storage and retrieval, computer vision, packet video, high-speed real-time circuits, VLSI architecture and implementation for video technology, multiprocessor systems--hardware and software-- ...
Methods, algorithms, and human-machine interfaces for physical and logical design, including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, and documentation of integrated-circuit and systems designs of all complexities. Practical applications of aids resulting in producible analog, digital, optical, or microwave integrated circuits are emphasized.
Design and analysis of algorithms, computer systems, and digital networks; methods for specifying, measuring, and modeling the performance of computers and computer systems; design of computer components, such as arithmetic units, data storage devices, and interface devices; design of reliable and testable digital devices and systems; computer networks and distributed computer systems; new computer organizations and architectures; applications of VLSI ...
The design and manufacture of consumer electronics products, components, and related activities, particularly those used for entertainment, leisure, and educational purposes
EEE Embedded Systems Letters seeks to provide a forum of quick dissemination of research results in the domain of embedded systems with a target turn-around time of no more than three months. The journal is currently published quarterly consisting of new, short and critically refereed technical papers. Submissions are welcome on any topic in the broad area of embedded systems ...
2007 Design, Automation & Test in Europe Conference & Exhibition, 2007
Customizable processors are being used increasingly often in SoC designs. During the past few years, they have proven to be a good way to solve the conflicting flexibility and performance requirements of embedded systems design. While their usefulness has been demonstrated in a wide range of products, a few challenges remain to be addressed: 1) Is extending a standard core ...
2009 46th ACM/IEEE Design Automation Conference, 2009
The short time-to-market window for embedded systems demands automation of design methodologies for customizable processors. Recent research advances in this direction have mostly focused on single criteria optimization, e.g., optimizing performance though custom instructions under pre-defined area constraint. From the designer's perspective, however, it would be more interesting if the conflicting trade-offs among multiple objectives (e.g., performance versus area) are ...
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015
Automatic identification of custom instructions (CI) is the process of supporting the programmer in choosing automatically beneficial parts of the application source code that can then be synthesized and run on dedicated hardware. Identification is typically modeled as choosing a subgraph from a graph, representing the application, that has the highest speedup potential when implemented in custom hardware, and that ...
2007 Design, Automation & Test in Europe Conference & Exhibition, 2007
This paper proposes a novel algorithm that, given a data-flow graph and an input/output constraint, enumerates all convex subgraphs under the given constraint in polynomial time with respect to the size of the graph. These subgraphs have been shown to represent efficient instruction set extensions for customizable processors. The search space for this problem is inherently polynomial but, this is ...
2012 IEEE 30th International Conference on Computer Design (ICCD), 2012
Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework ...
High Throughput Neural Network based Embedded Streaming Multicore Processors - Tarek Taha: 2016 International Conference on Rebooting Computing
Raspberry Pi High Speed SerDes Characterization Platform
Challenges and Opportunities of the NISQ Processors (Noisy Intermediate Scale Quantum Computing) - 2018 IEEE Industry Summit on the Future of Computing
Architecture and Dissipation: Thermodynamic Costs of General Purposeness in von Neumann Processors: IEEE Rebooting Computing 2017
Quantum Annealing: Current Status and Future Directions - Applied Superconductivity Conference 2018
2017 IEEE Donald O. Pederson Award in Solid-State Circuits: Takao Nishitani and John S. Thompson
Going Beyond Moore's Law: IEEE at SXSW 2017
What members say about IEEE Communications Society
Development of Quantum Annealing Technology at D-Wave Systems - 2018 IEEE Industry Summit on the Future of Computing
WIE ILC: Exception to Expectation: Women in Engineering
The Prospects for Scalable Quantum Computing with Superconducting Circuits - Applied Superconductivity Conference 2018
2011 IEEE/RSE Wolfson James Clerk Maxwell Award - Marcian E. Hoff
Customizable processors are being used increasingly often in SoC designs. During the past few years, they have proven to be a good way to solve the conflicting flexibility and performance requirements of embedded systems design. While their usefulness has been demonstrated in a wide range of products, a few challenges remain to be addressed: 1) Is extending a standard core template the right way to customization, or is it preferable to design a fully customized core from scratch? 2) Is the automation offered by current toolchains, in particular generation of complex instructions and their reuse, enough for what users would like to see? 3) And when we look at the future with the increasing use of multi-processor SoCs, do we see a sea of identical customized processors, or a heterogeneous mix? We comment and elaborate here on these challenges and open questions
The short time-to-market window for embedded systems demands automation of design methodologies for customizable processors. Recent research advances in this direction have mostly focused on single criteria optimization, e.g., optimizing performance though custom instructions under pre-defined area constraint. From the designer's perspective, however, it would be more interesting if the conflicting trade-offs among multiple objectives (e.g., performance versus area) are exposed enabling an informed decision making. Unfortunately, identifying the optimal trade-off points turns out to be computationally intractable. In this paper, we present a polynomial-time approximation algorithm to systematically evaluate the design trade-offs. In particular, we explore performance-area trade-offs in the context of multi- tasking real-time embedded applications to be implemented on a customizable processor.
Automatic identification of custom instructions (CI) is the process of supporting the programmer in choosing automatically beneficial parts of the application source code that can then be synthesized and run on dedicated hardware. Identification is typically modeled as choosing a subgraph from a graph, representing the application, that has the highest speedup potential when implemented in custom hardware, and that fulfills the constraints of convexity and of a given maximum number of inputs and outputs. Existing algorithms for CI identification either enumerate all the valid subgraphs under the constraints of convexity and I/O, or return the subset of all maximal valid subgraphs with respect to convexity only. The downside of the former approach is that enumerating all valid subgraphs is costly, especially for large values of input and output constraints, while we may be interested in the subgraphs which obtain the best speedup only. Instead, the latter approach may fail to find a feasible solution, since the valid subgraphs with respect to convexity only can be too large to be useful. In this paper, we present a novel approach which attempts to fill the gap between the existing methods. In particular, we present an algorithm that enumerates the subset of all maximum valid subgraphs with respect to convexity and number of inputs and outputs. Our method revisits and combines the existing approaches and yields an algorithm which is effective and outperforms the state-of-the-art for large values of input and output constraints.
This paper proposes a novel algorithm that, given a data-flow graph and an input/output constraint, enumerates all convex subgraphs under the given constraint in polynomial time with respect to the size of the graph. These subgraphs have been shown to represent efficient instruction set extensions for customizable processors. The search space for this problem is inherently polynomial but, this is the first paper to prove this and to present a practical algorithm for this problem with polynomial complexity. The algorithm is based on properties of convex subgraphs that link them to the concept of multiple-vertex dominators. The paper discussed several pruning techniques that, without sacrificing the optimality of the algorithm, make it practical for data-flow graphs of a thousands nodes or more
Recent work has shown that hardware-based runtime monitoring techniques can significantly enhance security and reliability of computing systems with minimal performance and energy overheads. However, the cost and time for implementing such a hardware-based mechanism presents a major challenge in deploying the run-time monitoring techniques in real systems. This paper addresses this design complexity problem through a common architecture framework and high-level synthesis. Similar to customizable processors such as Tensilica Xtensa where designers only need to write a small piece of code that describes a custom instruction, our framework enables designers to only specify monitoring operations. The framework provides common functions such as collecting a trace of execution, maintaining meta-data, and interfacing with software. To further reduce the design complexity, we also explore using a high-level synthesis tool (Cadence C-to-Silicon) so that hardware monitors can be described in a high-level language (SystemC) instead of in RTL such as Verilog and VHDL. To evaluate our approach, we implemented a set of monitors including soft-error checking, uninitialized memory checking, dynamic information flow tracking, and array boundary checking in our framework. Our results suggest that our monitor framework can greatly reduce the amount of code that needs to be specified for each extension and the high-level synthesis can achieve comparable area, performance, and power consumption to handwritten RTL.
This paper describes an integer-linear-programming (ILP)-based system called custom hardware instruction processor synthesis (CHIPS) that identifies custom instructions for critical code segments, given the available data bandwidth and transfer latencies between custom logic and a baseline processor with architecturally visible state registers. Our approach enables designers to optionally constrain the number of input and output operands for custom instructions. We describe a design flow to identify promising area, performance, and code-size tradeoffs. We study the effect of input/output constraints, register-file ports, and compiler transformations such as if- conversion. Our experiments show that, in most cases, the solutions with the highest performance are identified when the input/output constraints are removed. However, input/output constraints help our algorithms identify frequently used code segments, reducing the overall area overhead. Results for 11 benchmarks covering cryptography and multimedia are shown, with speed-ups between 1.7 and 6.6 times, code-size reductions between 6% and 72%, and area costs ranging between 12 and 256 adders for maximum speed-up. Our ILP-based approach scales well: benchmarks with basic blocks consisting of more than 1000 instructions can be optimally solved, most of the time within a few seconds.
Configuration of an application-specific instruction-set processor (ASIP) through an exhaustive search of the design space is computationally prohibitive. We propose a novel algorithm that models the design space using local regressions. With only a small subset of the design space sampled, our model uses statistical inference to estimate all remaining points. We used our approach to tune a two-level cache with 19,278 legal configurations. Only 1% of the design space was simulated resulting in a 100times speedup over a brute-force approach, hi doing so, we were able to identify near optimal configurations for most benchmarks and reduce the overall power of the processor by 13.9% on average, with one benchmark as high as 53%.
Instruction-set extensible processors allow an existing processor core to be extended with application-specific custom instructions. In this paper, we explore a novel application of instruction-set extensions to meet timing constraints in real-time embedded systems. In order to satisfy real-time constraints, the worst-case execution time (WCET) of a task should be reduced as opposed to its average-case execution time. Unfortunately, existing custom instruction selection techniques based on average-case profile information may not reduce a task's WCET. We first develop an Integer Linear Programming (ILP) formulation to choose optimal instruction-set extensions for reducing the WCET. However, ILP solutions for this problem are often too expensive to compute. Therefore, we also propose an efficient and scalable heuristic that obtains quite close to the optimal results. Experiment results indicate that suitable choice of custom instructions can reduce the WCET of our benchmark programs by as much as 42% (23.5% on an average).
Extensible processors allow addition of application-specific custom instructions to the core instruction set architecture. These custom instructions are selected through an analysis of the program's dataflow graphs. The characteristics of certain applications and the modern compiler optimization techniques (e.g., loop unrolling, region formation, etc.) have lead to substantially larger dataflow graphs. Hence, it is computationally expensive to automatically select the optimal set of custom instructions. Heuristic techniques are often employed to quickly search the design space. In order to leverage full potential of custom instructions, our previous work proposed an efficient algorithm for exact enumeration of all possible candidate instructions (or patterns) given the dataflow graphs. But the algorithm was restricted to connected computation patterns. In this paper, we describe an efficient algorithm to generate all feasible disjoint patterns starting with the set of feasible connected patterns. Compared to the state- of-the-art technique, our algorithm achieves orders of magnitude speedup while generating the identical set of candidate disjoint patterns.
Customizable and extensible processors can efficiently meet the growing demand of application-specific IC device designs in performance and flexibility. Due to the increasing complexity of software applications, it is essential to automatically decide operations to be carried out in custom function units from high-level application code. This paper addresses efficient techniques for identifying application-specific instruction candidates. New pruning criterions are proposed and combined with the latest work cited in this paper to reduce the search space, resulting in a fast algorithm for enumerating all valid candidates corresponding to given micro-architectural constraints. Experimental results show that, the latest algorithm is improved in runtime by up to 50% for the case of single-output constraint and up to 18% for the case of multiple-output constraint.
No standards are currently tagged "Customizable Processors"