Conferences related to Checkpointing

Back to Top

2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)

Computer Architecture


2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM)

All topics related to engineering and technology management, including applicable analytical methods and economical/social/human issues to be considered in making engineering decisions.


2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Cluster Computing, Grid Computing, Edge Computing, Cloud Computing, Parallel Computing, Distributed Computing


2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Bring together researchers from architecture, compilers, applications and languages to present and discuss innovative research of common interest.


2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Promote the exchange of ideas between academia and industry in the field of computer and networks dependability


More Conferences

Periodicals related to Checkpointing

Back to Top

Computer Architecture Letters

Rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessors computer systems, computer architecture workload characterization, performance evaluation and simulation techniques, and power-aware computing


Computers, IEEE Transactions on

Design and analysis of algorithms, computer systems, and digital networks; methods for specifying, measuring, and modeling the performance of computers and computer systems; design of computer components, such as arithmetic units, data storage devices, and interface devices; design of reliable and testable digital devices and systems; computer networks and distributed computer systems; new computer organizations and architectures; applications of VLSI ...


Dependable and Secure Computing, IEEE Transactions on

The purpose of TDSC is to publish papers in dependability and security, including the joint consideration of these issues and their interplay with system performance. These areas include but are not limited to: System Design: architecture for secure and fault-tolerant systems; trusted/survivable computing; intrusion and error tolerance, detection and recovery; fault- and intrusion-tolerant middleware; firewall and network technologies; system management ...


Distributed Systems Online, IEEE

After nine years of publication, DS Online will be moving into a new phase as part of Computing Now (http://computingnow.computer.org), a new website providing the front end to all of the Computer Society's magazines. As such, DS Online will no longer be publishing standalone peer-reviewed articles.


Information Forensics and Security, IEEE Transactions on

Research on the fundamental contributions and the mathematics behind information forensics, information seurity, surveillance, and systems applications that incorporate these features.


More Periodicals

Most published Xplore authors for Checkpointing

Back to Top

Xplore Articles related to Checkpointing

Back to Top

Adaptive Framework for Reliable Cloud Computing Environment

IEEE Access, 2016

Cloud computing technology has become an integral trend in the market of information technology. Cloud computing virtualization and its Internet-based lead to various types of failures to occur and thus the need for reliability and availability has become a crucial issue. To ensure cloud reliability and availability, a fault tolerance strategy should be developed and implemented. Most of the early ...


LAN protocols for high-performance distributed and concurrent computing

TENCON '91. Region 10 International Conference on EC3-Energy, Computer, Communication and Control Systems, 1991

The number and variety of evolving high-performance distributed and concurrent applications motivate the development of novel and more suitable protocols for their support. We have developed a protocol framework within which concurrent applications may be developed and executed in a straightforward manner on heterogeneous LANs/ The protocols are specifically oriented towards the needs of concurrent applications, and are designed to ...


A Checkpointing Technique for Rollback Error Recovery in Embedded Systems

2006 International Conference on Microelectronics, 2006

In this paper, a general checkpointing technique for rollback error recovery for embedded systems is proposed and evaluated. This technique is independent of used processor and employs the most important feature in control flow error detection mechanisms to simplify checkpoint selection and to minimize the overall code overhead. In this way, during the implementation of a control flow checking mechanism, ...


Demo abstract: Sensornet checkpointing between simulated and deployed networks

2009 International Conference on Information Processing in Sensor Networks, 2009

Sensor network development is notoriously difficult due to the low visibility of sensor platforms and systems. We propose sensornet checkpointing to increase the visibility of sensor networks. With sensornet checkpointing, we transfer network-wide application checkpoints between simulated and real networks. This approach enable advances in many research areas: visualization, repeatable experiments, fault injection, and application debugging. We demonstrate sensornet checkpointing ...


Improved EDF Algorithm for Fault Tolerance with Energy Minimization

2015 IEEE International Conference on Computational Intelligence & Communication Technology, 2015

The aim is to achieve schedulability of tasks in a real time system with fault tolerance and energy minimization. The fault tolerance can be achieved by maintaining enough time redundancy so that task can be re-executed in presence of fault. During re-execution, check pointing policy gives reliability in a system. First, calculate an optimal number of checkpoints to reduce the ...


More Xplore Articles

Educational Resources on Checkpointing

Back to Top

IEEE.tv Videos

No IEEE.tv Videos are currently tagged "Checkpointing"

IEEE-USA E-Books

  • Adaptive Framework for Reliable Cloud Computing Environment

    Cloud computing technology has become an integral trend in the market of information technology. Cloud computing virtualization and its Internet-based lead to various types of failures to occur and thus the need for reliability and availability has become a crucial issue. To ensure cloud reliability and availability, a fault tolerance strategy should be developed and implemented. Most of the early fault tolerant strategies focused on using only one method to tolerate faults. This paper presents an adaptive framework to cope with the problem of fault tolerance in cloud computing environments. The framework employs both replication and checkpointing methods in order to obtain a reliable platform for carrying out customer requests. Also, the algorithm determines the most appropriate fault tolerance method for each selected virtual machine. Simulation experiments are carried out to evaluate the framework's performance. The results of the experiments show that the proposed framework improves the performance of the cloud in terms of throughput, overheads, monetary cost, and availability.

  • LAN protocols for high-performance distributed and concurrent computing

    The number and variety of evolving high-performance distributed and concurrent applications motivate the development of novel and more suitable protocols for their support. We have developed a protocol framework within which concurrent applications may be developed and executed in a straightforward manner on heterogeneous LANs/ The protocols are specifically oriented towards the needs of concurrent applications, and are designed to be robust, efficient and lightweight. The protocols offer reliable and fast packet delivery in a variety of modes, including point to point broadcast, and dynamically variable multicast semantics. In addition, they support several high-level concurrent computing funcuons in an integral mamner, including global extrema and other arithmetic operations, polling, distributed consensus, and synchronization mechanisms. We discuss the salient design aspects of these protocols, describe the syntax and semantics of the user- interface protocol primitives, and, present performance results for varying network sizes and types.

  • A Checkpointing Technique for Rollback Error Recovery in Embedded Systems

    In this paper, a general checkpointing technique for rollback error recovery for embedded systems is proposed and evaluated. This technique is independent of used processor and employs the most important feature in control flow error detection mechanisms to simplify checkpoint selection and to minimize the overall code overhead. In this way, during the implementation of a control flow checking mechanism, the checkpoints are added to the program. To evaluate the checkpointing technique, a pre-processor is implemented that selects and adds the checkpoints to three workload programs running in an 8051 microcontroller-based system. The evaluation is based on 3000 experiments for each checkpoint.

  • Demo abstract: Sensornet checkpointing between simulated and deployed networks

    Sensor network development is notoriously difficult due to the low visibility of sensor platforms and systems. We propose sensornet checkpointing to increase the visibility of sensor networks. With sensornet checkpointing, we transfer network-wide application checkpoints between simulated and real networks. This approach enable advances in many research areas: visualization, repeatable experiments, fault injection, and application debugging. We demonstrate sensornet checkpointing on a network of Tmote Sky motes running Contiki.

  • Improved EDF Algorithm for Fault Tolerance with Energy Minimization

    The aim is to achieve schedulability of tasks in a real time system with fault tolerance and energy minimization. The fault tolerance can be achieved by maintaining enough time redundancy so that task can be re-executed in presence of fault. During re-execution, check pointing policy gives reliability in a system. First, calculate an optimal number of checkpoints to reduce the redundancy and save the system to complete re-execution. Energy minimization can be achieved by DVFS (Dynamic Voltage Frequency & Scaling). In this paper, existing non-preemptive EDF scheduling algorithm has been modified for fault tolerance and energy minimization. We adjust the voltage level according to available storage energy in the system & find feasibility test on each task. The worst case execution time is associated with voltage level. The approach is developed to enhance task schedulability and minimize energy consumption in presence of fault. At the end of the paper, experimental results shows that proposed algorithm is better than existing algorithm.

  • On the choice of checkpoint interval using memory usage profile and adaptive time series analysis

    This paper presents a new checkpoint scheme that utilizes the memory usage profile and time series analysis for low-overhead checkpoint. The proposed checkpoint scheme checks current and future checkpoint overhead based on the on the changes of the memory size and the expected checkpoint overhead using memory profile and adaptive time series analysis when it decides whether or not to take a checkpoint. Unlike the previous works that do not utilize the memory usage profile, it is possible to reduce the total overhead of the execution time. We also present experimental results which show that the checkpoint overhead of the proposed scheme is reduced compared with the previously developed checkpoint scheme.

  • Egida: an extensible toolkit for low-overhead fault-tolerance

    We discuss the design and implementation of Egida, an object-oriented toolkit designed to support transparent rollback-recovery. Egida exports a simple specification language that can be used to express arbitrary rollback recovery protocols. From this specification, Egida automatically synthesizes an implementation of the specified protocol by gluing together the appropriate objects from an available library of "building blocks". Egida is extensible and facilitates rapid implementation of rollback recovery protocols with minimal programming effort. We have integrated Egida with the MPICH implementation of the MPI standard. Existing MPI applications can rake advantage of Egida without any modifications: fault-tolerance is achieved transparently-all that is needed is a simple re-link of the MPI application with Egida.

  • Recoverable distributed shared virtual memory: memory coherence and storage structures

    An examination is made of the problem of implementing rollback recovery in multicomputer distributed shared virtual memory environments, in which the shared memory is implemented in software and exists only virtually. A user- transparent checkpointing recovery scheme and a twin-page disk storage management are presented to implement a recoverable distributed shared virtual memory. The checkpointing scheme is integrated with the shared virtual memory management. The twin-page disk approach allows incremental checkpointing without an explicit 'undo' at the time of recovery. A single consistent checkpoint state is maintained on stable disk storage. The recoverable distributed shared virtual memory allows the system to restart computation from a previous checkpoint after a processor failure without a global restart.<<ETX>>

  • Cluster-based coordinated checkpointing protocol in wireless ad-hoc networks

    The intrinsic characteristics of mobile communication system render them to be more prone to faults. Hence it is imperative to equip such devices with some fault-tolerance mechanism so as to run a reasonable application on them. Fault tolerance is intrinsically essential to wireless ad-hoc networks where the mobile devises are vulnerable to physical damage, theft etc. or can be exposed to radiations. Rollback recovery is one of the extensively adopted methods used to induce fault-tolerance in the distributed systems. Checkpointing is low cost fault tolerance techniques that can achieve fault tolerance transparently even against unanticipated faults. Mobile ad-hoc networks can be categorized as infrastructure-based and infrastructure-less (ad-hoc) networks. In infrastructure based networks, mobile hosts are supported by mobile support stations (MSS). In infrastructure-less ad-hoc networks, mobile hosts are not supported by any mobile support stations (MSS), which makes it very difficult for checkpointing techniques to implement. In this paper the cluster based coordinated checkpointing protocol is proposed that has minimum checkpointing overheads and does not require extra synchronization message and blocking.

  • Issues in the design of a reflective library for checkpointing C++ objects

    Object Persistence is an important feature of Object-oriented languages. The C++ language specification does not include or discuss any method of providing persistence for C++ objects. Several schemes have been developed for adding persistence to C++. Some of them require persistent objects to be allocated and treated differently than non-persistent objects, while some others require the programmer to provide vital parts of the persistence mechanism. It is desirable to make the persistence feature transparent, but the nature of C++ makes it difficult. This paper discusses in detail the various interesting language issues to be considered for adding persistence to C++ and how they lead to the design of the reflective object-checkpointing library, MemberAnalyzer.



Standards related to Checkpointing

Back to Top

No standards are currently tagged "Checkpointing"