Atomic Operation

Atomic operations are complex operations, which either fail altogether, or complete successfully, providing a strong guarantee that there is no intermediate, incorrect result.

godi is read-only when verifying data, and atomicity doesn't apply. It will be writing a single file per source when sealing, and it will only do so if it managed to read all source files successfully.

In sealed-copy mode, it will potentially write hundreds of thousands of files to multiple destinations. If one of these fails to write, it will remove all the files underneath a destination that it has written so far, but keeps writing unaffected destinations. Have a look at this feature in action

This feature implies that it has to remember all files written so far, and tests showed that it requires about 250MB of RAM for one million files. For thousands of files, the memory consumption will stay well below 10MB though.

Input File-Filters

During seal and sealed-copy operations, godi traverses directories to find files for reading. Which files it picks depends on the input file filter, specified using the --file-exclude-patterns flag.

By default, it will exclude files which are known to change a lot, like .DS_Store on osx, but you may specify to ignore hidden files, symbolic links, godi seal files, as well as files matching a glob pattern.

This is an example of the file-exclude-pattern in action, note the increasing amount of skipped files when the filter is in use.

The verify operation will always verify all files mentioned in the seal, a filter does not apply.

Seal Formats

A seal is a file that stores signatures of data files, each identifying the contents of the file. If a single bit within that data file changes, the signature will be a different one. In information technology, such a signature is called a hash. godi computes not one, but two of these, called MD5 and SHA1.

Currently there are two seal file formats which can be written and verified.

All seal files generated by godi will carry a signature to assure that changes to any information stored in the file will be detected. This is helpful to detect silent corruption of the file as well as intentional adjustments.

Performance Considerations

For understanding this paragraph, it's beneficial to understand how data is processed in godi. Without getting into too much detail, you can see that data is first read from storage, then hashed, and possibly written in sealed-copy mode.

architecture

All data is handled in parallel, thus it is read, right after reading fed into two hashers, and into all outputs, and all that in parallel.

As the Hasher part can easily deliver 450MB/s per core, you can imagine that the bottleneck will occur during disk-based input or output operations. For example, reading from an SSD with a cold filesystem cache will rarely deliver more than 500MB/s, and writing to an SSD would not be much faster either.

Nonetheless, depending on the type of storage, you might benefit from multiple simultaneous reads, and/or multiple simultaneous writes, which may drastically increase the perceived performance. The amount of simultaneous reads and writes is setup per device you are reading from or writing to.

It is vital to test for good values for --streams-per-input-device(-spid) and --streams-per-output-device(-spod*) to get optimal performance for your respective hardware. By default, there may be as many hashers as you have cores, and this rarely needs a change unless godi is competing with other programs for the CPU.

Have a look at this video, showing how a fast input device will need 6 cores for processing all the data. It maximized the CPU usage, as the device could feed data even faster.

As a summary, there are a few rules to remember

As each input stream is fed by exactly one file, you need to have enough files to keep them busy. For example, if you have only one big file, there is only about 2 cores to work on it, no matter how many input streams are set up.

Error Handling

godi will report and handle every error it encounters, reporting it to the user in any case. On error, it will abort the entire operation only if no chance of successful completion remains.

If a file could not be read from a source, the entire operation will abort as from this point on neither seal or sealed-copy operations will continue successfully. However, a verify operation will continue just to provide as much information to you as possible.

If a file could not be written during a sealed-copy operation, the respective destination will be marked as faulty and rolled-back. Nonetheless, godi will continue to write to all remaining destinations generate as many duplicates as possible.

Limitations

Windows

General