|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
ArchiveDetector | Detects archive files solely by scanning file paths - usually by testing for file name suffixes like .zip or the like. |
ArchiveStatistics | A proxy interface which encapsulates statistics about the total set of archives operated by this package. |
FileFactory | This interface is not intended for public use! |
Class Summary | |
---|---|
AbstractArchiveDetector | Implements the FileFactory part of the ArchiveDetector
interface. |
ArchiveEntryMetaData | This class is not intended for public use! |
DefaultArchiveDetector | An ArchiveDetector which matches file paths against a pattern of
archive file suffixes in order to detect prospective archive files and
look up their corresponding ArchiveDriver in its registry. |
File | A drop-in replacement for its subclass which provides transparent read/write access to archive files and their entries as if they were (virtual) directories and files. |
FileInputStream | A drop-in replacement for FileInputStream which
provides transparent read access to archive entries as if they were
(virtual) files. |
FileOutputStream | A drop-in replacement for FileOutputStream which
provides transparent write access to archive entries as if they were
(virtual) files. |
FileReader | A drop-in replacement for FileReader which provides
transparent read access to archive entries as if they were (virtual) files. |
FileWriter | A drop-in replacement for FileWriter which provides
transparent write access to archive entries as if they were (virtual) files. |
InputArchiveMetaData | This class is not intended for public use! |
OutputArchiveMetaData | This class is not intended for public use! |
RaesFiles | Saves and restores the contents of arbitrary files to and from the RAES file format for encryption and decryption. |
RaesFileUtils | Deprecated. Use the base class instead. |
Exception Summary | |
---|---|
ArchiveBusyException | Thrown if an archive file could not get updated because some input or output streams for its entries are still open. |
ArchiveBusyWarningException | Thrown if an archive file has been successfully updated, but some input or output streams for its entries have been forced to close. |
ArchiveEntryStreamClosedException | Thrown if an input or output stream for an archive entry has been forced to close when the archive file was (explicitly or implicitly) unmounted. |
ArchiveException | Represents a chain of exceptions thrown by the File.umount() and
File.update() methods to indicate an error condition which
does incur loss of data. |
ArchiveInputBusyException | Like its super class, but indicates the existance of open input streams. |
ArchiveInputBusyWarningException | Like its super class, but indicates the existance of open input streams. |
ArchiveOutputBusyException | Like its super class, but indicates the existance of open output streams. |
ArchiveOutputBusyWarningException | Like its super class, but indicates the existance of open output streams. |
ArchiveWarningException | Represents a chain of exceptions thrown by the File.umount() and
File.update() methods to indicate an error condition which
does not incur loss of data and may be ignored. |
ChainableIOException | Represents a chain of IOException s. |
ContainsFileException | Thrown if two paths are referring to the same file or contain each other. |
FileBusyException | Thrown if an archive entry cannot get accessed because either (a) the client application is trying to input or output to the same archive file concurrently and the respective archive driver does not support this, or (b) the archive file needs an implicit unmount which cannot get performed because the client application is still using some other open streams for the same archive file. |
InputIOException | Thrown if an IOException happened on the input side rather than
the output side when copying an InputStream to an OutputStream. |
Start here: Provides transparent read/write access to archive files and their entries as if they were (virtual) directories and files. Archive files may be arbitrarily nested and the nesting level is only limited by heap and file system size.
In order to create a new archive file, the client application can simply use
File.mkdir()
.
In order to delete it, File.delete()
can be used.
Similar to a regular directory this is only possible if the archive file is empty.
Alternatively, the client application could use File.deleteAll()
in order to delete the virtual directory in one go, regardless of its contents.
To read an archive entry, the client application can simply create a FileInputStream
or a FileReader
with the path or a File
instance as its constructor parameter. Note that you cannot create a FileInputStream
or a FileReader
to read an archive file itself (unless it's a false
positive, i.e. a regular file or directory with an archive file suffix).
Likewise, to write an archive entry, the client application can simply create
a FileOutputStream
or a FileWriter
with the path or a File
instance as its constructor
parameter. Note that you cannot create a FileOutputStream
or a
FileWriter
to write an archive file itself (unless it's a false positive,
i.e. a regular file or directory with an archive file suffix).
If the client application just needs to copy data however, using one of the
copy methods in the File
class
is highly recommended instead of using File(In|Out)putStream
directly.
These methods use asynchronous I/O (though they return synchronously), pooled big
buffers, pooled threads (on JSE 5 and later) and do not need to decompress/recompress
archive entry data when copying from one archive file to another for supported archive
types. In addition, they are guaranteed to fail gracefully, while many Java apps
fail to close their streams if an IOException
occurs.
Note that there is no eqivalent to java.io.RandomAccessFile
in this
package because it's impossible to seek within compressed archive entry data.
When using streams, the client application should always close them in a
finally
-block like this:
FileOutputStream out = new FileOutputStream(file); try { // Do I/O here... } finally { out.close(); // ALWAYS close the stream! }
This ensures that the stream is closed even if an exception occurs.
Note that for various (mostly archive driver specific) reasons, the close()
method may throw an IOException
, too. The client application needs
to deal with this appropriately, for example by enclosing the entire block with
another try-catch
-block like this:
try { FileOutputStream out = new FileOutputStream(file); try { // Do I/O here... } finally { out.close(); // ALWAYS close the stream! } } catch (IOException ex) { ex.printStackTrace(); }
This idiom is not at all specific to TrueZIP: Streams often utilize OS resources
such as file descriptors, database or network connections. All OS resources are
limited however and sometimes they are even exclusively allocated for a stream,
so the stream should always be closed as soon as possible again, especially in long
running server applications (relying on finalize()
to do this during
garbage collection is unsafe). Unfortunately, many Java applications and libraries
fail in this respect.
TrueZIP is affected by open archive entry streams in the following ways:
FileBusyException
.ArchiveBusyException
is thrown. If the entry streams are forced to close however, the archive file
is unmounted and an ArchiveBusyWarningException
is thrown to indicate that subsequent I/O operations on these entry streams
(other than close()
) will fail with an ArchiveEntryStreamClosedException
.
Neither solution is optimal.In order to prevent these exceptions, TrueZIP automatically closes entry streams when they are garbage collected. However, the client application should never rely on this because the delay and order in which streams are processed by the finalizer thread is not specified and any unwritten data gets lost in output streams.
In general, a file system operation is either atomic or not. In its strict sense, an atomic operation meets the following conditions:
All reliable file system implementations meet the first condition and so does TrueZIP. However, the situation is different for the second condition:
File
instances could monitor and influence changes in progress. This implies that TrueZIP cannot provide any operations which are atomic in its strict sense. However, many file system operations in this package are declared to be virtually atomic according to their Javadoc. A virtually atomic operation meets the following conditions:
File
instances which recognize the same set
of archive files in the path and share the same definition of classes in this
package can't monitor or influence the changes as they are in progress. They
can only see the result. These conditions apply regardless of whether the File
instances
are used by different threads or not. In other words, TrueZIP is thread safe as
much as you could expect from a real file system.
To provide random read/write access to archive files, TrueZIP needs to associate some state for every recognized archive file on the heap and in the folder for temporary files while the client application is operating on the VFS.
TrueZIP automatically mounts the VFS from an archive file on the first access. The client application can then operate on the VFS in an arbitrary manner. Finally, an archive file must get unmounted in order to update it with the cumulated modifications. Note that an archive entry gets modified by any operation which creates, modifies or deletes it.
Archive file unmounting is performed semi-automatic:
File.umount()
or File.update()
.Explicit unmounting is required to support third-party access to an archive file
(see below) or to monitor progress (see
below). It also allows some control over any exceptions
thrown: Both umount()
and update()
may throw an ArchiveWarningException
or an ArchiveException
.
The client application may catch these exceptions and act on them individually (see
below).
However, calling umount()
or update()
too often may
increase the overall runtime: On each call, all remaining entries in the archive
file are copied to the archive file again if the archive file did already exist.
If the client application is explicitly unmounting the archive file after each modification,
this may lead to an overall runtime of O(s*s)
, where s
is the size of the archive file in bytes (see below).
In comparison, implicit unmounting provides best performance because archive
files are only updated if there's really a need to. It also works reliably: The
JVM shutdown hook is always run unless the JVM crashes
(note
that an uncatched throwable terminates the JVM, but does not crash
it - a JVM crash is an extremely rare situation which indicates a bug in the JVM
implementation, not a bug in the JRE or the application). Furthermore, it omits
the need to introduce a call to umount()
or update()
in
legacy applications.
The disadvantage is that the client application cannot easily detect and deal with any exceptions thrown as a result of updating an archive file: Depending on where the implicit unmount happens, either an arbitrary IOException is thrown, a boolean value is returned, or - when called from the JVM shutdown hook - just a stack trace is printed. In addition, updating an existing archive file takes linear runtime (see below). However, using long running JVM shutdown hooks is generally discouraged: They can't use java.util.logging, they can't use a GUI to monitor progress (see below) and they can only get debugged on JSE 5 or later.
Because TrueZIP associates some state with any archive file which is read and/or write accessed by the client application, it requires exclusive access to these archive files until they get unmounted again.
Third parties must not concurrently access these archive files nor their entries unless the precautions outlined below have been taken!In this context, third parties are:
java.io.File
which are not instances
of the class de.schlichtherle.io.File
.de.schlichtherle.io.File
which do not
recognize the same set of archive files in the path due to the use of a differently
working ArchiveDetector
.As a rule of thumb, the same archive file or entry within an archive file should
not be accessed by different File
classes (java.io.File
versus de.schlichtherle.io.File
) or File
instances with
different ArchiveDetector
parameters. This ensures that the state associated
to an archive file is not shadowed or bypassed.
To ensure that all File
instances recognize the same set of archive
files in a path, it's recommended not to use constructors or methods of
the File
class with explicit ArchiveDetector
parameters
unless there is good reason to.
To ensure that all File
instances share the same definition of classes
in this package, it's recommended to add TrueZIP's JAR to the boot class path or
the extension class path.
If the prerequisites for these recommendations don't apply or if the recommendations
can't be followed, the client application may call File.umount()
(File.update()
will not work) to perform an explicit
unmount. This clears all state information so that the third party can then safely
access any archive file. In addition, the client application must make sure not
to access the same archive file or any of its entries in any way while the third
party is still accessing it.
Failure to comply to these guidelines may result in unpredictable behavior and may even cause loss of data!
umount()
and update()
are guaranteed to process
all archive files which are in use or have been touched by the client application.
However, processing some of these archive files may fail for a number of I/O related
reasons. Hence, during processing, a sequential chain of archive exceptions
is constructed and thrown upon termination unless its empty. Note that sequential
exception chaining is a concept which is completely orthogonal to Java's general
exception cause chaining: In a sequential archive exception chain, each archive
exception may still have a chain of other exceptions as its cause (most likely
IOException
s).
Archive exceptions fall into two categories:
ArchiveWarningException
is the root
of all warning exception types. These exceptions are thrown if an archive file
has been completely updated, but some warning conditions apply. No data has
been lost.ArchiveException
is the root
of all other exception types (unless it's an ArchiveWarningException
again). These exceptions are thrown if an archive file could not get updated
completely. This implies loss of some or all data in the respective archive
file.Note that the effect which is indicated by an archive exception is local: An exception thrown when processing an archive file does not imply an archive exception or loss of data when processing another archive file.
When the archive exception chain is thrown by this method, it's first sorted
according to (1) descending order of priority and (2) ascending order of appearance,
and the resulting head exception is then thrown. Since ArchiveWarningException
s
have a lower priority than ArchiveException
s, they are always pushed
back to the end of the chain, so that an application can use the following simple
idiom to detect if only some warnings or at least one severe error has occured:
Note that thetry { File.umount(); // with or without parameters } catch (ArchiveWarningException oops) { // Only instances of the class ArchiveWarningException exist in // the sequential chain of exceptions. We decide to ignore this. } catch (ArchiveException ouch) { // At least one exception occured which is not just an // ArchiveWarningException. This is a severe situation that // needs to be handled. // Print the sequential chain of exceptions in order of // descending priority and ascending appearance. //ouch.printStackTrace(); // Print the sequential chain of exceptions in order of // appearance instead. ouch.sortAppearance().printStackTrace(); }
Throwable.getMessage()
method (and hence Throwable.printStackTrace()
will concatenate the detail messages of the
exceptions in the sequential chain in the given order.
Unmounting a modified archive file is a linear runtime operation: If the size of the resulting archive file is s bytes, the operation always completes in O(s), even if only a single, small archive entry has been modified within a very large archive file. Unmounting an unmodified or newly created archive file is a constant runtime operation: It always completes in O(1). These magnitudes are independent of whether unmounting was performed explicitly or implicitly.
Now if the client application modifies each entry in a loop and accidentally triggers unmounting the archive file on each iteration, then the overall runtime increases to O(s*s)! Here's an example:
String[] names = { "a", "b", "c" }; int n = names.length; for (int i = 0; i < n; i++) { // n * ... File entry = new File("archive.zip", names[i]); // O(1) entry.createNewFile(); // O(1) File.umount(); // O(i + 1) !! } // Overall: O(n*n) !!!
The bad runtime is because umount()
is called within the loop. Moving
it out of the loop fixes the issue:
String[] names = { "a", "b", "c" }; int n = names.length; for (int i = 0; i < n; i++) { // n * ... File entry = new File("archive.zip", names[i]); // O(1) entry.createNewFile(); // O(1) } File.umount(); // new file: O(1); modified: O(n) // Overall: O(n)
In essence: If at all, the client application should never call umount()
or update()
in a loop which modifies an archive file.
The situation gets more complicated with implicit remounting: If a file entry shall get modified which already has been modified before, TrueZIP implicitly remounts the archive file in order to avoid writing duplicated entries to it (which would waste space and may even confuse other utilities). Here's an example:
String[] names = { "a", "b", "c" }; int n = names.length; for (int i = 0; i < n; i++) { // n * ... File entry = new File("archive.zip", names[i]); // O(1) entry.createNewFile(); // First modification: O(1) entry.createNewFile(); // Second modification triggers remount: O(i + 1) !! } // Overall: O(n*n) !!!
Each call to createNewFile()
is a modification operation. Hence,
on the second call to this method, TrueZIP needs to do an implicit remount which
writes all entries in the archive file created so far to disk again.
Unfortunately, a modification operation is not always so easy to spot. Consider the following example to create an archive file with empty entries which all share the same last modification time:
long time = System.currentTimeMillis(); String[] names = { "a", "b", "c" }; int n = names.length; for (int i = 0; i < n; i++) { // n * ... File entry = new File("archive.zip", names[i]); // O(1) entry.createNewFile(); // First modification: O(1) entry.setLastModified(time); // Second modification triggers remount: O(i + 1) !! } // Overall: O(n*n) !!!
When setLastModified()
gets called, the entry has already been written
and so an implicit remount is triggered, which writes all entries in the archive
file created so far to disk again.
Detail: This deficiency is caused by archive file formats: All currently supported archive types require to write an entry's meta data (including the last modification time) before its content to the archive file. So if the meta data is to be modified, the archive entry and hence the whole archive file needs to get rewritten, which is what the implicit remount is doing.
To avoid accidental remounting when copying data, you should consider using the advanced copy methods instead. These methods are easy to use, work reliably and provide superior performance.
When unmounting, the client application can monitor the progress by another thread
using File.getLiveArchiveStatistics()
. The returned
instance is a proxy which returns live statistics about the updating process.
Here's an example how to monitor unmounting progress on standard error output after an initial delay of two seconds:
class ProgressMonitor extends Thread { Long[] args = new Long[2]; ArchiveStatistics liveStats = File.getLiveArchiveStatistics(); ProgressMonitor() { setPriority(Thread.MAX_PRIORITY); setDaemon(true); } public void run() { boolean run = false; for (long sleep = 2000; ; sleep = 200, run = true) { try { Thread.sleep(sleep); } catch (InterruptedException shutdown) { break; } showProgress(); } if (run) { showProgress(); System.err.println(); } } void showProgress() { // Round up to kilobytes. args[0] = new Long( (liveStats.getUpdateTotalByteCountRead() + 1023) / 1024); args[1] = new Long( (liveStats.getUpdateTotalByteCountWritten() + 1023) / 1024); System.err.print(MessageFormat.format( "Top level archive IO: {0} / {1} KB \r", args)); } void shutdown() { interrupt(); try { join(); } catch (InterruptedException interrupted) { interrupted.printStackTrace(); } } } // ... ProgressMonitor monitor = new ProgressMonitor(); monitor.start(); try { File.umount(); } finally { monitor.shutdown(); }
Here are some guidelines to find the right balance between performance and control:
umount()
is recommended in order to handle exceptions explicitly, but not required because
TrueZIP's JVM shutdown hook takes care of unmounting anyway and prints the stacktrace
of any exceptions on the standard error output.umount()
or
update()
should not get called unless either
third party access or explicit
exception handling is required.umount()
is generally preferred over update()
for safety reasons.The top level entries in an archive file build its root directory. The root directory is never written to the output when an archive file is modified.
To the client application, the root directory behaves like any other directory
and is addressed by naming the archive file in a path: For example, the client application
may list its contents by calling File.list()
or File.listFiles()
.
The root directory receives its last modification time from the archive file whenever it's read. Likewise, the archive file will receive the root directory's last modification time whenever it's written.
While this is a proper emulation of the behavior of real file systems, it may confuse users if only entries which are located one level or more below the root directory have been changed in an existing archive file: In this case, the last modification time of the root directory is not updated and hence the archive file's last modification time will not reflect the changes in the deeper directory levels.
As a workaround, the client application can use the idiom
to detect an archive file
and explicitly change the last modification time of its root directory by calling
File.isArchive()
&& File.isDirectory()
File.setLastModified(long)
.
An archive may contain directories for which no entry is present in the file
although they contain at least one member in their directory tree for which an entry
is actually present in the file. Similarly, if File.isLenient()
returns true
(which is the default), an archive entry may be created
in an archive file although its parent directory hasn't been explicitly created
by calling File.mkdir()
before.
Such a directory is called a ghost directory: Like the root directory, a ghost directory is not written to the output whenever an archive file is modified. This is to mimic the behavior of most archive utilities which do not create archive entries for directories.
To the client application, a ghost directory behaves like a regular directory
with the exception that its last modification time returned by File.lastModified()
is 0L
. If the client application sets the last modification time explicitly
using File.setLastModified(long)
, then the ghost directory
reincarnates as a regular directory and will be output to the archive file.
Mind that a ghost directory can only exist within an archive file, but not every directory within an archive file is actually a ghost directory.
File paths may be composed of elements which either refer to regular nodes in the real file system (directories, files or special files), including top level archive files, or refer to entries within an archive file.
As usual in Java, elements in a path which refer to regular nodes may be case sensitive or not in TrueZIP's VFS, depending on the real file system and/or the platform.
However, elements in a path which refer to archive entries are always case sensitive. This enables the client application to address all files in existing archive files, regardless of the operating system they've been created on.
For existing archive files, redundant elements in entry names such as the empty
string (""
), the dot ("."
) directory, or the dot-dot (".."
)
directory are removed in the VFS when the archive file is read and not
retained when the archive file is modified.
If an entry name contains characters which have no representation in the character set of the corresponding archive file type, then all file operations to create the archive entry will fail gracefully according to the documented contract of the respective operation. This is to protect the client application from creating archive entries which cannot get encoded and decoded again correctly. For example, the Euro sign (€) does not have a representation in the IBM437 character set and hence cannot be used for entries in ordinary ZIP files unless TrueZIP's configuration is customized to use another charset.
If an archive file contains entries with absolute entry names, such as /readme.txt rather than readme.txt, the client application cannot address these entries using the VFS in this package. However, these entries are retained like any other entry whenever the client application modifies the archive file. This should not impose problems as absolute entry names should never be used anyway and I'm not aware of any recent tools which would allow to create these.
If an archive file contains both a file and a directory entry with the same name
it's up to the individual methods how they behave in this case. This could happen
with archive files created by external tools only. Both File.isDirectory()
and File.isFile()
will return true
in this
case and in fact they are the only methods the client application can rely upon
to act properly in this situation: Many other methods use a combination of
isDirectory()
and isFile()
calls and will show an undefined
behavior.
The good news is that both the file and the directory coexist in the virtual archive file system implemented by this package. Thus, whenever the archive file is modified, both entries will be retained and no data gets lost. This allows you to use another tool to fix the issue in the archive file. TrueZIP never allows the client application to create such an archive file, however.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |