Private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)Ĭonst int bufferSize = 1024 * sizeof(Int64) Result = StreamsContentsAreEqual(file1, file2) If (fileInfo1.Length != fileInfo2.Length) public static bool FilesContentsAreEqual(FileInfo fileInfo1, FileInfo fileInfo2) I reduced the average comparison time to 1/4. It's getting even faster if you don't read in small 8 byte chunks but put a loop around, reading a larger chunk. An obscure exception: NTFS alternate data streams are not examined by any of the answers discussed on this page and thus may differ for files otherwise considered the "same." For example, even though SequenceEqual does in fact give us the "optimization" of abandoning on first mismatch, this hardly matters after having already fetched the files' contents, each fully necessary for any true positive cases.ġ.
NET GC (because it's fundamentally optimized to keep small, short-lived allocations extremely cheap), and in fact could even be optimal when file sizes are expected to be less than 85K, because using a minimum of user code (as shown here) implies maximally delegating file performance issues to the CLR, BCL, and JIT to benefit from (e.g.) the latest design technology, system code, and adaptive runtime optimizations.įurthermore, for such workaday scenarios, concerns about the performance of byte-by-byte comparison via LINQ enumerators (as shown here) are moot, since hitting the disk a̲t̲ a̲l̲l̲ for file I/O will dwarf, by several orders of magnitude, the benefits of the various memory-comparing alternatives.
Beyond that important caveat, full loading isn't really a penalty given the design of the. This code loads both files into memory entirely, so it should not be used for comparing truly gigantic files. note 1) will always be considered not-equal. Unlike some other posted answers, this is conclusively correct for any kind of file: binary, text, media, executable, etc., but as a full binary comparison, files that that differ only in "unimportant" ways (such as BOM, line-ending, character encoding, media metadata, whitespace, padding, source-code comments, etc. (fi1.Length = 0 || File.ReadAllBytes(fi1.FullName).SequenceEqual( Public static bool AreFileContentsEqual(FileInfo fi1, FileInfo fi2) => Public static bool AreFileContentsEqual(String path1, String path2) =>įile.ReadAllBytes(path1).SequenceEqual(File.ReadAllBytes(path2)) If you d̲o̲ decide you truly need a full byte-by-byte comparison (see other answers for discussion of hashing), then the easiest solution is: Static bool FilesAreEqual_Hash(FileInfo first, FileInfo second)īyte firstHash = MD5.Create().ComputeHash(first.OpenRead()) īyte secondHash = MD5.Create().ComputeHash(second.OpenRead()) Here's the ReadByte and hashing methods I used, for comparison purposes: static bool FilesAreEqual_OneByte(FileInfo first, FileInfo second)
This testing was with an ~100MB video file. Hashing always came back sub-second at around an average of 865ms. Averaged over 1000 runs, I got this method at 1063ms, and the method below (straightforward byte by byte comparison) at 3031ms. In my testing, I was able to see this outperform a straightforward ReadByte() scenario by almost 3:1. If (BitConverter.ToInt64(one,0) != BitConverter.ToInt64(two,0)) Using (FileStream fs2 = second.OpenRead()) Using (FileStream fs1 = first.OpenRead()) Int iterations = (int)Math.Ceiling((double)first.Length / BYTES_TO_READ) If (string.Equals(first.FullName, second.FullName, StringComparison.OrdinalIgnoreCase))
Static bool FilesAreEqual(FileInfo first, FileInfo second) Here's what I came up with: const int BYTES_TO_READ = sizeof(Int64) The fastest I've been able to come up with is a similar comparison, but instead of one byte at a time, you would use an array of bytes sized to Int64, and then compare the resulting numbers. The slowest possible method is to compare two files byte by byte.