I have started doing the real work few days back. So what gives? I have
branched darcs and started porting the relevant bits over to
hashed-storage. Along the way, hashed-storage has received some
improvements. For the most part, these were darcs compatibility improvements
(in the darcs hashed pristine code) and in tree diffing department. The tree
diff is now fully symmetrical, which is required for --look-for-adds
.
Efficiency has suffered a little, but I don’t quite expect this to show up on
profiles.
In darcs, I have mostly implemented safe index manipulation. (I.e. not allowing index to get out of date with regards to tracked files… The nature of the index requires that each tracked file is present in the index, so that we don’t need to read the actual working or pristine directory contents.)
Unfortunately, the index still doesn’t work very well with paths that have spaces in them (which is weird, since the index doesn’t particularly care about what is stored in the path, but I’ll investigate that later). This also means, that I can’t test on ghc-hashed, but I can test on ghc-testsuite, which seems to be more interesting of those two, anyway. The numbers (with hot cache in both cases):
darcs wh 0,87s user 0,12s system 94% cpu 1,046 total
darcs-hs wh 0,06s user 0,03s system 84% cpu 0,100 total
That gives about tenfold speedup for whatsnew on hashed repositories. This also fixes the infamous “timestamps get out of sync all the time” bug, which is usually manifested by darcs taking extraordinarily long time “reading pristine”. Branching the ghc-testsuite repo, I get (in the newly created branch, which has broken timestamps wherever hardlinks work; hot cache again):
darcs wh 5,91s user 0,56s system 91% cpu 7,033 total
To get back to darcs-hs, it seems, that at least on my machine, it manages to pass the darcs testsuite (although it took some tweaking to get there). Nevertheless, there are some further issues I have discovered that the suite does not cover. Still, at least for now, it should be safe to use darcs-hs, as the code is “read-only”: it is only used for whatsnew, never for creating patches.
Next week, I’ll work some more on getting record use the new diffing code (index-based, that is). I have already started, but I’m still failing a bunch of tests and they are not trivial to fix yet. Also, I should look into getting back the optimised version of filepath-restricted diff — I had to disable it since it’s not clear how to make it work with pending renames (the original darcs approach doesn’t apply for my version, sadly).
That’s it, I’m attaching a summary of changes on the individual repositories. The first one is hashed-storage (get from http://repos.mornfall.net/hashed-storage):
- Make the diffTree implementation symmetric.
- Implement unlink and rename in Monad.
- Omit missing files when reading an indexed tree.
- Handle empty hashed pristine directories (they may omit gzip header).
- Allow item/subtree removal in modifyTree.
- Implement zipAllFiles, zipAllDirs.
- Add emptyBlob, in addition to emptyTree.
- Concede to also darcs-formatting sha1 sums sans the size prefix.
- Fix cabal build-type to custom (we implement cabal test now).
- Allow checking for file existence as a result of stat.
The other one is darcs-hs, from http://repos.mornfall.net/darcs/darcs-hs:
- Factor out a common bit in WhatsNew.lhs.
- Import relevant bits of gorsvet, for now under Darcs.Gorsvet.
- Handle adds and removals in treeDiff.
- Kill a bunch of unused imports.
- Convenience wrapper for restrict_paths for use in Darcs.
- Make the trailing newline shuffling in treeDiff a little less fragile.
- Appease haskell_policy. (Sigh.)
- Disable restriction in unrecordedChanges for now (less efficient but correct).
- Implement basic index maintenance functionality.
- Bomb out from unrecordedChanges when pending is buggy.
- Invalidate index at key positions in relevant (pristine-modifying) commands.
- Use index for diffing in the basic whatsnew scenario.