1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
|
Prerequisite packages to install
================================
- dev-vcs/cvs
- dev-vcs/cvs-fast-export
- dev-vcs/git
- dev-libs/libxslt (for userinfo.xml conversion)
Create the author map
=====================
Extract userinfo.xml from LDAP on dev.gentoo.org::
$ perl_ldap -U
Create authormap.txt from userinfo.xml::
$ ./make-authormap.sh >authormap.txt
Fetch and unpack the CVS repository
===================================
Fetch a copy of the archived gentoo-x86 CVS repository from:
https://projects.gentoo.org/vcs-history/gentoo-x86.tar.gz
Run cvs-fast-export
===================
::
$ cd var/cvsroot/gentoo-x86
$ find . | cvs-fast-export -A /path/to/authormap.txt -l /path/to/gentoo-x86-export.log -p >/path/to/gentoo-x86-export.out
This will run for some time (8 hours on i7-8700), mostly as a single
thread, and produce a 21 GiB output file.
The CVS repository contains a package app-backup/Attic, which confuses
cvs-fast-export: "Files in CVS Attic and RCS directories are treated
as though the 'Attic/' or 'RCS/' portion of the path were absent."
This can be seen in the output file (note that the ``Attic`` path
component is missing)::
----------------------------------------------------------------------
commit refs/heads/master
mark :5149424
committer Hanno Böck <hanno@gentoo.org> 1431281161 +0000
data 118
Initial commit of Attic
(Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42)
from :5149420
M 100644 :5149421 app-backup/Attic-0.15.ebuild
M 100644 :5149422 app-backup/ChangeLog
M 100644 :5149423 app-backup/metadata.xml
----------------------------------------------------------------------
----------------------------------------------------------------------
commit refs/heads/master
mark :5149426
committer Hanno Böck <hanno@gentoo.org> 1431281167 +0000
data 118
Initial commit of Attic
(Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42)
from :5149424
M 100644 :5149425 app-backup/Manifest
----------------------------------------------------------------------
This is fixed by an additional sed filter in the following step.
Import into Git
===============
::
$ mkdir gentoo-x86-git
$ cd gentoo-x86-git
$ git init
$ LC_ALL=C sed '/^Initial commit of Attic$/,/^M [0-7]\{6\} .* app-backup\/Manifest/{s:^\(M [0-7]\{6\} .* app-backup/\)\(.*\):\1Attic/\2:}' \
/path/to/gentoo-x86-export.out | git fast-import
Differences to the old conversion
=================================
- cvs-fast-export(1) says:
"A set of file operations is coalesced into a changeset if either
(a) they all share the same commitid, or (b) all have no commitid
but identical change comments, authors, and modification dates
within the window defined by the time-fuzz parameter."
For our case this means that for commits after 2006-03-04T10:23:03Z
(commit 531f1a00a131) the commitid has been used to group them
together, while earlier ones have been grouped by authors and commit
messages, within a 5 minutes time window (which is the default
for the fuzz parameter).
This results in a total of 1688447 commits in the master branch,
while the old conversion has only 788893 commits. Most of the
difference can be explained by the fact that ``repoman commit``
actually did two CVS commits, the second one for the Manifest to
catch up with the updated $Header$ keywords. Since this reflects
the actual workflow, no attempts have been made to squash these
pairs of commits.
- The new conversion has a complete author map, previously users
cbrannon, jerrya, luke-jr, and uid2214 (darkside) were missing.
- Commit messages have been left alone. For example, no conversion
to Git footer lines has taken place. Conversion of character sets
wasn't attempted either. (There are 310 commit messages with
non-UTF-8 characters. About 80% of them appear to be latin-1,
but the rest is something else, or just contains some garbage
characters.)
- Category app-backup is now there.
- File sci-libs/qfits/Manifest in HEAD differs. The new conversion
agrees with the last CVS checkout.
- The new conversion has a .gitignore file in its top-level directory.
Also metadata/.cvsignore was renamed to metadata/.gitignore
(cvs-fast-export does this automatically).
- Output of ``diff -qr --exclude=.git`` between tips of old and new
repo::
Only in gentoo-x86-git: .gitignore
Only in gentoo-x86-git: app-backup
Files historical/header.txt and gentoo-x86-git/header.txt differ
Only in historical/metadata: .cvsignore
Only in gentoo-x86-git/metadata: .gitignore
Files historical/sci-libs/qfits/Manifest and gentoo-x86-git/sci-libs/qfits/Manifest differ
Notes
=====
Keyword expansion
-----------------
Although the man page of cvs-fast-export (version 1.57) says that the
program "does the equivalent of cvs -kb when checking out masters, not
performing any $-keyword expansion at all", it actually does expand
$-keywords.
For the tip of the trunk, expanded keywords appear to be correct,
as can be verified with Manifest checksums. This is not always true
earlier in history. For example, the CVS repository was located in
/home/cvsroot and moved to /var/cvsroot later (``$Header$`` lines
suggest that this move happened in early 2004). Also it is known that
some files were moved in the raw repository. Expanded keywords from
before such a move won't match.
Branch points
-------------
cvs-fast-export-1.57 gets confused about branch points, if a file
doesn't have any commits on the trunk that are newer than those on the
branch.
This triggers some warnings during conversion::
cvs-fast-export: warning - non-vendor ./app-admin/analog/files/analog.cfg,v branch RELEASE-1_4 has no parent
[and many more of the same type]
cvs-fast-export: warning - branch point import-1.1.1 -> master later than branch
cvs-fast-export: trunk(85563): 2005-11-30T09:36:17Z en.txt 1.1
cvs-fast-export: branch(85563): 2005-11-30T09:38:30Z app-accessibility/SphinxTrain/files/digest-SphinxTrain-0.9.1-r1 1.1
It also results in commits from the branch showing up in the converted
Git master branch. The problem has been `reported upstream`__.
For the time being, this is worked around by adding an extra commit to
the trunk (and removing it from the converted repository later)::
$ export CVSROOT=/var/cvsroot
$ cvs checkout gentoo-x86
$ cd gentoo-x86
$ for file in $(find . -type d -name CVS -prune -o -type f -print); do echo >>${file}; done
$ cvs commit -m "extra commit in trunk"
__ https://gitlab.com/esr/cvs-fast-export/-/issues/57
Missing app-games category
--------------------------
It is known that some files and directories have been moved, copied or
even deleted in the (server-side) RCS directory. This was advocated__
as late as 2005. For example, the whole ``app-games`` category was
deleted__ server-side at some time in late 2003 or early 2004, after
its packages had been moved to ``games-*`` categories.
Obviously, the history of these files is lost and there is no way for
the conversion to recover it.
__ https://archives.gentoo.org/gentoo-dev/message/029e91bdc515ddc5ae205b4694e00e91
__ https://archives.gentoo.org/gentoo-dev/message/ad7fa1ecae70e59d43ac70548076afcd
.. This work is licensed under the Creative Commons
Attribution-ShareAlike 4.0 International License.
https://creativecommons.org/licenses/by-sa/4.0/
.. Local Variables:
.. mode: rst
.. indent-tabs-mode: nil
.. End:
|