1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
|
This is a weekly progress report no. 2 for Project Grumpy.
As reported previously, I am building a system to index portage packages
and related metadata to make package maintainership a bit easier for
developers.
First, a few words about the document metadata storage. For this project, the
plan is to use a document-oriented and schema-free database (MongoDB) instead
of a regular relational database system (like SQLite or PostgreSQL).
This also means that we can create a single document collection, where
documents correspond to simply "category/package" and collection containing
whole ebuild tree.
Document itself in the collection, is just a JSON-formatted dictionary with
following structure (beware, this is work in progress, so some things are
still missing)::
{
# "package/category" (primary index, unique)
'_id' : string,
# Version of the schema, used internally (just in case)
'schema_ver' : integer,
# Package category
'cat' : string,
# Package name
'pkg' : string,
## Data from metadata.xml
# List of herds maintaining this package
'herds' : [ string, ... ],
# Long description of the package
'ldesc' : string,
# List of maintainers (by email addresses)
'maintainers' : [ string, ... ],
## Data from ebuilds itself (but should be general)
# Description
"desc" : string,
# Upstream url(s) (FIXME: Do we need list here?)
'homepage' : string,
# Array of all the package versions and their specific info
'ebuilds' : [
# Package version (from category/package-version)
'version' : string,
# Eapi version
"eapi" : integer,
# List of USE flags supported by this ebuild
'iuse' : [ string, ... ],
# Package keywords ("x86", "~amd64", ...)
'keywords : [ string, ... ],
# Licenses
'licence' : [ string, ... ],
# Package slot
'slot' : string,
# Need to figure out proper structure for these, so we can also
# map out USE flags ;)
'depend' : TODO!!!
'rdepend' : TODO!!!
]
}
So how about querying the data? That's easy. (Please note we are using MongoDB
shell). So, what if a developer wants to know which packages he is supposedly
maintaining::
> db.ebuilds.find({'maintainers' : '...@gentoo.org' })
{... document data ...} # (Too much info :) )
> db.ebuilds.find({'maintainers' : '...@gentoo.org' }).count()
7
And the results come fast. I mean really fast.
Ok, how about checking how many packages under 'dev-python' are using specific
EAPI version::
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 0}).count()
202
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 1}).count()
3
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 2}).count()
255
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 3}).count()
125
> db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 4}).count()
0
> db.ebuilds.find({'cat' : 'dev-python' }).count()
504
> 202+3+255+125 - 504
81
Ahem.. looks like we have a "design issue" with our document structure. So
back to the drawing board.
Last week's progress report
===========================
Last week's progress has been a bit slow, I have mostly played with document
structure and played a bit with pkgcore's internals. Although I now have
portage contents inside the database the document structure itself is far from
ideal (as you can see from the example with EAPI counts given earlier).
I have committed some of the stuff I have been working on into Grumpy's repo,
so in case you are interested check it out from [1].
[1] http://git.overlays.gentoo.org/gitweb/?p=proj/grumpy.git;a=summary
First a warning, the portage->mongodb syncer is slow. I mean really slow - it
takes about 3 hours (or even more) on my laptop to fully scan the contents of
portage and store the data in database.
Plans for current week
======================
1) Speed up the portage syncer
2) Improve document structure
|