I wanted to bring up something Aryan mentioned in this ticket in on the
subject of recent contributions by Ven and himself visualizing Git data
https://github.com/sbenthall/bigbang/issues/136#issuecomment-75007815
There are now a few different Git-related notebooks in the repository. It
would be good if we could standardize on:
* coding patterns -- will we be creating new objects to represent Git data
or will we depend on the preprocessed Pandas dataframe? I see advantages to
either.
* notebook idioms -- we should come up standards for what gets including
in notebooks and what gets pushed to underlying libraries so the notebooks
stay clean is possible.
* visualization idiom -- the out-of-the-box networkx graph visualizations
are a good start but not pretty. We have some options for graph
visualization, including more artful use of what networkx/matplotlib gives
us, use of d3 within notebooks, etc. It would be cool to have a coherent
set of design principles for this.
Hi guys,
Just an update, mainly for Aryan's benefit since he couldn't make the
meeting today.
Ven -- please sign up for the development mailing list here, where I will
be sending emails like this in the future:
https://lists.sudoroom.org/listinfo/bigbang-dev
Aryan -- Ven is starting to dig into analyzing Git repositories by
unpacking the Git commit network. This is the network you see in
visualizations like these <https://github.com/sbenthall/bigbang/network>.
He's working on these tickets for next week:
https://github.com/sbenthall/bigbang/issues/135https://github.com/sbenthall/bigbang/issues/136
What we discovered while looking at the GitRepo
<https://github.com/sbenthall/bigbang/blob/master/git_data/GitRepo.py> code
is that the Pandas dataframe does not yet have in it two kinds of data
about the commits:
* information about which commit each commit is a descendant of in the
network. It might be the descendant of more than one commit if it is a
'merge' commit, for example.
* the information about the 'diff' of each commit.
Could you take a look at this and see if it's feasible to add this to
GitRepo? I've made a ticket for this and assigned it to you.
https://github.com/sbenthall/bigbang/issues/137
Looking forward to seeing you both on Wednesday,
Seb
An upcoming development priority for us will need to be providing a
configuration file as the place to put paths to data storage folders (I'm
thinking where we are keeping archived email lists, git repos, etc.)
Currently those paths are hard-coded, which is poor form and is started to
lead to bugs.
https://github.com/sbenthall/bigbang/issues/113https://github.com/sbenthall/bigbang/issues/129
There are a number of options for how to do this. We could use a YAML file
for configuration, or the .cfg format which uses Pythons ConfigParser. We
could also do what Django does have have a settings.py file for
configuration constants.
I'm pretty indifferent to which method we use myself; my high-level
understanding is that they all do pretty much the same thing. But I was
wondering if anybody else had a strong preference.
Thanks,
Seb