Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

SVN Users and Git Authors
Anchor
SVNGitAuthors
SVNGitAuthors

Both Subversion and Git keep authors names in commits, but those authors entities differ in those two systems authors differ.

In SVN, the author is being stored is stored as an unversioned revision property, namely property svn:author. Every time a Subversion user makes a commit, SVN creates a new revision and sets this revision svn:author property to be equal to that exact user's name, e.g. johndoe in this case, for example, johndoe:

Code Block
languagetext
themeFadeToGrey
titleSVN revision 196
$ svn proplist -v -----------------------------------------------------------------------
r163 | johndoe | 2017-06-07 20:22:15 +0500 (Wed, 07 Jun 2017) | 1 line
Changed paths:
   A /project
   A /project/branches
   A /project/tags
   A /project/trunk

initial layout for the project
------------------------------------------------------------------------

Git, in turn, also stores author's name along with every commit, but this name differs from that in SVN: whereas SVN stores actual username, Git stores a name that's set by user.name Git directive, e.g., by global setting:

$ git config --global user.name "John Doe"

and in addition to the user.name Git relies upon user.email that can be set as:

$ git config --global user.email johndoe@example.com

Actually, both user.name and user.email are parts of Git user identity, i.e. a Git user is being mentioned as

Git User <gituser@domain.com>

everywhere; e.g. the user John Doe we've set above will be referred as:

John Doe <johndoe@example.com>

and this exact line will appear as author name when the user makes some commit in Git. Still, it worth to mention for completeness sake that Git stores not only author but also a committer name along with every commit and they may differ in some cases:

$ git cat-file commit HEAD
    tree 905df23db37b33320483fc6676bfc684078ed248
    parent 4a0cf06baa9aefaa20a13820265ef401d7b1c2b6
    author John Doe <johndoe@example.com> 1496849115 +0000
    committer Jane Doe <janedoe@example.com> 1496849115 +0000

Pro Git book describes the difference between those names as follows: the author is the person who originally wrote the work, whereas the committer is the person who last applied the work. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit – you as the author, and the core member as the committer.

Most often though, author and committer names are the same since most of the programmers commit their work by themselves. Also, SubGit works with Git author name, so committer name doesn't mean much for the SVN-to-Git translation process.

Authors Mapping and SubGit Licensing

Authors names don't affect the projects themselves much, neither in SVN nor in Git: it's just part of the project history reflecting who did the job and who committed it and this is what authors mapping feature is intended for - keeping history clean and consistent both on SVN and Git sides. However, if that information doesn't worth to be preserved you can leave it untouched - in such case, SubGit will try to guess authors names using automatic authors mapping. Most probably the project's history won't be exactly the same since authors names will differ in SVN and Git, but the rest commits and revisions information - date and time, revision number, commit message and so on - still the same so it will work well for that matter.

And it works perfectly for one-time import as the import is free and doesn't require a license in contrast to the mirror which does. There are few different kinds of licenses which differ in licensed user number, both SVN and Git users in case of the free license for small teams or only Git users in case of commercial licenses, see the pricing for details. And the fact that the license relies upon users number may cause some troubles if the authors mapping is not set.

Say, you have obtained a free license for up to 10 SVN and Git users. You have mirrored one of your SVN projects to Git repository and you have users committing to SVN and to Git. One of your users, John Doe, makes commits both to SVN project and to the Git repository. In such case SubGit will generate automatic authors mapping that works as follows:

  • when John makes a commit to Git, the commit's author name and email are being set to his Git user.name and user.email - say, John Doe and &lt;johndoe@example.com>.
  • SubGit then translates this commit into SVN revision; since authors mapping is not set, SVN revision author is set to John Doe.
  • when then he commits to SVN project, the new revision author name sets to his SVN username - say, johndoe.
  • SubGit now translates SVN revision into Git commit and, since no authors mapping provided, sets commit's author to johndoe and email to &lt;johndoe@example.com>.

Eventually, John Doe will be counted twice instead of one time since there are different author names in SVN and Git.

Such situation is not possible in case of commercial licenses since they limit the number of Git users only. But there can be another situation when one actual user is being counted twice or even more. The matter is that SubGit counts a number of Git users that make pushes to the mirrored Git repository and it distinguishes users by their names and email addresses - both Git user.name and user.email matter for SubGit in this context. If either of them differs SubGit considers that Git user as new and increases licensed users counter by one. Say, one user has laptop and desktop computers and makes commits from both of them. If there's any difference between Git user.name and user.email settings on those computers, it will lead this particular user will be counted twice. E.g., a user named John Doe works on two workstations and has set Git settings like this:

  • workstation 1:

    [user] name=John Doe email=johndoe@example.com

  • workstation 2:

    [user] name=John M. Doe email=johndoe@example.com

When John pushes to the mirrored Git repository from the first workstation, SubGit increases licensed users counter by one; when then he pushes from the second workstation SubGit counts him as a new user and increases the counter once more since user.name differs. Thus now there are two committers instead of one.

The same applies to the situation when user.name gets changed with time: when new user.name appears, SubGit treats this as if the commits are made by a new user and increases licensed users counter by one. That's why it's especially important to set correct user.name and user.email prior to establish SVN to Git mirror.

Configuration Options

There are two configuration options that relate to authors:

  • core.authorsFile<a name="core.authorsFile">

    this option represents a path to the authors mapping file or authors mapping helper program. The path can be either relative to the Git repository or absolute. The authors mapping file is a text file that lists SVN and Git usernames pairs, see more detail below. The authors mapping helper program is either script or binary executable file that provides authors related data in a certain form, find details below in helpers chapter. Note, there may be more that one authorsFile option set in the file, e.g.: [core] authorsFile = subgit/authors.txt authorsFile = /etc/authors.txt All the mentioned files contents will be merged into full list, but there's some specific: if some SVN username appears twice (or more) - only its first occurence will be applied. For example, if SVN username johndoe appears both in subgit/authors.txt subgit/authors.txt johndoe = John Doe EMAIL: johndoe@example.com and in /etc/authors.txt: /etc/authors.txt johndoe = John M. Doe EMAIL: john_doe@example.com then mapping from subgit/authors.txt will be applied since that file appears before /etc/authors.txt in the list.

  • core.defaultDomain<a name="core.defaultDomain">

    this option provides a domain name to be added to the username to form an email address in Git when automatic authors mapping is used. SubGit automatically fills that option with a hostname when subgit configure command is invoked. If the option is not set or omitted in the configuration file, SubGit will not generate the email address for Git commits and author's email will appears empty (just a pair of angle brackets with nothing in between) in the commit:

    $ git log -v commit d5d46afc3aa33240de8b5200e72611d4e0d72a99 Author: john_doe <> Date: Thu Jun 6 10:25:02 2017 +0200

    minor changes

Those are two authors-related SubGit options, but those are not all the configurations that may be needed authors mapping to work correctly: an additional setting may be needed on SVN side depending on how SubGit logs in the SVN repository.

Actually, there are two possible alternatives: SubGit can use one dedicated SVN account to log in SVN repository and it can use several different accounts for that. There's sugit/passwd file that's intended to store SVN accounts list that SubGit can use to get authenticated. When SubGit performs a Git commit translation into SVN revision (in case the mirror is established), it searches for the commit author in the authors file. If there's a match, SubGit then searches the passwd file for that exact SVN username. If the password for that account is found - SubGit uses that username to log in SVN and create a new revision. In this case, correct revision author is being set automatically since SubGit is logged using the correct account.

If SubGit uses one dedicated SVN account (in cases of cached SVN credentials, only one provided SVN account or if no matching SVN accounts found in sugit/passwd) it works a little different. It connects to SVN, creates a new revision and sets the revision's author equal to the SVN username it uses to log in. The problem is that this username usually is not correct author name - it might be, but commonly it differs. So SubGit then connects the SVN server second time and changes the newly created revision svn:author property to the correct author name.

And some additional configuration may be needed here, namely:

  • if SVN server 1.7.20, 1.8.12 or 1.9.0 or later is used and it's being accessed over http(s):// protocol
  • or if the SVN server is being accessed over svn:// protocol

then pre-revprop-change hook has to be enabled in the SVN repository. That requirement is introduced by SVN and that's why we need to make some changes on SVN side.

The hook per se is pretty simple: it just an executable file, script or binary, that may even do nothing, just start and exit. So you can just create as simple script as

  • Linux and OS X:

    #!/bin/sh exit 0;

  • Windows:

    @echo off exit 0

place it into SVN repository hooks directory:

SVN_REPOSITORY/
            hooks/
                pre-revprop-change     # for Linux and OS X
                pre-revprop-change.bat # for Windows

make the file executable in Linux/MacOS

chmod +x pre_revprop_change

and that's it!

Automatic Authors Mapping

When SubGit starts translation beween SVN and Git, it looks for authors mapping files or authors helper programs. If none of them present, it generates the mapping automatically, following these rules for the translation:

  • Subversion svnusername is translated to svnusername &lt;svnusername@defaultDomain>> in Git
  • Git Author Name &lt;email@domain.com> is translated to Author Name in Subversion

'defaultDomain' here stands for the core.defaultDomain SubGit configuration option. SubGit fills that setting with the hostname during subgit configure process, but it can be changed later. Also, if subgit configure is invoked with --layout auto option, SubGit fills the authors file with automatically generated mapping - i.e. SubGit connects to the SVN, checks through the project history and records all the SVN users found in the history. Then SubGit generates Git names and emails from those SVN usernames according to the rules above and records resulting mapping to the authors file.

Say, a user makes commits using john_doe SVN user; a SVN revision he made may look like:

------------------------------------------------------------------------
    r167 | john_doe | 2017-06-06 10:25:02 +0200 (Tue, 06 Jun 2017) | 1 line
    Changed paths:
       M /project/trunk/foo.c

    minor changes
    ------------------------------------------------------------------------

at some point, the SVN project is being translated to Git. If no explicit authors mapping provided, SubGit will create automatic mapping according to the rules we've mentioned, so the revision 167 we showed above will look like this in Git:

$ git log -v
    commit d5d46afc3aa33240de8b5200e72611d4e0d72a99
    Author: john_doe <john_doe@git.example.com>
    Date:   Thu Jun 6 10:25:02 2017 +0200

        minor changes

supposing Git machine has 'git.example.com' hostname.

And vise versa, if a user John Doe &lt;johndoe@example.com> will make commit to the Git repository:

commit 7faaf52c41a0325d4686f2a6f2851dc3e3739136
    Author: John Doe <johndoe@example.com>
    Date:   Thu Jun 8 20:06:31 2017 +0200

        minor changes to bar.c

being mirrored to SVN it will look like:

------------------------------------------------------------------------
    r173 | John Doe | 2017-06-08 20:06:31 +0200 (Thu, 08 Jun 2017) | 1 line
    Changed paths:
       M /project/trunk/bar.c

    minor changes to bar.c
    ------------------------------------------------------------------------

Note, that since SVN username and Git user.name commonly differ, licensed committers counter might be affected, see the details in chapter 2.

Authors File

...

revprop --revision 163

Unversioned properties on revision 163:
  svn:author
    johndoe
  svn:date
    2017-06-07T15:22:15.655243Z
  svn:log
    initial layout for the project

Git also stores author name along with commits, but this name differs from that in SVN: whereas SVN stores actual username, Git user identity consists of a name and email:

No Format
Git User <gituser@domain.com>

Those name and email don't relate to an actual username that is used to login to Git repository, they are being set in Git configuration, for example, they may be set by the commands:

Code Block
languagetext
themeFadeToGrey
titlegot config
$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

The Git user John Doe is then referred as

No Format
John Doe <johndoe@example.com>

 This exact line then appears as the author name in every commit that John Doe makes.

It worth to mention that Git holds not only author name, but also a committer name:

Code Block
languagetext
themeFadeToGrey
titleGit commit example
$ git cat-file commit HEAD
    tree 905df23db37b33320483fc6676bfc684078ed248
    parent 4a0cf06baa9aefaa20a13820265ef401d7b1c2b6
    author John Doe <johndoe@example.com> 1496849115 +0000
    committer Jane Doe <janedoe@example.com> 1496849115 +0000

Pro Git book describes the difference between those names as follows: the author is the person who originally wrote the work, whereas the committer is the person who last applied the work. So, if you send in a patch to a project and one of the core members applies the patch, both of you get credit – you as the author, and the core member as the committer.

Up to the version 3.3.4 SubGit only used Git author name, so committer name did not mean much.

Since v.3.3.4 SubGit uses committer name for the author mapping by default. Since v.3.3.6 there's also an option to switch back to using author name for the mapping with the core.mapGitCommitter setting:

No Format
[core]
   mapGitCommitter = true|false

When it’s set to true (default), SubGit uses committer's name. Set it to false to make SubGit use author's name.

Info
titleAuthors mapping affects licensing

Note, that SubGit uses authors names to count licensed users, see Licensing manual for details.

Configuration options
Anchor
ConfigOptions
ConfigOptions

All the configuration options reside in SubGit configuration file, that is situated in subgit subdirectory inside a newly created Git repository:

No Format
GIT_REPOS_PATH/subgit/config

There are two configuration options that relate to authors:

  • core.authorsFile
    Anchor
    core.authorsFile
    core.authorsFile

    this option represents a path to the authors mapping file or authors mapping helper program. The path is either relative to the Git repository or absolute. The default authors file is situated in SubGit directory:

    No Format
    [core]
       authorsFile = subgit/authors.txt

    There may be more that one authorsFile option set in the file:

    No Format
    [core]
       authorsFile = subgit/authors.txt
       authorsFile = /etc/authors.txt

    All the mentioned files content is being merged into a full list. If an SVN username appears more than once – only its first occurrence will be applied. For example, if an SVN username johndoe appears in both authors files:

    No Format
    subgit/authors.txt    
          johndoe = John Doe <johndoe@example.com>
    …
    /etc/authors.txt    
          johndoe = John M. Doe <john_doe@example.com>
    
    

    In this case, the SVN name johndoe is being mapped to Git name John Doe <johndoe@example.com> as it appears first in the list.

  • core.defaultDomain
    Anchor
    core.defaultDomain
    core.defaultDomain

    this option represents a domain name that is being used to construct a Git user email address in case if no explicit authors mapping has been provided. If the option is not set, Git user email appears empty in commits – a pair of angle brackets with nothing in between:

    No Format
    Author: john_doe <>


Info
titleAuthors mapping for SVN server 1.7.20, 1.8.12 or 1.9.0

If you are using SVN server 1.7.20, 1.8.12 or 1.9.0 or later, and the SVN repository is being accessed over http(s):// protocol, then pre-revprop-change hook may need to be enabled in SVN repository, see SVN authors are not being set correctly article.


Automatic Authors Mapping

Anchor
AutoMapping
AutoMapping

When SubGit starts translation between SVN and Git, it looks for authors mapping files or authors helper programs. If none of them present, it generates the mapping automatically, following these rules for the translation:

  • Subversion svn_user_name is being translated to svn_user_name <svn_user_name@defaultDomain> in Git
  • Git Author Name <email@domain.com> is translated to Author Name in Subversion

defaultDomain here stands for the core.defaultDomain SubGit configuration option. 

During initial subgit configure call, this setting is being set to the hostname. Also, if subgit configure is invoked with --layout auto option, SubGit connects to specified SVN project, checks its history and generates authors mapping using found SVN usernames and given defaultDomain. Then SubGit fills up the default authors file with the resulting mapping.

For example, suppose, the defaultDomain is set like follows:

No Format
[core]
   defaultDomain = example.com

A user made commits in SVN under john_doe SVN account.

When SubGit translates these commits to Git, it sets author name in Git commits in the following way:

No Format
Author: john_doe <john_doe@git.example.com>

Similarly, Git commits that made by John Doe <johndoe@example.com> Git user appears in SVN with John Doe author name.


Authors File

Anchor
AuthorsFile
AuthorsFile

The authors mapping file is a text file filled with SVN username - Git author pairs. Each pair maps SVN username to Git author like:

No Format
svn_user_name = Git Name <gitname@domain.name>

e.g. For example, a mapping for a user named John Doe, the mapping can be set as: may look like this:

No Format
john_doe = John Doe <john

...

_doe@example.com>

Each mapping pair must appear on a new line.

During SVN to Git translation, SubGit takes a SVN revision authors name and search the authors file for a match. If there is a matching line in the file - SubGit uses appropriate Git username to create commit in Git ; otherwise, if there is not - SubGit will construct a Git commit author name using automatic mapping. And vice versa - during Git commit to SVN revision translate, SubGit searches the file and use appropriate SVN username to create the SVN revision; if there's no matching pair in the file - automatic mapping is used.Note, that it is possible to map two different SVN usernames to the same Git author - for cases, say, when one team member uses two identities to make commits or some SVN username was renamed some time. In such case there might be such configuration created:translation, SubGit searches all specified authors files for a mapping pair. If the matching pair is found, SubGit uses appropriate author name. If there is no match, then SubGit generates author name according to automatic mapping rules.

It is possible to map more than one SVN username to the same Git author:

No Format
john_doe = John Doe <john_doe@example.com>

...

johndoe =

...

 

...

John Doe <john_doe@example.com>

...

Revisions that are created either by john_doe or johndoe are being translated to Git commits with author name John Doe <john_doe@example.com>. However, Git commits that are made by John Doe <john_doe@example.com> are being translated to SVN revision using first SVN username in authros files that matches particular Git name – john_doe in this case.

Similarly, one SVN user username can be mapped to different Git authors, e.g.:

No Format
jdoe = John Doe <john_doe@example.com>

...

jdoe = Jane Doe <jane_doe@example.com>

...

jdoe = James Doe <james_doe@example.com>

and againAgain, every Git commit made by those authors will be translated to SVN with revision author set to jdoe; but SVN . SVN revisions made by that jdoe SVN user will always be by jdoe is always set to first matching Git user in the authors file - files – John Doe <john_doe@example.com> in this particular case.

Changes made to authors files are being applied immediately, there is no need to restart mirroring or reinstall SubGit.

Scriptable Authors Mapping
Anchor
ScriptableMapping
ScriptableMapping

In addition to the authors filefiles, there 's is another way to provide establish SVN to Git authors mapping using authors helper program. The authors helper is an executable - script – script or binary - that – that is able to read data from standard input and provide send its work result to the standard output. The data helper reads from input and the data helper provides to output must fulfill certain formatInput and output data must fulfil the following formats:

  • for Git to Subversion mapping:

    No Format
    INPUT:
    Author Name author email OUTPUT: Subversionuser
     
       author Name
       author email
    OUTPUT: 
       subversion_user_name


  • for Subversion to Git mapping:

    No Format
    INPUT:
    Subversionusername OUTPUT: Author Name author
     
       subversion_user_name
    OUTPUT: 
       author Name
       author email


Every time SubGit finds needs to map an author name during translation, it invokes the authors mapping helper program, passes the name to it and expects the helper to answer with matching author name.

The authors helper program might be extremely useful especially when you have many authors and the authors list is constantly changing - new – new users are being added, names and emails changes and so on. If you use some catalog to store accounts - LDAP– LDAP, Active Directory, OpenID and so forth - you – you can create a script that will gather needed information from the catalog and provide it to SubGit.

During On configuration or installation phase SubGit places simple creates a simple authors.sh script into subgit/in samples directory subdirectory. This script doesn't do much useful, it 's just some 'proof of concept' that demonstrates how input data is being read and output data provided.

Expand
titleThe

...

simple authors helper script
#!/bin/sh
while read input
    do
      if [ -z "$name" ]; then
        name="$input"
      elif [ -z "$email" ]; then
       email="$input"
      fi
    done

    if [ -z "$email" ]; then
      echo Full Name
      echo author@email.com
    else
      echo shortSvnUserName
    fi

...

exit 0;

Depending on what was sent to its input script returns either Git author name and email or SVN short name. It can be extended to, say, receive the data from catalog or database thereby facilitate the authors mappingFor more details on the authors helper see Script-provided authors mapping article.