MW Import&Export

From Wiki4Intranet
Jump to: navigation, search

Original MediaWiki import/export mechanism exports only page texts, but not uploaded files.

This is fixed in the MediaWiki4Intranet MW distribution (Internal Bug 47362) using a patch to MediaWiki code (see #Links). This is not the only feature, there are many useful features added.

The features are listed below. With all these improvements, the import mechanism is backwards compatible and is able to import dumps from MediaWikis without these improvements.

Upload export/import

We've implemented file upload export and import. There are two options:

  • Either file data is included into the export file, which is made multipart/related "archive" from just XML.
  • Or the file remains XML, and HTTP links and SHA1 hashes of each exported file is included into it. So these files can be downloaded by target Wiki, if it doesn't has them yet. This greatly reduces export file size, but requires HTTP access to source wiki.

To implement this, we also needed to change filerepo/LocalFile.php slightly, because in the stock MediaWiki, old upload revisions have a strange "archivation timestamp», usually equal to the timestamp of next revision. Moreover, sometimes it's not just equal to timestamp of the next revision, but differs from it a second or two. The patch fixes this behaviour, so each old upload has it's own timestamp in the name, and their names become independent of the next revision, so the import/export also becomes more correct.

link= Because of this change, if you apply the patch to a non-empty MediaWiki installation, you need to run a maintenance tool from #Links which will rename old uploaded files.

Advanced page selection

We've also implemented advanced selection of pages for export:

  • From a category, including its subcategories.
  • And/or from namespace.
  • Inclusion of used templates, images and/or linked pages («template/image/page link closure»).
    It's very important that this "closure" is really "closure" and works correctly for any template inclusion depth, for images, included into templates, and etc.
    Also, the selection of linked pages is now used by «Add pages» instead of «user-invisible» features that are ran after clicking "Export". I.e., the "Export" now exports just pages that are in the list.

After page selection, the following filters could be applied:

  • Selection only of pages changed after some date, which allows for "incremental" replication - replication only of pages changed after some date.
  • Selection only of pages which are NOT in the selected category («not in category» filter).

Also, IntraACL rights are supported — закрытые от чтения страницы не попадают в список выбранных, также в список в этом случае не попадает лишняя страница «Доступ запрещён».

Conflict detection

The stock MediaWiki import/export works very simply — it creates the revisions and tells «done». In more later versions, it tells «done, something uploaded», or «done, nothing uploaded» for each imported page. But it's not enough if you use it often: you want to see where the newer version was, was it changed by someone and etc. These features are available in our version, and it adds DVCS-like functionality into MediaWiki — you can exchange revision sets between many Wiki installations.

This is called "import conflict detection" and advanced import report. It includes a message for each imported page, it can be of one of 5 types:

  1. All revisions were previously imported. No local changes.
  2. All revisions were previously imported. Page changed locally.
  3. N revisions imported.
  4. N revisions imported (new page).
  5. N revisions imported (conflict: XX (import) and YY (local)).

Installation

Installation is very simple:

  • Download the patch for your MediaWiki version, apply it with patch -p0 < downloaded_patch.diff.
  • If there is no patch for your MediaWiki version, either update MediaWiki, try one from the version close to yours, or ":-(" and you'll have to try to fix it by hand.
  • If you are installing a patch to a non-empty MediaWiki installation, also download maintenance script, put it into maintenance directory and run it from there with php file-upload-renamer.php.

TODO

  • Upload import report. Now, import reports only about importing text revisions of the page File:XXX. So, if the article exists and has no real upload, and if an export file which adds this revision is imported, then the upload will emerge, and import will tell nothing about it.

Changes

  • 2011-02-22: Disabled adding of empty marker revision with comment "N revisions imported" as it can lead to infinite multiplication of revisions in the case of cross-replication.
  • 2011-03-15: Fixed «Only last revision» checkbox behaviour for uploads, now it's working as expected. Fixes SHA1 hash calculation, as MediaWiki uses base36-encoding of it, not hex.
  • 2011-05-19: (Changes are available only for MediaWiki4Intranet and MediaWiki 1.16.2, not for old or trunk versions): Improvements in page selection:
    1. Selection of used images, templates and page links is moved into "Add pages». Export now exports only pages which are listed in the textbox.
    2. Added filter «only pages not in category», applied after all additions to the page list, which allows denying export of some category.
    3. Changed "modification time" filter behaviour - it's now also applied after all additions, just like the "not-category" filter. This allows for "incremental" replication of pages.
    4. Page selection code is optimized and works much faster now.

Links