MW Import&Export

From Wiki4Intranet
Revision as of 15:17, 20 May 2011 by VitaliyFilippov (Talk | contribs) (Новая страница: «Original MediaWiki import/export mechanism exports only page texts, but not uploaded files. This is fixed in the MediaWiki4Intranet MW distribution ({{Bug|4736...»)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Original MediaWiki import/export mechanism exports only page texts, but not uploaded files.

This is fixed in the MediaWiki4Intranet MW distribution (Bug:47362) using a patch to MediaWiki code (see #Links). This is not the only feature, there are many useful features added.

So, the features:

Upload export/import

We've implemented file upload export and import. There are two options:

  • Either file data is included into the export file, which is made multipart/related "archive" from just XML.
  • Or the file remains XML, and HTTP links and SHA1 hashes of each exported file is included into it. So these files can be downloaded by target Wiki, if it doesn't has them yet. This greatly reduces export file size, but requires HTTP access to source wiki.

To implement this, we also needed to change filerepo/LocalFile.php slightly, because in the stock MediaWiki, old upload revisions have a strange "archivation timestamp», usually equal to the timestamp of next revision. Moreover, sometimes it's not just equal to timestamp of the next revision, but differs from it a second or two. The patch fixes this behaviour, so each old upload has it's own timestamp in the name, and their names become independent of the next revision, so the import/export also becomes more correct.

Caution.svg Because of this change, if you apply the patch to a non-empty MediaWiki installation, you need to run a maintenance tool from #Links which will rename old uploaded files.

Advanced page selection

We've also implemented advanced selection of pages for export:

  • From a category, including its subcategories.
  • And/or from namespace.
  • Inclusion of used templates, images and/or linked pages («template/image/page link closure»).
    It's very important that this "closure" is really "closure" and works correctly for any template inclusion depth, for images, included into templates, and etc.
    Also, the selection of linked pages is now used by «Add pages» instead of «user-invisible» features that are ran after clicking "Export". I.e., the "Export" now exports just pages that are in the list.

After page selection, the following filters could be applied:

  • Selection only of pages changed after some date, which allows for "incremental" replication - replication only of pages changed after some date.
  • Selection only of pages which are NOT in the selected category («not in category» filter).

Also, IntraACL rights are supported — закрытые от чтения страницы не попадают в список выбранных, также в список в этом случае не попадает лишняя страница «Доступ запрещён».

Conflict detection

Оригинальный импорт MediaWiki работает просто — заливает ревизии и говорит «готово». В новых версиях он говорит чуть больше — «готово, что-то залил», или «готово, ничего не залил» на каждую страницу. Однако если часто использовать импорт-экспорт, этого недостаточно: хочется видеть, у кого версия страница была новее, вносил ли кто-то изменения локально и т. п. По сути, это добавляет DVCS-возможности в MediaWiki — наборами ревизий статей становится можно обмениваться в широком круге Wiki-систем.

Вот это и есть «выявление конфликтов» импорта и расширенный отчёт по импортированным страницам. Он включает в себя по сообщению на каждую импортированную страницу, сообщений бывает 5 вариантов:

  1. Все редакции были ранее импортированы. Локальных изменений нет.
  2. Все редакции были ранее импортированы. Страница изменена локально.
  3. N версий.
  4. N версий (новая страница).
  5. N версий (конфликт: XX (импорт) и YY (локальная)).

При всех этих доработках механизм импорта/экспорта обратно совместим с экспортными файлами стандартных версий MediaWiki. То есть, в модифицированную Wiki можно загрузить экспортный файл из стандартной, а в стандартную — из модифицированной, в случае, если не экспортируются файлы.

Installation

Installation is very simple:

  • Download the patch for your MediaWiki version, apply it with patch -p0 < downloaded_patch.diff.
  • If there is no patch for your MediaWiki version, either update MediaWiki, try one from the version close to yours, or ":-(" and you'll have to try to fix it by hand.
  • If you are installing a patch to a non-empty MediaWiki installation, also download maintenance script, put it into maintenance directory and run it from there with php file-upload-renamer.php.

TODO

  • Upload import report. Now, import reports only about importing text revisions of the page File:XXX. So, if the article exists and has no real upload, and if an export file which adds this revision is imported, then the upload will emerge, and import will tell nothing about it.

Changes

  • 2011-02-22: Disabled adding of empty marker revision with comment "N revisions imported" as it can lead to infinite multiplication of revisions in the case of cross-replication.
  • 2011-03-15: Fixed «Only last revision» checkbox behaviour for uploads, now it's working as expected. Fixes SHA1 hash calculation, as MediaWiki uses base36-encoding of it, not hex.
  • 2011-05-19: (Changes are available only for MediaWiki4Intranet and MediaWiki 1.16.2, not for old or trunk versions): Improvements in page selection:
    1. Selection of used images, templates and page links is moved into "Add pages». Export now exports only pages which are listed in the textbox.
    2. Added filter «only pages not in category», applied after all additions to the page list, which allows denying export of some category.
    3. Changed "modification time" filter behaviour - it's now also applied after all additions, just like the "not-category" filter. This allows for "incremental" replication of pages.
    4. Page selection code is optimized and works much faster now.

Links