Skip to content

Importing Data with OpenRefine (Wikimedia Commons)

Importing Data with OpenRefine (Wikimedia Commons)

Section titled “Importing Data with OpenRefine (Wikimedia Commons)”
  • go to https://hub-paws.wmcloud.org/ and sign in with your Wiki account
  • make a new folder called images and upload your images.
  • Make sure you keep the links to their source somewhere (not to the image itself, just the page with information about it. It needs to have copyright information!)

  • make sure you are in your images folder
  • click on Bash in the Console category on the right side of your PAWS interface
  • enter the command ls and hit Shift + Enter to run it

  • open a new tab in your PAWS hub and click on OpenRefine
  • choose Create Project > Get data from > Clipboard
  • write filepath, source, copyright, creator, date, description, wikitext, location into the textbox
  • paste your image paths under that, then click Next

  • in the next window change these settings:
    • Parse data as… > CSV / TSV / separator-based files
    • Columns are separated by > commas (CSV)
  • give your project a name and click Create project

  • now you should have a project with eight columns:
    • filepath, which has data, and
    • source,
    • copyright,
    • creator,
    • date,
    • description,
    • wikitext and
    • location, which are empty
  • click on the little arrow next to filepath
  • select Edit cells > Transform

  • in the window, replace value with '/home/paws/images/'+value and click OK
  • this will allow OpenRefine to find the images you uploaded to PAWS

  • click the arrow again
  • select Edit column > Add column based on this column…

  • enter filename as the new column name and
    • either leave the textbox empty to generate an empty column,
    • or use this expression: value.replace(/.*\//, "").replace(/[-_]/, " ").toTitlecase() to generate a name from your filepath
  • give your items descriptive and unique names (per Common’s guidelines)
  • make sure to keep the file ending!

Now you can fill in the rest of your data. You can do it by hand or, if all cells of a column take the same data, use the Transform method from above. In that case, delete value and instead input your data in quotes.

  • paste your saved links from earlier into source
    • you can leave this column empty if you’re the creator of your images
  • Public domain into copyright
  • the name(s) of whoever took your images into creator
  • date means date(s) of publication (in YYYY-MM-DD format)
  • add a short description to description
  • add the name of your GLAM institution to location
  • copy and paste this into wikitext:
=={{int:filedesc}}==
{{Information}}
=={{int:license-header}}==
{{PD-old-70}}
{{PD-US}}
[[Category:HSH Kurs 2025]]
  • you might need to change the {{PD-old-70}} to another license
    • {{PD-heirs}} > for images released into the public domain by the original creator’s heirs
    • {{PD-self}} > for images you took yourself
    • {{PD-author|author}} > for images someone else released into the public domain
    • see Commons:Copyright tags for more
  • (You can create your own category if you want to keep your images all in one place. To do this, add [[Category:Uploaded by YOURUSERNAME]] to the wikitext. This will generate a link on the bottom of your Commons pages. Click on that and you can create your category. Pick User Category as a parent. See also Commons:Categories for more information)
  • from the filename menu select Reconcile > Actions > Create a new item for each cell…
  • pick Wikimedia Commons (en) from the dropdown menu and click OK
  • this will create new items to upload to Commons

  • the header of your filename column should be underlined in green and every entry should have a new behind it
  • those of the other columns that represent entities or concepts (like people, organisations, places…) should be reconciliated against Wikidata in order to standardise them and link them to their Wikidata entries.
  • things like URLs, dates or freetext like descriptions won’t be reconciled
  • in our case, do the following for copyright, creator, and description:
    • Reconcile > Start reconciling…
    • choose Wikidata reconci.link (en) as the service to reconcile against.
    • choose the most fitting entity type (usually that is the pre-selected one)
    • click Start reconciling…
    • (you can also search for a type to reconcile against or choose to reconcile against no type in particular)

  • the cells will automatically connect to matches that most likely fit your data or show you a selection
  • check their name and description to see if they’re correct
  • if they are, click match this cell or match all identical cells
  • if they aren’t, check if there is a better match under See more”, orSearch for match
  • you can also search on Wikidata to see if the entity you want to link to exists

  • now from the Wikibase button in the upper right choose Edit Wikibase schema

  • choose Wikimedia Commons as Target Wikibase instance
  • open the menu again and select Manage schemas…
  • either download this file or copy-paste its contents into a .json file: mediawiki_schema on Github Gist
  • upload the file
  • go back into the Schema tab and choose new_wikimedia_schmema from the dropdown menu Start from an existing schema:
  • (You might need to reload the page.)
  • it should look like this:

  • if your description is in another language than English, make sure to replace en in the Captions section
  • if you’re the creator of the images:
    • delete the described at URL qualifier and
    • replace file available on the internet with original creation by uploader
  • click Wikibase in the upper right again and choose Upload edits to Wikibase…
  • OpenRefine will warn you about New media uploaded, which is what we want and not a problem
  • write a short note about your upload in the summary field, then click the Upload edits button and you’re done!
  • your filename column should now contain links that take you to the newly generated Mediawiki Commons pages, which should look like this:

Link to Commons

  • Note: You cannot change the images and changing their names is highly discouraged by Commons
  • the process for editing or adding to data is the same as uploading, with the difference that you don’t use the full schema
    • remove every value that doesn’t have to be changed and only keep the file name and new data
    • you can decide for each value if it will replace the old data or merge with it (in statement > configure)
    • then upload as before