Importing Data with OpenRefine (Wikimedia Commons)
Importing Data with OpenRefine (Wikimedia Commons)
Section titled “Importing Data with OpenRefine (Wikimedia Commons)”Upload Your Images to PAWS
Section titled “Upload Your Images to PAWS”- go to https://hub-paws.wmcloud.org/ and sign in with your Wiki account
- make a new folder called
images
and upload your images. - Make sure you keep the links to their source somewhere (not to the image itself, just the page with information about it. It needs to have copyright information!)
Get Your Image Paths
Section titled “Get Your Image Paths”- make sure you are in your images folder
- click on Bash in the Console category on the right side of your PAWS interface
- enter the command
ls
and hitShift + Enter
to run it
Create OpenRefine Project
Section titled “Create OpenRefine Project”- open a new tab in your PAWS hub and click on OpenRefine
- choose Create Project > Get data from > Clipboard
- write
filepath, source, copyright, creator, date, description, wikitext, location
into the textbox - paste your image paths under that, then click Next
- in the next window change these settings:
Parse data as…
>CSV / TSV / separator-based files
Columns are separated by
>commas (CSV)
- give your project a name and click
Create project
Add Data
Section titled “Add Data”- now you should have a project with eight columns:
- filepath, which has data, and
- source,
- copyright,
- creator,
- date,
- description,
- wikitext and
- location, which are empty
- click on the little arrow next to filepath
- select
Edit cells
>Transform
- in the window, replace
value
with'/home/paws/images/'+value
and clickOK
- this will allow OpenRefine to find the images you uploaded to PAWS
- click the arrow again
- select
Edit column
>Add column based on this column…
- enter filename as the new column name and
- either leave the textbox empty to generate an empty column,
- or use this expression:
value.replace(/.*\//, "").replace(/[-_]/, " ").toTitlecase()
to generate a name from your filepath
- give your items descriptive and unique names (per Common’s guidelines)
- make sure to keep the file ending!
Now you can fill in the rest of your data. You can do it by hand or, if all cells of a column take the same data, use the Transform method from above. In that case, delete value
and instead input your data in quotes.
- paste your saved links from earlier into source
- you can leave this column empty if you’re the creator of your images
Public domain
into copyright- the name(s) of whoever took your images into creator
- date means date(s) of publication (in YYYY-MM-DD format)
- add a short description to description
- add the name of your GLAM institution to location
- copy and paste this into wikitext:
=={{int:filedesc}}=={{Information}}
=={{int:license-header}}=={{PD-old-70}}{{PD-US}}
[[Category:HSH Kurs 2025]]
- you might need to change the
{{PD-old-70}}
to another license{{PD-heirs}}
> for images released into the public domain by the original creator’s heirs{{PD-self}}
> for images you took yourself{{PD-author|
author
}}
> for images someone else released into the public domain- see Commons:Copyright tags for more
- (You can create your own category if you want to keep your images all in one place. To do this, add
[[Category:Uploaded by YOURUSERNAME]]
to the wikitext. This will generate a link on the bottom of your Commons pages. Click on that and you can create your category. PickUser Category
as a parent. See also Commons:Categories for more information)
Reconciliate Data
Section titled “Reconciliate Data”- from the filename menu select
Reconcile
>Actions
>Create a new item for each cell…
- pick Wikimedia Commons (en) from the dropdown menu and click
OK
- this will create new items to upload to Commons
- the header of your filename column should be underlined in green and every entry should have a new behind it
- those of the other columns that represent entities or concepts (like people, organisations, places…) should be reconciliated against Wikidata in order to standardise them and link them to their Wikidata entries.
- things like URLs, dates or freetext like descriptions won’t be reconciled
- in our case, do the following for copyright, creator, and description:
Reconcile
>Start reconciling…
- choose Wikidata reconci.link (en) as the service to reconcile against.
- choose the most fitting entity type (usually that is the pre-selected one)
- click
Start reconciling…
- (you can also search for a type to reconcile against or choose to reconcile against no type in particular)
- the cells will automatically connect to matches that most likely fit your data or show you a selection
- check their name and description to see if they’re correct
- if they are, click
match this cell
ormatch all identical cells
- if they aren’t, check if there is a better match under See more”, orSearch for match”
- you can also search on Wikidata to see if the entity you want to link to exists
Upload to Commons
Section titled “Upload to Commons”- now from the
Wikibase
button in the upper right chooseEdit Wikibase schema
- choose Wikimedia Commons as
Target Wikibase instance
- open the menu again and select
Manage schemas…
- either download this file or copy-paste its contents into a
.json
file: mediawiki_schema on Github Gist - upload the file
- go back into the Schema tab and choose new_wikimedia_schmema from the dropdown menu
Start from an existing schema:
- (You might need to reload the page.)
- it should look like this:
- if your description is in another language than English, make sure to replace
en
in the Captions section
- if you’re the creator of the images:
- delete the
described at URL
qualifier and - replace
file available on the internet
withoriginal creation by uploader
- delete the
- click Wikibase in the upper right again and choose
Upload edits to Wikibase…
- OpenRefine will warn you about New media uploaded, which is what we want and not a problem
- write a short note about your upload in the summary field, then click the
Upload edits
button and you’re done! - your filename column should now contain links that take you to the newly generated Mediawiki Commons pages, which should look like this:
Edit Data on Commons
Section titled “Edit Data on Commons”- Note: You cannot change the images and changing their names is highly discouraged by Commons
- the process for editing or adding to data is the same as uploading, with the difference that you don’t use the full schema
- remove every value that doesn’t have to be changed and only keep the file name and new data
- you can decide for each value if it will replace the old data or merge with it (in
statement
>configure
) - then upload as before