Wikipedia:Request a query
|
|||||
This page has archives. Sections older than 14 days may be automatically archived by Lowercase sigmabot III when more than 4 sections are present. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.
Talk pages
[edit]Hey, I'm looking for help with creating two SQL queries
- Find all talk pages (excluding archive talk pages) that do not use {{WikiProject banner shell}}.
- Find all articles with no talk page (red link).
Gonnym (talk) 13:11, 15 January 2025 (UTC)
- There are at least hundreds of thousands, likely millions, of each. quarry:query/89907 and 89908 have the first 10k. —Cryptic 14:18, 15 January 2025 (UTC)
- Thanks! The first query is not producing correct results. The second result on the list is Talk:Destroy-Oh-Boy! which does use the template. Gonnym (talk) 16:50, 15 January 2025 (UTC)
- No, the second result is Talk:!!Destroy-Oh-Boy!!. The page it redirects to transcludes the template. —Cryptic 17:40, 15 January 2025 (UTC)
- (Which, of course, isn't that helpful an answer, so I've updated the query in-place to exclude talk pages that are redirects. —Cryptic 17:43, 15 January 2025 (UTC))
- No, the second result is Talk:!!Destroy-Oh-Boy!!. The page it redirects to transcludes the template. —Cryptic 17:40, 15 January 2025 (UTC)
- Thanks! The first query is not producing correct results. The second result on the list is Talk:Destroy-Oh-Boy! which does use the template. Gonnym (talk) 16:50, 15 January 2025 (UTC)
Serial commas in page titles
[edit]I posted the following request at WP:BOTR and was advised to come here.
Extended content
|
---|
Hello, I'm not sure that this request can be completed automatically; please accept my apology if it can't. I just want some lists, without edits to anything except the page where you put the lists, so it's not a CONTEXTBOT issue: just a "good use of time" issue. Could you compile some lists of pages in which serial commas are present or are omitted? I just discovered List of cities, towns and villages in Cyprus and created List of cities, towns, and villages in Cyprus as a redirect to support serial commas. Ideally, whenever a page could have a serial comma in the title, we'd have a redirect for the form not used by the current title, but I assume this isn't always the case. First off, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that use a serial comma. I think the criteria might be:
I'm unsure whether they're rigid enough, or whether they might return a lot of false positives. Secondly, I'd like a list of all pages whose titles are identical to the first list, except lacking a serial comma. Redirects would be acceptable here, since if I'm creating serial-comma redirects, it helps to know if it already exists. Thirdly, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that could use a serial comma but don't. I think the criteria would be:
Once the list is complete, the bot checks each page with the following process: "if I inserted a comma immediately before 'and' or 'or', would it appear on the first list?" If the answer is "no", the bot removes it from the list. Fourthly, I'd like a list of all pages whose titles are identical to the third list, except they have a serial comma. Again, redirects are acceptable.
|
After my request, User:Qwerfjkl suggested that I come here and offered some comments: intitle:/[A-Za-z ]+, [A-Za-z ]+, (and|or) [A-Za-z ]+/
would work for the first request and intitle:/[A-Za-z ]+, [A-Za-z ]+ (and|or) [A-Za-z ]+/
would work for the second.
The latter two lists are trickier. To this, I replied Is there a way to download a list of results from a particular search? As far as I know, the only way to get a list of results is to copy/paste the whole thing somewhere and delete everything that's not a page title. (With 11,544 results for the first search, this isn't something I want to do manually.) Also, the first search includes redirects, e.g. Orders, decorations, and medals of the United Nations is result #1.
Please ping me if you reply; I've never heard of this page before, and I might not remember to come back if you don't ping me. Thanks for your help. Nyttend (talk) 03:22, 20 January 2025 (UTC)
- @Nyttend: Doing the whole thing would be very awkward; we don't have access to create temporary tables on the replicas, so it would have to all be in one query, and I haven't been able to cajole the query pessimizer into doing it efficiently. What I can do quite easily is give you lists of all mainspace titles that match each of those regexes, along with whether each is a redirect, and what the titles would be if the comma is added/removed (as appropriate). quarry:query/90019 and 90020. Those should be sufficient to construct your four lists, though perhaps not trivially. I can do that too if you're unable, though not tonight and maybe not tomorrow either; let me know. —Cryptic 04:32, 20 January 2025 (UTC)
- Could you give me two tables with four columns of data? First column is the link, second column is existing status (red, blue redirect, blue non-redirect), third column is alternate-comma variant (with-comma for the without-comma table; without-comma for the with-comma table), and fourth column is status of variant. If you do this in a simple table, I should be able to copy/paste it into Excel, sort so I can easily remove everything I don't want, and move the remaining content back to MediaWiki. Nyttend (talk) 04:44, 20 January 2025 (UTC)
- Ah, never mind, I've explored the links and just realised that there's a way to download a CSV with almost all this information. This should suffice. Thanks! Nyttend (talk) 04:47, 20 January 2025 (UTC)
- @Nyttend: I've updated both queries to include the status of the variant titles. —Cryptic 05:00, 20 January 2025 (UTC)
Ref count
[edit]Can anyone create/link me to a modified version of User:Bunnypranav/Reports#Probable "draftify because of no sources" candidates which has 1-5 refs (inline citations) only. I would appreciate if you could just add another section below the above link and create/test the query. Thanks in advance! ~/Bunnypranav:<ping> 13:50, 24 January 2025 (UTC)
- References aren't stored in the database at all except in the page text, which isn't copied to the toolforge replicas. —Cryptic 14:37, 24 January 2025 (UTC)
- Hmm. How is the no refs query made? Is there any way to change it from the existing no refs exists to x number of refs exist. From my very novice understanding of the existing query, it checks no of ext links, can we modify it to find a specific number of them for the requested query? Thanks for your help. ~/Bunnypranav:<ping> 16:32, 24 January 2025 (UTC)
- It's looking at whether there are any external links, yes, and I'll go ahead and change the query. But it's going to get hardly any results either way - the only pages it finds even without limiting external links at all are Battle of Nalos (3 links) and Operation Qazanchi (6), I suspect by its requirement of a no-citations tag, and one specifically added through page-triage at that. —Cryptic 16:50, 24 January 2025 (UTC)
- Thanks Cryptic for that change, that will help catch some more possibilities eventually. @DreamRimmer, since I see you have expertise in python scripts, could a py script help find the no. of citations tags for pages younger than 90 days and list them somewhere? ~/Bunnypranav:<ping> 06:28, 25 January 2025 (UTC)
- I would say this is technically doable, but it's not advisable due to the large number of pages involved, roughly 50,000 to 60,000 pages created in the last 90 days. Checking the number of references for these pages would require making the same number of API requests, which would significantly waste resources. A more efficient approach would be to download the relevant database dumps and generate your list from that data. – DreamRimmer (talk) 07:40, 25 January 2025 (UTC)
- Is there any way to limit the number of pages queried from the beginning? In any case, I am ready to atleast try to process the data dumps from my end, could you possibly help me with a py script since I do not know much to start from scratch. Up to you though and thanks for the reply! ~/Bunnypranav:<ping> 07:43, 25 January 2025 (UTC)
- There's a tool in WP:AWB called the database scanner that may be helpful. –Novem Linguae (talk) 13:35, 25 January 2025 (UTC)
- Noted, thanks! ~/Bunnypranav:<ping> 13:36, 25 January 2025 (UTC)
- There's a tool in WP:AWB called the database scanner that may be helpful. –Novem Linguae (talk) 13:35, 25 January 2025 (UTC)
- Is there any way to limit the number of pages queried from the beginning? In any case, I am ready to atleast try to process the data dumps from my end, could you possibly help me with a py script since I do not know much to start from scratch. Up to you though and thanks for the reply! ~/Bunnypranav:<ping> 07:43, 25 January 2025 (UTC)
- I would say this is technically doable, but it's not advisable due to the large number of pages involved, roughly 50,000 to 60,000 pages created in the last 90 days. Checking the number of references for these pages would require making the same number of API requests, which would significantly waste resources. A more efficient approach would be to download the relevant database dumps and generate your list from that data. – DreamRimmer (talk) 07:40, 25 January 2025 (UTC)
- Thanks Cryptic for that change, that will help catch some more possibilities eventually. @DreamRimmer, since I see you have expertise in python scripts, could a py script help find the no. of citations tags for pages younger than 90 days and list them somewhere? ~/Bunnypranav:<ping> 06:28, 25 January 2025 (UTC)
- It's looking at whether there are any external links, yes, and I'll go ahead and change the query. But it's going to get hardly any results either way - the only pages it finds even without limiting external links at all are Battle of Nalos (3 links) and Operation Qazanchi (6), I suspect by its requirement of a no-citations tag, and one specifically added through page-triage at that. —Cryptic 16:50, 24 January 2025 (UTC)
- Hmm. How is the no refs query made? Is there any way to change it from the existing no refs exists to x number of refs exist. From my very novice understanding of the existing query, it checks no of ext links, can we modify it to find a specific number of them for the requested query? Thanks for your help. ~/Bunnypranav:<ping> 16:32, 24 January 2025 (UTC)
Website Infobox parameter
[edit]Hi Cryptic, could you query me a list of articles that: Are part of {{WikiProject Academic Journals}}, have a {{Infobox journal}}, don't have a |website parameter? Thanks Nobody (talk) 18:00, 24 January 2025 (UTC)
- There's no way to see whether they have a parameter. The usual solution is to add a hidden tracking category in the template. —Cryptic 18:51, 24 January 2025 (UTC)
- Thanks for the reply, I've asked for it at Template talk:Infobox journal#Template-protected edit request on 24 January 2025, which was the reason for this request. Nobody (talk) 19:45, 24 January 2025 (UTC)