First, please see “Oricon extended ranking data, backlog complete! Explanation and summary” to get the background on what this data is.

I’ve taken my source file for this (which is messy and complicated and slow) and put together a Google Doc with just the relevant bits. Before that though, a lot of caveats:

1. Columns are mostly self-explanatory (Rank, Title, Release date). High is a disc’s peak position in the charts. Times is the number of weeks it’s ranked (in the extended rankings). Known Sales is a number I can cross-reference against known sources. Minimum sales is the least a disc could have sold considering the next lowest Known Sales number.

2. Even back to back scrapes (with no weekly update between) return different ordering. Seems to be because discs with the same sales sort randomly. For example one scrape you’ll get Release A > Release B > Release C. Scrape again and it may be Release C > Release A > Release B. This is why you currently see three DVDs with a Minimum Sales number of 211 even though they’re *below* the lowest known threshold disc (ロボットガールズZ VOL.1). When I scraped previously, they were above it. This just means they’re all tied at 211.

3. Oricon’s site often duplicates items across two pages. In the latest scrape for example, “ターザン(04.02)” was returned at #9339 and #9342. This happens often and I try to dedupe.

4. My spreadsheet has more rows than there are discs on Oricon’s site. Sometimes this might be missed dupes, sometimes Oricon’s list may no longer include a disc that it previously did. I don’t delete discs from mine, so I may have discs Oricon no longer lists. For example: AKB0048 VOL.01, ケロロ軍曹 5thシーズン 11, PSYCHO-PASS サイコパス VOL.6 DVD, ヨスガノソラ 4 for a few totally random examples. They were clearly on Oricon’s list at one point because I have them. But in the latest scrape, they weren’t there anymore. They could easily return next week! This seems to affect DVD more than BD.

5. To merge updated data with my ordered list, I match on title. This works in virtually all cases save a handful of movies (mostly Ghibli) where Oricon’s title is identical between original and re-release, e.g. となりのトトロ from 2001 and となりのトトロ from 2014. I think I’ve handled most of these situations, but it’s possible you’ll find a few discs that look like they’re in totally the wrong ranking spot. If so, I don’t think it’ll be very many. I’d match on both title and date to avoid this, but my spreadsheet takes upwards of 30 seconds to save or calculate formulas with 8 cores running as is. I’d go nuts if I bogged it down more.

6. I’m using to scrape and it’s a really cool tool. Especially given it’s free *and* they run the scrape from their servers, which both means it’s faster and that I won’t get my IP banned if I scrape too aggressively. But it gets the encoding wrong on some non-standard numeral characters and outputs �W instead of Ⅳ or �C instead of ④, for example. I eventually figured out what each maps to, but it would be confusing if I just dumped it as is. So I did a find and replace on the cleaned data to fix it. This is yet more manual steps though, another reason why I don’t intend to update this all that frequently.

7. Two discs are clearly errors on Oricon’s part. D.C.~ダ・カーポ~ DVD-BOXⅡ (DA Capo s1 v1) and フルーツバスケット 4 (Fruits Basket v4) are ranking in a spot that would, suspiciously, be exactly double their known sales. There’s nothing about either release that would justify such a thing, so I kept their old numbers in my data.

With that out of the way (I’m sure you read it all, right?)…

(Two sheets, one for BD, the other for DVD)

    As mentioned above I don’t know if I’ll update this much, but I did do some fixes while cleaning it up that might make it quicker to do in the future. Maybe once a month or so? If I remember. Committing to maintaining yet more sales list is probably not my best idea but…

    Oh and uh, it should be obvious but I know I’ve overlooked it in other google docs before myself: there are two sheets in that spreadsheet, one for BDs and one for DVDs. Not just one.

    This is much much easier to work with, thanks!

    Minor correction i’d like to point out… LL Vol.1 LE known sales is actually 115,870. It is the 396 RE sales that makes the Vol.1 total 116,266. Unless you did that to make it less tedious to remember those RE sales, in the average/vol calculation?

      Sorry, I just noticed this one too…

      Gundam UC 7 should be 118,997, from the Yearly 2014 Anime BD list.

      Nope, just forgot to remove the REs. Will fix, thanks.

        You put 118,997 for Gundam UC Vol.1, when it should be for Vol.7.
        2014年 BD年間 アニメTOP50 (2013/12/24付~2014/12/15付)
        *,118,997 機動戦士ガンダムUC 7 初回限定版

        Some other stuff I noticed…

        #155 for BDs, is that right? The 12,049 sales seems out of place there when #154 is at 27,197 and #156 is at 27,088. Perhaps that is the release from back on 2008/12/03?

        #2081 for BDs (はぐれ勇者の鬼畜美学 Vol.3), that also seems to be out of place. Perhaps it should be at #2,151?

        #991 for BDs (エウレカセブンAO 1【初回限定版】), probably a typo, but it should be 4,461.

        2012年6月度 BD月間TOP50 (06/18~07/09付)
        31 4,461 エウレカセブンAO 1 【初回限定版】

          Oh, misread the UC comment, fixed.

          Lupin is one of those that appears under the same name twice, which screws up my sorting. I checked DVD for those, but I guess there was one in BD too. It should be down at 319 and the 2008 release at 155.

          Fixed 16/61 typo for Eureka AO, adjusted WCW, Idolmaster, Barakamon accordingly.

          Hagure Yuusha is out of place because it disappeared from the Oricon list between scrapes. It’s annoying but this happens quite a but (it may suddenly reappear next download), though much less so in the BD list. I don’t want to remove a disc I’ve already captured, so I have to use some kind of ranking number. In cases where a title is not found in this week’s Oricon scrape, I fall back to whatever number the disc ranked at last time. It’s a better alternative than having no ranking number at all, which would sort it at the bottom and lose the context.

          Unfortunately this will create some places here and there where a title is out of sync with those around it. If anything we’ll be seeing more of these over time! I only started filling in the minimum sales column in the last week, so it was all clean until the latest update. I suppose I could try to identify all of these situations each week but it’s not even completely clear how to resolve them, or if it would be worth the considerable time it could take. Especially if it stays missing from the list and needs to be bounced around every week as stuff around it changes.

          I might do a periodic cleanup of these once in a while, but not every week.

      Oh, heh, I guess I should probably try harder to keep it up to date now. Still, this is good, it brings everyone in alignment. Added a third tab to the spreadsheet with links back to these explanatory posts for some context.

    Even after all that scraping, Diabolik Lovers remains a 0? And yet it is getting a second season. I guess the game must sell well or something as I don’t think the music and drama CDs sell particularly well.

    Here’s an example of how obnoxious the extended rankings can be to work with week to week:
    Redownload, merge in, sort, suddenly discs appear in places they shouldn’t be!

    銀河鉄道999(12.11) goes down a hundred spots and was above a 252 before but is now below a 223.
    ノブナガ・ザ・フール 8 【DVD】was below the lowest known 211, now it’s a bunch of spots above it.
    キャプテン・アース VOL.5 初回生産限定版 was above a 244 now it’s barely in the 211 clump.

    There’s a lot of these every week. The changes are, by and large, very very small. The ones I cite as examples above really are totally negligible, unimportant differences. But they mean the Minimum column is never going to be as clean as I’d like, and it’ll just get worse over time. I can update it periodically to give every disc the min of the next known disc below it but can’t put every one of those changes in my data spreadsheets because I’d be spending hours and hours every week making tiny tweaks to sales numbers that’ll probably just change again next week.

    Stupid extended ranking. I love you, but I hate you.

      Any theories on why this happen?

        I think their website just sucks? Like, I don’t know how they’re building the list but considering there are hundreds of duplicates in the DVD list each week there must be some issues in how it’s sorting discs. It’s not enough to make any significant difference in what I’m doing but man it’s enough to bug the shit out of me when I like everything to be all nice and orderly! It seems to be much more prevalent in the DVD list than the BDs. In general, the BD list has been much more stable both to scrape and in terms of weird fluctuations in ranking.

        I’m going to try reducing the scrape speed to 1 page at a time next time. Maybe it gets confused when two requests come in at the same time, even though it totally should not matter… page 1036 or whatever should have the same data no matter what. Maybe all the duplicates throw off the counts and jumble up the ordering just enough to goof things up?

        Well, anyway, I start the scrape on Wednesday and don’t look at it until Thursday nights anyway, so I don’t care if it takes longer only scraping with one connection at a time. I don’t know if it’ll help but it’s worth a shot.

    Don’t want to sound too greedy, but is it possible if you could provide the latest update of that google docs spreadsheet for the extended rankings? I understand how much effort it takes to merge the new data and fix some of the problems that come up, but now it’s almost 2 months out of date.

      “I hate Oricon” rant time!

      I’ve considered doing so a few times but every time I start, I get incredibly frustrated at Oricon and how messy and unreliable (in small but annoying ways) their list is – and it keeps getting worse.

      The whole bottom, I dunno, half? of the DVD list (and a much smaller portion of the BD list) is a stupid jumble of discs that are out of numerical order in so, so many places due to the way discs just randomly shift around every single week.

      The simplest way of fixing this (assign every disc the sales of the disc below it, unless it is itself a “reference disc” with known sales) doesn’t work because there are plenty of places where the “known” sales of a disc clearly don’t match with the reference discs above and below. For example, right now:

      252 – #12838 – 桜Trick 2
      240 – #12849 – がんばれ!ルルロロ「ルルロロのおばけたいじ」
      244 – #12872 – フューチャーカード バディファイト【2】

      Ganbare! Lulu Lolo obviously has to be wrong, but that’s the reported sales we have from that week’s Oricon data. So I have to manually make sure the discs between 12839 and 12848 are 244 rather than 240. They’d be the latter if I tried to clean it up with a simple formula. They all ranked just one week too.

      There are also places where, for example, we had:

      500 [known]
      490 [known]
      480 [known]

      But then, say, the 490 disc got a wk2, and was maybe like 600, no longer in this range. Then we get:
      500 [known]
      480 [known]

      If I didn’t know that there was previously a 490 in the middle, I’d be setting those 490s as 480s, which is wrong. So I can’t just calculate the simple way because of those either. [Bonus fun: differentiating between this phenomenon and just plain old Oricon shuffling randomness!]

      And let’s not forget the increasing scourge of Oricon glitches randomly doubling some discs’ apparent sales, which we know is totally wrong. I’ve caught a few dozen but there could be hundreds…?

      The first time I did cleanup two months ago, I *did* use a formula, which only made things worse because I lost a couple weeks of week-by-week narrowing down due to not accounting for the issues above. A lot of that has been fixed over time (I tend to check some discs above and below each newly added or modified one to at least clean up its neighborhood) but short of a row by row cleanup it’ll never be complete.

      And nevermind actually making these changes on the separate spreadsheet that tracks the sales! That would take days and days, transferring over every little shift up or down a few (or few dozen) discs, and repeat every week… I usually only transfer over anything that’s more than about ~10-20 discs, and only for the discs I’m cleaning up around the newly edited/added discs for the week.

      All this to say… the data just makes me a bit irritated to look at lately and the prospect of taking ~6 hours (or more?) out of an already hyper-short weekend to fix it all up just so the next week’s scrape can fuck it all up again has been kinda depressing! And at the same time I’ve been reluctant to update the google doc with messy information, particularly since the JP sales community is aware of it.

      I know all these differences account for very little in terms of the effect on averages but it still just looks… I hesitate to say “unprofessional” because haha as if any of this is at all professional but that’s kinda the feeling I have nonetheless.

      I’ll see what I can do this weekend. I’m looking forward to spending as much time as I can starting Diablo III season three and marathoning spread Wixoss (marathoned infected last weekend, was really good) but maybe I’ll give it a go when I’m updating the extended rankings this week.

      I really need some time off from my real job to catch up on, well, everything. I hate that the weekly ranking summaries have been slipping from weekly to every 2-3 weeks but that’s because the time usually spent on that is now spent 2- or 3-fold on the extended ranking updates instead.

      Aaaaand rant over, finally.

        A bit confused by your example. It looks right to me?

        252 – #12805 – 桜Trick 2
        244 – #12837 – フューチャーカード バディファイト【2】
        240 – #12849 – がんばれ!ルルロロ「ルルロロのおばけたいじ」

        • something says:

          Oh, copied from the wrong thing in my comment, order is right but ranking number was wrong, should be:

          252 – #12838 – 桜Trick 2
          240 – #12849 – がんばれ!ルルロロ「ルルロロのおばけたいじ」
          244 – #12872 – フューチャーカード バディファイト【2】

          Which indeed has Lulu Lolo in the wrong place. It’s just wonky like this in quite a lot of spots. I don’t know if they apply a correction sometimes after reporting, or if the original report on the jp threads was wrong, or what, but Lulu Lolo is not where it should be based on the sales (or conversely, the sales are not what they should be based on the ranking).

    Can anyone help to find the standard singles chart sales figures for october 2003?
    I cant find week 1,2 and 3 anywhere like it dissipated.
    I found week 4 sales figures only.

    Can someone please explain the “Glitch” where people see double sales?
    Someone told me that these are not always glitches and Oricon is making some figures higher for which one they like

      Take Kanon (2006) Vol. 1 for example. Based on the weekly/monthly/yearly charts, it sold 20,097 after 5 weeks of ranking. But then we look at its position on the Oricon extended rankings: #264 between 40,343 (Haruhi LE v4) and 39,989 (some dreamworks CG movie called Over the Hedge)

      That implies sales exactly double what we know its actual sales to be. But Oricon clearly says its data is based on 5 weeks of ranking, which we already know was when it sold that 20.1k. This pattern is repeated for a number of other discs. They’re all listed at a rank implying double the sales they were last reported at, which is simply impossible no matter how long their tail was.

      The simplest explanation is Oricon has a duplicate record in their database and is double-counting some of the entries. There are at least 54 discs affected by this.

        also happens in the singles chart I believe, its a shame that Oricon is meant to be a trusted institution but is not playing fairly.

        • something says:

          There’s no reason to believe Oricon is doing this intentionally. This is clearly just a simple data glitch, and not one that affects their official data reports in any way. I don’t imagine they’d randomly select 50 discs out of almost 20,000 and double their sales in a list that nobody was aware existed until a year ago and maybe a handful of people in the world actually bother to deduce sales rankings from!

    Thankyou so much for your help. You have a nice skill for data collection.
    Please do you have this information I can get a copy?

