A service of J.C.'s blog at tacticalsecret.com. Contact me on Twitter @JamesPugJones.
Relevant blog post: Let's Encrypt's Growth to 10 Million Active Unique FQDNs
This box is maintaining a state of Certificate Transparency using github.com/jcjones/ct-mapreduce. The CT "fetcher" that writes whole certificates to storage is Golang, but certificate processing is now in Python to ease transition to Amazon EMR / Spark jobs.
This is the testbed and the source of the Let's Encrypt Statistics page on letsencrypt.org.
There are datasets here, too, you can look at it. You shouldn't pull data directly from here, because long-term plans include moving this to a more powerful machine.
There is a methodology cutover point annotated on the graphs, see the Methodology Cutover section.
There is a data methodology cut-over from the old ct-sql code to the new ct-mapreduce code on 3 July 2017, where there is an obvious drop in the various counts. After debugging, I believe the prior ct-sql code/queries have been overcounting, and the new code is producing more accurate values.
Active certificate counts appear to have been infllated by ~14%, while FQDN and Registered Domain counts were inflated by ~7% each.
Some of the domain overcounting appears to have been due to domains issued SAN-certificates sometimes not being purged when those certificates expire without being renewed. This only happens in cases where the domains are part of a SAN cert, and then the SAN cert is re-issued with a somewhat different set of domains. Those removed, while expired, were still counted.
The active certificate overcounting is in-part due to timing of new certificates being added during nightly maintenance being essentially double-counted. Jacob Hoffman-Andrews pointed out that if Let's Encrypt had average issuance, for every hour maintenance takes, the active certificate count would inflate by ~5%. Maintenance with the SQL code took between 1 and 4 hours to complete each night.
There are likely other more subtle counting errors, too.
The nature of the new Map/Reduce effort produces discrete lists of domains for each issuance day, which are more easily inspected for debugging, so I feel more confident in it. These domain lists are also available as datasets, below.
These data sets are generated daily, between 00:30 and 02:00 UTC:
These data sets are generated at least weekly, with nothing in them to tell you when. I should fix that. These lists are big, and should probably be moved to S3 so they serve faster, but I imagine very few people care about them. If you do, send me a note on Twitter.