Details

    Description

      For various compliance purposes we need to generate a Software Bill of Materials for a server build. It's a JSON that follows a specific schema. The main purpose of it is to list dependencies of the built binaries. Dependencies here are used in the sense of vulnerability management, that is, if X contains a security vulnerability, will Y have it? In that sense CONNECT depends on minizip.

      cmake knows what it links targets with, and bundled sources (like gzip or minizip) have the version embedded that cmake can read it with FILE(STRINGS ...) for example.

      It seems that it should be possible to dump this information into a json template, e.g. with CONFIGURE_FILE().

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Since the security engineering already approved the generated SBOM file, I rely on their expertise and didn't check the files' compliance with requirements and regulations. I'll only add some observations on the contents and further process.

            My main concern is maintainability. The solution implies that somebody will be keeping the list of dependencies up-to-date, but I doubt it's realistic. Developers won't always know or remember about SBOM considerations while adding new features which may add new components which should be of SBOM's interest, and as far as I can see, there is no way we would detect that it has happened. Besides, there are different builds and different environments.

            Already now, if we look for example at the CS bintar, we link it statically with at least libzstd and libncurses. I'm not sure about libpmem, but libzstd.a and libncurses.a are definitely used, but neither of them show up in the SBOM, while according to the spec they should be there.

            There is a set of cmake options to add extra components to SBOM, but using them also doesn't seem to be realistic, because not only you need to provide explicitly the name, but also the version etc., and it cannot be done for automated builds.

            Some other notes

            • properties.name=package_name, properties.value=<...>
              I'm not sure what is meant by "package" here, but for example looking at the RPM build,

                      "name": "package_name",
                      "value": "mariadb-11.8.0-linux-x86_64"
              

              apparently it will be the same for all RPMs of the same architecture, thus indistinguishable, and the file also doesn't have any indication of the OS flavor. I don't know if it's a problem, but at least it is likely to be an inconvenience. Note that even the example here, while it has OS and architecture, doesn't specify OS version, so we would still have it identical for all RHELs, etc.
              For DEB packages, we have e.g.

                      "name": "package_name",
                      "value": "mariadb-deb11-11.8.0-linux-aarch64"
              

              At least it specifies the system uniquely, although it doesn't look like any package name we create.

            • there is a problem with CPE for components which (happen to be) of an intermediate version. For example, here is what we have for CS main branch at the moment of the writing:

              {
                    "bom-ref": "mariadb-connector-c-52d0a38e",
                    "type": "library",
                    "name": "mariadb-connector-c",
                    "version": "52d0a38e",
                    "purl": "pkg:github/mariadb/mariadb-connector-c@52d0a38e",
                    "cpe": "cpe:2.3:a:mariadb:connector\\/c:52038:*:*:*:*:*:*:*",
                    "supplier": {
                        "name": "MariaDB"
                     }
                  },
              

              The "version" is set to a git commit hash, which I suppose means it doesn't have any release tag; but what CPE has is just wrong, apparently it tries to use the "version" but removes the letters? I don't know what is supposed to be done in this case, maybe CPE should be omitted altogether.

            elenst Elena Stepanova added a comment - - edited Since the security engineering already approved the generated SBOM file, I rely on their expertise and didn't check the files' compliance with requirements and regulations. I'll only add some observations on the contents and further process. My main concern is maintainability. The solution implies that somebody will be keeping the list of dependencies up-to-date, but I doubt it's realistic. Developers won't always know or remember about SBOM considerations while adding new features which may add new components which should be of SBOM's interest, and as far as I can see, there is no way we would detect that it has happened. Besides, there are different builds and different environments. Already now, if we look for example at the CS bintar , we link it statically with at least libzstd and libncurses. I'm not sure about libpmem, but libzstd.a and libncurses.a are definitely used, but neither of them show up in the SBOM , while according to the spec they should be there. There is a set of cmake options to add extra components to SBOM, but using them also doesn't seem to be realistic, because not only you need to provide explicitly the name, but also the version etc., and it cannot be done for automated builds. Some other notes properties.name=package_name, properties.value=<...> I'm not sure what is meant by "package" here, but for example looking at the RPM build , "name": "package_name", "value": "mariadb-11.8.0-linux-x86_64" apparently it will be the same for all RPMs of the same architecture, thus indistinguishable, and the file also doesn't have any indication of the OS flavor. I don't know if it's a problem, but at least it is likely to be an inconvenience. Note that even the example here , while it has OS and architecture, doesn't specify OS version, so we would still have it identical for all RHELs, etc. For DEB packages , we have e.g. "name": "package_name", "value": "mariadb-deb11-11.8.0-linux-aarch64" At least it specifies the system uniquely, although it doesn't look like any package name we create. there is a problem with CPE for components which (happen to be) of an intermediate version. For example, here is what we have for CS main branch at the moment of the writing: { "bom-ref": "mariadb-connector-c-52d0a38e", "type": "library", "name": "mariadb-connector-c", "version": "52d0a38e", "purl": "pkg:github/mariadb/mariadb-connector-c@52d0a38e", "cpe": "cpe:2.3:a:mariadb:connector\\/c:52038:*:*:*:*:*:*:*", "supplier": { "name": "MariaDB" } }, The "version" is set to a git commit hash, which I suppose means it doesn't have any release tag; but what CPE has is just wrong, apparently it tries to use the "version" but removes the letters? I don't know what is supposed to be done in this case, maybe CPE should be omitted altogether.
            • we add new bundled third-party code very rarely. last time it was ~3 years ago, before that it was >10 years ago? Keeping it in sync with SBOM won't be too difficult
            • statically linked libraries — yes, but this is mainly for packages. we'll extend it into bintars if needed.
            • "ES" comments aren't applicable here in this project
            • strange version in cpe looks like a bug. I don't have a good suggestion what to do here, perhaps omit cpe altogether when a version cannot be determined?
            serg Sergei Golubchik added a comment - we add new bundled third-party code very rarely. last time it was ~3 years ago, before that it was >10 years ago? Keeping it in sync with SBOM won't be too difficult statically linked libraries — yes, but this is mainly for packages. we'll extend it into bintars if needed. "ES" comments aren't applicable here in this project strange version in cpe looks like a bug. I don't have a good suggestion what to do here, perhaps omit cpe altogether when a version cannot be determined?
            wlad Vladislav Vaintroub added a comment - - edited

            elenst, thanks for looking. I'll try to answer what I know

            1. Static libraries, generally., or how we derive "external code"

            SBOM generator uses CMake , and by itself, it can't introspect the whole build, take all libraries with extension .a that are linked, find the version of the random static libraries, find out github URL for the sources, and lookup the official CPE database for the CPE identifier.
            Above is the information we need - version, vendor, download or github URL, CPE identifier . There is no generic solution that will take random file with extension .a and derives all of that. There are things we can derive, automatically, "git submodules", there are things that are currently semi-automatic. like ExternalProjectAdd, where at least download location can be mapped, there is internal knowledge about parts of zlib copied to connect engine. There is an extensibility you mention, "injecting" dependency information that can be done on CI (this is what you say "can't" be done) .
            If you name the libraries, you use, "ncurses", I can try to find some version from header file, and look up where sources exist on github, or whether it has CPE id. There is still a change that some random guy compiled this static library several years ago, put it on build server, and version is taken from system header file, and does not match the actual source. Also what do we use "zstd" for, why do we link it statically ?

            2. component.bom-ref, component.name, vendor and product name in component.cpe

            First. bom-ref can be a completely random thing, like, it can be a UUID, it is only for referencing this element elsewhere in the same document
            name, vendor and product name are all taken from how it is known in cmake. Only very few things are hardcoded.

            "component": {
                  "bom-ref": "@CPACK_PACKAGE_NAME@",
                  "type": "application",
                  "name": "@CPACK_PACKAGE_NAME@",
                  "version": "@CPACK_PACKAGE_VERSION@",
                  "supplier": {
                    "name": "@CPACK_PACKAGE_VENDOR@",
                    "url": [
                      "@CPACK_PACKAGE_URL@"
                    ]
                  },
                  "purl": "pkg:github/@GITHUB_REPO_USER@/@GITHUB_REPO_NAME@@@GIT_REV_SHORT@",
                  "cpe": "cpe:2.3:a:mariadb:mariadb:@CPACK_PACKAGE_VERSION@:*:*:*:*:*:*"
                },
            

            All these variables are standard CMake variables related to packaging.
            I would say, if name would be important, it would be set accordingly, elsewhere in CMake.

            CPE vendor and product name is not anything we can decide about. An authority, I think NIST, publishes a dictionary, https://nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.zip and I had to look up there for how mariadb is refered to
            There are "cpe:2.3:a:mariadb:mariadb" entries, and there is no mentioning of "mariadb-enterprise".

            3. properties.name=package_name, properties.value=<...>
            Christian said "highly recommended, not mandated, a property that can help to identify the package".
            So, it is not mandatory, and it *helps", and there is no strict requirement about this being unique to the package, which is impossible in general case, because for example on Windows, the same build creates both ZIP and MSI package
            The source for that one is

             "properties": [
                  {
                    "name": "package_name",
                    "value": "@CPACK_PACKAGE_FILE_NAME@"
                  }
                ],
            

            CPACK_PACKAGE_FILE_NAME is standard variables supposed to be resulting base file name for the package. How it is supposed to work for component-based, which creates several packages , I do not know to be honest, but something(what) that be done if we detect this -DRPM=1 during the build.

            If you started creating and renaming tar.gz for some kinds of packages now, you maybe also can start to rename sbom.json to that same name, I don't know.
            Anyway, it is an "optional" element. It tells you something about package. It does not have to be there.

            4. Connector/C intermediate version. Ok I can fix that, omit CPE, but we do should not release our packages with untagged intermediate version, right?

            Connector/C backslashes. In CPE, forward slash needs to be masked. It can't be "connector-c", as this is not how it is called in the official dictionary

            wlad Vladislav Vaintroub added a comment - - edited elenst , thanks for looking. I'll try to answer what I know 1. Static libraries, generally., or how we derive "external code" SBOM generator uses CMake , and by itself, it can't introspect the whole build, take all libraries with extension .a that are linked, find the version of the random static libraries, find out github URL for the sources, and lookup the official CPE database for the CPE identifier. Above is the information we need - version, vendor, download or github URL, CPE identifier . There is no generic solution that will take random file with extension .a and derives all of that. There are things we can derive, automatically, "git submodules", there are things that are currently semi-automatic. like ExternalProjectAdd, where at least download location can be mapped, there is internal knowledge about parts of zlib copied to connect engine. There is an extensibility you mention, "injecting" dependency information that can be done on CI (this is what you say "can't" be done) . If you name the libraries, you use, "ncurses", I can try to find some version from header file, and look up where sources exist on github, or whether it has CPE id. There is still a change that some random guy compiled this static library several years ago, put it on build server, and version is taken from system header file, and does not match the actual source. Also what do we use "zstd" for, why do we link it statically ? 2. component.bom-ref, component.name, vendor and product name in component.cpe First. bom-ref can be a completely random thing, like, it can be a UUID, it is only for referencing this element elsewhere in the same document name, vendor and product name are all taken from how it is known in cmake. Only very few things are hardcoded. "component": { "bom-ref": "@CPACK_PACKAGE_NAME@", "type": "application", "name": "@CPACK_PACKAGE_NAME@", "version": "@CPACK_PACKAGE_VERSION@", "supplier": { "name": "@CPACK_PACKAGE_VENDOR@", "url": [ "@CPACK_PACKAGE_URL@" ] }, "purl": "pkg:github/@GITHUB_REPO_USER@/@GITHUB_REPO_NAME@@@GIT_REV_SHORT@", "cpe": "cpe:2.3:a:mariadb:mariadb:@CPACK_PACKAGE_VERSION@:*:*:*:*:*:*" }, All these variables are standard CMake variables related to packaging. I would say, if name would be important, it would be set accordingly, elsewhere in CMake. CPE vendor and product name is not anything we can decide about. An authority, I think NIST, publishes a dictionary, https://nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.zip and I had to look up there for how mariadb is refered to There are "cpe:2.3:a:mariadb:mariadb" entries, and there is no mentioning of "mariadb-enterprise". 3. properties.name=package_name, properties.value=<...> Christian said "highly recommended, not mandated, a property that can help to identify the package". So, it is not mandatory, and it *helps", and there is no strict requirement about this being unique to the package, which is impossible in general case, because for example on Windows, the same build creates both ZIP and MSI package The source for that one is "properties": [ { "name": "package_name", "value": "@CPACK_PACKAGE_FILE_NAME@" } ], CPACK_PACKAGE_FILE_NAME is standard variables supposed to be resulting base file name for the package. How it is supposed to work for component-based, which creates several packages , I do not know to be honest, but something(what) that be done if we detect this -DRPM=1 during the build. If you started creating and renaming tar.gz for some kinds of packages now, you maybe also can start to rename sbom.json to that same name, I don't know. Anyway, it is an "optional" element. It tells you something about package. It does not have to be there. 4. Connector/C intermediate version. Ok I can fix that, omit CPE, but we do should not release our packages with untagged intermediate version, right? Connector/C backslashes. In CPE, forward slash needs to be masked. It can't be "connector-c", as this is not how it is called in the official dictionary

            1. There are "generic" solutions that try to solve the problem of an "arbitrary build with arbitrary rules, whatever build system, etc, etc". They're slow (2x build time or more), mostly expensive, and very imprecise. But external libraries (.a) are not difficult to solve with out approach, CMake knows all libraries it links with, we can have a loop over them, extract the version of known libraries (like ncurses) and abort the build on unknown libraries. It's not required for rpm/deb packages, but if we'll want SBOM for bintars — I suspect this will be an approach to go.

            serg Sergei Golubchik added a comment - 1. There are "generic" solutions that try to solve the problem of an "arbitrary build with arbitrary rules, whatever build system, etc, etc". They're slow (2x build time or more), mostly expensive, and very imprecise. But external libraries ( .a ) are not difficult to solve with out approach, CMake knows all libraries it links with, we can have a loop over them, extract the version of known libraries (like ncurses) and abort the build on unknown libraries. It's not required for rpm/deb packages, but if we'll want SBOM for bintars — I suspect this will be an approach to go.

            pushed a while ago

            wlad Vladislav Vaintroub added a comment - pushed a while ago

            People

              wlad Vladislav Vaintroub
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.