CodeQL .qll Contribution

</aside>

Why .qll Contributions Are Necessary

It appears that, in the Python CodeQL queries, only the frameworks defined as QLL files under the python/ql/lib/semmle/python/frameworks/ directory are recognized as Sinks. For example, MarkupSafe is correctly detected as a Sink because it is defined in that location, whereas MarkUp is not detected due to the absence of a corresponding QLL definition. Given this, I would like to create a QLL file for MarkUp, write appropriate unit tests, and submit a pull request. Would it be possible to have the PR reviewed and potentially merged once submitted? I would greatly appreciate your guidance on this.

Best regards,

Kim Soo Hyun

Comparison of Sinks: When .qll Files Are Defined vs. Not Defined

In cases where a .qll file is defined, such as MarkupSafe,
- Frameworks.qll
  
  https://github.com/github/codeql/blob/main/python/ql/lib/semmle/python/Frameworks.qll → python/ql/lib/semmle/python/Frameworks.qll
- Markupsafe.qll
  
  https://github.com/github/codeql/blob/main/python/ql/lib/semmle/python/frameworks/MarkupSafe.qll → python/ql/lib/semmle/python/frameworks/MarkupSafe.qll
- Taint Tracking
  
  As shown above, when MarkupSafe.qll is defined and registered in Frameworks.qll, it can be correctly detected as a Sink.
In cases where a .qll file is not defined (Example: Markup),

In the case of Markup, where a .qll file is not defined, we can observe that it is not detected as a Sink.

Contributing to .qll

python/ql/lib/semmle/python/Frameworks.qll

private import semmle.python.frameworks.Markup

python/ql/lib/semmle/python/frameworks/Markup.qll

/**
 * Provides classes modeling security-relevant aspects of the hypothetical `markup` PyPI package
 * (imported as `markup`)
 *
 * This models parsing functions that may process untrusted input.
 */

 private import python
 private import semmle.python.dataflow.new.DataFlow
 private import semmle.python.Concepts
 private import semmle.python.ApiGraphs
 
 private module Markup {
   /**
    * A call to any of the parsing functions in `markup` (`parse`, `parse_document`,
    * `unsafe_parse`, `unsafe_parse_document`, `safe_parse`, `safe_parse_document`)
    *
    * These functions may be unsafe if they parse untrusted markup content.
    */
   private class MarkupParseCall extends Decoding::Range, DataFlow::CallCfgNode {
     override CallNode node;
     string func_name;
 
     MarkupParseCall() {
       func_name in [
           "parse", "parse_document", "unsafe_parse", "unsafe_parse_document",
           "safe_parse", "safe_parse_document", "div", "page"
         ] and
       this = API::moduleImport("markup").getMember(func_name).getACall()
     }
 
     /**
      * Determine whether this function call may unsafely execute input data.
      *
      * `unsafe_parse`, `unsafe_parse_document`, and `parse`, `parse_document` without secure settings
      * are considered unsafe.
      */
     override predicate mayExecuteInput() {
       func_name in ["unsafe_parse", "unsafe_parse_document"]
       or
       func_name in ["parse", "parse_document"] and
       // If no safe mode argument is set, assume unsafe
       not exists(DataFlow::Node mode_arg |
         mode_arg in [this.getArg(1), this.getArgByName("mode")] |
           mode_arg =
             API::moduleImport("markup")
                 .getMember(["SafeMode", "StrictMode"])
                 .getAValueReachableFromSource()
       )
     }
 
     override DataFlow::Node getAnInput() { result in [this.getArg(0), this.getArgByName("input")] }
 
     override DataFlow::Node getOutput() { result = this }
 
     override string getFormat() { result = "Markup" }
   }
 }

CodeQL Taint Trakcing Query

/**
 * @kind path-problem
 * @id python/taint-tracking
 */

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.RemoteFlowSources

module Config implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }

  predicate isSink(DataFlow::Node sink) { any() }
}

module Flow = TaintTracking::Global<Config>;

import Flow::PathGraph

from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink,
"sink function: "       + sink.getNode().asExpr().getScope().(Function).getName()